FIN 918 - Textual Analysis in Finance

Spring 2015

The course will be held in block format from 05/18/2015 to 05/20/2015:

Monday, May 18 from 10 a.m. to 12 p.m. (noon) and from 2 p. m. to 4 p.m. in L9, 1-2, room 210
Tuesday, May 19 from 10 a.m. to 12 p.m. (noon) in L9, 1-2, room 001 and from 2 p. m. to 4 p.m. in L9, 1-2, room 409
Wednesday, May 20 from 9 a.m. to 12 p.m. (noon) in L9, 1-2, room 210

ECTS Points: 4 (see the section "Participation, Credits and Grading" for Details)


For updates, please check

Instructor: Alexander Hillert
Room: 504
Tel. 181-1642

Motivation and Course Description:
Starting with Antweiler and Frank (2004) and Tetlock (2007) textual analysis has become a more and more frequently used method in financial research. In this course, students will learn how textual analysis works and how to implement it. This course consists of four parts.
The first part will give an introduction to the major papers on textual analysis. The lecturer will discuss the most commonly used methods for textual analysis, e.g. simple word count and naïve Bayes. Furthermore, the most common types of documents that have been analyzed in the literature so far will be presented. The students will also learn which variables (e.g., market returns, market volatility, firm-level accounting data, etc.) have been shown to be predictable by quantifying textual information.
In the second part, the most commonly used databases for textual analysis will be presented. This will include the standard databases for newspapers and newswires (Factiva and Nexis), the EDGAR (Electronic Data Gathering, Analysis, and Retrieval System) of the Security and Exchange Commission, and other databases.
The third part deals with the implementation of textual analysis and will introduce the necessary commands of the programming language Python. The students will learn how to find specific files in the EDGAR system and how to download them. Next, they will learn how to modify files and how to prepare the documents for the textual analysis, e.g. how to delete html code in the files. Last, the software package LIWC will be introduced and students will get to know how they can run a simple word count based textual analysis with this software.
In the fourth part of the course students will learn how to use the AutoIt scripting language to obtain “non-ready-to-download” data. AutoIt allows to imitate human web browsing. This feature can be very useful when one needs to obtain data from sources that do not have a clear file and folder structure. For example, the students will learn the AutoIt commands for mouse movements and clicks, for window handling, for typing text, etc.

It is not required to have any knowledge in Python, AutoIt, or any other programming language. All necessary commands will be learnt during the lectures and the exercise sessions. Nevertheless, having some previous programming experience will be helpful. Students should have some basic knowledge in accounting, economics, or finance.

Course Materials:
The lecturer will use presentation slides, which will be sent to the participants by e-mail.
The course is mainly based on papers using textual analysis published in the top finance journals and also on recent working papers. Students will receive the core literature of the course in the first lecture as pdf files.
In the third and fourth part of the course students will work with the programming languages Python and AutoIt. Both are freely available for download at and at
Students will need a computer or notebook for the exercises and for the final assignment.

Competences Acquired:
During this course students will get an overview of the literature on textual analysis in finance which will allow them to identify potential new research questions in this field more easily. Furthermore, students will learn to program a word count based textual analysis. The program skills learnt in the course include the entire process from data gathering to tone measurement. Last, students will learn how they can automate the data collection of “non-ready-to-download” data, e.g. Google search volume, by scripting.

Participation, Credits and Grading:
Students can choose between two options of participation. In the elective course option, students have to complete a 48 hour take home exam and they will receive 4 ECTS if they pass the exam. In the workshop option, students do not have to take the final exam but they will not get any ECTS. In both cases, course attendance is mandatory. It is not allowed to participate only in some parts of the course.


The lecturer reserves the right to make modification to this syllabus. The modifications (if any) will be announced in class. Students are responsible for all announcements made in class.