welch2000jfe.pl --- 10 June 2001
welch2000jfe.pl shows how to code the herding (imitation) estimation described in
Its most important econometric contribution was the development of an estimation method that is applicable to discrete choice data. Note that discrete choice is a necessary requirement for informational cascades.
The distribution archive contains five files (all in subdirectory welch2000jfe/):
pod2html -noindex -title "welch2000jfe documentation" welch2000jfe.pl > docs.html
You are now looking either at the program code itself (welch2000jfe.pl), or the documentation (docs.html) generated from it.
There is now a web interface to this program at http://welch.econ.brown.edu/academics/welch2000jfe/webfrontend.html . This is basically a form, which simply calls welch2000jfe.pl.
Unfortunately, this also means that the program welch2000jfe.pl must behave slightly different when it is run by a web server. It permits running without an input file name (it reads from STDIN), it produces less instructive debugging output (just the processed output), and it exits with code 0 if there is an error (so that the server does not announce a server error).
The distribution can be downloaded from http://welch.econ.brown.edu/academics/welch2000jfe.tgz The individuals files are at http://welch.econ.brown.edu/academics/
It contains the code, an optimizer, some html documentation, and a sample input file. Then use an unpacking utility, such as pkzip (from http://www.pkware.com) or winzip (from http://www.winzip.com) Many browsers/computers already have this installed; if not, please download one using the above links.
To just see some sample output, you do not need to install perl, because the welch2000jfe.pl program's output is reproduced in this documentation. For the most part, the welch2000jfe.pl code should be easy to read, even by someone who does not know perl.
Still, the welch2000jfe.pl program can immediately run any arbitrary data set (including your's) and tell you your herding parameter theta. (For small data sets, just use the provided web interface.) So, rather than reinventing the wheel, you may find that downloading the archive and installing perl may well be worth it.
Invoking this program with argument FIGURE1 produces the output for the graph of figure1 in the paper.
Invoking this program with argument ``MSFT.dat'' uses the enclosed Microsoft dataset. Any such data set has to be in a particular format (fields are space, tab or comma separated):
from to (any other information, such as yyyymmdd, identity, etc.)
from to (any other information, such as yyyymmdd, identity, etc.)
...
from to (any other information, such as yyyymmdd, identity, etc.)
(For a format example, please see the microsoft dataset in the program itself). The program determines the range (domain) of feasible values by itself. However, it is about discrete herding, so it truncates each 'from' and/or 'to' into an integer. It also only makes sense if a consensus can be computed as the simple average of past ``to's.'' Necessarily, this means that the domain must be ordered, and a difference between a '1' and a '2' should be similar to the difference between a '4' and '5', etc.
The target in this program is assumed to be the prevailing average consensus (the average of all prior ``to's''). Thus, to be able to compute a prevailing consensus, the estimation necessarily begins with a later observation number (usually # 11; this is a global parameter that can be changed at the top of the perl code).
It only makes sense to run this if:
Again, it may not make sense to ask if an agent is herding on himself. So, if you plan to use this program for an optimization for a paper that empirically tests for imitation, please do not forget that you should filter out multiple consecutive choices by the same agent before you run this program.
In contrast to the JFE paper, the sample MSFT data ignores the fact that there may be consecutive recommendations by the same agent and even on the same day. The program simply assumes that each agent is different [even though the file tells us the identity and date]. This makes the code a lot easier here. The user is advised to filter undesirable observations before running the program!
For the enclosed Microsoft MSFT.dat, the output is as follows:
The NULL probability transition matrix is:
to1 to2 to3 to4 to5
from1 0.4519 0.2212 0.3173 0.0000 0.0096
from2 0.2717 0.3913 0.3370 0.0000 0.0000
from3 0.3301 0.2913 0.3107 0.0485 0.0194
from4 0.1667 0.0000 0.6667 0.1667 0.0000
from5 0.0000 0.0000 0.6667 0.0000 0.3333
(This transition matrix is computed on the fly from the enclosed data and displayed to screen.) The program then shows
The total log-probability for theta=0.5 given this data set is -335.129769269242
First Run: The optimal log-likelihood function is LL(theta=0.0278)= - 330.140
Second Run: The optimal log-likelihood function is LL(theta=0.0278)= - 330.140
PERL was chosen as a language because it is free, universally available for all common computers and operating systems (and these days comes pre-installed on most Unix distributions), concise, and still permits for readily understandable code. In fact, even this documentation is directly embedded in the perl code itself.
For more information, click on http://www.perl.org/; recommended books: Learning Perl by Schwartz and Christiansen and Perl Cookbook by Christiansen and Torkington.)
figure1 shows how we computed the figure on page 378 in the JFE.
functionvalue wraps the actual invokation of the likelihood function. The first argument is theta (possibly to be optimized), the second argument is a perl reference to the (immutable) data information provided by readdata.
computealtvector computes the probability under the alternative hypothesis, given a null probability vector, a theta, and a target.
accepts a vector of data, and computes prevailing consensi from past values up until the current observation. It returns a vector of equal dimensions to the input vector (N*1, where N is the number of observations).
It would be very easy to modify this subroutine to use a decaying consensus or to use a moving average instead of a consensus computed from all past values.
accepts a reference to the ``from'' vector and a reference to the ``to'' vector. Outputs a twodimensional array. Please note: $MinChoice, $MaxChoice, and $NoChoice are global variables that must have been set before!
printmatrix just prints a transition matrix from a reference.
is easier than it looks. It just needs to check that the command line arguments are appropriate, whether it is run in a web browser, and then returns all input files as a simple vector of strings (lines). If it is in web server mode, it also prints the HTML header.
welch2000jfe.pl was written AFTER the paper. Errors in either the paper or in this program have not necessarily propagated to the other.
welch2000jfe.pl is not the most efficient code for the problem at hand, because its intent is to demonstrate usage.
Apologies for the repeated use of the phrase welch2000jfe in file namings. This does not (necessarily) indicate egocentrism as much as a desire to connect the filenames to the standard article citation name.
(C) Ivo Welch, March 2001. Free Distribution Permitted, provided usage is cited and this notice remains in the text. The original paper (Welch, Ivo. ``Herding Among Security Analysts,'' Journal of Financial Economics 58-3 (December 2000), p.369-396.) is copyrighted by the Journal of Financial Economics (Elsevier). I am also very pleased to report that it won a JFE Fama prize.
10 June 2001 complete overhaul.