Economics 1759, Fall 2008, is an experimental course that will teach modern research techniques in financial economics. The skills are useful both in graduate economics courses and in quantitative Wall Street research departments, and specifically those that work on quantitative equities investing. (Matthew Rothman is the global head of quantitative equity strategy research at Lehman Bros, and holds a PhD from the University of Chicago.)
Returns <- Lagged Firm Size, Lagged Book/Market, Lagged Accruals, 1-mo Momentum, 2-7mo Momentum, Market-Beta
permno mdate alpha beta sealpha sebeta sigmasqwhere sigmasq is the variance of the residuals.
More details: For the market and risk-free rates of return, let's switch to Ken French's data in F-F_Research_Data_Factors_daily.zip, That is, please do not use the S&P500, but Ken's "Mkt-RF" column; and as your own stock's rate of return in the market model, please use it net of the "RF" rate. (Obviously, this may require small change in your program if you used the S&P500 earlier.) Your output format should be a csv file with two columns, PERMNO, BETA. We want to post the results by Friday, so please post your results by Tuesday Noon. (Honor Code Note: You are not allowed to look at the programs and results of your colleagues. Yes, there is a Unix log of file accesses.)
permno gvkey yyyymm this-months-stock-return-from-CRSP ...a set-of-lagged-and-or-computed-variables-from-Compustat and your betas...Your program must have a reasonably easy way to allow you to change the lagging parameter and the variables that you want in your data set. The output should be gzipped. For example, I may tell you that I want you to use 3-month lagged variables, (except the own rate of return, of course), and that I want [a] market-betas; [b] Net-Income/Sales (i.e., net-income and sales may come from a financial statement that is between 3 to 3+11 months old); and [c] Sales. [Sales may be called Revenues.] A typical line may therefore look as follows:
10045,231465,200511,0.123,1.241,0.054,921341.3 10045,231465,200512,-0.123,1.229,0.054,921341.3and the last two items may be 6 months old. My guess is that for 40 years of data, should have about 40*12*5,000∼2-3 million lines of data. Once the data is gzipped, it should be a few megabytes. (Extra-Credit: Think about whether it would be difficult to do YTD versions of your compustat variables.)
The draft and abbreviated version of the syllabus is in at data-course2.html and possibly data-course2.pdf.