ECON 4400 Final project
This final project will constitute 25% of your final grade. For this final project you will write an approximately 5 double-spaced page paper (excluding the works-cited page). More than 5 is fine, but please don’t write me a full thesis worth of information (I have 80 papers to grade…). In this paper you will estimate a log-wage equation focusing on a particular variable of your choice as the treatment variable. I will provide you two separate datasets for this project. You will use one of these datasets depending on what research question you decide to pursue. I am providing the documentation for each dataset. To begin, you should first review the documentation for both datasets and pick a research question and corresponding treatment variable (e.g. returns to education, returns to experience, returns to tenure, gender or racial wage gaps,…). Your paper should include 6 total sections:
- Identification strategy
- Discussion and conclusions
- Works cited
You should do the following in your introduction:
- State your research question
- Motivate your research question: Why should we care about your research question? Is there a policy implication? Are there implications for social welfare? You need not convince me that your research question is the best research question in the history of the world, but demonstrate that you’ve put genuine thought into why it is worthwhile. You are welcome to include any news articles, scholarly articles, or any relevant and legitimate source that you think helps establish why your topic is interesting.
- Discuss relevant pre-existing literature: You MUST discuss at least one peer-reviewed article relevant to the research question you choose. You are welcome to include more than one if you wish. You do not need to give me an in depth synopsis of the paper. It is sufficient that you describe the general research question the paper is trying to answer, and provide the general results of the paper (e.g. this paper studied the returns to education and found that for each additional year of education there is a predicted increase in wages of X%). Make sure you properly cite your sources! I will discuss in-class a couple of sources you may visit to find articles.
Distinguish which of the two datasets you use. Describe the dataset: how many observations are there, what is the underlying data source (e.g. CPS, NLSY), what year(s) is the data from, describe any specific characteristics of the sample (e.g. is it only one gender…), describe how each variable you include is measured (e.g. for education it is years of education, experience is potential experience defined as…, etc). Out of the two datasets why was this one best-suited to your research question? If you use wage2 you should reference the originating paper (which I will provide) for further details regarding the dataset.
In this section you should detail the models you estimate in order to answer your research question. You must type out each model neatly and properly formatted. I will discuss in class how to write equations in Word. You MUST estimate at least two models:
- A simple regression model of log wages on your treatment variable of interest.
- A multiple regression model of log wages on the treatment variable and any important control variables.
- Rather than simply running a “kitchen sink” regression where you dump all of the other variables into the model as control variables just because you can, you should demonstrate that you put some thought into why each of the additional variables you add may cause omitted variable bias. That is, you should briefly describe how each control variable could be related to BOTH the treatment variable and log wages.
You should estimate each model described in your identification strategy in RStudio. You should notice in each of the two datasets that I have excluded a variable for log wages, so you will first have to create a log wage variable. The function in R to take the natural log of a variable X is simply “log(X)”. This section should include:
- A SINGLE table with results from all of the models you estimated. I will discuss in class how to format the table. This table should include all of your estimates, standard errors for each estimate, number of observations, and adjusted R-squared for each model.
- An interpretation of the coefficient estimate on your treatment variable for each model you estimate. Determine whether or not the treatment variable is significant (and at what levels) for each model.
- Compare the estimate for your treatment variable between the models with and without controls. How did the estimate change when you included the control variables (e.g. what direction and how much)? What implications does that have for omitted variable bias from other unobserved factors?
This section should wrap up your paper with conclusions about your research question drawn from your results and discussion about the validity of your identification strategy in uncovering the average causal effect of interest. This should include:
- Detail what your results imply about your research question (e.g. my results imply that for each additional year of education the returns to education are…). If this has any specific implications in regards to how you motivated your research question, make sure to mention this (e.g. policy implications).
- Discuss factors that may cause your identification strategy to fail to capture the average causal effect you are seeking. Can you think of any specific variables that you were unable to control for that may cause omitted variable bias? How could future research address this shortcoming?
Cite each and every source you use. Use APA formatting for works cited. For in-text citations, just use a parenthetical citation with authors name(s) and publication year.
Wage1 documentation (taken from Wooldridge package in R):
Wooldridge Source: These are data from the 1976 Current Population Survey.
A data.frame with 526 observations on 24 variables:
- wage: average hourly earnings
- educ: years of education
- exper: years potential experience, defined as [Age-educ-6]
- tenure: years with current employer
- nonwhite: =1 if nonwhite
- female: =1 if female
- married: =1 if married
- numdep: number of dependents
- smsa: =1 if live in SMSA
- northcen: =1 if live in north central U.S
- south: =1 if live in southern region
- west: =1 if live in western region
- construc: =1 if work in construc. indus.
- ndurman: =1 if in nondur. manuf. indus.
- trcommpu: =1 if in trans, commun, pub ut
- trade: =1 if in wholesale or retail
- services: =1 if in services indus.
- profserv: =1 if in prof. serv. indus.
- profocc: =1 if in profess. occupation
- clerocc: =1 if in clerical occupation
- servocc: =1 if in service occupation
Wooldridge Source: M. Blackburn and D. Neumark (1992), “Unobserved Ability, Efficiency Wages, and Interindustry Wage Differentials,” Quarterly Journal of Economics 107, 1421-1436. Professor Neumark kindly provided the data, of which I used just the data for 1980.
A data.frame with 935 observations on 17 variables:
- wage:monthly earnings
- hours:average weekly hours
- IQ:IQ score
- KWW:knowledge of world work score
- educ:years of education
- exper:years of work experience
- tenure:years with current employer
- age:age in years
- married:=1 if married
- black:=1 if black
- south:=1 if live in south
- urban:=1 if live in SMSA
- sibs:number of siblings
- brthord:birth order
- meduc:mother’s education
- feduc:father’s education