Essays on Causal Inference with Endogeneity and Missing Data




Feng, Qian, Ph. D.

Journal Title

Journal ISSN

Volume Title



This dissertation strives to devise novel yet easy-to-implement estima- tion and inference procedures for economists to solve complicated real world problems. It provides by far the most optimal solutions in situations when sample selection is entangled with missing data problems and when treatment effects are heterogenous but instruments only have limited variations. In the first chapter, we investigate the problem of missing instruments and create the generated instrument approach to address it. Specifically, When the missingness of instruments is endogenous, dropping observations can cause biased estimation. This chapter proposes a methodology which uses all the data to do instrumental variables (IV) estimation. The methodology provides consistent estimation with endogenous missingness of instruments. It firstly forms a generated instrument for every observation in the data sample that: a) for observations without instruments, the new instrument is an imputation; b) for observations with instruments, the new instrument is an inverse propensity score weighted combination of the original instrument and an imputation. The estimation then proceeds by using the generated instruments. Asymptotic theorems are established. The new estimator attains the semiparametric efficiency bound. It is also less biased compared to existing procedures in the simulations. As an illustrative example, we use the NLSYM data set in which IQ scores are partially missing, and demonstrate that by adopting the new methodology the return to education is larger and more precisely estimated compared to standard complete case methods. In the second chapter, we provide Lasso-type of procedures for reduced form regression with many missing instruments. The methodology takes two steps. In the first step, we generate a rich instrument set from the many missing instruments and other observed data. In the second step, IV estimation is conduced based on the generated instrument set. Specifically, the (very) many generated instruments are used to approximate a “pseudo” optimal instrument in the reduced form regression. The approach has been shown to have efficiency gains compared to the generated instrument estimator developed in the first chapter. We also compare the finite sample behavior of the new estimator with other Lasso estimator and demonstrate the good performance of the proposed estimator in the Monte Carlo experiments. The third chapter estimates individual treatment effects in a triangular model with binary–valued endogenous treatments. This chapter is based on the previous joint work with Quang Vuong and Haiqing Xu. Following the identification strategy established in (Vuong and Xu, forthcoming), we propose a two-stage estimation approach. First, we estimate the counterfactual outcome and hence the individual treatment effect (ITE) for every observational unit in the sample. Second, we estimate the density of individual treatment effects in the population. Our estimation method does not suffer from the ill-posed inverse problem associated with inverting a non–linear functional. Asymptotic properties of the proposed method are established. We study its finite sample properties in Monte Carlo experiments. We also illustrate our approach with an empirical application assessing the effects of 401(k) retirement programs on personal savings. Our results show that there exists a small but statistically significant proportion of individuals who experience negative effects, although the majority of ITEs is positive.



LCSH Subject Headings