DiD Summer 2022 update

edwigetia · Jul 2, 2022 · 8c2c281 · 8c2c281
1 parent f86eb75
commit 8c2c281
Show file tree

Hide file tree

Showing 16 changed files with 305 additions and 22 deletions.
diff --git a/docs/reading/04_twitter.md → archive/04_twitter.txt b/docs/reading/04_twitter.md → archive/04_twitter.txt
@@ -2,6 +2,7 @@
 layout: default
 title: Twitter threads
 parent: Resources
+image: "../../../assets/images/DiD.png"
 nav_order: 3
 ---
 
@@ -11,6 +12,9 @@ nav_order: 3
 Some interesting Twitter threads chronologically sorted and in no particular order of importance. Sometimes, really important exchanges take place online that provide great insights. These views are of course subject to change or become formalized later in journal articles etc. So go through these with caution or just take them as archives of the DiD development on Twitter:
 
 
+<blockquote class="twitter-tweet"><p lang="en" dir="ltr">I don’t really get why year FEs don’t mess with DiD estimation when post is year based, anyone have a good explanation?</p>&mdash; Katie Gutiérrez 🏳️‍🌈 (@katie_mo_) <a href="https://twitter.com/katie_mo_/status/1542646346913484800?ref_src=twsrc%5Etfw">June 30, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
+
+
 <br>
 
 <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Calling the DID experts <a href="https://twitter.com/jondr44?ref_src=twsrc%5Etfw">@jondr44</a>, <a href="https://twitter.com/pedrohcgs?ref_src=twsrc%5Etfw">@pedrohcgs</a>, <a href="https://twitter.com/borusyak?ref_src=twsrc%5Etfw">@borusyak</a>, <a href="https://twitter.com/agoodmanbacon?ref_src=twsrc%5Etfw">@agoodmanbacon</a>, <a href="https://twitter.com/jmwooldridge?ref_src=twsrc%5Etfw">@jmwooldridge</a>: I have a very unusual panel dataset with incomplete and irregular periods (both within and across units), staggered rollout, and strong imbalance. It looks like this: <a href="https://t.co/R8Xg7xUXBf">pic.twitter.com/R8Xg7xUXBf</a></p>&mdash; David Schönholzer (@davidfromterra) <a href="https://twitter.com/davidfromterra/status/1491818086340694021?ref_src=twsrc%5Etfw">February 10, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

diff --git a/assets/images/DiD.png b/assets/images/DiD.png
diff --git a/assets/images/DiD.jpg → assets/images/DiD_old.jpg b/assets/images/DiD.jpg → assets/images/DiD_old.jpg
diff --git a/assets/images/DiD_old.png b/assets/images/DiD_old.png
diff --git a/assets/images/allestimators.png b/assets/images/allestimators.png
diff --git a/assets/images/allestimators2.png b/assets/images/allestimators2.png
diff --git a/docs/01_stata.md b/docs/01_stata.md
@@ -1,6 +1,7 @@
 ---
 layout: default
 title: Stata packages
+image: "../../../assets/images/DiD.png"
 nav_order: 2
 ---
 

diff --git a/docs/02_R.md b/docs/02_R.md
@@ -1,6 +1,7 @@
 ---
 layout: default
 title: R packages
+image: "../../../assets/images/DiD.png"
 nav_order: 3
 ---
 

diff --git a/docs/03_julialang.md b/docs/03_julialang.md
@@ -1,6 +1,7 @@
 ---
 layout: default
 title: Julia packages
+image: "../../../assets/images/DiD.png"
 nav_order: 4
 ---
 

diff --git a/docs/code/06_code.md b/docs/code/06_code.md
@@ -5,7 +5,7 @@ nav_order: 6
 permalink: /docs/code
 has_children: true
 mathjax: true
-image: "../../../assets/images/DiD.jpg"
+image: "../../../assets/images/DiD.png"
 ---
 
 # Stata code

diff --git a/docs/code/06_combined.md b/docs/code/06_combined.md
@@ -0,0 +1,269 @@
+---
+layout: default
+title: All estimators
+parent: Stata code
+nav_order: 10
+mathjax: true
+image: "../../../assets/images/DiD.png"
+---
+
+# All estimators
+{: .no_toc }
+
+## Table of contents
+{: .no_toc .text-delta }
+
+1. TOC
+{:toc}
+
+---
+
+# Comparing the estimators
+
+This example follows the [five estimators](https://github.com/borusyak/did_imputation/blob/main/five_estimators_example.png) code that utilizes the `event_plot` command. In this example, we will use the same code structure we have been using in the individual sections above. So let's get started.
+
+## Step 1: Create all the variables for all the DiD packages
+
+```applescript
+clear
+clear matrix
+set scheme white_tableau
+
+
+local units = 30
+local start = 1
+local end 	= 60
+
+local time = `end' - `start' + 1
+local obsv = `units' * `time'
+set obs `obsv'
+
+egen id	   = seq(), b(`time')  
+egen t 	   = seq(), f(`start') t(`end') 	
+
+sort  id t
+xtset id t
+
+
+gen Y 	   		= 0		// outcome variable	
+gen D 	   		= 0		// intervention variable
+gen cohort      = .  	// treatment cohort
+gen effect      = .		// treatment effect size
+gen first_treat = .		// when the treatment happens for each cohort
+gen rel_time	= .     // time - first_treat
+
+
+set seed 20211222
+
+
+// determine the number of cohorts and assign them to IDs
+levelsof id, local(lvls)
+foreach x of local lvls {
+	local chrt = runiformint(0,5)	
+	replace cohort = `chrt' if id==`x'
+}
+
+
+// for each cohort determine the timing and treatment effect
+levelsof cohort, local(lvls)  
+foreach x of local lvls {
+	
+	local eff = runiformint(2,10)
+		replace effect = `eff' if cohort==`x'
+			
+	local timing = runiformint(`start',`end' + 20)	// 
+	replace first_treat = `timing' if cohort==`x'
+	replace first_treat = . if first_treat > `end'
+		replace D = 1 if cohort==`x' & t>= `timing' 
+}
+
+
+
+replace rel_time = t - first_treat   // relative time
+replace Y = id + t + cond(D==1, effect * rel_time, 0) + rnormal()  // treatment effect
+
+
+// derive the various variables for various estimators
+	*** leads
+	cap drop F_*
+	forval x = 2/10 {  
+		gen F_`x' = rel_time == -`x'
+	}
+
+	
+	*** lags
+	cap drop L_*
+	forval x = 0/10 {
+		gen L_`x' = rel_time ==  `x'
+	}
+	
+
+	
+gen never_treat = first_treat==.  // never treated group
+
+sum first_treat
+gen last_cohort = first_treat==r(max) // last treated 
+
+
+gen gvar = first_treat
+recode gvar (. = 0)     
+  
+```
+
+This gives us the same graph we have been using for all our examples:
+
+```applescript
+xtline Y, overlay legend(off)
+```
+
+<img src="../../../assets/images/test_data.png" height="300">
+
+
+## Step 2: Run the package and store the packages
+
+
+
+```applescript
+************
+*** TWFE ***
+************
+
+reghdfe Y L_* F_*, absorb(id t) cluster(i)
+
+estimates store twfe 
+
+
+*************
+*** csdid ***
+*************
+
+
+csdid Y, ivar(id) time(t) gvar(gvar) notyet
+
+estat event, window(-10 10) estore(csdd) 
+
+
+
+***********************
+*** did_imputation  ***
+***********************
+
+
+did_imputation Y i t first_treat, horizons(0/10) pretrend(10) minn(0) 
+
+estimates store didimp	
+	
+
+***********************
+*** did_multiplegt  ***
+***********************
+
+did_multiplegt Y id t D, robust_dynamic dynamic(10) placebo(10) breps(2) cluster(id)
+
+
+matrix didmgt_b = e(estimates) 
+matrix didmgt_v = e(variances)
+
+
+
+*****************************
+***  eventstudyinteract   ***
+*****************************
+
+
+eventstudyinteract Y L_* F_*, vce(cluster id) absorb(id t) cohort(first_treat) control_cohort(never_treat)	
+
+matrix evtstint_b = e(b_iw) 
+matrix evtstint_v = e(V_iw)
+
+
+*****************************		
+*** did2s (Gardner 2021)  ***
+*****************************
+
+did2s Y, first_stage(id t) second_stage(F_* L_*) treatment(D) cluster(id)
+
+matrix did2s_b = e(b)
+matrix did2s_v = e(V)
+
+******************
+*** stackedev  ***
+******************
+
+gen no_treat = first_treat==.			
+
+	// leads
+	cap drop F_*
+	forval x = 1/10 {  
+		gen     F_`x' = rel_time == -`x'
+		replace F_`x' = 0 if no_treat==1
+	}
+
+	
+	//lags
+	cap drop L_*
+	forval x = 0/10 {
+		gen     L_`x' = rel_time ==  `x'
+		replace L_`x' = 0 if no_treat==1
+	}
+	
+	ren F_1 ref  // reference year
+	
+stackedev Y F_* L_* ref, cohort(first_treat) time(t) never_treat(no_treat) unit_fe(id) clust_unit(id)
+	
+	
+matrix stackedev_b = e(b)
+matrix stackedev_v = e(V)	
+```
+
+
+## Step 3: Put all the estimators together
+
+Here we also make use of the colorpalettes package (`ssc install palettes, replace` and `ssc install colrspace, replace`) to control the color.
+
+```applescript
+colorpalette tableau, nograph	
+
+event_plot 	  twfe	csdd 	didimp 	didmgt_b#didmgt_v 	evtstint_b#evtstint_v 	did2s_b#did2s_v		stackedev_b#stackedev_v	, 	///
+	stub_lag( L_# 	Tp# 	tau# 	Effect_#  			L_#						L_#  				L_# ) 		///
+	stub_lead(F_# 	Tm# 	pre# 	Placebo_#   		F_#						F_# 				F_# )		///
+		together perturb(-0.30(0.10)0.30) trimlead(5) noautolegend 									///
+		plottype(scatter) ciplottype(rspike)  														///
+			lag_opt1(msymbol(+)    mlwidth(0.3) color(black)) 		lag_ci_opt1(color(black)	 lw(0.1)) 	///
+			lag_opt2(msymbol(lgx)  mlwidth(0.3) color("`r(p1)'")) 	lag_ci_opt2(color("`r(p1)'") lw(0.1)) 	///
+			lag_opt3(msymbol(Dh)   mlwidth(0.3) color("`r(p2)'")) 	lag_ci_opt3(color("`r(p2)'") lw(0.1)) 	///
+			lag_opt4(msymbol(Th)   mlwidth(0.3) color("`r(p3)'")) 	lag_ci_opt4(color("`r(p3)'") lw(0.1)) 	///
+			lag_opt5(msymbol(Sh)   mlwidth(0.3) color("`r(p4)'")) 	lag_ci_opt5(color("`r(p4)'") lw(0.1)) 	///
+			lag_opt6(msymbol(Oh)   mlwidth(0.3) color("`r(p5)'")) 	lag_ci_opt6(color("`r(p5)'") lw(0.1)) 	///	 
+			lag_opt7(msymbol(V)    mlwidth(0.3) color("`r(p6)'")) 	lag_ci_opt7(color("`r(p6)'") lw(0.1)) 	///		
+					graph_opt(												///
+								title("DiD estimates") 						///
+								xtitle("") 									///
+								ytitle("Average effect") xlabel(-5(1)10)	///
+								legend(order(1 "TWFE" 3 "csdid (CS 2020)" 5 "did_imputation (BJS 2021)" 7 "did_multiplegt (CD 2020)"  9 "eventstudyinteract (SA 2020)" 11 "did2s (G 2021)" 13 "stackedev (CDLZ 2019)" ) pos(6) rows(2) region(style(none))) 	///
+								xline(-0.5, lc(gs8) lp(dash)) ///
+								yline(   0, lc(gs8) lp(dash)) ///
+							) 
+```
+
+
+<img src="../../../assets/images/allestimators.png" height="300">
+
+
+
+The graph above has some interesting elements. First the TWFE model is clearly wrong. But so are `eventstudyinteract` and `stackedev`. All the other estimators give us estimates that are roughly close to the true values (*to be added*). So why do the two packages end up like this? I have no idea! I tested for a bunch of different options but the results stay roughly the same. Two reasons could be that (a) the estimation itself is not fully correcting for heterogenous treatments, and (b) the coding of the command is not correctly capuring heterogenous treatments. 
+
+But if there is an error in the code, then please report it together with the corrections if possible. You can also try different seeds, different cohorts, and different treatment timings and magnitudes and check how the graphs vary. If we throw out the wrong estimators, we can see the results of the remaining packages as follows:
+
+<img src="../../../assets/images/allestimators2.png" height="300">
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/code/06_twfe.md b/docs/code/06_twfe.md
@@ -4,7 +4,7 @@ title: TWFE
 parent: Stata code
 nav_order: 1
 mathjax: true
-image: "../../../assets/images/DiD.jpg"
+image: "../../../assets/images/DiD.png"
 ---
 
 # The Twoway Fixed Effects (TWFE) model

diff --git a/docs/reading/04_literature.md b/docs/reading/04_literature.md
@@ -2,6 +2,7 @@
 layout: default
 title: Literature
 parent: Resources
+image: "../../../assets/images/DiD.png"
 nav_order: 1
 ---
 
@@ -73,6 +74,14 @@ Carolina Caetano, Brantly Callaway, Stroud Payne, Hugo Sant'Anna Rodrigues (2022
 
 Clément de Chaisemartin, Xavier d'Haultfoeuille, Félix Pasquier, Gonzalo Vazquez-Bare (2022). [Difference-in-Differences Estimators for Treatments Continuously Distributed at Every Period](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4011782).
 
+
+Paul Goldsmith-Pinkham, Peter Hull & Michal Kolesár (2022). [Contamination Bias in Linear Regressions](https://www.nber.org/papers/w30108)
+
+Nandita Mitra, Jason Roy, Dylan Small (2022). [The Future of Causal Inference](https://academic.oup.com/aje/advance-article-abstract/doi/10.1093/aje/kwac108/6618833). American Journal of Epidemiology.
+
+Anna Wysocki, Katherine Lawson, Mijke Rhemtulla (2022). [Statistical Control Requires Causal Justification](https://journals.sagepub.com/doi/10.1177/25152459221095823). Advances in Methods and Practices in Psychological Science.
+
+Susanne Dandl, Torsten Hothorn, Heidi Seibold, Erik Sverdrup, Stefan Wager, Achim Zeileis (2022). [What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work?](https://arxiv.org/abs/2206.10323)
 
 ### 2021
 
@@ -96,6 +105,8 @@ Brantly Callaway, Andrew Goodman-Bacon, Pedro H.C. Sant'Anna (2021). [Difference
 
 Xavier D’Haultfoeuille, Stefan Hoderlein, Yuya Sasaki (2021). [Nonparametric Difference-in-Differences in Repeated Cross-Sections with Continuous Treatments](https://arxiv.org/abs/2104.14458).
 
+Bruno Ferman, Cristine Pinto (2021). [Synthetic Controls With Imperfect Pretreatment Fit](https://qeconomics.org/ojs/index.php/qe/article/view/1599). Quantitative Economics.
+
 [John Gardner](https://jrgcmu.github.io/) (2021). [Two-stage differences in differences](https://jrgcmu.github.io/2sdd_current.pdf).
 
 [Andrew Goodman-Bacon](http://goodman-bacon.com/) (2021). [Difference-in-differences with variation in treatment timing](https://www.sciencedirect.com/science/article/abs/pii/S0304407621001445). Journal of Econometrics.
@@ -124,6 +135,8 @@ Tymon Słoczyński (2020). [Interpreting OLS Estimands When Treatment Effects Ar
 
 ### 2019 and earlier
 
+Bruno Ferman, Cristine Pinto (2019). [Inference in Differences-in-Differences with Few Treated Groups and Heteroskedasticity](https://direct.mit.edu/rest/article-abstract/101/3/452/58517/Inference-in-Differences-in-Differences-with-Few). The Review of Economics and Statistics.
+
 [Simon Freyaldenhoven](https://simonfreyaldenhoven.github.io/), [Christian Hansen](https://www.chicagobooth.edu/faculty/directory/h/christian-b-hansen), [Jesse M. Shapiro](https://www.brown.edu/Research/Shapiro/) (2019). [Pre-event Trends in the Panel Event-Study Design](https://www.aeaweb.org/articles?id=10.1257/aer.20180609). American Economic Review.
 
 [Clément de Chaisemartin](https://sites.google.com/site/clementdechaisemartin/) [<img width="12px" src="https://cdn.jsdelivr.net/npm/simple-icons@v5/icons/twitter.svg" />](https://twitter.com/CdeChaisemartin), [Xavier D'Haultfoeuille](https://faculty.crest.fr/xdhaultfoeuille/) (2018). [Fuzzy differences-in-differences](https://academic.oup.com/restud/article-abstract/85/2/999/4096388). The Review of Economic Studies.

diff --git a/docs/reading/04_resources.md b/docs/reading/04_resources.md
@@ -3,6 +3,7 @@ layout: default
 title: Resources
 nav_order: 5
 has_children: true
+image: "../../../assets/images/DiD.png"
 permalink: /docs/resources
 ---