Add R versions of solutions

RuiminAo · May 17, 2022 · 4360658 · 4360658
1 parent e7c25d5
commit 4360658
Show file tree

Hide file tree

Showing 8 changed files with 749 additions and 34 deletions.
diff --git a/.gitignore b/.gitignore
@@ -48,3 +48,4 @@
 
 ## Word/Ppt Temp files
 **/~$*
+.Rproj.user
diff --git a/Lab/Instructions.md → Lab/README.md b/Lab/Instructions.md → Lab/README.md
diff --git a/Lab/Solutions-R.Rmd b/Lab/Solutions-R.Rmd
@@ -0,0 +1,191 @@
+---
+output: github_document
+---
+
+```{r setup, include = F}
+# devtools::install_github("Hemken/Statamarkdown")
+library(Statamarkdown)
+```
+
+
+# Mixtape SSIV Workshop: Coding Lab
+
+This lab will walk through some basic SSIV analyses using data from [Autor, Dorn, and Hanson](https://github.com/Mixtape-Sessions/Shift-Share/blob/main/Readings/Autor_Dorn_Hanson_2013) (ADH, 2013). As discussed in lecture, ADH use a shift-share instrument aggregating Chinese import shocks across 397 manufacturing instruments with exposure weights calculated as the economy-wide share of (baseline) industry employment. We will use three cleaned datasets from their setup:
+
+- `adh_shocks.dta`: an industry-by-year dataset of the shocks
+
+- `adh_shares.dta`: a location-by-industry-by-year dataset of the shares
+
+- `adh_noIV.dta`: a location-by-year dataset of the main outcome (manufacturing employment growth, `y`), treatment (local growth of China import exposure, `x`), and other useful variables -- excluding the ADH instrument
+
+
+## Exercises:
+
+1. Construct the ADH (location-by-year) instrument by appropriately combining the data on shocks and shares. Merge this into the `adh_noIV` dataset, and estimate an IV regression of the outcome onto the treatment which controls for year (i.e. the `post` variable) and weights by baseline total employment (the `weight` variable), clustering by `state`. Then estimate the exact same IV regression replacing the outcome `y` with the lagged outcome `y_lag`, capturing growth in manufacturing employment that took place before the ADH "China Shock" quasi-experiment. Comment on the difference in the two IV regression coefficients.
+
+```{r}
+## Example solutions for SSIV Lab
+## Written by Peter Hull & Kyle Butts
+## 5/16/2022 (v1)
+
+devtools::install_github("kylebutts/ssaggregate")
+library(tidyverse) # Data Cleaning
+library(fixest) # Regressions
+library(here) # Relative file paths
+library(haven) # Reading .dta
+library(ssaggregate)
+
+# Construct z
+adh_shares <- haven::read_dta(here("Lab/adh_shares.dta"))
+adh_shocks <- haven::read_dta(here("Lab/adh_shocks.dta"))
+df <- haven::read_dta(here("Lab/adh_noIV.dta"))
+
+df <- adh_shares |> 
+  # Merge shares with shocks
+  left_join(adh_shocks, by=c("industry", "year")) |> 
+  # Create \sum (shock * share)
+  mutate(z = ind_share * shock) |> 
+  group_by(location, year) |> 
+  summarize(z = sum(z)) |> 
+  # Merge with no IV dataset
+  left_join(df, by = c("location", "year"))
+
+```
+
+```{r}
+
+# Basic SSIV and balance 
+df |> 
+  feols(
+    y ~ post | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+```{r}
+df |> 
+  feols(
+    y_lag ~ post | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+
+*Main IV Estimate:*                      
+*Standard Error:*
+
+*Lag Outcome IV Estimate:*               
+*Standard Error:*
+
+*Comments:*
+
+
+
+2.  Construct the "sum-of-shares" control from the `adh_shares` dataset and add this control to both of the previous IV regressions. Comment on how the main IV estimate changes.
+
+```{r}
+# Add sum of shares control
+df <- adh_shares |> 
+  group_by(location, year) |> 
+  summarize(sum_share = sum(ind_share)) |> 
+  left_join(df, by=c("location", "year"))
+
+summary(df$sum_share)
+```
+
+```{r}
+# SSIV with Sum of Shares 
+df |> 
+  feols(
+    y ~ post + sum_share | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+```{r}
+# Balance Test with Sum of Shares 
+df |> 
+  feols(
+    y_lag ~ post + sum_share | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+*Main IV Estimate:*                      
+*Standard Error:*
+
+*Lag Outcome IV Estimate:*               
+*Standard Error:*
+
+*Comments:*
+
+
+
+3.  Interact the "sum-of-shares" control with year and add this control to both of the previous IV regressions. Comment on how both IV estimates change. Can you see why the interaction control is important?
+
+```{r}
+# SSIV with Sum of Shares x Year
+df |> 
+  feols(
+    y ~ post + i(post, sum_share) | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+```{r}
+# Balance Test with Sum of Shares x Year
+df |> 
+  feols(
+    y_lag ~ post + i(post, sum_share) | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+
+```{r}
+# Check why sum of shares matters
+adh_shocks |> 
+  feols(shock ~ i(year), cluster = ~industry)
+```
+
+*Main IV Estimate:*                      
+*Standard Error:*
+
+*Lag Outcome IV Estimate:*               
+*Standard Error:*
+
+*Comments:*
+
+4.  Use the *ssaggregate* command to run both of the previous IV regressions at the shock level. You should control for year fixed effects in the shock-level IV regressions. The coefficients should be identical to the previous estimates, but the standard errors will be different. Comment on the change.
+
+```{r}
+# Get exposure-robust SEs with ssaggregate 
+ssagg <- ssaggregate(
+  data = df,
+  vars = ~ y + x + y_lag,
+  controls = ~ post + i(post, sum_share),
+  weights = "weight",
+  n = "industry",
+  l = "location",
+  t = "year",
+  s = "ind_share",
+  shares = adh_shares
+)
+
+ssagg_df <- ssagg |> 
+  left_join(adh_shocks, by = c("industry", "year"))
+```
+
+```{r}
+ssagg_df |> 
+  feols(y ~ 1 | year | x ~ shock, weights = ~s_n, cluster = ~industry)
+```
+
+```{r}
+ssagg_df |> 
+  feols(y_lag ~ 1 | year | x ~ shock, weights = ~s_n, cluster = ~industry)
+```
+
+*Main IV Estimate:*                      
+*Standard Error:*
+
+*Lag Outcome IV Estimate:*               
+*Standard Error:*
+
+*Comments:*
+
+
Original file line number	Diff line number	Diff line change
Expand Up		@@ -48,3 +48,4 @@

		## Word/Ppt Temp files
		*/~$
		.Rproj.user