Merge branch 'main' of https://github.com/Mixtape-Sessions/Shift-Share

yizhz · Sep 22, 2023 · 54eb21c · 54eb21c
2 parents a53c1fa + 9476836
commit 54eb21c
Show file tree

Hide file tree

Showing 7 changed files with 1,416 additions and 0 deletions.
diff --git a/Lab/Solutions/Solutions-R.Rmd b/Lab/Solutions/Solutions-R.Rmd
@@ -0,0 +1,169 @@
+---
+output: github_document
+---
+
+```{r setup, include = F}
+# devtools::install_github("Hemken/Statamarkdown")
+library(Statamarkdown)
+```
+
+
+# Mixtape SSIV Workshop: Coding Lab
+
+This lab will walk through some basic SSIV analyses using data from [Autor, Dorn, and Hanson](https://github.com/Mixtape-Sessions/Shift-Share/blob/main/Readings/Autor_Dorn_Hanson_2013) (ADH, 2013). As discussed in lecture, ADH use a shift-share instrument aggregating Chinese import shocks across 397 manufacturing instruments with exposure weights calculated as the economy-wide share of (baseline) industry employment. We will use three cleaned datasets from their setup:
+
+- `adh_shocks.dta`: an industry-by-year dataset of the shocks
+
+- `adh_shares.dta`: a location-by-industry-by-year dataset of the shares
+
+- `adh_noIV.dta`: a location-by-year dataset of the main outcome (manufacturing employment growth, `y`), treatment (local growth of China import exposure, `x`), and other useful variables -- excluding the ADH instrument
+
+
+## Exercises:
+
+1. Construct the ADH (location-by-year) instrument by appropriately combining the data on shocks and shares. Merge this into the `adh_noIV` dataset, and estimate an IV regression of the outcome onto the treatment which controls for year (i.e. the `post` variable) and weights by baseline total employment (the `weight` variable), clustering by `state`. Then estimate the exact same IV regression replacing the outcome `y` with the lagged outcome `y_lag`, capturing growth in manufacturing employment that took place before the ADH "China Shock" quasi-experiment. How does the latter IV regression help build support for the former IV regression?
+
+```{r}
+## Example solutions for SSIV Lab
+## Written by Peter Hull & Kyle Butts
+## 5/16/2022 (v1)
+
+devtools::install_github("kylebutts/ssaggregate")
+library(tidyverse) # Data Cleaning
+library(fixest) # Regressions
+library(here) # Relative file paths
+library(haven) # Reading .dta
+library(ssaggregate)
+
+# Construct z
+adh_shares <- haven::read_dta(here("Lab/adh_shares.dta"))
+adh_shocks <- haven::read_dta(here("Lab/adh_shocks.dta"))
+df <- haven::read_dta(here("Lab/adh_noIV.dta"))
+
+df <- adh_shares |> 
+  # Merge shares with shocks
+  left_join(adh_shocks, by=c("industry", "year")) |> 
+  # Create \sum (shock * share)
+  mutate(z = ind_share * shock) |> 
+  group_by(location, year) |> 
+  summarize(z = sum(z)) |> 
+  # Merge with no IV dataset
+  left_join(df, by = c("location", "year"))
+
+```
+
+```{r}
+
+# Basic SSIV and balance 
+df |> 
+  feols(
+    y ~ post | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+```{r}
+df |> 
+  feols(
+    y_lag ~ post | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+*Comments:*
+
+The lag outcome IV estimate is much smaller than the main IV estimate (-0.131 vs -0.746) and statistically insignificant. This tells us that regions which would get a large “china shock” in the post period are not on differential outcome trends in the pre period, building support for the validity of the instrument. To do this comparison properly we should use exposure-robust standard errors, i.e. by the ssaggregate command used below. But as you’ll see below the standard errors are not too different this way.
+
+
+
+2.  Construct the "sum-of-shares" control from the `adh_shares` dataset and add this control to both of the previous IV regressions. How does the main IV estimate change? Why, intuitively, is this control important to include?
+
+```{r}
+# Add sum of shares control
+df <- adh_shares |> 
+  group_by(location, year) |> 
+  summarize(sum_share = sum(ind_share)) |> 
+  left_join(df, by=c("location", "year"))
+
+summary(df$sum_share)
+```
+
+```{r}
+# SSIV with Sum of Shares 
+df |> 
+  feols(
+    y ~ post + sum_share | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+```{r}
+# Balance Test with Sum of Shares 
+df |> 
+  feols(
+    y_lag ~ post + sum_share | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+*Comments:*
+
+The sum-of-shares control should be included because ADH is a setting with "incomplete shares" (i.e. the sum of shares I not constant across location-years). Without this control the SSIV will be using both the variation in shocks across industries and the average “size” of the shock through the sum of shares (unless the shocks are mean-zero, which you can see they are not).
+
+
+3.  Interact the "sum-of-shares" control with year and add this control to both of the previous IV regressions. How do both IV estimates change? Can you see why, intuitively, the interaction control shifts the main IV estimate so much?
+
+```{r}
+# SSIV with Sum of Shares x Year
+df |> 
+  feols(
+    y ~ post + i(post, sum_share) | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+```{r}
+# Balance Test with Sum of Shares x Year
+df |> 
+  feols(
+    y_lag ~ post + i(post, sum_share) | 0 | x ~ z, weights = ~weight, cluster = ~state
+  )
+```
+
+
+```{r}
+# Check why sum of shares matters
+adh_shocks |> 
+  feols(shock ~ i(year), cluster = ~industry)
+```
+
+*Comments:*
+Interacting the sum-of-shares control with year isolates the within-year variation in shocks. To see this take the year fixed effects as the industry-level "q_n" discussed in class and note that to leverage this control we need to control for sum_n (s_lnt*q_n) = sum_nt (s_ln)*period_t in the location-year regression. You can see that the shock mean is quite different across periods (in the post period the average shock is significantly larger) such that isolating the within-period variation makes a difference – without this control the SSIV is using both within- and across-period shock variation, and the economic conditions in the two periods are quite different (causing OVB).
+
+
+4.  Use the *ssaggregate* command to run both of the previous IV regressions at the shock level. You should control for year fixed effects in the shock-level IV regressions. The coefficients should be identical to the previous estimates, but the standard errors will be different. Comment on the change.
+
+```{r}
+# Get exposure-robust SEs with ssaggregate 
+ssagg <- ssaggregate(
+  data = df,
+  vars = ~ y + x + y_lag,
+  controls = ~ post + i(post, sum_share),
+  weights = "weight",
+  n = "industry",
+  l = "location",
+  t = "year",
+  s = "ind_share",
+  shares = adh_shares
+)
+
+ssagg_df <- ssagg |> 
+  left_join(adh_shocks, by = c("industry", "year"))
+```
+
+```{r}
+ssagg_df |> 
+  feols(y ~ 1 | year | x ~ shock, weights = ~s_n, cluster = ~industry)
+```
+
+```{r}
+ssagg_df |> 
+  feols(y_lag ~ 1 | year | x ~ shock, weights = ~s_n, cluster = ~industry)
+```
+
+