Skip to content

Commit

Permalink
Add R versions of solutions
Browse files Browse the repository at this point in the history
  • Loading branch information
kylebutts committed May 17, 2022
1 parent e7c25d5 commit 4360658
Show file tree
Hide file tree
Showing 8 changed files with 749 additions and 34 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,4 @@

## Word/Ppt Temp files
**/~$*
.Rproj.user
File renamed without changes.
191 changes: 191 additions & 0 deletions Lab/Solutions-R.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
---
output: github_document
---

```{r setup, include = F}
# devtools::install_github("Hemken/Statamarkdown")
library(Statamarkdown)
```


# Mixtape SSIV Workshop: Coding Lab

This lab will walk through some basic SSIV analyses using data from [Autor, Dorn, and Hanson](https://github.com/Mixtape-Sessions/Shift-Share/blob/main/Readings/Autor_Dorn_Hanson_2013) (ADH, 2013). As discussed in lecture, ADH use a shift-share instrument aggregating Chinese import shocks across 397 manufacturing instruments with exposure weights calculated as the economy-wide share of (baseline) industry employment. We will use three cleaned datasets from their setup:

- `adh_shocks.dta`: an industry-by-year dataset of the shocks

- `adh_shares.dta`: a location-by-industry-by-year dataset of the shares

- `adh_noIV.dta`: a location-by-year dataset of the main outcome (manufacturing employment growth, `y`), treatment (local growth of China import exposure, `x`), and other useful variables -- excluding the ADH instrument


## Exercises:

1. Construct the ADH (location-by-year) instrument by appropriately combining the data on shocks and shares. Merge this into the `adh_noIV` dataset, and estimate an IV regression of the outcome onto the treatment which controls for year (i.e. the `post` variable) and weights by baseline total employment (the `weight` variable), clustering by `state`. Then estimate the exact same IV regression replacing the outcome `y` with the lagged outcome `y_lag`, capturing growth in manufacturing employment that took place before the ADH "China Shock" quasi-experiment. Comment on the difference in the two IV regression coefficients.

```{r}
## Example solutions for SSIV Lab
## Written by Peter Hull & Kyle Butts
## 5/16/2022 (v1)
devtools::install_github("kylebutts/ssaggregate")
library(tidyverse) # Data Cleaning
library(fixest) # Regressions
library(here) # Relative file paths
library(haven) # Reading .dta
library(ssaggregate)
# Construct z
adh_shares <- haven::read_dta(here("Lab/adh_shares.dta"))
adh_shocks <- haven::read_dta(here("Lab/adh_shocks.dta"))
df <- haven::read_dta(here("Lab/adh_noIV.dta"))
df <- adh_shares |>
# Merge shares with shocks
left_join(adh_shocks, by=c("industry", "year")) |>
# Create \sum (shock * share)
mutate(z = ind_share * shock) |>
group_by(location, year) |>
summarize(z = sum(z)) |>
# Merge with no IV dataset
left_join(df, by = c("location", "year"))
```

```{r}
# Basic SSIV and balance
df |>
feols(
y ~ post | 0 | x ~ z, weights = ~weight, cluster = ~state
)
```

```{r}
df |>
feols(
y_lag ~ post | 0 | x ~ z, weights = ~weight, cluster = ~state
)
```


*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*



2. Construct the "sum-of-shares" control from the `adh_shares` dataset and add this control to both of the previous IV regressions. Comment on how the main IV estimate changes.

```{r}
# Add sum of shares control
df <- adh_shares |>
group_by(location, year) |>
summarize(sum_share = sum(ind_share)) |>
left_join(df, by=c("location", "year"))
summary(df$sum_share)
```

```{r}
# SSIV with Sum of Shares
df |>
feols(
y ~ post + sum_share | 0 | x ~ z, weights = ~weight, cluster = ~state
)
```

```{r}
# Balance Test with Sum of Shares
df |>
feols(
y_lag ~ post + sum_share | 0 | x ~ z, weights = ~weight, cluster = ~state
)
```

*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*



3. Interact the "sum-of-shares" control with year and add this control to both of the previous IV regressions. Comment on how both IV estimates change. Can you see why the interaction control is important?

```{r}
# SSIV with Sum of Shares x Year
df |>
feols(
y ~ post + i(post, sum_share) | 0 | x ~ z, weights = ~weight, cluster = ~state
)
```

```{r}
# Balance Test with Sum of Shares x Year
df |>
feols(
y_lag ~ post + i(post, sum_share) | 0 | x ~ z, weights = ~weight, cluster = ~state
)
```


```{r}
# Check why sum of shares matters
adh_shocks |>
feols(shock ~ i(year), cluster = ~industry)
```

*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*

4. Use the *ssaggregate* command to run both of the previous IV regressions at the shock level. You should control for year fixed effects in the shock-level IV regressions. The coefficients should be identical to the previous estimates, but the standard errors will be different. Comment on the change.

```{r}
# Get exposure-robust SEs with ssaggregate
ssagg <- ssaggregate(
data = df,
vars = ~ y + x + y_lag,
controls = ~ post + i(post, sum_share),
weights = "weight",
n = "industry",
l = "location",
t = "year",
s = "ind_share",
shares = adh_shares
)
ssagg_df <- ssagg |>
left_join(adh_shocks, by = c("industry", "year"))
```

```{r}
ssagg_df |>
feols(y ~ 1 | year | x ~ shock, weights = ~s_n, cluster = ~industry)
```

```{r}
ssagg_df |>
feols(y_lag ~ 1 | year | x ~ shock, weights = ~s_n, cluster = ~industry)
```

*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*


Loading

0 comments on commit 4360658

Please sign in to comment.