Skip to content

Commit

Permalink
Finished solutions written explanations
Browse files Browse the repository at this point in the history
  • Loading branch information
kylebutts committed May 18, 2022
1 parent 9db7aaa commit e6c8639
Show file tree
Hide file tree
Showing 5 changed files with 95 additions and 129 deletions.
8 changes: 3 additions & 5 deletions Lab/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This lab will walk through some basic SSIV analyses using data from [Autor, Dorn

## Exercises:

1. Construct the ADH (location-by-year) instrument by appropriately combining the data on shocks and shares. Merge this into the `adh_noIV` dataset, and estimate an IV regression of the outcome onto the treatment which controls for year (i.e. the `post` variable) and weights by baseline total employment (the `weight` variable), clustering by `state`. Then estimate the exact same IV regression replacing the outcome `y` with the lagged outcome `y_lag`, capturing growth in manufacturing employment that took place before the ADH "China Shock" quasi-experiment. Comment on the difference in the two IV regression coefficients.
1. Construct the ADH (location-by-year) instrument by appropriately combining the data on shocks and shares. Merge this into the `adh_noIV` dataset, and estimate an IV regression of the outcome onto the treatment which controls for year (i.e. the `post` variable) and weights by baseline total employment (the `weight` variable), clustering by `state`. Then estimate the exact same IV regression replacing the outcome `y` with the lagged outcome `y_lag`, capturing growth in manufacturing employment that took place before the ADH "China Shock" quasi-experiment. How does the latter IV regression help build support for the former IV regression?

*Main IV Estimate:*
*Standard Error:*
Expand All @@ -24,7 +24,7 @@ This lab will walk through some basic SSIV analyses using data from [Autor, Dorn



2. Construct the "sum-of-shares" control from the `adh_shares` dataset and add this control to both of the previous IV regressions. Comment on how the main IV estimate changes.
2. Construct the "sum-of-shares" control from the `adh_shares` dataset and add this control to both of the previous IV regressions. How does the main IV estimate change? Why, intuitively, is this control important to include?

*Main IV Estimate:*
*Standard Error:*
Expand All @@ -34,7 +34,7 @@ This lab will walk through some basic SSIV analyses using data from [Autor, Dorn

*Comments:*

3. Interact the "sum-of-shares" control with year and add this control to both of the previous IV regressions. Comment on how both IV estimates change. Can you see why the interaction control is important?
3. Interact the "sum-of-shares" control with year and add this control to both of the previous IV regressions. How do both IV estimates change? Can you see why, intuitively, the interaction control shifts the main IV estimate so much?

*Main IV Estimate:*
*Standard Error:*
Expand All @@ -51,5 +51,3 @@ This lab will walk through some basic SSIV analyses using data from [Autor, Dorn

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*
38 changes: 8 additions & 30 deletions Lab/Solutions-R.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ This lab will walk through some basic SSIV analyses using data from [Autor, Dorn

## Exercises:

1. Construct the ADH (location-by-year) instrument by appropriately combining the data on shocks and shares. Merge this into the `adh_noIV` dataset, and estimate an IV regression of the outcome onto the treatment which controls for year (i.e. the `post` variable) and weights by baseline total employment (the `weight` variable), clustering by `state`. Then estimate the exact same IV regression replacing the outcome `y` with the lagged outcome `y_lag`, capturing growth in manufacturing employment that took place before the ADH "China Shock" quasi-experiment. Comment on the difference in the two IV regression coefficients.
1. Construct the ADH (location-by-year) instrument by appropriately combining the data on shocks and shares. Merge this into the `adh_noIV` dataset, and estimate an IV regression of the outcome onto the treatment which controls for year (i.e. the `post` variable) and weights by baseline total employment (the `weight` variable), clustering by `state`. Then estimate the exact same IV regression replacing the outcome `y` with the lagged outcome `y_lag`, capturing growth in manufacturing employment that took place before the ADH "China Shock" quasi-experiment. How does the latter IV regression help build support for the former IV regression?

```{r}
## Example solutions for SSIV Lab
Expand Down Expand Up @@ -68,18 +68,13 @@ df |>
)
```


*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*

The lag outcome IV estimate is much smaller than the main IV estimate (-0.131 vs -0.746) and statistically insignificant. This tells us that regions which would get a large “china shock” in the post period are not on differential outcome trends in the pre period, building support for the validity of the instrument. To do this comparison properly we should use exposure-robust standard errors, i.e. by the ssaggregate command used below. But as you’ll see below the standard errors are not too different this way.



2. Construct the "sum-of-shares" control from the `adh_shares` dataset and add this control to both of the previous IV regressions. Comment on how the main IV estimate changes.
2. Construct the "sum-of-shares" control from the `adh_shares` dataset and add this control to both of the previous IV regressions. How does the main IV estimate change? Why, intuitively, is this control important to include?

```{r}
# Add sum of shares control
Expand Down Expand Up @@ -107,17 +102,12 @@ df |>
)
```

*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*

The sum-of-shares control should be included because ADH is a setting with "incomplete shares" (i.e. the sum of shares I not constant across location-years). Without this control the SSIV will be using both the variation in shocks across industries and the average “size” of the shock through the sum of shares (unless the shocks are mean-zero, which you can see they are not).


3. Interact the "sum-of-shares" control with year and add this control to both of the previous IV regressions. Comment on how both IV estimates change. Can you see why the interaction control is important?
3. Interact the "sum-of-shares" control with year and add this control to both of the previous IV regressions. How do both IV estimates change? Can you see why, intuitively, the interaction control shifts the main IV estimate so much?

```{r}
# SSIV with Sum of Shares x Year
Expand All @@ -142,13 +132,9 @@ adh_shocks |>
feols(shock ~ i(year), cluster = ~industry)
```

*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*
Interacting the sum-of-shares control with year isolates the within-year variation in shocks. To see this take the year fixed effects as the industry-level "q_n" discussed in class and note that to leverage this control we need to control for sum_n (s_lnt*q_n) = sum_nt (s_ln)*period_t in the location-year regression. You can see that the shock mean is quite different across periods (in the post period the average shock is significantly larger) such that isolating the within-period variation makes a difference – without this control the SSIV is using both within- and across-period shock variation, and the economic conditions in the two periods are quite different (causing OVB).


4. Use the *ssaggregate* command to run both of the previous IV regressions at the shock level. You should control for year fixed effects in the shock-level IV regressions. The coefficients should be identical to the previous estimates, but the standard errors will be different. Comment on the change.

Expand Down Expand Up @@ -180,12 +166,4 @@ ssagg_df |>
feols(y_lag ~ 1 | year | x ~ shock, weights = ~s_n, cluster = ~industry)
```

*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*


70 changes: 36 additions & 34 deletions Lab/Solutions-R.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@ setup:
variable), clustering by `state`. Then estimate the exact same IV
regression replacing the outcome `y` with the lagged outcome
`y_lag`, capturing growth in manufacturing employment that took
place before the ADH “China Shock” quasi-experiment. Comment on the
difference in the two IV regression coefficients.
place before the ADH “China Shock” quasi-experiment. How does the
latter IV regression help build support for the former IV
regression?

``` r
## Example solutions for SSIV Lab
Expand Down Expand Up @@ -132,17 +133,21 @@ df |>
F-test (1st stage), x: stat = 1,146.1, p < 2.2e-16 , on 1 and 1,441 DoF.
Wu-Hausman: stat = 11.2, p = 8.305e-4, on 1 and 1,440 DoF.

*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*

The lag outcome IV estimate is much smaller than the main IV estimate
(-0.131 vs -0.746) and statistically insignificant. This tells us that
regions which would get a large “china shock” in the post period are not
on differential outcome trends in the pre period, building support for
the validity of the instrument. To do this comparison properly we should
use exposure-robust standard errors, i.e. by the ssaggregate command
used below. But as you’ll see below the standard errors are not too
different this way.

2. Construct the “sum-of-shares” control from the `adh_shares` dataset
and add this control to both of the previous IV regressions. Comment
on how the main IV estimate changes.
and add this control to both of the previous IV regressions. How
does the main IV estimate change? Why, intuitively, is this control
important to include?

``` r
# Add sum of shares control
Expand Down Expand Up @@ -208,18 +213,19 @@ df |>
F-test (1st stage), x: stat = 646.1 , p < 2.2e-16 , on 1 and 1,440 DoF.
Wu-Hausman: stat = 0.138437, p = 0.709894, on 1 and 1,439 DoF.

*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*

The sum-of-shares control should be included because ADH is a setting
with “incomplete shares” (i.e. the sum of shares I not constant across
location-years). Without this control the SSIV will be using both the
variation in shocks across industries and the average “size” of the
shock through the sum of shares (unless the shocks are mean-zero, which
you can see they are not).

3. Interact the “sum-of-shares” control with year and add this control
to both of the previous IV regressions. Comment on how both IV
estimates change. Can you see why the interaction control is
important?
to both of the previous IV regressions. How do both IV estimates
change? Can you see why, intuitively, the interaction control shifts
the main IV estimate so much?

``` r
# SSIV with Sum of Shares x Year
Expand Down Expand Up @@ -285,13 +291,17 @@ adh_shocks |>
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 28.0 Adj. R2: 0.046361

*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*
*Comments:* Interacting the sum-of-shares control with year isolates the
within-year variation in shocks. To see this take the year fixed effects
as the industry-level “q_n” discussed in class and note that to leverage
this control we need to control for sum_n (s_lnt*q_n) = sum_nt
(s_ln)*period_t in the location-year regression. You can see that the
shock mean is quite different across periods (in the post period the
average shock is significantly larger) such that isolating the
within-period variation makes a difference – without this control the
SSIV is using both within- and across-period shock variation, and the
economic conditions in the two periods are quite different (causing
OVB).

4. Use the *ssaggregate* command to run both of the previous IV
regressions at the shock level. You should control for year fixed
Expand Down Expand Up @@ -354,11 +364,3 @@ ssagg_df |>
Within R2: 0.004348
F-test (1st stage), x: stat = 154.8 , p < 2.2e-16 , on 1 and 791 DoF.
Wu-Hausman: stat = 0.63358, p = 0.426284, on 1 and 790 DoF.

*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*
34 changes: 9 additions & 25 deletions Lab/Solutions-Stata.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ This lab will walk through some basic SSIV analyses using data from [Autor, Dorn

## Exercises:

1. Construct the ADH (location-by-year) instrument by appropriately combining the data on shocks and shares. Merge this into the `adh_noIV` dataset, and estimate an IV regression of the outcome onto the treatment which controls for year (i.e. the `post` variable) and weights by baseline total employment (the `weight` variable), clustering by `state`. Then estimate the exact same IV regression replacing the outcome `y` with the lagged outcome `y_lag`, capturing growth in manufacturing employment that took place before the ADH "China Shock" quasi-experiment. Comment on the difference in the two IV regression coefficients.
1. Construct the ADH (location-by-year) instrument by appropriately combining the data on shocks and shares. Merge this into the `adh_noIV` dataset, and estimate an IV regression of the outcome onto the treatment which controls for year (i.e. the `post` variable) and weights by baseline total employment (the `weight` variable), clustering by `state`. Then estimate the exact same IV regression replacing the outcome `y` with the lagged outcome `y_lag`, capturing growth in manufacturing employment that took place before the ADH "China Shock" quasi-experiment. How does the latter IV regression help build support for the former IV regression?

```{stata, collectcode = T}
/*****************************************/
Expand Down Expand Up @@ -54,16 +54,13 @@ ivreg2 y (x=z) post [aw=weight], cluster(state)
ivreg2 y_lag (x=z) post [aw=weight], cluster(state)
```

*Comments:*

*Main IV Estimate:*
*Standard Error:*
The lag outcome IV estimate is much smaller than the main IV estimate (-0.131 vs -0.746) and statistically insignificant. This tells us that regions which would get a large “china shock” in the post period are not on differential outcome trends in the pre period, building support for the validity of the instrument. To do this comparison properly we should use exposure-robust standard errors, i.e. by the ssaggregate command used below. But as you’ll see below the standard errors are not too different this way.

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*

2. Construct the "sum-of-shares" control from the `adh_shares` dataset and add this control to both of the previous IV regressions. Comment on how the main IV estimate changes.
2. Construct the "sum-of-shares" control from the `adh_shares` dataset and add this control to both of the previous IV regressions. How does the main IV estimate change? Why, intuitively, is this control important to include?

```{stata, collectcode = T}
Expand All @@ -88,15 +85,12 @@ ivreg2 y (x=z) post sum_share [aw=weight], cluster(state)
ivreg2 y_lag (x=z) post sum_share [aw=weight], cluster(state)
```

*Main IV Estimate:*
*Standard Error:*
*Comments:*

*Lag Outcome IV Estimate:*
*Standard Error:*
The sum-of-shares control should be included because ADH is a setting with "incomplete shares" (i.e. the sum of shares I not constant across location-years). Without this control the SSIV will be using both the variation in shocks across industries and the average “size” of the shock through the sum of shares (unless the shocks are mean-zero, which you can see they are not).

*Comments:*

3. Interact the "sum-of-shares" control with year and add this control to both of the previous IV regressions. Comment on how both IV estimates change. Can you see why the interaction control is important?
3. Interact the "sum-of-shares" control with year and add this control to both of the previous IV regressions. How do both IV estimates change? Can you see why, intuitively, the interaction control shifts the main IV estimate so much?

```{stata, collectcode = T}
/* Interact sum of shares with year */
Expand All @@ -120,13 +114,10 @@ reg shock year, cluster(industry)
restore
```

*Main IV Estimate:*
*Standard Error:*
*Comments:*

*Lag Outcome IV Estimate:*
*Standard Error:*
Interacting the sum-of-shares control with year isolates the within-year variation in shocks. To see this take the year fixed effects as the industry-level "q_n" discussed in class and note that to leverage this control we need to control for sum_n (s_lnt*q_n) = sum_nt (s_ln)*period_t in the location-year regression. You can see that the shock mean is quite different across periods (in the post period the average shock is significantly larger) such that isolating the within-period variation makes a difference – without this control the SSIV is using both within- and across-period shock variation, and the economic conditions in the two periods are quite different (causing OVB).

*Comments:*

4. Use the *ssaggregate* command to run both of the previous IV regressions at the shock level. You should control for year fixed effects in the shock-level IV regressions. The coefficients should be identical to the previous estimates, but the standard errors will be different. Comment on the change.

Expand All @@ -148,12 +139,5 @@ ivreg2 y (x=shock) i.year [aw=s_n], cluster(industry)
ivreg2 y_lag (x=shock) i.year [aw=s_n], cluster(industry)
```

*Main IV Estimate:*
*Standard Error:*

*Lag Outcome IV Estimate:*
*Standard Error:*

*Comments:*


Loading

0 comments on commit e6c8639

Please sign in to comment.