These code and data files replicate the results in "In-group bias in the Indian judiciary: Evidence from 5 million criminal cases" by Elliot Ash, Sam Asher, Aditi Bhowmick, Daniel Chen, Tanaya Devi, Christoph Goessman, Paul Novosad, Bilal Siddiqi (2021). A working paper version of the manuscript can be found here.
All data sources used in the paper are available in the paper's data packet.The primary data source is the recently digitized data from the eCourts platform (a semi-public system by Indian government to host summary data and full text from orders and judgements in courts across the country) on the outcomes of close to the universe of criminal cases in India from 2010-2018. The data files are separated by each year and follows the naming convention "cases_clean_20xx".
The authors have legitimate access to and permission to use the data used in this manuscript.
Dataset | Description |
---|---|
cases_clean_2010 |
The file contains data on all criminal court cases from Indian lower Judiciary from the year 2010. |
cases_clean_2011 |
The file contains data on all criminal court cases from Indian lower Judiciary from the year 2011. |
cases_clean_2012 |
The file contains data on all criminal court cases from Indian lower Judiciary from the year 2012. |
cases_clean_2013 |
The file contains data on all criminal court cases from Indian lower Judiciary from the year 2013. |
cases_clean_2014 |
The file contains data on all criminal court cases from Indian lower Judiciary from the year 2014. |
cases_clean_2015 |
The file contains data on all criminal court cases from Indian lower Judiciary from the year 2015. |
cases_clean_2016 |
The file contains data on all criminal court cases from Indian lower Judiciary from the year 2016. |
cases_clean_2017 |
The file contains data on all criminal court cases from Indian lower Judiciary from the year 2017. |
cases_clean_2018 |
The file contains data on all criminal court cases from Indian lower Judiciary from the year 2018. |
judges_clean |
The file contains data on judges in all courts in the Indian lower judiciary from the eCourts platform. |
poi_master |
The file contains data on People of India; only modules used in the data are shared. |
ACLED_India_violence_2005-2023 |
The file contains data on violent conflict and protests in India, collected by ACLED (Armed Conflict Location & Event Data) which is an independent, impartial, international non-profit organization collecting data on violent conflict and protest across the world. |
acled_districts |
The file contains keys to match the ACLED violence data to Indian districts. |
This package is designed to be run on a *nix system with Python 3.2+, Matlab 2019+, and Stata 16+ installed. Data and code folders for the replication must not include spaces. This package may require modification to run on Windows due to the use of some Unix shell commands. This package was tested on a system with about 30 GB of memory.
The file make_justice.do
describes the build and analysis process in detail.
To regenerate the tables and figures from the paper, take the following steps:
-
Download and unzip the replication data package linked at the end of this document
-
Clone this repo (github) or copy all the code into a folder.
-
Create a python environment following the package list in
requirements.yml
. For example:
conda env create -f requirements.yml -n py_justice
conda activate py_justice
- Set the following environment variables so that Python will be able to find the data and output paths. From the Unix/OSX shell (before running Stata):
export TMP=[path to working files]
export OUT=[destination path for exhibits]
export JDATA=[folder where the replication data package is unzipped]
- Open the do file
make_justice.do
, and set the globalsout
,jdata
,tmp
, andjcode
. These need to match the environment variables set in the previous step!
$out
is the target folder for all outputs, such as tables and graphs.$tmp
is the folder for the data files and temporary data files that will be created during the rebuild.$jdata
is the folder where you unzipped and saved the replication data package.$jcode
is the code folder of the clone of the replication repo
-
Run the do file
make_justice.do
. This will run through all the other do files to regenerate all of the results. -
We have included all the required programs to generate the main results. However, some of the estimation output commands (like estout) may fail if certain Stata packages are missing. These can be replaced by the estimation output commands preferred by the user.
-
Please note we use globals for pathnames, which will cause errors if filepaths have spaces in them. Please store code and data in paths that can be access without spaces in filenames.
-
This code was tested using Stata 16.0. Run time to generate all results on our server was about 8 hours.
The mapping of do files to tables and figures is as follows:
Exhibit | Code filename | Output Filename |
---|---|---|
Figure 1 | make_gender_coefplot.py | g_coef1.png, g_coef2.png, r_coef1.png, r_coef2.png |
Table 1 | judge_summary.do | judge_summary.tex |
Table 2 | table_rct_gender.do | gender_acquitted.tex , gender_decision.tex |
Table 3 | table_rct_religion.do | religion_acquitted.tex , religion_decision.tex |
Table 4 | table_victim_ramadan.do | victim_inter.tex |
Table 5 | test_same_lastname.do | last_names.tex |
Figure 2 | prep_lit_coefs.do , graph_scatter_pub_bias.do | lit_coef.png , pub_bias.png |
Table 6 | graph_scatter_pub_bias.do | pub_bias.tex |
Figure A5 | explore_discretion.do | judge_acquittal_resids.png |
Figure A6 | test_same_lastname_app.do | name_balance_coef_rcap.png |
Figure A7 | test_same_lastname_app.do | rare_names_weighted.png , rare_names_unweighted.png |
Table A2 | table_sample_representativeness.do | table_crime_in_sample.tex , table_state_in_sample.tex |
Table A3 | class_success.do | class_success.tex |
Table A5 | robustness_checks.do | gender_amb.tex |
Table A6 | robustness_checks.do | religion_amb.tex |
Table A7 | table_rct_gender.do | gender_non_convicted.tex |
Table A8 | table_rct_gender.do | gender_acquitted_amb.tex |
Table A9 | table_rct_religion.do | religion_non_convicted.tex |
Table A10 | table_rct_religion.do | religion_acquitted_amb.tex |
Table A11 | summary_stats.do | gbal.tex |
Table A12 | summary_stats.do | rbal.tex |
Table A13 | table_rct_statewise.do | output_sample_accounting_1.tex |
Table A14 | table_rct_statewise.do | output_sample_accounting_2.tex |
Table A15 | table_balance_extended.do | balance_extended_missing.tex |
Table A16 | table_balance_extended.do | balance_extended_lawyers.tex |
Table A17 | table_judge_type_by_crime_cat.do | table_judges_by_crime_category.tex |
Table A18 | explore_ambiguity.do | low_ambiguity_rcts.tex |
Table A19 | table_balance_lawyers.do | balance_lawyers.tex |
Table A20 | table_rct_lawyers.do | lawyers_religion.tex |
Table A21 | table_rct_lawyers.do | lawyers_gender.tex |
Table A22 | table_victim_ramadan.do | victim_inter_all_g.tex |
Table A23 | table_victim_ramadan.do | victim_inter_all_r.tex |
Table A24 | crimes_against_women.do | crimes_against_women.tex |
Table A25 | table_victim_ramadan.do | victim_inter_cy.tex |
Table A26 | table_rct_by_year.do | rct_2year_bins.tex |
Table A27 | tables_election_month.do | table_election_month.tex |
Table A28 | test_same_lastname.do | last_names_loc_year.tex |
Table A29 | test_same_lastname_app.do | surname_freq_table.tex |
Table A30 | table_ingroup_poi.do | table_balance_poi.tex |
Table A31 | table_ingroup_poi.do | table_ingroup_poi.tex |
Table B1 | table_balance.do | random_acq.tex |
The data to replicate this paper is available on Google Drive and at the Harvard Dataverse.
-
The Google Drive version is recommended, because Harvard Dataverse requires us to split up the files in strange ways. If you download from the Harvard Dataverse, you need to: (1) unzip all case files separately into the
raw/
subfolder; (2) recombine the large 2018 case file:cat cases_clean_2018_part_* > cases_clean_2018.zip
and put it into theraw/
subfolder. -
The layout of data files should look like this when everything is unzipped.
$jdata
should point to theraw
folder with the*
below.
.
├── out
├── raw*
│ ├── acled_district_key.dta
│ ├── acled_districts.dta
│ ├── ACLED_India_violence_2005-2023.csv
│ ├── cases_clean_2010.dta
│ ├── cases_clean_2011.dta
│ ├── cases_clean_2012.dta
│ ├── cases_clean_2013.dta
│ ├── cases_clean_2014.dta
│ ├── cases_clean_2015.dta
│ ├── cases_clean_2016.dta
│ ├── cases_clean_2017.dta
│ ├── cases_clean_2018.dta
│ ├── cases_state_key.dta
│ ├── classification
│ │ └── pooled_names_clean_appended.dta
│ ├── judges_clean.dta
│ ├── keys
│ │ ├── acled_district_key.dta
│ │ ├── cases_district_key.dta
│ │ ├── cases_state_key.dta
│ │ ├── disp_name_key.dta
│ │ ├── pc11_court_district_key.dta
│ │ ├── purpose_name_key.dta
│ │ └── type_name_key.dta
│ ├── lit_coefs.dta
│ ├── names
│ ├── poi_master.dta
│ └── raw
│ └── ACLED_India_violence_2005-2023.csv
└── tmp