Skip to content
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.

Commit

Permalink
Adding datasets (#21)
Browse files Browse the repository at this point in the history
* added datasets/ folder

* added esoph dataset and README
  • Loading branch information
storopoli committed Dec 10, 2023
1 parent e19c504 commit fd6779d
Show file tree
Hide file tree
Showing 7 changed files with 4,054 additions and 1 deletion.
40 changes: 39 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,28 @@ Thousands of users rely on `Stan` for statistical modeling, data analysis, and p
Models specified using `Turing.jl` are easy to read and write — models work the way you write them.
Like everything in Julia, `Turing.jl` is [fast](https://arxiv.org/abs/2002.02702).

## Datasets

- `kidiq` (linear regression): data from a survey of adult American women and their children
(a subsample from the National Longitudinal Survey of Youth).
Source: Gelman and Hill (2007).
- `wells` (logistic regression): a survey of 3200 residents in a small area of Bangladesh suffering
from arsenic contamination of groundwater.
Respondents with elevated arsenic levels in their wells had been encouraged to switch their water source
to a safe public or private well in the nearby area
and the survey was conducted several years later to
learn which of the affected residents had switched wells.
Souce: Gelman and Hill (2007).
- `esoph` (ordinal regression): data from a case-control study of (o)esophageal cancer in Ille-et-Vilaine, France.
Source: Breslow and Day (1980).
- `roaches` (Poisson regression): data on the efficacy of a pest management system at reducing the number of roaches in urban apartments.
Source: Gelman and Hill (2007).
- `duncan` (robust regression): data from occupation's prestige filled with outliers.
Source: Duncan (1961).
- `cheese` (hierarchical models): data from cheese ratings.
A group of 10 rural and 10 urban raters rated 4 types of different cheeses (A, B, C and D) in two samples.
Source: Boatwright, McCulloch and Rossi (1999).

## Author

Jose Storopoli, PhD - [*Lattes* CV](http://lattes.cnpq.br/2281909649311607) - [ORCID](https://orcid.org/0000-0002-0559-5176) - <https://storopoli.io>
Expand All @@ -77,7 +99,7 @@ I've made it to be how I would have liked to be introduced to Bayesian statistic

## References

The references are divided in **books**, **papers** and **software**.
The references are divided in **books**, **papers**, **software**, and **datasets**.

### Books

Expand Down Expand Up @@ -158,6 +180,14 @@ The papers section of the references are divided into **required** and **complem
problems of p values. *Psychonomic Bulletin & Review*, *14*(5),
779–804.
https://doi.org/[10.3758/BF03194105](https://doi.org/10.3758/BF03194105)
- Vandekerckhove, J., Matzke, D., Wagenmakers, E.-J., & others. (2015).
Model comparison and the principle of parsimony.
In J. R. Busemeyer, Z. Wang, J. T. Townsend, & A. Eidels (Eds.),
Oxford handbook of computational and mathematical psychology (pp. 300–319).
Oxford University Press Oxford.
- Vehtari, A., Gelman, A., & Gabry, J. (2015). Practical Bayesian model evaluation
using leave-one-out cross-validation and WAIC.
https://doi.org/10.1007/s11222-016-9696-4

#### Complementary

Expand Down Expand Up @@ -221,6 +251,14 @@ The papers section of the references are divided into **required** and **complem
- Tarek, M., Xu, K., Trapp, M., Ge, H., & Ghahramani, Z. (2020). DynamicPPL: Stan-like Speed for Dynamic Probabilistic Models. ArXiv:2002.02702 [Cs, Stat]. http://arxiv.org/abs/2002.02702
- Xu, K., Ge, H., Tebbutt, W., Tarek, M., Trapp, M., & Ghahramani, Z. (2020). AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms. Symposium on Advances in Approximate Bayesian Inference, 1–10. http://proceedings.mlr.press/v118/xu20a.html

### Datasets

- Boatwright, P., McCulloch, R., & Rossi, P. (1999). Account-level modeling for trade promotion: An application of a constrained parameter hierarchical model. _Journal of the American Statistical Association_, 94(448), 1063–1073.
- Breslow, N. E. & Day, N. E. (1980). **Statistical Methods in Cancer Research. Volume 1: The Analysis of Case-Control Studies**. IARC Lyon / Oxford University Press.
- Duncan, O. D. (1961). A socioeconomic index for all occupations. Class: Critical Concepts, 1, 388–426.
- Gelman, A., & Hill, J. (2007). **Data analysis using regression and
multilevel/hierarchical models**. Cambridge university press.

## How to cite

To cite this course, please use:
Expand Down
161 changes: 161 additions & 0 deletions datasets/cheese.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
"cheese","rater","background","y"
"A",1,"rural",67
"A",1,"rural",66
"B",1,"rural",51
"B",1,"rural",53
"C",1,"rural",75
"C",1,"rural",70
"D",1,"rural",68
"D",1,"rural",66
"A",2,"rural",76
"A",2,"rural",76
"B",2,"rural",56
"B",2,"rural",65
"C",2,"rural",82
"C",2,"rural",82
"D",2,"rural",81
"D",2,"rural",77
"A",3,"rural",80
"A",3,"rural",84
"B",3,"rural",71
"B",3,"rural",67
"C",3,"rural",82
"C",3,"rural",82
"D",3,"rural",71
"D",3,"rural",70
"A",4,"rural",79
"A",4,"rural",83
"B",4,"rural",63
"B",4,"rural",60
"C",4,"rural",80
"C",4,"rural",83
"D",4,"rural",66
"D",4,"rural",71
"A",5,"rural",74
"A",5,"rural",72
"B",5,"rural",55
"B",5,"rural",58
"C",5,"rural",76
"C",5,"rural",79
"D",5,"rural",75
"D",5,"rural",74
"A",6,"rural",66
"A",6,"rural",71
"B",6,"rural",49
"B",6,"rural",49
"C",6,"rural",72
"C",6,"rural",70
"D",6,"rural",65
"D",6,"rural",65
"A",7,"rural",63
"A",7,"rural",68
"B",7,"rural",38
"B",7,"rural",33
"C",7,"rural",72
"C",7,"rural",71
"D",7,"rural",70
"D",7,"rural",72
"A",8,"rural",67
"A",8,"rural",60
"B",8,"rural",56
"B",8,"rural",48
"C",8,"rural",72
"C",8,"rural",70
"D",8,"rural",67
"D",8,"rural",69
"A",9,"rural",59
"A",9,"rural",65
"B",9,"rural",41
"B",9,"rural",48
"C",9,"rural",61
"C",9,"rural",70
"D",9,"rural",57
"D",9,"rural",59
"A",10,"rural",72
"A",10,"rural",64
"B",10,"rural",55
"B",10,"rural",57
"C",10,"rural",78
"C",10,"rural",78
"D",10,"rural",72
"D",10,"rural",67
"A",1,"urban",67
"A",1,"urban",64
"B",1,"urban",44
"B",1,"urban",54
"C",1,"urban",75
"C",1,"urban",83
"D",1,"urban",64
"D",1,"urban",60
"A",2,"urban",83
"A",2,"urban",85
"B",2,"urban",60
"B",2,"urban",63
"C",2,"urban",91
"C",2,"urban",90
"D",2,"urban",67
"D",2,"urban",69
"A",3,"urban",74
"A",3,"urban",75
"B",3,"urban",58
"B",3,"urban",54
"C",3,"urban",77
"C",3,"urban",80
"D",3,"urban",64
"D",3,"urban",66
"A",4,"urban",81
"A",4,"urban",84
"B",4,"urban",58
"B",4,"urban",62
"C",4,"urban",86
"C",4,"urban",90
"D",4,"urban",71
"D",4,"urban",67
"A",5,"urban",83
"A",5,"urban",79
"B",5,"urban",54
"B",5,"urban",57
"C",5,"urban",91
"C",5,"urban",82
"D",5,"urban",78
"D",5,"urban",84
"A",6,"urban",84
"A",6,"urban",82
"B",6,"urban",65
"B",6,"urban",61
"C",6,"urban",80
"C",6,"urban",87
"D",6,"urban",78
"D",6,"urban",84
"A",7,"urban",89
"A",7,"urban",84
"B",7,"urban",70
"B",7,"urban",76
"C",7,"urban",79
"C",7,"urban",89
"D",7,"urban",89
"D",7,"urban",91
"A",8,"urban",80
"A",8,"urban",80
"B",8,"urban",62
"B",8,"urban",56
"C",8,"urban",85
"C",8,"urban",86
"D",8,"urban",77
"D",8,"urban",79
"A",9,"urban",74
"A",9,"urban",73
"B",9,"urban",56
"B",9,"urban",61
"C",9,"urban",78
"C",9,"urban",79
"D",9,"urban",77
"D",9,"urban",82
"A",10,"urban",77
"A",10,"urban",82
"B",10,"urban",64
"B",10,"urban",57
"C",10,"urban",86
"C",10,"urban",86
"D",10,"urban",81
"D",10,"urban",83
46 changes: 46 additions & 0 deletions datasets/duncan.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
"profession","type","income","education","prestige"
"accountant","prof",62,86,82
"pilot","prof",72,76,83
"architect","prof",75,92,90
"author","prof",55,90,76
"chemist","prof",64,86,90
"minister","prof",21,84,87
"professor","prof",64,93,93
"dentist","prof",80,100,90
"reporter","wc",67,87,52
"engineer","prof",72,86,88
"undertaker","prof",42,74,57
"lawyer","prof",76,98,89
"physician","prof",76,97,97
"welfare.worker","prof",41,84,59
"teacher","prof",48,91,73
"conductor","wc",76,34,38
"contractor","prof",53,45,76
"factory.owner","prof",60,56,81
"store.manager","prof",42,44,45
"banker","prof",78,82,92
"bookkeeper","wc",29,72,39
"mail.carrier","wc",48,55,34
"insurance.agent","wc",55,71,41
"store.clerk","wc",29,50,16
"carpenter","bc",21,23,33
"electrician","bc",47,39,53
"RR.engineer","bc",81,28,67
"machinist","bc",36,32,57
"auto.repairman","bc",22,22,26
"plumber","bc",44,25,29
"gas.stn.attendant","bc",15,29,10
"coal.miner","bc",7,7,15
"streetcar.motorman","bc",42,26,19
"taxi.driver","bc",9,19,10
"truck.driver","bc",21,15,13
"machine.operator","bc",21,20,24
"barber","bc",16,26,20
"bartender","bc",16,28,7
"shoe.shiner","bc",9,17,3
"cook","bc",14,22,16
"soda.clerk","bc",12,30,6
"watchman","bc",17,25,11
"janitor","bc",7,20,8
"policeman","bc",34,47,41
"waiter","bc",8,32,10
89 changes: 89 additions & 0 deletions datasets/esoph.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
"agegp","alcgp","tobgp","ncases","ncontrols"
"25-34","0-39g/day","0-9g/day",0,40
"25-34","0-39g/day","10-19",0,10
"25-34","0-39g/day","20-29",0,6
"25-34","0-39g/day","30+",0,5
"25-34","40-79","0-9g/day",0,27
"25-34","40-79","10-19",0,7
"25-34","40-79","20-29",0,4
"25-34","40-79","30+",0,7
"25-34","80-119","0-9g/day",0,2
"25-34","80-119","10-19",0,1
"25-34","80-119","30+",0,2
"25-34","120+","0-9g/day",0,1
"25-34","120+","10-19",1,0
"25-34","120+","20-29",0,1
"25-34","120+","30+",0,2
"35-44","0-39g/day","0-9g/day",0,60
"35-44","0-39g/day","10-19",1,13
"35-44","0-39g/day","20-29",0,7
"35-44","0-39g/day","30+",0,8
"35-44","40-79","0-9g/day",0,35
"35-44","40-79","10-19",3,20
"35-44","40-79","20-29",1,13
"35-44","40-79","30+",0,8
"35-44","80-119","0-9g/day",0,11
"35-44","80-119","10-19",0,6
"35-44","80-119","20-29",0,2
"35-44","80-119","30+",0,1
"35-44","120+","0-9g/day",2,1
"35-44","120+","10-19",0,3
"35-44","120+","20-29",2,2
"45-54","0-39g/day","0-9g/day",1,45
"45-54","0-39g/day","10-19",0,18
"45-54","0-39g/day","20-29",0,10
"45-54","0-39g/day","30+",0,4
"45-54","40-79","0-9g/day",6,32
"45-54","40-79","10-19",4,17
"45-54","40-79","20-29",5,10
"45-54","40-79","30+",5,2
"45-54","80-119","0-9g/day",3,13
"45-54","80-119","10-19",6,8
"45-54","80-119","20-29",1,4
"45-54","80-119","30+",2,2
"45-54","120+","0-9g/day",4,0
"45-54","120+","10-19",3,1
"45-54","120+","20-29",2,1
"45-54","120+","30+",4,0
"55-64","0-39g/day","0-9g/day",2,47
"55-64","0-39g/day","10-19",3,19
"55-64","0-39g/day","20-29",3,9
"55-64","0-39g/day","30+",4,2
"55-64","40-79","0-9g/day",9,31
"55-64","40-79","10-19",6,15
"55-64","40-79","20-29",4,13
"55-64","40-79","30+",3,3
"55-64","80-119","0-9g/day",9,9
"55-64","80-119","10-19",8,7
"55-64","80-119","20-29",3,3
"55-64","80-119","30+",4,0
"55-64","120+","0-9g/day",5,5
"55-64","120+","10-19",6,1
"55-64","120+","20-29",2,1
"55-64","120+","30+",5,1
"65-74","0-39g/day","0-9g/day",5,43
"65-74","0-39g/day","10-19",4,10
"65-74","0-39g/day","20-29",2,5
"65-74","0-39g/day","30+",0,2
"65-74","40-79","0-9g/day",17,17
"65-74","40-79","10-19",3,7
"65-74","40-79","20-29",5,4
"65-74","80-119","0-9g/day",6,7
"65-74","80-119","10-19",4,8
"65-74","80-119","20-29",2,1
"65-74","80-119","30+",1,0
"65-74","120+","0-9g/day",3,1
"65-74","120+","10-19",1,1
"65-74","120+","20-29",1,0
"65-74","120+","30+",1,0
"75+","0-39g/day","0-9g/day",1,17
"75+","0-39g/day","10-19",2,4
"75+","0-39g/day","30+",1,2
"75+","40-79","0-9g/day",2,3
"75+","40-79","10-19",1,2
"75+","40-79","20-29",0,3
"75+","40-79","30+",1,0
"75+","80-119","0-9g/day",1,0
"75+","80-119","10-19",1,0
"75+","120+","0-9g/day",2,0
"75+","120+","10-19",1,0
Loading

0 comments on commit fd6779d

Please sign in to comment.