Skip to content

Commit

Permalink
Start a summary check list
Browse files Browse the repository at this point in the history
  • Loading branch information
rdpeng committed Jan 21, 2014
1 parent 2e3bd19 commit 1fcc5e0
Show file tree
Hide file tree
Showing 3 changed files with 393 additions and 0 deletions.
107 changes: 107 additions & 0 deletions 05_ReproducibleResearch/Checklist/index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,110 @@ url:
widgets : [mathjax] # {mathjax, quiz, bootstrap}
mode : selfcontained # {standalone, draft}
---

## DO: Start With Good Science

* Garbage in, garbage out

* Coherent, focused question simplifies many problems

* Working with good collaborators reinforces good practices

* Something that's interesting to you will (hopefully) motivate good
habits

---

## DON'T: Do Things By Hand



* Editing spreadsheets of data to "clean it up"

- Removing outliers
- QA / QC
- Validating

* Editing tables or figures (e.g. rounding, formatting)

* Downloading data from a web site (clicking links in a web browser)

* Moving data around your computer; splitting / reformatting data files

* "We're just going to do this once...."


Things done by hand need to be precisely documented (this is harder
than it sounds)

---

## DON'T: Point And Click


---

## DO: Teach a Computer


---

## DO: Use Some Version Control

* Slow things down

* Add changes in small chunks (don't just do one massive commit)

* Track / tag snapshots; revert to old versions

* Software like GitHub / BitBucket / SourceForge make it easy to
publish results

---
## DO: Keep Track of Your Software Environment


---

## DO: Keep Track of Your Software Environment


```{r}
sessionInfo()
```

---

## DON'T: Save Output


---

## DO: Think About the Pipeline

* Data analysis is a lengthy process

* How you got the end is just as important as the end itself

* The farther back in the pipeline you can "preserve" the better

---

## Summary: Checklist

* Are we doing good science?

* Was any part of this analysis done by hand?
- If so, are those parts *precisely* document?

* Have we taught a computer to do as much as possible (i.e. coded)?

* Are we using a version control system?

* Have we documented our software environment?

* Have we saved any output that we cannot reconstruct from original
data + code?

* How far back in the analysis pipeline can we go before our results
are no longer (automatically) reproducible?
159 changes: 159 additions & 0 deletions 05_ReproducibleResearch/Checklist/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,169 @@ <h2>What to Do and What Not to Do</h2>
<!-- SLIDES -->
<slide class="" id="slide-1" style="background:;">
<hgroup>
<h2>DO: Start With Good Science</h2>
</hgroup>
<article>
<ul>
<li><p>Garbage in, garbage out</p></li>
<li><p>Coherent, focused question simplifies many problems</p></li>
<li><p>Working with good collaborators reinforces good practices</p></li>
<li><p>Something that&#39;s interesting to you will (hopefully) motivate good
habits</p></li>
</ul>

</article>
<!-- Presenter Notes -->
</slide>

<slide class="" id="slide-2" style="background:;">
<hgroup>
<h2>DON&#39;T: Do Things By Hand</h2>
</hgroup>
<article>
<ul>
<li><p>Editing spreadsheets of data to &quot;clean it up&quot;</p>

<ul>
<li>Removing outliers</li>
<li>QA / QC</li>
<li>Validating</li>
</ul></li>
<li><p>Editing tables or figures (e.g. rounding, formatting)</p></li>
<li><p>Downloading data from a web site (clicking links in a web browser)</p></li>
<li><p>Moving data around your computer; splitting / reformatting data files</p></li>
<li><p>&quot;We&#39;re just going to do this once....&quot;</p></li>
</ul>

<p>Things done by hand need to be precisely documented (this is harder
than it sounds)</p>

</article>
<!-- Presenter Notes -->
</slide>

<slide class="" id="slide-3" style="background:;">
<hgroup>
<h2>DON&#39;T: Point And Click</h2>
</hgroup>
<article>

</article>
<!-- Presenter Notes -->
</slide>

<slide class="" id="slide-4" style="background:;">
<hgroup>
<h2>DO: Teach a Computer</h2>
</hgroup>
<article>

</article>
<!-- Presenter Notes -->
</slide>

<slide class="" id="slide-5" style="background:;">
<hgroup>
<h2>DO: Use Some Version Control</h2>
</hgroup>
<article>
<ul>
<li><p>Slow things down</p></li>
<li><p>Add changes in small chunks (don&#39;t just do one massive commit)</p></li>
<li><p>Track / tag snapshots; revert to old versions</p></li>
<li><p>Software like GitHub / BitBucket / SourceForge make it easy to
publish results</p></li>
</ul>

</article>
<!-- Presenter Notes -->
</slide>

<slide class="" id="slide-6" style="background:;">
<hgroup>
<h2>DO: Keep Track of Your Software Environment</h2>
</hgroup>
<article>

</article>
<!-- Presenter Notes -->
</slide>

<slide class="" id="slide-7" style="background:;">
<hgroup>
<h2>DO: Keep Track of Your Software Environment</h2>
</hgroup>
<article>
<pre><code class="r">sessionInfo()
</code></pre>

<pre><code>## R version 3.0.2 Patched (2013-12-30 r64600)
## Platform: x86_64-apple-darwin13.0.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets base
##
## other attached packages:
## [1] slidify_0.3.3
##
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.1 formatR_0.10 knitr_1.5 markdown_0.6.3
## [5] stringr_0.6.2 tools_3.0.2 whisker_0.3-2 yaml_2.1.8
</code></pre>

</article>
<!-- Presenter Notes -->
</slide>

<slide class="" id="slide-8" style="background:;">
<hgroup>
<h2>DON&#39;T: Save Output</h2>
</hgroup>
<article>

</article>
<!-- Presenter Notes -->
</slide>

<slide class="" id="slide-9" style="background:;">
<hgroup>
<h2>DO: Think About the Pipeline</h2>
</hgroup>
<article>
<ul>
<li><p>Data analysis is a lengthy process</p></li>
<li><p>How you got the end is just as important as the end itself</p></li>
<li><p>The farther back in the pipeline you can &quot;preserve&quot; the better</p></li>
</ul>

</article>
<!-- Presenter Notes -->
</slide>

<slide class="" id="slide-10" style="background:;">
<hgroup>
<h2>Summary: Checklist</h2>
</hgroup>
<article>
<ul>
<li><p>Are we doing good science?</p></li>
<li><p>Was any part of this analysis done by hand?</p>

<ul>
<li>If so, are those parts <em>precisely</em> document?</li>
</ul></li>
<li><p>Have we taught a computer to do as much as possible (i.e. coded)?</p></li>
<li><p>Are we using a version control system?</p></li>
<li><p>Have we documented our software environment?</p></li>
<li><p>Have we saved any output that we cannot reconstruct from original
data + code?</p></li>
<li><p>How far back in the analysis pipeline can we go before our results
are no longer (automatically) reproducible?</p></li>
</ul>

</article>
<!-- Presenter Notes -->
</slide>
Expand Down
Loading

0 comments on commit 1fcc5e0

Please sign in to comment.