The admixr package provides a convenient R interface to ADMIXTOOLS, a widely used software package for calculating admixture statistics and testing population admixture hypotheses.
A typical ADMIXTOOLS workflow often involves a combination of sed
/awk
/shell
scripting and manual editing to create different configuration files. These are
then passed as command-line arguments to one of ADMIXTOOLS commands, and
control how to run a particular analysis. The results of such computation are
then usually redirected to another file, which needs to be parsed by the user
to extract values of interest, often using command-line utilities again or by
manual copy-pasting, and finally analysed in R, Excel or another program.
This workflow can be a little cumbersome, especially if one wants to explore many hypotheses involving different combinations of populations or data filtering strategies. Most importantly, it makes it difficult to follow the rules of best practice for reproducible science, especially given the need for manual intervention on the command-line or custom shell scripting to orchestrate more complex pipelines.
admixr makes it possible to perform all stages of an ADMIXTOOLS analysis entirely from R. It provides a set of convenient functions that completely remove the need for "low-level" configuration of individual ADMIXTOOLS programs, allowing users to focus on the analysis itself.
admixr is now published as an Application Note in the journal Bioinformatics. If you use it in your work, please cite the paper! You will join an excellent company of papers who have used it to do amazing research. 🙂
You can try out admixr without installation directly in your browser! Simply click on and after a short moment you will get a Binder RStudio could session running in your web browser. However, please note that Binder's computational resources are extremely limited so you might run into issues if you try to run extremely resource-intensive computations.
The package is available on CRAN. You can install it simply by running
install.packages("admixr")
from your R session. This the recommended procedure for most users.
To install the development version from Github (which might be slightly ahead in terms of new features and bugfixes compared to the stable release on CRAN), you need the R package devtools. You can run:
install.packages("devtools")
devtools::install_github("bodkan/admixr")
In order to use the admixr package, you need a working installation of ADMIXTOOLS. You can find installation instructions here.
Furthermore, you also need to make sure that R can find ADMIXTOOLS
binaries on the $PATH
. You can achieve this by specifying
PATH=<path to the location of ADMIXTOOLS programs>
in the
.Renviron
file in your home directory. If R cannot find ADMIXTOOLS utilities,
you will get a warning upon loading library(admixr)
in your R session.
This is all the code that you need to perform ADMIXTOOLS analyses using this
package! No shell scripting, no copy-pasting and manual editing of text files.
The only thing you need is a working ADMIXTOOLS installation and a path to
EIGENSTRAT data (a trio of ind/snp/geno files), which we call prefix
here.
library(admixr)
# download a small testing dataset to a temporary directory and process it for use in R
snp_data <- eigenstrat(download_data())
result <- d(
W = c("French", "Sardinian"), X = "Yoruba", Y = "Vindija", Z = "Chimp",
data = snp_data
)
result
Note that a single call to the d
function generates all required intermediate
config and population files, runs ADMIXTOOLS, parses its log output and returns
the result as a data.frame
object with the D statistics results. It does all of
this behind the scenes, without the user having to deal with low-level technical
details.
To see many more examples of admixr in action, please check out the tutorial vignette.
Recently, a new R package called ADMIXTOOLS 2 appeared on the horizon, offering a re-implementation of several features of the original ADMIXTOOLS suite of command-line programs.
The admixr project is not related to that initiative. It is not a pre-cursor to it, nor it is superseeded by it. I have never used ADMIXTOOLS 2 myself, but from the looks of it it seems to offer some very interesting features for fitting complex admixture graphs, which is certainly something which admixr does not do.
The bottom-line is this: as long as the original ADMIXTOOLS continues to be developed and maintained, admixr remains relevant and useful and will continue to be supported. ADMIXTOOLS is one of the most battle-tested pieces of software in population genetics—if you're happy with the set of features it provides and if you're happy with admixr itself, there is no real reason to move away from either of them.