[Note codecov does not check MS Windows-only code]
Join Slack channel and discuss
The package ctrdata
provides functions for retrieving (downloading)
information on clinical trials from public registers, and for
aggregating and analysing such information. It can be used for the
European Union Clinical Trials Register (“EUCTR”,
https://www.clinicaltrialsregister.eu/) and for ClinicalTrials.gov
(“CTGOV”, https://clinicaltrials.gov/). Development of ctrdata
started mid 2015 and was motivated by the wish to understand trends in
designs and conduct of trials and their availability for patients. The
package is to be used within the R system.
Last edit 2019-04-21 for version 0.18.9002, with bug fixes and new features:
- dates are now returned as Date types, and some Yes / No fields are
returned as logical, by function
dbGetFieldsIntoDf()
, - personal annotations can be added when records are retrieved from a
register (new options
annotate.text
andannotate.mode
for functionctrLoadQueryIntoDb()
), for later use in analysis, and - synonyms of active substances to better find trials can be retrieved
with function
ctrFindActiveSubstanceSynonyms()
, - improved functioning with remote Mongo databases, and removed need for local installation of MongoDB.
Main features:
-
Protocol-related information on clinical trials is easily retrieved (downloaded) from public online sources: Users define a query using the registers’ web pages interfaces and then use
ctrdata
for retrieving all trials resulting from the query. -
Results-related information on these clinical trials is now included (since August 2017) when information is retrieved (downloaded).
-
Retrieved (downloaded) trial information is transformed and stored in a document-centric database (MongoDB), for fast and offline access. This can then be analysed with
R
(or others systems). Easily re-run a previous query to update a database collection. -
Unique (de-duplicated) clinical trial records are identified (a database collection may hold information from more than one register, and trials may have more than one record in a register).
ctrdata
has also functions to merge protocol-related information from different registers and to recode it. Vignettes are provided to get started and with detailed examples such as analyses of time trends of details of clinical trial protocols.
Remember to respect the registers’ copyrights and terms and conditions
(see ctrOpenSearchPagesInBrowser(copyright = TRUE)
). Please cite this
package in any publication as follows: Ralf Herold (2019). ctrdata: Retrieve and Analyze Information on Clinical Trials from Public Registers. R package version 0.18.1, https://github.com/rfhb/ctrdata
Package ctrdata
has been used for example for:
-
Blog post on Innovation coming to paediatric research
-
Report on The impact of collaboration: The value of UK medical research to EU science and health
Overview of functions used in sequence:
Within R, use the following commands to
get and install package ctrdata
:
# Release version:
install.packages("ctrdata")
# Development version from github.com:
install.packages("devtools")
# Note build_opts is emptied so that vignettes are built:
devtools::install_github("rfhb/ctrdata", build_opts = "")
Package ctrdata
can be found here on
CRAN.
These command line tools are only required for
ctrGetQueryUrlFromBrowser()
, a main function of package ctrdata
. In
Linux and macOS, these are usually already installed.
For MS Windows, install cygwin: In
R
, run ctrdata::installCygwinWindowsDoInstall()
for an automated
minimal installation into c:\cygwin
. Alternatively, install manually
cygwin with packages perl
, php-jsonc
and php-simplexml
(consumes
about 160 MB disk space; administrator credentials not needed).
A remote or a local mongo database server can be used with the package
ctrdata
. Suggested installation instructions for a local database
server are
here.
A remote mongo database server such as here could be used; this is shown in the examples vignette.
Name | Function |
---|---|
ctrOpenSearchPagesInBrowser | Open search pages of registers or execute search in web browser |
ctrFindActiveSubstanceSynonyms | Find synonyms and alternative names for an active substance |
ctrGetQueryUrlFromBrowser | Import from clipboard the URL of a search in one of the registers |
ctrLoadQueryIntoDb | Retrieve (download) or update, and annotate, information on clinical trials from register and store in database collection |
dbQueryHistory | Show the history of queries that were downloaded into the database collection |
dbFindFields | Find names of fields in the database collection |
dbFindIdsUniqueTrials | Produce a vector of de-duplicated identifiers of clinical trial records in the database collection |
dbGetFieldsIntoDf | Create a data.frame from records in the database collection with the specified fields |
dfMergeTwoVariablesRelevel | Merge two variables into a single variable, optionally map values to a new set of values |
installCygwinWindowsDoInstall | Convenience function to install a cygwin environment (MS Windows only) |
The aim is to download protocol-related trial information and tabulate the trials’ status.
- Attach package
ctrdata
:
library(ctrdata)
#> Registered S3 method overwritten by 'rvest':
#> method from
#> read_xml.response xml2
#> Information on this package and how to use it:
#> https://cran.r-project.org/package=ctrdata
#>
#> Please respect the requirements and the copyrights of the
#> clinical trial registers when using their information. Call
#> ctrOpenSearchPagesInBrowser(copyright = TRUE) and visit
#> https://www.clinicaltrialsregister.eu/disclaimer.html
#> https://clinicaltrials.gov/ct2/about-site/terms-conditions#Use
#> Testing helper binaries:
#> completed.
- Open registers’ advanced search pages in browser:
ctrOpenSearchPagesInBrowser()
# Please review and respect register copyrights:
ctrOpenSearchPagesInBrowser(copyright = TRUE)
-
Click search parameters and execute search in browser
-
Copy address from browser address bar to clipboard
-
Get address from clipboard:
q <- ctrGetQueryUrlFromBrowser()
# * Found search query from EUCTR.
q
# query-term query-register
# 1 query=cancer&age=under-18&phase=phase-one EUCTR
- Retrieve protocol-related information, transform, save to database and analyse:
If no parameters are given for a database connection: mongodb is used on localhost, port 27017, database “users”, collection “ctrdata”.
Under the hood, scripts euctr2json.sh
and xml2json.php
(in
ctrdata/exec
) transform EUCTR plain text files and CTGOV xml files to
json format, which is imported into the database.
# Retrieve trials from public register:
ctrLoadQueryIntoDb(paste0("https://www.clinicaltrialsregister.eu/ctr-search/search?",
"query=cancer&age=under-18&phase=phase-one"))
# Alternative:
# ctrLoadQueryIntoDb(q)
Tabulate the status of deduplicated trials
# Get all records that have values in all specified fields.
# Note that b31_... is an element within the array b1_...
result <- dbGetFieldsIntoDf(c("b1_sponsor.b31_and_b32_status_of_the_sponsor",
"p_end_of_trial_status", "a2_eudract_number"))
# Eliminate trials records duplicated by EU Member State:
uniqueids <- dbFindIdsUniqueTrials()
result <- result[ result[["_id"]] %in% uniqueids, ]
# Tabulate the status of the clinical trial on the date of information retrieval
# Note some trials have more than one sponsor and values are concatenated with /.
with (result, table (p_end_of_trial_status, b1_sponsor.b31_and_b32_status_of_the_sponsor))
# b1_sponsor.b31_and_b32_status_of_the_sponsor
# p_end_of_trial_status Commercial Non-Commercial Non-Commercial / Non-Commercial
# Completed 81 32 0
# Ongoing 205 239 12
# Prematurely Ended 15 12 0
# Restarted 0 1 0
# Temporarily Halted 4 1 0
-
Explore NoSQL databases other than Mongo
-
Explore using the Windows Subsystem for Linux (WSL) instead of cygwin
-
Merge results-related information retrieved from different registers (e.g. corresponding endpoints) and prepare for analysis across trials.
-
Explore relevance to retrieve previous versions of protocol- and results-related information
-
Abstract database access
-
Data providers and curators of the clinical trial registers. Please review and respect their copyrights and terms and conditions (
ctrOpenSearchPagesInBrowser(copyright = TRUE)
). -
This package
ctrdata
has been made possible based on the work done for R, curl, clipr, mongolite, httr, xml2 and rvest.
-
Please file issues and bugs here.
-
Package
ctrdata
should work and is continually tested on Linux, Mac OS X and MS Windows systems. Linux and MS Windows are tested using continuous integration, see badges at the beginning of this document. Please file an issue for any problems. -
The information in the registers may not be fully correct; see this publication on CTGOV.
-
No attempts were made to harmonise field names between registers, but
dfMergeTwoVariablesRelevel()
can be used to merge and map two variables / fields into one. So far, there is no typing of database fields; they are all strings (except for indices).