- new function
histgroup_iarc()
to create variable for groups of malignant neoplasms considered to be histologically 'different' for the purpose of defining multiple tumors, ICD-O-3 (see #100) - some functions gain new
quiet
argument to suppressrlang::warn()
andrlang::inform()
messages. You can use this when you have checked your results for correctness and want to reduce message output, but keep the progress bars. asir()
: add World Standard Population 2000-2025 for function with optionstd_pop=="WHO2000"
as described here: https://seer.cancer.gov/stdpopulations/world.who.htmlsir_byfutime()
gains new argumentexpect_missing_refstrata_df
. You can define another dataframe that contains strata expected to be missing from refrates_df (because they are not explicitly coded with incidence = 0). This can be helpful, if refrates_df has a lot of strata and 0 incidence strata have been removed to save storage space. Internally, the rows of expect_missing_refstrata_df will be appended to refrates_df. This reduces the number of lines reported in attributeproblems_missing_ref_strata
. Default setting isexpect_missing_refstrata_df = NULL
.- sample data set for
data("us_second_cancer")
gains new variablet_hist
on histology, i.e. ICD-O-3-Code on tumor morphology (4 digits)
- no breaking changes in this version
- make
calc_refrates()
more robust for missingrace_var
(Closes #89) - fix bug in
calc_refrates()
usingcalc_totals == TRUE
(Closes #90) - fix bug in
calc_refrates()
using numeric versions offill_sites
(Closes #92) - fix bug in
asir()
that throws error for variable not needed (Closes #95)
- replace progress bars by
cli
- deprecate
verb.()
syntax from tidytable (Closes #94)
- new function
calc_refrates()
to calculate age-, sex-, region-, year-specific reference rates from a long format dataframe with cancer cases that are counted for incident cases and then matched with a reference population. The resulting reference rates dataframe can directly be used withsir_byfutime()
function. - functions gain new default
dattype = NULL
and thus are more flexible to take other source data types (Closes #73)
- functions
asir
,calc_futime*
,calc_refrates
,ir_crosstab_byfutime
,pat_status*
,renumber_time_id*
, andsir_byfutime
now by default are set todattype = NULL
. If you relied on automatic variable naming feature, you need to adddattype = "seer"
ordattype = "zfkd"
to your function call. - fix typo in attribute names: attributes are now correctly named
problems_missing_count_strata
andproblems_missing_fu_strata
(Closes #80)
sir_byfutime()
:- attributes with notes and problems are now correctly saved to
results_df
- attributes with notes and problems are now correctly saved to
- deprecated functions from
tidytable
package have been replaced (Closes #71 and #74)
- new function
sir_ratio()
and relatedsir_ratio_lci()
andsir_ratio_uci()
to calculate ratio of two SIRs/SMRs to get relative risk and confidence limits for this ratio. - tidytable variant of reshape_long function, i.e.
reshape_long_tt()
⇒ the _tt variants usually have smaller memory use than tidyverse and data.table variants. Execution time is usually much faster than tidyverse and comparable to or a little slower than the data.table variant. summarize_sir_results()
:- add ability to summarize by different site_var than the one used in
sir_byfutime()
- add ability to summarize by different site_var than the one used in
summarize_sir_results()
:- PYARs are now correctly calculated when using
summarize_site == TRUE
. Previously the results incorrectly counted each site multiple times. (Closes #62)
- PYARs are now correctly calculated when using
pat_status()
:- update default values for
dattype = "zfkd"
- update default values for
- add R-CMD-Check to github actions
- new sample data set for standard populations ⇒
data("standard_population")
- new sample data set for us population ⇒
data("population_us")
(Closes #58)
sir_byfutime()
: change output of integer columns to numeric to fix bug insummarize_sir_results()
(Closes #59)
- add examples to function documentation (Closes #56)
- remove "R" from package title (Closes #57)
- update package description (Closes #54)
- update introduction vignette
vignette("introduction")
- tidytable variants of functions, i.e.
reshape_wide_tt()
,renumber_time_id_tt()
,pat_status_tt()
,vital_status_tt()
,calc_futime_tt()
⇒ the _tt variants usually have smaller memory use than tidyverse and data.table variants. Execution time is usually much faster than tidyverse and comparable to or a little slower than the data.table variant. sir_byfutime()
:- is much faster using tidytable package
- gained the option
race_var
to optionally stratify SIR calculations by race. summarize_sir_results()
:- new function that increases functionality in summarizing results from
sir_byfutime()
function - new option to define custom
site_var_name
- new package website https://marianschmidt.github.io/msSPChelpR
- new sample datasets included in the package to demonstrate examples (#36)
sir_byfutime()
:- options
add_total_row
andadd_total_fu
are replaced bycalc_total_row
andcalc_total_fu
. These are logical parameters now. The positioning of total rows and columns is completely handled by thesummarize_sir_results()
function now. There total rows can be set to top and bottom and total columns to left and right. - option
expcount_src
including related parametersstdpop_df
,refpop_df
,std_pop
,truncate_std_pop
andpyar_var
have been removed. Functionsir_byfutime()
will only work calculating expected counts based on reference rates, not within the cohort of the dataset. To calculate expected based on the cohort, a new functioncreate_refrates
will be added in the future. (#41) - option
collapse_ci
has been removed and added tosummarize_sir_results()
instead. - option name for tumor site variable changed from
icdcat_var
tosite_var
- option name for age/age group variable changed from
agegroup_var
toage_var
- in total the parameters
expcount_src
,futime_src
,stdpop_df
,refpop_df
,std_pop
,truncate_std_pop
,pyar_var
,icdcat_var
,collapse_ci
have been removed to simply the function ⇒ make sure you remove these arguments from yoursir_byfutime()
function calls.
- options
sir()
:- is superseded by the use of
sir_byfutime()
. To migrate your formersir()
functions, you can simply usesir_byfutime(, futime_breaks = "none")
that will yield the same results.
- is superseded by the use of
summarize_sir_results()
:- option name for tumor site variable changed from
summarize_icdcat
tosummarize_site
- option name for tumor site variable changed from
reshape_long_tidyr()
:- option
var_selection
is deprecated. Please select variables before running thereshape_long_*
functions.
- option
asir()
:- option name for age/age group variable changed from
agegroup_var
toage_var
- option name for tumor site variable changed from
icdcat_var
tosite_var
- option name for age/age group variable changed from
pat_status()
,pat_status_tt()
,vital_status()
, andvital_status_tt()
:- Capitalized default variable labelling.
- This might break code that relied on using the labels coming out of these functions in later filter or mutate functions.
ir_crosstab_byfutime()
:- option
futime_breaks
now uses breaks in years instead of months as previously. - default
futime_var
is now follow-up time in years
- option
- now requires dplyr version 1.0.0
- now requires tidytable package
- the default option name for tumor site variable changed from
icdcat_var
tosite_var
. This need manual update of function calls ofsir_byfutime()
andasir()
, if option is specified. - the default variable name for tumor site in all functions has been changed from
t_icdcat
tot_site
. So the reference data frames used will need to have at_site
column. - the data.table variants of functions (
renumber_time_id_dt()
,pat_status_dt()
,reshape_long_dt()
,reshape_wide_dt()
,vital_status_dt()
) have been removed for simplicity, please use tidytable variants, i.e.reshape_wide_tt()
,renumber_time_id_tt()
,pat_status_tt()
,vital_status_tt()
,calc_futime_tt()
, instead. They will give the same data.table output and same performance.
- implement new reliable routine to split df when
reshape_wide()
with optionchunks
is used. Closes #1. - Sorting of columns in wide datasets by
reshape_wide_tidyr()
andreshape_wide_tt()
is now preserved. Closes #31. - ensure sorting in
renumer_time_id()
and make sure thatnew_time_id_var
is returned as integer. - fix bug in
pat_status_*(., check = TRUE)
option - improve internal tests in
sir_byfutime()
so that PYARs do not get lost before running summary function sir_byfutime()
now also gives correct results if range offutime_breaks
is not 0-Inf but smaller
- add timevar_max option to
renumber_time_id()
function; use sorting by date of diagnosis instead of old time_id_var - various improvements to
reshape_wide_tidyr()
function - various improvements to
reshape_wide_dt()
function which is much faster now and usesdata.table::dcast
instead ofstats::reshape
now - various improvements to
pat_status()
andpat_status_dt()
functions - option summarize_icdcat in
summarize_sir_results()
is now functional - update vignette
vignette("introduction")
- fix incomplete check for required variables in
pat_status()
andpat_status_dt()
functions - fix error in check for required variables in
renumber_time_id()
that broke functions - fix bug in check for end of FU time in
pat_status()
andcalc_futime()
- implement new tidyselect routine using
tidyselect::all_of
insummarize_sir_results()
- new faster version of reshape_long based on data.table
- start new vignette on workflow from filtered long dataset to follow-up times
vignette("patstatus_futime")
- implement new tidyselect routine using
tidyselect::all_of
for vector-based variable selection - implement correct referencing in
vital_status_dt
andpat_status_dt
- add exports from
data.table
- update documentation for sir and sir_byfutime functions
- make
reshape_long
function work
- new faster version of vital_status function using data.table
- new faster version of pat_status function using data.table
- new faster version of reshape_wide_dt function based on data.table and without problematic slices done by reshape_wide
- new faster version of renumber_time_id function based on data.table
- new function renumber_time_id
- add check to revert status_var to numeric in case it was created with option as_labelled_factor
- fix label bug in life_var_new
- add option as_labelled_factor to vital_status function
- fix newly introduced error in vital_status function
- fix error in vital_status function by replacing sjlabelled::get_label function
- fix error in pat_status and vital_status functions due to change in sjlabelled package
- rebuild description file and manual
- remove nest_legacy functions and use new tidyr syntax, close #19
- make summarize_sir_results function work without break variables
- for function sir_byfutime ⇒ make option
add_total_row
work, even if optionybreak_vars = "none"
- Make use of time_id_var and case_id_var use coherent across reshape functions
- Fixed issue in Namespace
- Added a
NEWS.md
file to track changes to the package.
- add option
futime_breaks = "none"
tosir_byfutime
function
- includes a new function to calculate crude (absolute) incidence rates a tabulate them by whatever number of grouping variables and it can be used as a Table 1 for publications ⇒ The function is called msSPChelpR::ir_crosstab
- includes a new function to calculate SIRs (standardized incidence ratios) by whatever strata you desire (unlimited ybreak_vars; one xbreak_var) and additionally customized breaks for follow-up times (default is: to 6 months, .5-1 year, 1-5 years, 5-10 years, >10 years) ⇒ attention, it only makes sense to stratify results (ybreak_vars or xbreak_var) by variables measured at baseline and not for variables that are dependent on the occurrence of an SPC) ⇒ function msSPChelpR::sir_byfutime ⇒ depending on the number of stratification variables you are using, this function may result in a very long results data.frame. So please use it together with the new function msSPChelpR::summarize_sir_results
- includes a new function to summarize results dataframes from SIR calculations
- New reshape functions that are faster and are using less memory