- added a function,
pairwise_comparison()
that runs pairwise comparisons between models on the output ofeval_forecasts()
- added functionality to compute relative skill within
eval_forecasts()
- added a function to visualise pairwise comparisons
- The WIS definition change introduced in version 0.1.5 was partly corrected such that the difference in weighting is only introduced when summarising over scores from different interval ranges
eval_forecasts()
can now handle a separate forecast and truth data set as as inputeval_forecasts()
now supports scoring point forecasts along side quantiles in a quantile-based format. Currently the only metric used is the absoluter error
- Many functions, especially
eval_forecasts()
got a major rewrite. While functionality should be unchanged, the code should now be easier to maintain - Some of the data-handling functions got renamed, but old names are supported as well for now.
- changed the default definition of the weighted interval score. Previously,
the median prediction was counted twice, but is no only counted once. If you
want to go back to the old behaviour, you can call the interval_score fucntion
with the agument
count_median_twice = FALSE
.
- we added basic plotting functionality to visualise scores. You can now
easily obtain diagnostic plots based on scores as produced by
eval_forecasts
. correlation_plot
shows correlation between metricsrange_plot
shows contribution of different prediction intervals to some chosen metricscore_heatmap
visualises scores as heatmapscore_table
shows a coloured summary table of scores
- renamed "calibration" to "coverage"
- renamed "true_values" to "true_value" in data.frames
- renamed "predictions" to "prediction" in data.frames
- renamed "is_overprediction" to "overprediction"
- renamed "is_underprediction" to "underprediction"
- the by argument in
eval_forecasts
now has a slightly changed meaning. It now denotes the lowest possible grouping unit, i.e. the unit of one observation and needs to be specified explicitly. The default is nowNULL
. The reason for this change is that most metrics need scoring on the observation level and this the most consistent implementation of this principle. The pit function receives its grouping now fromsummarise_by
. In a similar spirit,summarise_by
has to be specificed explicitly and e.g. doesn't assume anymore that you want 'range' to be included. - for the interval score,
weigh = TRUE
is now the default option. - (potentially planned) rename true_values to true_value and predictions to prediction.
- updated quantile evaluation metrics in
eval_forecasts
. Bias as well as calibration now take all quantiles into account - Included option to summarise scores according to a
summarise_by
argument ineval_forecasts
The summary can return the mean, the standard deviation as well as an arbitrary set of quantiles. eval_forecasts
can now return pit histograms.- switched to ggplot2 for plotting
- all scores in eval_forecasts were consistently renamed to lower case. Interval_score is now interval_score, CRPS is now crps etc.
- included support for grouping scores according to a vector of column names
in
eval_forecasts
- included support for passing down arguments to lower-level functions in
eval_forecasts
- included support for three new metrics to score quantiles with
eval_forecasts
: bias, sharpness and calibration
- example data now has a horizon column to illustrate the use of grouping
- documentation updated to explain the above listed changes
- included support for a long as well as wide input formats for
quantile forecasts that are scored with
eval_forecasts
- updated documentation for the
eval_forecasts
- added badges to the Readme