Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: tbl_svysummary() INDEX error #2092

Closed
ddsjoberg opened this issue Dec 3, 2024 · 3 comments
Closed

Bug: tbl_svysummary() INDEX error #2092

ddsjoberg opened this issue Dec 3, 2024 · 3 comments

Comments

@ddsjoberg
Copy link
Owner

dat_dput <- 
  structure(
    list(
      region = c(3, 3, 3, 3, 3, 3, 3, 3, 7, 7), 
      year = c(1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972, 1972), 
      weights = c(0.663196271930943, 0.917370028585327, 0.897412512251031, 1.06634082743438, 0.94432371066466, 0.526887241987567, 
                  0.526887241987567, 0.546578869586901, 0.283198307048893, 0.494322145606406)), 
    row.names = c(NA, 10L), class = "data.frame")
 

dat_svy <- dat_dput |> 
  srvyr::as_survey_design(
    ids = region, 
    strata = year, 
    weights = weights, 
    nest = TRUE)

cardx::ard_continuous(dat_svy, variables = weights)
#> {cards} data frame: 3 x 8
#>   variable   context stat_name stat_label  stat fmt_fn
#> 1  weights continuo…    median     Median 0.897      1
#> 2  weights continuo…       p25  25% Perc… 0.527      1
#> 3  weights continuo…       p75  75% Perc… 0.944      1
#> ℹ 2 more variables: warning, error
gtsummary::tbl_svysummary(dat_svy, include = weights)
#> Warning in svymean.survey.design2(reformulate2(variable), design = data, :
#> Sample size greater than population size: are weights correctly scaled?
#> Error in tapply(y, by, sum, na.rm = na.rm, default = 0L): 'INDEX' is of length zero

Created on 2024-12-03 with reprex v2.1.1

ddsjoberg added a commit to insightsengineering/cardx that referenced this issue Dec 28, 2024
**What changes are proposed in this pull request?**
* Update in `ard_missing.survey.design()` where we can now tabulate the
missing rate of design variables, such as the weights.

**Reference GitHub issue associated with pull request.** _e.g., 'closes
#<issue number>'_
ddsjoberg/gtsummary#2092


--------------------------------------------------------------------------------

Pre-review Checklist (if item does not apply, mark is as complete)
- [x] **All** GitHub Action workflows pass with a ✅
- [x] PR branch has pulled the most recent updates from master branch:
`usethis::pr_merge_main()`
- [x] If a bug was fixed, a unit test was added.
- [x] If a new `ard_*()` function was added, it passes the ARD
structural checks from `cards::check_ard_structure()`.
- [x] If a new `ard_*()` function was added, `set_cli_abort_call()` has
been set.
- [x] If a new `ard_*()` function was added and it depends on another
package (such as, `broom`), `is_pkg_installed("broom")` has been set in
the function call and the following added to the roxygen comments:
`@examplesIf do.call(asNamespace("cardx")$is_pkg_installed, list(pkg =
"broom""))`
- [x] Code coverage is suitable for any new functions/features
(generally, 100% coverage for new code): `devtools::test_coverage()`

Reviewer Checklist (if item does not apply, mark is as complete)

- [ ] If a bug was fixed, a unit test was added.
- [ ] Code coverage is suitable for any new functions/features:
`devtools::test_coverage()`

When the branch is ready to be merged:
- [ ] Update `NEWS.md` with the changes from this pull request under the
heading "`# cardx (development version)`". If there is an issue
associated with the pull request, reference it in parentheses at the end
update (see `NEWS.md` for examples).
- [ ] **All** GitHub Action workflows pass with a ✅
- [ ] Approve Pull Request
- [ ] Merge the PR. Please use "Squash and merge" or "Rebase and merge".
@ddsjoberg
Copy link
Owner Author

Hi @larmarange , It looks like this error arises from survey::svytable(). Before I close this out, is there something I am missing? Should we be supporting this type of tabulation?

library(magrittr)

dplyr::tribble(
  ~region, ~year,          ~weights,
  3,        1972, 0.663196271930943,
  3,        1972, 0.917370028585327,
  3,        1972, 0.897412512251031,
  3,        1972,  1.06634082743438,
  3,        1972,  0.94432371066466,
  3,        1972, 0.526887241987567,
  3,        1972, 0.526887241987567,
  3,        1972, 0.546578869586901,
  7,        1972, 0.283198307048893,
  7,        1972, 0.494322145606406
) %>%
  survey::svydesign(
    data = .,
    ids = ~region,
    strata = ~year,
    weights = ~weights,
    nest = TRUE
  ) %>%
  survey::svytable(formula = ~weights, design = .)
#> Error in tapply(y, by, sum, na.rm = na.rm, default = 0L): 'INDEX' is of length zero

Created on 2024-12-27 with reprex v2.1.1

@larmarange
Copy link
Collaborator

Yes, svytable() has a bug when applied to the weighting variable. The issue could eventually be raised to the survey package.

However, I do not see when generating such table. I do understand that you are interested by looking at the weighting variable. But in that case you will perform a tbl_summary() (unweighted stats). I do not see that would be the purpose of describing the weight variable using the weights themselves...

@ddsjoberg
Copy link
Owner Author

Thanks for the details @larmarange ! I'll close it out and if someone raises it with survey in the future that'll be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants