Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New parameter 'seed' and code cleanup on geom_sina(). #104

Merged
merged 6 commits into from
Nov 14, 2018

Conversation

sidiropoulos
Copy link
Contributor

@sidiropoulos sidiropoulos commented Aug 15, 2018

Fixes #90
I reset my master fork and recreated the previous pull request.

@sidiropoulos sidiropoulos changed the title New parameter 'seed' and code cleanup on geom_sina(). Fixes #90 New parameter 'seed' and code cleanup on geom_sina(). Aug 15, 2018
@sidiropoulos
Copy link
Contributor Author

@thomasp85 can you have a look in merging the pull request? It only affects sina.R and it fixes a reported bug and adds a requested seed parameter.

@thomasp85
Copy link
Owner

I generally hate having seed as an argument. It will affect all code that runs afterwards, but this is not apparent to the user, who often thinks that it only affects the code within the function... setting set.seed(1) before plotting achieves the same but makes it obvious that the global state is changed.

Is there a very real reason to have it as an argument?

@sidiropoulos
Copy link
Contributor Author

It mainly serves reproducibility purposes.
What do you mean by "will affect all code that runs after the function"?

I tried consecutive calls of geom_sina, one with seed and one without and there is no memory of the seed in the second one.

p <- ggplot(midwest, aes(state, popdensity)) + scale_y_log10()
p + geom_sina(seed = 1) + geom_sina(color = "2")

image

@thomasp85
Copy link
Owner

The memory is in how it affects all subsequent sampling.

p <- ggplot(midwest, aes(state, popdensity)) + scale_y_log10()
p + geom_sina(seed = 1) + geom_sina(color = "2")
sample(10)

If you call the above block repeatedly, you'll notice that the sample call will always return the same sequence, even though it is supposed to be random. This will confuse people that only expect the seed argument to affect the produced plot and not code that follows it. On the other hand, the following block makes it obvious that you are meddling with the global state:

set.seed(10)
p <- ggplot(midwest, aes(state, popdensity)) + scale_y_log10()
p + geom_sina() + geom_sina(color = "2")
sample(10)

and people will not be surprised that all code will be affected by it.

So, if the seed argument is to remain it will need to be implemented in such a way that it does not affect the global environment. One way of achieving that is to getting a new seed prior to setting it so your code will be:

if (!is.null(params$seed)) {
  new_seed <- sample(.Machine$integer.max, 1)
  set.seed(params$seed)
  on.exit(set.seed(new_seed))
}

Another way would simply be to educate users about using set.seed() when they want to ensure reproducibility....

@sidiropoulos
Copy link
Contributor Author

I see, thanks for clarifying that. I followed the first of your suggestions and removed the parameter completely, as it seems like the more neat solution.

@thomasp85
Copy link
Owner

Have you been requested to add it? In that case you should probably forward the set.seed() solution to them

@sidiropoulos
Copy link
Contributor Author

I just did. Is the rest of the request ok? I am working on this request on a separate branch and would like to get done with this one first before I make any new merges.

@thomasp85 thomasp85 merged commit 9ce6079 into thomasp85:master Nov 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants