Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: categorical scatter plot showing distribution with large datasets #2767

Open
tengjuilin opened this issue Mar 21, 2022 · 1 comment

Comments

@tengjuilin
Copy link

I wonder if there is a way to show the distribution of data points in a categorical scatter plot with large datasets, where the width of the distribution of data points is proportional to the number of individual points at that Y value (like in the "standard" graph below)?

I'm aware that stripplot can plot all the points but it's difficult to show the distribution with large datasets (like the "expanded" graph). swarmplot typically gives "smile" shapes and is not suitable for large datasets (like the "classic" graph). violinplot shows the distribution but does not plot all the data points.

Thanks for the great package!


(Image source: https://www.graphpad.com/guides/prism/latest/user-guide/using_graphing_each_replicate.htm)

image
(Additional example image source: https://www.graphpad.com/guides/prism/latest/user-guide/better-looking-graphs.htm)

@mwaskom
Copy link
Owner

mwaskom commented Mar 21, 2022

One way I have seen this done is to make a stripplot where the width of the jitter is proportional to a kernel density estimate. That seems principled, but would not be easily implemented with the current seaborn internals. But, to be honest I don't feel like the three options here give me a particularly good understanding of what the actual shape of the distribution looks like besides "symmetric and lepokurtotic" ... there's sort of a fundamental upper limit on how much information you can convey about large datasets while preserving the representation of individual datapoints (but maybe 'symmetric and leptokurotic is enough?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants