Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixscape reproducibility #683

Merged
merged 2 commits into from
Jan 9, 2025
Merged

Mixscape reproducibility #683

merged 2 commits into from
Jan 9, 2025

Conversation

Lilly-May
Copy link
Collaborator

PR Checklist

  • Referenced issue is linked (closes Reproducibility of mixscape #671)
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Description of changes

  • As described in Reproducibility of mixscape #671, running Mixscape multiple times can produce varying results due to the non-deterministic GaussianMixture model. I added a seed parameter to pt.tl.Mixscape.mixscape.
  • In the original Mixscape implementation, there is a parameter called n_dims, which specifies the number of dimensions from the chosen representation (usually PCA) to be used for determining the nearest neighbors. I have added this parameter to our implementation of pt.tl.Mixscape.perturbation_signature and set its default value to 15, which is also the default used in the original implementation.
  • I added a test for pt.tl.Mixscape.perturbation_signature that not only verifies that a layer has been added to the AnnData object but also checks the correctness of the computed scores
  • Input required: The perturbation signature was previously calculated by subtracting the average expression of the N nearest control (NT) cells from the observed expression. But, in the original implementation, this subtraction is performed in the reverse order (see here). This only affects the sign of the result, but aligning our calculation with the original implementation would make it easier to compare scores. I suggest adopting our calculation to the original approach, but happy to discuss this further.

@github-actions github-actions bot added the bug Something isn't working label Dec 9, 2024
@Zethson
Copy link
Member

Zethson commented Dec 9, 2024

The test fixes are unrelated and I'm hoping to fix them next week. Just ping me if you want a review, please.

@Lilly-May
Copy link
Collaborator Author

Originally, I wanted to add more to this PR, but I think it’s easier to create a separate one. Feel free to review, if you have time @Zethson!

@Lilly-May Lilly-May requested a review from Zethson January 8, 2025 09:12
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you!

  1. Input required: The perturbation signature was previously calculated by subtracting the average expression of the N nearest control (NT) cells from the observed expression. But, in the original implementation, this subtraction is performed in the reverse order (see here). This only affects the sign of the result, but aligning our calculation with the original implementation would make it easier to compare scores. I suggest adopting our calculation to the original approach, but happy to discuss this further.

Yes, let's adopt the way they did it. Whenever we can align with them, let's do it.

  1. I had to merge main into your branch to get the CI to run and hopefully pass. You may now also have the code and issue of Why is np.round().astype("int64") applied to post_prob in Mixscape? #694. I'll have a look at this ASAP but your results may now be different.

copy: bool | None = False,
):
"""Identify perturbed and non-perturbed gRNA expressing cells that accounts for multiple treatments/conditions/chemical perturbations.

The implementation resembles https://satijalab.org/seurat/reference/runmixscape
The implementation resembles https://satijalab.org/seurat/reference/runmixscape. Note that in the original implementation, the
perturbation signature is calculated on unscaled data by default and we therefore recommend to do the same.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test somewhere whether the data is unscaled and print a warning if it is? We can at least test whether the input is count or normalized data. There's code somewhere in pertpy for this I think. If not, please ping me.

@codecov-commenter
Copy link

codecov-commenter commented Jan 9, 2025

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 64.89%. Comparing base (9bba130) to head (9be8648).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
pertpy/tools/_mixscape.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #683      +/-   ##
==========================================
- Coverage   65.56%   64.89%   -0.68%     
==========================================
  Files          47       46       -1     
  Lines        6105     5994     -111     
==========================================
- Hits         4003     3890     -113     
- Misses       2102     2104       +2     
Files with missing lines Coverage Δ
pertpy/tools/_mixscape.py 80.28% <75.00%> (+1.16%) ⬆️

@Zethson Zethson merged commit 218ccb3 into main Jan 9, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reproducibility of mixscape
3 participants