Mixscape reproducibility #683

Lilly-May · 2024-12-09T15:58:06Z

PR Checklist

Referenced issue is linked (closes Reproducibility of mixscape #671)
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

Description of changes

As described in Reproducibility of mixscape #671, running Mixscape multiple times can produce varying results due to the non-deterministic GaussianMixture model. I added a seed parameter to pt.tl.Mixscape.mixscape.
In the original Mixscape implementation, there is a parameter called n_dims, which specifies the number of dimensions from the chosen representation (usually PCA) to be used for determining the nearest neighbors. I have added this parameter to our implementation of pt.tl.Mixscape.perturbation_signature and set its default value to 15, which is also the default used in the original implementation.
I added a test for pt.tl.Mixscape.perturbation_signature that not only verifies that a layer has been added to the AnnData object but also checks the correctness of the computed scores
Input required: The perturbation signature was previously calculated by subtracting the average expression of the N nearest control (NT) cells from the observed expression. But, in the original implementation, this subtraction is performed in the reverse order (see here). This only affects the sign of the result, but aligning our calculation with the original implementation would make it easier to compare scores. I suggest adopting our calculation to the original approach, but happy to discuss this further.

Zethson · 2024-12-09T19:46:25Z

The test fixes are unrelated and I'm hoping to fix them next week. Just ping me if you want a review, please.

Lilly-May · 2025-01-08T09:12:46Z

Originally, I wanted to add more to this PR, but I think it’s easier to create a separate one. Feel free to review, if you have time @Zethson!

Zethson

Great, thank you!

Input required: The perturbation signature was previously calculated by subtracting the average expression of the N nearest control (NT) cells from the observed expression. But, in the original implementation, this subtraction is performed in the reverse order (see here). This only affects the sign of the result, but aligning our calculation with the original implementation would make it easier to compare scores. I suggest adopting our calculation to the original approach, but happy to discuss this further.

Yes, let's adopt the way they did it. Whenever we can align with them, let's do it.

I had to merge main into your branch to get the CI to run and hopefully pass. You may now also have the code and issue of Why is np.round().astype("int64") applied to post_prob in Mixscape? #694. I'll have a look at this ASAP but your results may now be different.

Zethson · 2025-01-09T10:47:20Z

pertpy/tools/_mixscape.py

        copy: bool | None = False,
    ):
        """Identify perturbed and non-perturbed gRNA expressing cells that accounts for multiple treatments/conditions/chemical perturbations.

-        The implementation resembles https://satijalab.org/seurat/reference/runmixscape
+        The implementation resembles https://satijalab.org/seurat/reference/runmixscape. Note that in the original implementation, the
+        perturbation signature is calculated on unscaled data by default and we therefore recommend to do the same.


Should we test somewhere whether the data is unscaled and print a warning if it is? We can at least test whether the input is count or normalized data. There's code somewhere in pertpy for this I think. If not, please ping me.

codecov-commenter · 2025-01-09T11:23:25Z

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 64.89%. Comparing base (9bba130) to head (9be8648).
Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
pertpy/tools/_mixscape.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #683      +/-   ##
==========================================
- Coverage   65.56%   64.89%   -0.68%     
==========================================
  Files          47       46       -1     
  Lines        6105     5994     -111     
==========================================
- Hits         4003     3890     -113     
- Misses       2102     2104       +2

Files with missing lines	Coverage Δ
pertpy/tools/_mixscape.py	`80.28% <75.00%> (+1.16%)`	⬆️

Added Mixscape seeds and test

0c08bf5

github-actions bot added the bug Something isn't working label Dec 9, 2024

Lilly-May mentioned this pull request Dec 9, 2024

Reproducibility of mixscape #671

Closed

NguyenLuongHa mentioned this pull request Dec 16, 2024

Mixscape classification #688

Open

Lilly-May requested a review from Zethson January 8, 2025 09:12

Merge branch 'main' into fix/mixscape_reproducibility

9be8648

Zethson approved these changes Jan 9, 2025

View reviewed changes

Zethson merged commit 218ccb3 into main Jan 9, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixscape reproducibility #683

Mixscape reproducibility #683

Lilly-May commented Dec 9, 2024

Zethson commented Dec 9, 2024

Lilly-May commented Jan 8, 2025

Zethson left a comment

Zethson Jan 9, 2025

codecov-commenter commented Jan 9, 2025 •

edited

Loading

Mixscape reproducibility #683

Mixscape reproducibility #683

Conversation

Lilly-May commented Dec 9, 2024

Zethson commented Dec 9, 2024

Lilly-May commented Jan 8, 2025

Zethson left a comment

Choose a reason for hiding this comment

Zethson Jan 9, 2025

Choose a reason for hiding this comment

codecov-commenter commented Jan 9, 2025 • edited Loading

Codecov Report

codecov-commenter commented Jan 9, 2025 •

edited

Loading