Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: n_components=1 #82

Open
thiswillbeyourgithub opened this issue Nov 21, 2024 · 4 comments
Open

Feature Request: n_components=1 #82

thiswillbeyourgithub opened this issue Nov 21, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@thiswillbeyourgithub
Copy link

Hi,

It seems setting n_components to 1 errors out because PaCMAP does not support dimensions less than 2.

I think it would be nice, as I was planning on testing it on repeng which is a small library that allows to nudge an LLM by injecting a 1D vectors that was created by dimension reduction of the internal representation of a dataset of pairs of good vs bad texts.

Is that something that will never happen or is there a way?

Thanks!

@hyhuang00
Copy link
Collaborator

Thank you for your interest in PaCMAP. I think that the algorithm will be able to run in the case when n_components=1, but I haven't personally tried to do so. As a result, we put an error statement that serves as a sanity check. You can try to remove the three lines and see if it works.

@thiswillbeyourgithub
Copy link
Author

Hi!

Thanks a lot for the quick reply. I can attest that so far simply removing the line works without crash.

As I can't properly evaluate the quality of the dimension reduction in this scenario, what do you think of replacing the line by a warning that this has not been thoroughly evaluated / at the user's own risk? If you then released the new version that would simplify my reproduction instruction for my projects. The alternative is to instruct to users how to patch the lib.

Thanks!

@mathematicalmichael
Copy link
Contributor

mathematicalmichael commented Nov 26, 2024

hi @thiswillbeyourgithub and @hyhuang00.
funny enough, I'm actually evaluating this exact same use case at the moment.

I have a particularly good test-scenario @thiswillbeyourgithub: sample RGB colorspace, dimension reduce to 1, and compare to the ordering (invariant to start position) that arises from the H (hue) channel of HSV (from matplotlib.colors import rgb_to_hsv).

I'm evaluating the source code and paper as well, to think through the implications of trying this. I'm really glad to know that code-wise, it worked. I'll try this on my branch. feature/unit-dimension

@hyhuang00 hyhuang00 added the enhancement New feature or request label Nov 26, 2024
@mathematicalmichael
Copy link
Contributor

each of these uses "white" as its 12-o-clock value, which in HSV space should land next to "black" and "red" as well.

hsv_sorted_colors_circle
keep this in mind when looking at images - you want to see red at the top ideally, but ultimately the ordering is what matters (reds across from greens, blues across from yellows)

here are the default parameters of PaCMAP (fwiw, I had apply_pca off since the dimension was 3, but it seems to make no difference) on my dataset:

21pacmap_sorted_colors_circle
42pacmap_sorted_colors_circle
honestly not so great, fairly comparable to UMAP by setting up the defaults.

with a bit of tuning, I was able to get better results though...

what I LOVE about PaCMAP is that I get intuitive visual results when tuning MN_ratio and FP_ratio by small amounts with a fixed seed, I can see some level of continuity in the transform.

(1.0, 1.0) with 50 neighbors
21pacmap_sorted_colors_circle

(1.0, 1.2)
21pacmap_sorted_colors_circle

(1.0, 1.4)
21pacmap_sorted_colors_circle

(1.0, 1.0) with 100 neighbors is starting to look interesting.

21pacmap_sorted_colors_circle

but I imagine it may take a bit of work to see if the RGB -> H mapping can be discovered. Certainly am trying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants