-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with challenging data sets #9
Comments
Hi Ivan, Thank you for using our package! Are you trying to discover a particular structure from your dataset? Do you know what kind of structure it could like? For general purposes (such as discovering cluster structures) it may be useful to increase the FP_ratio while keeping all the other hyperparameters as default. Keep in mind that PaCMAP can only discover structure that already exist in the dataset, so you may want to know what kind of structure you would like to find before you perform the dimensionality reduction. |
Hi @hyhuang00 Many thanks for your suggestions. I believe that the dataset should have regions of high density and they are interconnected or show some degree of overlapping. For that reason, I believe that PaCMAP has some challenges to unfold the original structure. I will give a try with increasing FP_ratio, and let you know how did it work. Ivan |
Hi Ivan,
DR methods don’t tend to preserve high density - they tend to try to spread points out a bit so they look nice in 2d, which sounds like it might be the opposite of what you’re looking for. Anyway, please let us know if you are able to get it.
Cheers,
Cynthia
… On May 19, 2021, at 9:06 AM, ivan-marroquin ***@***.***> wrote:
Hi @hyhuang00
Many thanks for your suggestions. I believe that the dataset should have regions of high density and they are interconnected or show some degree of overlapping. For that reason, I believe that PaCMAP has some challenges to unfold the original structure.
I will give a try with increasing FP_ratio, and let you know how did it work.
Ivan
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hi Cynthia, Thanks for the advice. I shared results that I obtained by generating my own specified nearest neighbors: tree= NNDescent(, metric= 'minkowski', metric_kwds= {'p': 0.3}, n_neighbors= n_neighbors, nbrs= np.zeros((input_data.shape[0], n_neighbors), dtype= np.int32) scaled_dist= np.ones((train_attr_cube.shape[0], n_neighbors), dtype= np.float32) pair_neighbors= pacmap.pacmap.sample_neighbors_pair(train_attr_cube, scaled_dist, nbrs, np.int32(n_neighbors)) embedding= pacmap.PaCMAP(n_dims= 2, n_neighbors= n_neighbors, MN_ratio= 0.05, FP_ratio= 20.0, lr= 1.0, note that n_neighbors is set according to rule used in PaCMAp and FP_ratio = 2.0, 10.0 and 20.0 I noted that with an increased FP_ratio, the computation time increased as well. The results are in the attached zip file. Any comments/suggestions? Ivan |
Hi,
Many thanks for such great package! I found very interesting the dimensionality reduction approach proposed in PaCMAP when compared to other techniques. So, I decided to give a try with my data sets.
I tried different initial conditions:
In all tests, I always get a "blob"
So, I am looking for you suggestions/comments. I provided a Python script with one of my data sets (see attached file).
Many thanks,
Ivan
testing_dim_reduction.zip
The text was updated successfully, but these errors were encountered: