-
Notifications
You must be signed in to change notification settings - Fork 12
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default t-SNE parameters #22
Comments
Thanks! Learning rate: The default being used is 200. Thanks for bringing this up. I completely overlooked it. The implementation of SG-tSNE has not exposed this parameter. I have create an issue here and hope that the developers fix it soon. Once that is up, I will set learning rate dynamically based on no. of cells, as has been suggested. SG part:: You are mostly correct, the sum of rows in the affinity matrix add upto 1. The value of Initialization: Thank you for brining this to our notice. There is a major error in the explanation and will be corrected in the preprint asap. So, we perform PCA reduction ( |
Thanks for the replies. Re learning rate -- great! Some remarks here: if the SG-tSNE implementation has the factor 4 in the gradient (as sklean does), then the heuristic should be n/48 and not n/12. Also, 12 here comes from the early exaggeration, so I'd suggest you consider changing default alpha to 12 and early_iter to 250 to, but if you prefer to keep your defaults, then maybe the learning rate should be n/10 (or n/40). In any case for n=4mln you will change the learning rate from 200 to something like 100000 which is a HUGE change, so you definitely should run your benchmarks and see if everything still holds. I have no experience with SG-tSNE implementation so don't know if there are any caveats here. SG part: yes, I noticed that you set lambda to 1. This is what makes the affinities to sum to 1, if I understood correctly. The UMAP weights will not sum to 1, so you need to fix that before running t-SNE, and it seems that SG-tSNE will do that for you, i.e. choose some gamma exponents so that the weights in each row sum to 1. Correct me if I am wrong. Re initialization -- this makes perfect sense and is actually what I suspected you are doing :-) As we show in https://www.nature.com/articles/s41587-020-00809-z, it's important to have informative initialization, but there are many possible choices for what is informative. I can see two caveats here: (1) when you perform the PCA on And (2) the way you do it, you will have lots of points that are exactly overlapping in the initialization. I found this to cause a lot of numerical problems especially for Barnes-Hut but also for FFT the way it's implemented in FIt-SNE. So it's actually beneficial to add a tiny amount of noise to the initialization. But I don't know if it matters in the SG-tSNE implementation. In any case, your results look reasonable, so maybe none of these caveats matters for you. |
Learning rate: These are some fantastic suggestions! Thanks. It will interesting to see how the embedding benefits from an increased learning rate. I expect that runtime will get prolonged when using higher values for the learning rate. It will be interesting to compare the results between low learning rate + more iterations and high learning rate + fewer iterations. From the articles you shared, it seems that setting a high learning rate solves issues that a large number of iterations (under a reasonable limit) might not be able to solve. it will be interesting how the local neighborhood is preserved in these comparisons. SG part: Yes, lambda is set to 1 but this is not what makes the row affinities sum to 1. This happens here in the SG-tSNE code like a pre-processing step. The lambda rescaling happens here. I think that the lambda rescaling step is akin to smooth_knn_dist. Hence, by default, the lambda rescaling is turned off (by setting the value to 1). In UMAP, the equivalent to lambda would be Initialization: Again, thank you for a very constructive feedback here.
|
Yes, but what high learning rate solves is the incomplete early exaggeration phase, because increasing the number of iterations without increasing the length of the early exaggeration phase won't help it. I don't think this will matter much for local neighborhood preservation though... So I am actually very curious to see how the learning rate will affect your local neighborhood metric.
Hmm, I am not so sure. The fist link in your comment goes to a place where the entire P matrix is normalized to sum to 1. But not the individual rows... Equations 5-6 in the original paper suggest to me that rows are normalized to 1 before that, via lambda rescaling. But I see now that due to Equation 6 they will sum to 1 independent of the value of lambda. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Great paper and great package. Amazing work!
I am specifically interested in your UMAP/t-SNE comparisons and benchmarks and am now trying to figure out the SG-t-SNE-Pi default parameters that you use. As far as I understood your API, your defaults are
max_iter=500, early_iter=200, alpha=10
where alpha denotes early exaggeration coefficient. I noticed that 10 and 200 are slightly different from the default values in most existing t-SNE implementations (12 and 250), I wonder why. But they are pretty close so it does not really matter. What is not mentioned though is the learning rate. Learning rate can have a huge influence on t-SNE embeddings and speed of convergence. See https://www.nature.com/articles/s41467-019-13056-x and https://www.nature.com/articles/s41467-019-13055-y that recommend setting learning rate to n/12 where n is the sample size. What learning rate is used by the SG-t-SNE-Pi implementation that you use?Unrelated, if I understood the "SG" part and your implementation correctly, you construct a kNN graph using k=10, then assign UMAP weights to the edges, and then when running t-SNE, SG-t-SNE will normalize each row of the affinity matrix to sum to 1. Then symmetrize and run t-SNE as usual. Right? If I understood correctly, then this is pretty much exactly how it should be implemented in Scanpy soon, see pending scverse/scanpy#1561 by @pavlin-policar. Nice.
Finally, I am not entirely sure I understood your initialization approach. It's great that you use the same initialization for t-SNE and UMAP (another relevant paper here: https://www.nature.com/articles/s41587-020-00809-z). But I am confused by the following bit:
Is this a binary matrix that has 1 in position ij if cell i belongs to cluster j? If so, I'm not quite sure what's the point of running PCA on such a matrix? I'm probably misunderstanding.
The text was updated successfully, but these errors were encountered: