Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to run DBSCAN in distributed memory? #1149

Open
lykos98 opened this issue Sep 20, 2024 · 5 comments
Open

Is it possible to run DBSCAN in distributed memory? #1149

lykos98 opened this issue Sep 20, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@lykos98
Copy link

lykos98 commented Sep 20, 2024

Hi!
I found this repository by reading the newly published paper on the advancements of the library. The paper mentions that you were able to run dbscan on 2*10^12 points, but in the documentation it is not clear how and if the actual implementation of dbscan works in distributed memory.

Can you provide an example of the algorithm applied also using mpi when the data is scattered between different processes?
Thank you!
F

@aprokop
Copy link
Contributor

aprokop commented Sep 20, 2024

Hi @lykos98. In the paper, the application (HACC) was responsible for the distributed memory. The way it is organized is that it construct local domains with some halos that guarantees that the clusters will be fully contained in the local data. That is what ArborX' algorithms were run on. We do not have a distributed DBSCAN implementation at the moment.

@lykos98
Copy link
Author

lykos98 commented Sep 23, 2024

Hi thank you for the quick answer! I understand implementing it in a true distributed fashion is quite difficult, nevertheless the performance reported in the paper is impressive. Congratulations!

@lykos98 lykos98 closed this as completed Sep 23, 2024
@aprokop
Copy link
Contributor

aprokop commented Sep 23, 2024

@lykos98 I'm not sure it's that difficult. In the back of my mind, I always thought that if we needed to implement it, we could combine our local algorithm with a distributed part from Patwary et al paper.

@lykos98
Copy link
Author

lykos98 commented Sep 30, 2024

I'm sorry for not getting back to you sooner. I missed the notification. Thank you for the reference! I had already encountered that paper a while ago, I did not mean actually in the previous comment "difficult" in the algorithmic sense but in the implementation side which needs a non-negligible amount of effort to make it work. I'll keep an eye on this repository, thank you again!

@aprokop
Copy link
Contributor

aprokop commented Nov 6, 2024

We are going to implement this.

@aprokop aprokop reopened this Nov 6, 2024
@aprokop aprokop added the enhancement New feature or request label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants