Is it possible to run DBSCAN in distributed memory? #1149

lykos98 · 2024-09-20T17:35:37Z

Hi!
I found this repository by reading the newly published paper on the advancements of the library. The paper mentions that you were able to run dbscan on 2*10^12 points, but in the documentation it is not clear how and if the actual implementation of dbscan works in distributed memory.

Can you provide an example of the algorithm applied also using mpi when the data is scattered between different processes?
Thank you!
F

aprokop · 2024-09-20T17:59:36Z

Hi @lykos98. In the paper, the application (HACC) was responsible for the distributed memory. The way it is organized is that it construct local domains with some halos that guarantees that the clusters will be fully contained in the local data. That is what ArborX' algorithms were run on. We do not have a distributed DBSCAN implementation at the moment.

lykos98 · 2024-09-23T07:28:41Z

Hi thank you for the quick answer! I understand implementing it in a true distributed fashion is quite difficult, nevertheless the performance reported in the paper is impressive. Congratulations!

aprokop · 2024-09-23T10:21:29Z

@lykos98 I'm not sure it's that difficult. In the back of my mind, I always thought that if we needed to implement it, we could combine our local algorithm with a distributed part from Patwary et al paper.

lykos98 · 2024-09-30T10:15:32Z

I'm sorry for not getting back to you sooner. I missed the notification. Thank you for the reference! I had already encountered that paper a while ago, I did not mean actually in the previous comment "difficult" in the algorithmic sense but in the implementation side which needs a non-negligible amount of effort to make it work. I'll keep an eye on this repository, thank you again!

aprokop · 2024-11-06T13:32:03Z

We are going to implement this.

lykos98 closed this as completed Sep 23, 2024

aprokop reopened this Nov 6, 2024

aprokop added the enhancement label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to run DBSCAN in distributed memory? #1149

Is it possible to run DBSCAN in distributed memory? #1149

lykos98 commented Sep 20, 2024

aprokop commented Sep 20, 2024

lykos98 commented Sep 23, 2024

aprokop commented Sep 23, 2024

lykos98 commented Sep 30, 2024

aprokop commented Nov 6, 2024

Is it possible to run DBSCAN in distributed memory? #1149

Is it possible to run DBSCAN in distributed memory? #1149

Comments

lykos98 commented Sep 20, 2024

aprokop commented Sep 20, 2024

lykos98 commented Sep 23, 2024

aprokop commented Sep 23, 2024

lykos98 commented Sep 30, 2024

aprokop commented Nov 6, 2024