Skip to content

will-lauer/datasketches-spark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache® DataSketches™ Spark Library

This repo is still an early-stage work in progress.

There have been multiple attempts to help integrate Apache DataSketches into Apache Spark, including one built into Spark itself as of v3.5. All are useful work, but in comparing them, there are various limitations to each library. Whether limitng the type of sketches available (e.g. native Spark provides only HLL) or limiting flexibility and functionality (e.g. forcing HLL and Theta to use a common interface which precludes set operations HLL cannot support, or using global parameters to control the sizes of all sketch instances in the query), the other libraries place undesirable constraints on developers looking to use sketches in their queries or data systems. This library aims to restore that choice to develoeprs.

Releases

No releases published

Packages

No packages published

Languages

  • Scala 100.0%