Skip to content

User defined aggregate function to calculate percentiles with Spark with linear complexity

Notifications You must be signed in to change notification settings

ivanlukomskiy/sparkPercentileUdaf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

Spark user defined aggregate function that calculates percentile with quick select algorythm.

It performs linear intepolation between adjacent ranks.

Quick select provides linear complexity of calculations.

Null values on the input are ignored.

Benchmarking results

Percentiles benchmarking

Other two benchmarking reports was performed on the same machine in the same conditions and can be used to compare performance with percentiles calculations:

Quickselect benchmarking

Quicksort benchmarking

Tests

Percentiles tests

License

Apache 2.0

This code was developed by me during my work in SBDA Group

About

User defined aggregate function to calculate percentiles with Spark with linear complexity

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages