Skip to content

Reservoir Sampling algorithm of K items from an unknown N itemsize data stream (probability K/N to be sampled) - space complexity O(K)

License

Notifications You must be signed in to change notification settings

ioannapap/reservoir_sampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reservoir_sampling

A modified Reservoir Sampling algorithm that samples Κ items from a stream of N items, uniformly at random, so that each element has probability K/N to appear in the sample. The algorithm should work in a single pass over the data, reading the items one by one, without prior knowledge of the size of the stream N, and using O(Κ) of memory.

SETUP: (Prerequisite: Python)

This algorithm is implemented in Python3 and runs with the following command line format in Terminal:

python3 reservoir.py sample_number_K < input_file.txt

sample_number_K: any positive integer number

input_file.txt: any .txt file

Note that, it is necessary for the files to be in the same exact folder and run the above Terminal commands in that exact folder.

About

Reservoir Sampling algorithm of K items from an unknown N itemsize data stream (probability K/N to be sampled) - space complexity O(K)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages