Skip to content

Latest commit

 

History

History
200 lines (152 loc) · 11.8 KB

index.md

File metadata and controls

200 lines (152 loc) · 11.8 KB
layout pageClass hero features
home
my-index-page
name tagline image actions
1 Billion Row Challenge
Calculate the min, max, and average of <b>1 billion</b> measurements
/hero.png
theme text link
brand
Accept the challenge
#💪-the-challenge
theme text link
alt
Original blog post
icon title details link linkText
src
/c.png
1BRC in C/C++
Try your hand at processing 12 GB of text using low-level C code! ⚡
Submit your solution!
icon title details link linkText
src
/python.png
1BRC in Python
Use the power of snakes to read 1 billion lines of text! 🐍
Submit your solution!
icon title details link linkText
src
/go.png
1BRC in Go
Go get started to see if you can average 1B measurements in Go! 🐹
Submit your solution!
icon title details link linkText
src
/javascript.png
1BRC in JavaScript
Wrangle with the world's most popular programming language to process 1B rows! 💻
Submit your solution!
icon title details link linkText
src
/rust.png
1BRC in Rust
Embrace your inner iron crab and read a ginormous file in Rust! 🦀
Submit your solution!
icon title details link linkText
src
/zig.png
1BRC in Zig
Use this new language to process 1B rows of text! 🧩
Submit your solution!
icon title details link linkText
src
/php.png
1BRC in PHP
ElePHPants are not as slow as one might think! 🐘
Submit your solution!
icon title details link linkText
src
/CSharp.png
1BRC in C#
Sharpen your wide span&lt;T&gt;(ing) skills and refresh your memory&lt;T&gt;
Submit your solution!
icon title details link linkText
src
/java.png
<s>1BRC in Java</s> <small><i>closed</i></small>
<s>The original 1BRC language! 🎉</s>
View historical submissions
icon title details link linkText
src
/julia.png
<s>1BRC in Julia</s> <small><i>closed</i></small>
<s>The original 1BRC language! 🎉</s>
View historical submissions
<style> .my-index-page .VPHomeHero .actions .action .VPButton:not(:link) { background-color: transparent; } </style>

Don't see your favorite language listed above? Open an Issue to add it!

Choose one of the languages listed above to see the language-specific leaderboard and instructions for submitting your solution to that language's repository.

Global leaderboard

TODO: Make sure this is up-to-date

Time Solution Language Author
1. 6.159s link Java royvanrijn
2. 6.532s link Java Thomas Wuerthinger
3. 7.620s link Java Quan Anh Mai
4. 9.062s link Java obourgain
5. 9.338s link Java Elliot Barlas
6. 10.589s link Java Artsiom Korzun
7. 10.613s link Java Sam Pullara
8. 11.038s link Java Andrew Sun
9. 11.222s link Java Jamie Stansfield
10. 13.277s link Java Yavuz Tas
4m 13.449s link Java Reference implementation

You can view language-specific leaderboards on each language's competition page.

💪 The challenge

Your mission, should you choose to accept it, is to write a program that retrieves temperature measurement values from a text file and calculates the min, mean, and max temperature per weather station. There's just one caveat: the file has 1,000,000,000 rows! That's more than 10 GB of data! 😱

The text file has a simple structure with one measurement value per row:

Hamburg;12.0
Bulawayo;8.9
Palembang;38.8
Hamburg;34.2
St. John's;15.2
Cracow;12.6
... etc. ...

The program should print out the min, mean, and max values per station, alphabetically ordered. The format that is expected varies slightly from language to language, but the following example shows the expected output for the first three stations:

Hamburg;12.0;23.1;34.2
Bulawayo;8.9;22.1;35.2
Palembang;38.8;39.9;41.0

Oh, and this input.txt is different for each submission since it's generated on-demand. So no hard-coding the results! 😉

Choose a language from the cards at the top of this page to get started! 🚀

Rules and limits

  • No external library dependencies may be used. That means no lodash, no numpy, no Boost, no nothing. You're limited to the standard library of your language.

  • Implementations must be provided as a single source file. Try to keep it relatively short; don't copy-paste a library into your solution as a cheat.

  • The computation must happen at application runtime; you cannot process the measurements file at build time

  • Input value ranges are as follows:

    • Station name: non null UTF-8 string of min length 1 character and max length 100 bytes (i.e. this could be 100 one-byte characters, or 50 two-byte characters, etc.)
    • Temperature value: non null double between -99.9 (inclusive) and 99.9 (inclusive), always with one fractional digit
  • There is a maximum of 10,000 unique station names.

  • Implementations must not rely on specifics of a given data set. Any valid station name as per the constraints above and any data distribution (number of measurements per station) must be supported.

Entering the challenge

Some languages have special instructions but in general here's what you can expect:

  1. Create a fork of the 1BRC repository for your language on your own GitHub profile. This will let you submit your solution via a pull request.

  2. Somehow create a new implementation file in the repository. This will vary by language. For example in JavaScript you might create a new src/<username>.js file while in C++ you might make a new src/<username>.cpp file. It's recommended to copy the default reference solution to get started and then modify it from there.

  3. Make that implementation fast. Really fast.

  4. Test & benchmark your solution! There's usually language-specific instructions on how to do this but in general you run <some-command> bench <username> to run your solution against the reference implementation. If you see any differences, fix them before submitting your implementation.

  5. Create a pull request against the upstream repository! 🎉 There's usually some additional instructions in the Pull Request template on information you should include like how long it took on your computer and your computer's specs.

  6. Someone or some robot will run your solution "officially" on the same hardware as everyone else's solution (so no hardware differences) and report the results. If you're the fastest, you win! 🏆 If not, you'll still probably go on the leaderboard. 🥉

If you'd like to discuss any potential ideas for implementing 1BRC with the community, you can use the GitHub Discussions of this @1brc GitHub organization or the language-specific repository discussions. Please keep it friendly and civil.

Prize 🎁

If you enter this challenge, you may learn something new, get to inspire others, and take pride in seeing your name listed in the scoreboard above. Rumor has it that the winner of the Java competition (the original challenge language) may receive a unique 1️⃣🐝🏎️ t-shirt, too!

FAQ

Make sure you check your language-specific FAQ as well. 😉

What is the encoding of the measurements.txt file?

The file is encoded as UTF-8.

Can I make assumptions on the names of the weather stations showing up in the data set?

No. While only a fixed set of station names is used by the data set generator, any solution should work with arbitrary UTF-8 station names. For the sake of simplicity, names are guaranteed to contain no ; character.

Can I copy code from other submissions?

Yes, you can. The primary focus of the challenge is about learning something new, rather than "winning". When you do so, please give credit to the relevant source submissions. Please don't re-submit other entries with no or only trivial improvements.

My solution runs in 2 sec on my machine. Am I the fastest 1BRC-er in the world?

Probably not. 😊 1BRC results are reported in wallclock time, thus results of different implementations are only comparable when obtained on the same machine. If for instance an implementation is faster on a 32 core workstation than on the 8 core evaluation instance, this doesn't allow for any conclusions. When sharing 1BRC results, you should also always share the result of running the baseline implementation on the same hardware.

Why 1️⃣🐝🏎️?

It's the abbreviation of the project name: the One Billion Row Challenge.