CSV file reader with an e-mail's domain occurrences counter.
- Use buffered reading (bufio package),
- Parallel processing (processes email domains concurrently using worker goroutines),
- Optimize data structures (structs for readability/maintainability improvement),
- Benchmark tests (benchmarking different-sized input data files),
- Code profiling (pprof tool to identify specific bottlenecks).
-
To override config variables change the values in .env file. The default values:
CONCURRENCY=4 INPUT_CSV_FILE_PATH_DEFAULT=./data/test/customers_3k_lines.csv INPUT_CSV_FILE_PATH_0_LINES=../data/test/customers_0_lines.csv INPUT_CSV_FILE_PATH_10_LINES=../data/test/customers_10_lines.csv INPUT_CSV_FILE_PATH_3K_LINES=../data/test/customers_3k_lines.csv INPUT_CSV_FILE_PATH_10M_LINES=../data/test/customers_10m_lines.csv* READ_BUFFER_SIZE_IN_BYTES=4096
* customers_10m_lines.csv file is stored locally due to the size (over 500 MB). It is used in benchmark tests.
-
Run program
make run
-
Run tests
make test
-
Run benchmark
make benchmark