There's plenty of other CSV parsers in the wild, but I had a hard time finding what I wanted. Specifically, I wanted something which had an interface similar to Python's csv
module. Furthermore, I wanted support for special use cases such as calculating statistics on very large files. Thus, this library was created with these following goals in mind:
This CSV parser uses multiple threads to simulatenously pull data from disk and parse it. Furthermore, it is capable of incremental streaming (parsing larger than RAM files), and quickly parsing data types.
This CSV parser is much more than a fancy string splitter, and follows every guideline from RFC 4180. On the other hand, it is also robust and capable of handling deviances from the standard. An optional strict parsing mode can be enabled to sniff out errors in files.
This CSV parser will handle ANSI and UTF-8 encoded files. It does not try to decode UTF-8, except for detecting and stripping byte order marks.
Easy to Use and Well-Documented
In additon to being easy on your computer's hardware, this library is also easy on you--the developer. Some helpful features include:
- Decent ability to guess the dialect of a file (CSV, tab-delimited, etc.)
- Ability to handle common deviations from the CSV standard, such as inconsistent row lengths, and leading comments
- Ability to manually set the delimiter and quoting character of the parser
This CSV parser has an extensive test suite and is checked for memory safety with Valgrind. If you still manage to find a bug, do not hesitate to report it.
Building and Compatibility (latest stable version)
This library was developed with Microsoft Visual Studio and is compatible with g++ and clang.
All of the code required to build this library, aside from the C++ standard library, is contained under include/
.
C++11 is the minimal version required. This library makes extensive use of string views, either through
Martin Moene's string view library or
std:string_view
when compiling with C++17. Please be aware of this if you use parts of the public API that
return string views.
This library is available as a single .hpp
file under single_include/csv.hpp
. This header includes all necessary
internal and external dependencies.
If you're including this in another CMake project, you can simply clone this repo into your project directory, and add the following to your CMakeLists.txt:
# Optional: Defaults to C++ 17
# set(CSV_CXX_STANDARD 11)
add_subdirectory(../csv-parser)
# ...
add_executable(<your program> ...)
target_link_libraries(<your program> csv)
With this library, you can easily stream over a large file without reading its entirety into memory.
C++ Style
# include "csv.hpp"
using namespace csv;
...
CSVReader reader("very_big_file.csv");
for (CSVRow& row: reader) { // Input iterator
for (CSVField& field: row) {
// By default, get<>() produces a std::string.
// A more efficient get<string_view>() is also available, where the resulting
// string_view is valid as long as the parent CSVRow is alive
std::cout << field.get<>() << ...
}
}
...
Old-Fashioned C Style Loop
...
CSVReader reader("very_big_file.csv");
CSVRow row;
while (reader.read_row(row)) {
// Do stuff with row here
}
...
Retrieving values using a column name string is a cheap, constant time operation.
# include "csv.hpp"
using namespace csv;
...
CSVReader reader("very_big_file.csv");
double sum = 0;
for (auto& row: reader) {
// Note: Can also use index of column with [] operator
sum += row["Total Salary"].get<double>();
}
...
If your CSV has lots of numeric values, you can also have this parser (lazily) convert them to the proper data type. Type checking is performed on conversions to prevent undefined behavior.
Note: Conversions to floating point types are not currently checked for loss of precision.
# include "csv.hpp"
using namespace csv;
...
CSVReader reader("very_big_file.csv");
for (auto& row: reader) {
if (row["timestamp"].is_int()) {
// Can use get<>() with any signed integer type
row["timestamp"].get<int>();
// ..
}
}
Although the CSV parser has a decent guessing mechanism, in some cases it is preferrable to specify the exact parameters of a file.
# include "csv.hpp"
# include ...
using namespace csv;
CSVFormat format;
format.delimiter('\t')
.quote('~')
.header_row(2); // Header is on 3rd row (zero-indexed)
// Alternatively, we can use format.delimiter({ '\t', ',', ... })
// to tell the CSV guesser which delimiters to try out
CSVReader reader("wierd_csv_dialect.csv", format);
for (auto& row: reader) {
// Do stuff with rows here
}
# include "csv.hpp"
using namespace csv;
...
// Method 1: Using parse()
std::string csv_string = "Actor,Character\r\n"
"Will Ferrell,Ricky Bobby\r\n"
"John C. Reilly,Cal Naughton Jr.\r\n"
"Sacha Baron Cohen,Jean Giard\r\n";
auto rows = parse(csv_string);
for (auto& r: rows) {
// Do stuff with row here
}
// Method 2: Using _csv operator
auto rows = "Actor,Character\r\n"
"Will Ferrell,Ricky Bobby\r\n"
"John C. Reilly,Cal Naughton Jr.\r\n"
"Sacha Baron Cohen,Jean Giard\r\n"_csv;
for (auto& r: rows) {
// Do stuff with row here
}
# include "csv.hpp"
# include ...
using namespace csv;
using namespace std;
...
stringstream ss; // Can also use ifstream, etc.
auto writer = make_csv_writer(ss);
writer << vector<string>({ "A", "B", "C" })
<< deque<string>({ "I'm", "too", "tired" })
<< list<string>({ "to", "write", "documentation" });
...
Bug reports, feature requests, and so on are always welcome. Feel free to leave a note in the Issues section.