-
Notifications
You must be signed in to change notification settings - Fork 192
PARQUET-451: Add RowGroupReader helper class and refactor parquet_reader.cc into DebugPrint #23
Conversation
505f7e6
to
533bb10
Compare
ColumnReader* reader = make_column_reader(&col.meta_data, | ||
&this->parent_->metadata_.schema[i + 1], input.release()); | ||
|
||
column_readers_[i] = std::shared_ptr<ColumnReader>(reader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a non-zero chance of memory leak if the program crashes before or during line 121. This is marginally better:
column_readers_[i].reset(make_column_reader(...));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make make_column_reader
return a shared_ptr
then combine these two lines.
switch-on-type statements. Add parquet::SchemaElement* member to Decoder<T>, for FLBA metadata.
parquet_reader.cc into ParquetFileReader::DebugPrint
@nongli this has been rebased (the conflicts weren't too bad) and here's a green build https://travis-ci.org/wesm/parquet-cpp/builds/105058183 I've been hassling Travis CI about the build problems and they are having infrastructure issues that has caused the build queue to stall. I think it's ridiculous, and the ASF is supposed to have 30 concurrent build slaves, so this really should not be happening. pls let me know what other code comments you have and we can get this merged. Then @majetideepak can refactor #24 to be compatible with this revamped code structure |
+1, LGTM. @nongli |
@@ -269,4 +167,36 @@ bool ColumnReader::ReadNewPage() { | |||
return true; | |||
} | |||
|
|||
std::shared_ptr<ColumnReader> make_column_reader(const parquet::ColumnMetaData* metadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MakeColumnReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. What do you think about making this a static method like ColumnReader::Make
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok done.
This looks good. Can we update reader-test to read all the data files in the repo and verify some results? (as another PR). |
Opened https://issues.apache.org/jira/browse/PARQUET-475. Will plan to write some more smoke tests to verify the data in those files. Within short order (next few weeks), we really need to be able to round-trip data to files so that unit tests can generate test data and verify it can be read and written successfully. |
@nongli thank you, good to go |
I closed my pull request to avoid complicated rebasing with the current re-factored code. |
Merged. Thanks! |
thank you sir! |
This also addresses PARQUET-433 and PARQUET-453.