You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a simplified example, let's say I'm parsing a file like this, which contains five lines but only three records (with two fields: a String and an Int):
abc,123
def,456
ghi,"j
k
l"
Lines 0 and 1 parse okay, so I process them immediately.
The last three lines constitute one record because of the quotes, and parsing fails because j\nk\nl is not a valid number. So I want to extract that record for separate, manual processing. To do that, I need a verbatim copy of the line(s) which couldn't be parsed:
ghi,"j
k
l"
Alternatively, the cell might represent an enum (really case objects, used to encode a record type), with a CellDecoder which decodes one of N strings into one of N enums. But if an unexpected String shows up, I need to save the entire record for later processing.
I'm dealing with a case like this right now, but I've had to make the assumption that there are never any quotes (or any multi-line records). Thus I can split the file/stream into lines and parse the lines one at a time, converting each line to Either[String, MyRow] (where a String is a bad line, and MyRow is a successfully parsed record). But this code could be simpler if the parser simply returned an error which included the offending line(s), plus it would work with quotes and multi-line records.
Note that a CellDecoder[Either[String, MyCell]] won't work, because I need the entire line (or multi-line record) to reprocess it.
A RowDecoder[Either[String, MyRow]] is closer, but it looks like RowDecoders only receive a Seq[String], and I need the entire original line (or lines) as a String, unaltered. Trying to convert the Seq[String] back to a String is bound to involve some loss (eg if there were extra fields at the end, or a trailling comma).
The text was updated successfully, but these errors were encountered:
Can you explain why you’d want a String rather than a CSV row (a Seq[String])? I think the rest of your issue is now clear, but I don’t understand why you’d want to re-implement the csv parsing logic yourself.
@nrinaudo I wouldn't want to re-implement the parsing as such, but if the row processing has gone terribly wrong for whatever reason, I want to save the record in its original, unaltered form. This way, after my code (or kantan.csv config) is fixed/updated, the data can be reparsed and reprocessed from the beginning.
I'm planning to send these unhandled records to a kind of dead letter queue, and move them back to the input queue when the code is ready. So I need a String which contains one or more CSV records, since that's what the input queue carries. And when things are going wrong, I don't want to introduce any unnecessary changes by converting String to Seq[String] and back to String.
As a simplified example, let's say I'm parsing a file like this, which contains five lines but only three records (with two fields: a
String
and anInt
):Lines 0 and 1 parse okay, so I process them immediately.
The last three lines constitute one record because of the quotes, and parsing fails because
j\nk\nl
is not a valid number. So I want to extract that record for separate, manual processing. To do that, I need a verbatim copy of the line(s) which couldn't be parsed:Alternatively, the cell might represent an enum (really case objects, used to encode a record type), with a CellDecoder which decodes one of N strings into one of N enums. But if an unexpected String shows up, I need to save the entire record for later processing.
I'm dealing with a case like this right now, but I've had to make the assumption that there are never any quotes (or any multi-line records). Thus I can split the file/stream into lines and parse the lines one at a time, converting each line to
Either[String, MyRow]
(where aString
is a bad line, andMyRow
is a successfully parsed record). But this code could be simpler if the parser simply returned an error which included the offending line(s), plus it would work with quotes and multi-line records.Related: #183
Note that a
CellDecoder[Either[String, MyCell]]
won't work, because I need the entire line (or multi-line record) to reprocess it.A
RowDecoder[Either[String, MyRow]]
is closer, but it looks likeRowDecoder
s only receive aSeq[String]
, and I need the entire original line (or lines) as aString
, unaltered. Trying to convert theSeq[String]
back to aString
is bound to involve some loss (eg if there were extra fields at the end, or a trailling comma).The text was updated successfully, but these errors were encountered: