Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append to an existing file? #160

Open
joshua-cooper opened this issue Sep 25, 2020 · 6 comments
Open

Append to an existing file? #160

joshua-cooper opened this issue Sep 25, 2020 · 6 comments
Labels

Comments

@joshua-cooper
Copy link

Is it possible to append to an existing file using this crate?

Writer will always add headers so I don't think it can be used for this.

@poros
Copy link
Collaborator

poros commented Sep 29, 2020

I honestly never tried, but I think that the Avro reader should be able to pick up a new header just fine. Have you given it a try already?

@poros poros added the question label Sep 29, 2020
@joshua-cooper
Copy link
Author

joshua-cooper commented Oct 6, 2020

Correct me if I'm wrong but to append using the reader I would need to read the entire file into memory first. Since the files I'm working with can be arbitrarily large that won't be possible.

I think there needs to be a way to opt out of writing the headers to get around this. Perhaps it's possible with the lower level parts of the crate but I haven't had any luck so far.

@JuliDi
Copy link

JuliDi commented Nov 3, 2020

Did you find any solution for this?

I am looking for a way to append additional fields/data to a file. Preferably without reading the whole file first and then writing the whole file again.

Something like this, using the Readme example:
First schema:

        let raw_schema = r#"
    {
        "type": "record",
        "name": "test",
        "doc": "just for testing purposes",
        "fields": [
            {"name": "a", "type": "long", "default": 42},
            {"name": "b", "type": "string"},
        ]
    }
"#;

Updated schema:

        let raw_schema = r#"
    {
        "type": "record",
        "name": "test",
        "doc": "just for testing purposes",
        "fields": [
            {"name": "a", "type": "long", "default": 42},
            {"name": "b", "type": "string"},
            {"name": "c", "type": "long", "default": 43}
        ]
    }
"#;

And now I would like to read (with the updated schema) an Avro file that has been created with the first schema and append the field "c" to that file. Is this possible with avro-rs?

Opening the file and just doing something like

        let mut record = Record::new(writer.schema()).unwrap();
        record.put("c", 33i64);

        // schema validation happens here
        writer.append(record).unwrap();

does not work for me (Validation error).
If this was (in theory) the right approach, I could provide a complete example of what I have tried.

@joshua-cooper
Copy link
Author

Did you find any solution for this?

Not yet unfortunately.

@poros
Copy link
Collaborator

poros commented Nov 27, 2020

Have you tried using the https://docs.rs/avro-rs/0.11.0/avro_rs/fn.to_avro_datum.html and https://docs.rs/avro-rs/0.11.0/avro_rs/fn.from_avro_datum.html functions by any chance? They don't do some of the validation and header handling that writer does and perhaps they could work better in a "seek file then read" kind of scenario.

@JuliDi
Copy link

JuliDi commented Dec 8, 2020

Thanks, I'll give that a try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants