Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - ActionKV Index Maintenance - maybe bug #11

Conversation

Michael-J-Ward
Copy link

Hey - want to start out by saying thank again for the great book!

The insert_but_ignore_index code appears incorrect to me. If the underlying file is pointing at position in_the_middle when insert is called, the KV will be correctly appended at the end of the file, but the index will say the KV is at position in_the_middle

let next_byte = SeekFrom::End(0);
let current_position = f.seek(SeekFrom::Current(0))?;
f.seek(next_byte)?;
f.write_u32::<LittleEndian>(checksum)?;
f.write_u32::<LittleEndian>(key_len as u32)?;
f.write_u32::<LittleEndian>(val_len as u32)?;
f.write_all(&mut tmp)?;
Ok(current_position)

I wrote test_index_maintenance to confirm my suspicion, expecting it to fail. But the test passes!

To understand what was happening, I altered get_at to print out the file' position after performing the read - and then the test fails!

    pub fn get_at(&mut self, position: u64) -> io::Result<KeyValuePair> {
        let mut f = BufReader::new(&mut self.f);
        f.seek(SeekFrom::Start(position))?;
        let kv = ActionKV::process_record(&mut f)?;

        // Even though `f.seek` does take `&mut self`, my understanding is calling it
        // with `SeekFrom::Current(0)` simply says "where am I at in the stream" without
        // altering anything. In fact there's even a convenience function that claims
        // exactly this: https://doc.rust-lang.org/std/io/trait.Seek.html#method.stream_position
        //
        // However, commenting these two lines makes `test_index_maintenance` pass
        let final_pos = f.seek(SeekFrom::Current(0))?;
        println!("after reading at {}, final_pos is {}", position, final_pos);
        Ok(kv)
    }

So now there's at least these few options on what's going on.

  1. .seek(SeekFrom::Current(0)) is not actually a functional no-op, and calling it breaks the system in some expected way. i.e. what I thought looked like a bug in .insert isn't a bug.
  2. There is a bug in .insert and for some magic the .seek(SeekFrom::Current(0)) sheds light on it. (Maybe something due to the extra buffer flush?
  3. .seek(SeekFrom::Current(0)) is intended to be a no-op, but there is a bug in it that this case triggered.

I'd assume 1 is by far the most likely, followed by 2 and then 3 a long way off.

Either way hope this helps and interested to hear what the answer is!

(Related: #4 since it also mentions the in memory index)

Reproducing the doc comment in `get_at` here for posterity

Even though `f.seek` does take `&mut self`, my understanding is calling it
with `SeekFrom::Current(0)` simply says "where am I at in the stream" without
altering anything. In fact there's even a convenience function that claims
exactly this: https://doc.rust-lang.org/std/io/trait.Seek.html#method.stream_position

However, commenting these two lines makes `test_index_maintenance` pass
The method no longer assumes the storage file is already pointing
at the end of the file.

Previously, this bug was covered up small storage sizes by the
BufReader used in `get`. When the entire file is smaller than the
buffer, any read to the file will result in the file's seek
position pointing to the end of the file.

My debug print statement revealed the bug because calling
BufReader.seek(SeekFrom::Current(0)) was resetting the
file position to the same location it would have been
without the buffer.
@Michael-J-Ward
Copy link
Author

Michael-J-Ward commented Aug 2, 2020

So the answer is that BufReader conceals the bug anytime the underlying storage file is smaller than the buffer.

In that case, any read results in BufReader reading in the entire file and that file's seek position pointing to the end.

My debug print statement "revealed" the bug because it sets the seek position back to it's "natural" position.

If you'd like to verify this yourself, run the test as in 8fb3d55, and then alter the final_pos line to retrieve the current seek from the underlying storage file instead of the buffer.

See: https://doc.rust-lang.org/src/std/io/buffered.rs.html#374-379

        if let SeekFrom::Current(n) = pos {
            let remainder = (self.cap - self.pos) as i64;
            // it should be safe to assume that remainder fits within an i64 as the alternative
            // means we managed to allocate 8 exbibytes and that's absurd.
            // But it's not out of the realm of possibility for some weird underlying reader to
            // support seeking by i64::min_value() so we need to handle underflow when subtracting
            // remainder.
            if let Some(offset) = n.checked_sub(remainder) {
                result = self.inner.seek(SeekFrom::Current(offset))?;
            } else {
                // seek backwards by our remainder, and then by the offset
                self.inner.seek(SeekFrom::Current(-remainder))?;
                self.discard_buffer();
                result = self.inner.seek(SeekFrom::Current(n))?;
            }

@timClicks
Copy link
Contributor

Thank you very much for looking into this @Michael-J-Ward.

@timClicks
Copy link
Contributor

I'll mark this PR as merged as I update my manuscript (which is held in a private repo on Manning's infrastructure).

@timClicks
Copy link
Contributor

Thanks again for the input @Michael-J-Ward.

@timClicks timClicks closed this Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants