Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup StringBuilder methods #211

Merged
merged 36 commits into from
Jan 2, 2025
Merged

Conversation

airblast-dev
Copy link
Contributor

I haven't added the necessary tests yet so marking this as draft.

Many of StringBuilder's methods had logic that really should have been their own function with safe wrappers.
This PR moves some logic to their own methods and allows public calls to allocate space without pushing bytes.

This should also reduce the effect of the memory leaks from String as String::from_bytes will now only allocate the minimal amount needed. This shouldn't change performance in any call site as the capacity is not stored, meaning no length changes are possible once a String is constructed. This only helps to reduce the allocated size where arguments require a String and a user calls String::from_bytes.

As part of the cleanup a redundant null check was removed from StringBuilder::push_bytes as libc::realloc already handles the null case.

I have made the commit messages to make things are easy to follow.
Will mark this ready for review once I have added some tests.

use `StringBuilder::reserve` to avoid code duplication in
`StringBuilder::push_bytes`.

The null check and call to malloc was removed as `libc::realloc` already
handles the null case properly.

The extra capacity check was removed as `StringBuilder::reserve` only
allocates if needed.
The builder would allocate extra capacity even though it is not possible
to change the length of the buffer after initialization.

This makes it so only the space needed gets allocated.
`StringBuilder::reserve` and `StringBuilder::reserve_exact` already
handle the needed space for the null byte.
@airblast-dev
Copy link
Contributor Author

Fixed things locally, will push the latest version with new tests added and existing ones passing.

While mainly a fix for previous commits, this adds a few safety checks
and optimizations.

In cases of inserting even 0 bytes an allocation would be created with a
single null byte. This makes it so an allocation is only made if bytes
are actually being inserted.

NULL checks have been added for realloc as it may return a NULL pointer
if it is unable to allocate. Instead of UB it will now panic.
this should be reverted once a more sophisticated string mechanism in
place

until then this at least decreases the amount that is leaked
We already call `StringBuilder::reserve_exact` so we already know the
allocation cannot be shrinked further. This avoids an extra trip to the
allocator in `String::from_bytes`.
@airblast-dev
Copy link
Contributor Author

airblast-dev commented Dec 31, 2024

Added a few safety checks to avoid UB in edge cases. Passes the existing tests and the new ones.

The refactor makes it so String either points to null with a length of 0, or points to an allocation with a length bigger than zero. The old version would allocate even when pushing 0 bytes.

Lastly, a String allocation is now shrunk to its minimal size when calling StringBuilder::finish (this is not done in StringBuilder::finish_unchecked for cases where extra capacity is known to not exist such as String::from_bytes).

Should be ready for a review :)

Edit: In case the LOC is too much I can split this into multiple parts, though I think it would be pretty annoying to deal with as most of the changes are connected.

@airblast-dev airblast-dev marked this pull request as ready for review December 31, 2024 13:26
Copy link
Owner

@noib3 noib3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good after comments are addressed and tests pass. Thanks!

crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
crates/types/src/string.rs Outdated Show resolved Hide resolved
@noib3
Copy link
Owner

noib3 commented Jan 1, 2025

Note that a couple of tests are segfaulting.

@airblast-dev
Copy link
Contributor Author

Yeah, not really sure why. Weird as nothing seems wrong in the String tests. Looking into it.

@airblast-dev
Copy link
Contributor Author

airblast-dev commented Jan 1, 2025

Looked into the neovim source. Down the call chains the string is copied with xstrdup which doesn't handle the null ptr causing segfaults.

Not sure if it was intentional but String::from_bytes always allocating even for zero bytes was avoiding this problem.
This also means even String::new should allocate in case it is passed somewhere and it is attempted to be copied with memcpy or xstrdup...

Many of the functions work without an issue but to be safe its likely best to always allocate. For example the failing function relies on the null byte while some of them check the length, in order to even find the null byte the null pointer gets read and causes a segfault or the null pointer is used in a memcpy with mostly the same outcome.

I originally though that it could be an issue, but String::new() returning a null ptr made me think this was fine.

Not really sure what to do from here. Should I refactor the code to always allocate? or just leave this PR for the time being?

@noib3
Copy link
Owner

noib3 commented Jan 1, 2025

Having to allocate even for empty strings kinda sucks, but it's an upstream issue. I think, at least for the time being, we should just add a comment explaining why we allocate even when not necessary.

in order to even find the null byte the null pointer gets read

I'd have to look into it, but why does Neovim even need strings to be null-terminated when they have a len field?

@airblast-dev
Copy link
Contributor Author

airblast-dev commented Jan 1, 2025

I'd have to look into it, but why does Neovim even need strings to be null-terminated when they have a len field?

Not really sure tbh. I will look into the neovim source code again if I'm misunderstanding something or if there is an underlying bug somewhere. Regardless I am pretty sure that null pointers are a no-no. In case you want to try it out, set the minimum argument to StringBuilder::with_capacity to 1 inside String::from_bytes and all of the tests will pass.

After reading the implementation of nvim_create_user_command I decided to test if :lua vim.api.nvim_create_user_command('B', '\0bc\0d', {}) would register properly. Nope, it just registers an empty string. Essentially this means that a neovim string can contain null bytes as much as it wants, but it doesn't mean it will work properly depending on the code path. This confirms that the null byte must always exist at the end of the string even if the pointer is null or the length is zero.

A few notes in case someone is trying to fix things:

  • String must always contain a null byte at its end regardless of if the len is 0 or the pointer is null.
  • String may contain a null byte inside of it, however depending on the code path the string may be cut off at the first null byte.
  • String may point to null in some functions (functions that do a length check and/or null check). But it may cause UB or segfaults in others.

@noib3
Copy link
Owner

noib3 commented Jan 1, 2025

I'll do some more investigation to better understand strings' usage in Neovim.

As far as this PR is concerned, I say we just remove or comment out the if cap == 0 check in StringBuilder::with_capacity(), which should fix the segfaults.

@noib3 noib3 merged commit f5f8fb5 into noib3:main Jan 2, 2025
12 checks passed
@noib3
Copy link
Owner

noib3 commented Jan 2, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants