Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache bucket locations #15

Merged
merged 10 commits into from
Jul 18, 2024

Conversation

henrikskog
Copy link
Contributor

@henrikskog henrikskog commented Jul 12, 2024

Fixes #14

Run with RUST_BACKTRACE=1 cargo test test_load_all_buckets -- --nocapture. This requires that you have an account authenticated with some s3 buckets though. I'll remove the tests if this gets approved. It's just the easiest way I have found to do manual testing.

  • Creates a file in $DEFAULT_STU_DIR/cache.json that is currently only used for the bucket locations
  • Fetches bucket locations from this cache if it gets a bucket that has been fetched before
  • Uses moka for the cache. I write the whole cache to file each time put is ran. Super inefficient, but does not really matter maybe? The cache is loaded from file at startup.
  • Factored out the check if a bucket is in a region to a seperate function to not bloat the other function too much

Very very rough and needs a lot of cleanup with logging and error handling probably. I thought it might be smart to hear your thoughts about the approach here before fixing up all the small things with the code. What do you think?

@henrikskog henrikskog force-pushed the cache-bucket-locations branch from b591851 to f636e90 Compare July 12, 2024 21:05
@henrikskog henrikskog force-pushed the cache-bucket-locations branch from f636e90 to d34ebb4 Compare July 12, 2024 21:16
@henrikskog henrikskog marked this pull request as draft July 13, 2024 18:13
@henrikskog
Copy link
Contributor Author

Marked as draft as this is not something that is ready to merge.

@lusingander
Copy link
Owner

Thank you for addressing this!

At this stage, I believe implementing the bare minimum is sufficient.

  • A library like moka doesn't seem necessary because there are no repeated references or updates while the application is running (at least for now).
  • I don't think multiple updates are much of a problem, but it seems better to read the entire cache first and save the entire cache at the end.
    • There are several possible places to do this, and any would be fine for now:
      • Read it before creating the client and pass it to new
      • Read it in new
      • Read it in load_all_buckets
      • Write it after load_all_buckets
      • Write it in load_all_buckets
      • (However, it seems better to be consistent about whether you do it inside or outside load_all_buckets.)
  • The cache file location looks fine.
  • Refactoring the function seems like a good idea.

@henrikskog
Copy link
Contributor Author

henrikskog commented Jul 16, 2024

A library like moka doesn't seem necessary because there are no repeated references or updates while the application is running (at least for now).

I removed the moka dep and created the cache manually instead

I don't think multiple updates are much of a problem, but it seems better to read the entire cache first and save the entire cache at the end.

I added the location check to the load_bucket method as well. Now the cache is written at the end of both load_bucket and load_buckets. IMO it would be clean to do it at the end of get_bucket_region at the expense of writing multiple times to file. This way the cache logic is only a concern of that spesific function.

(However, it seems better to be consistent about whether you do it inside or outside load_all_buckets.)

Hmm I'm now reading in the new method and writing inside the load-bucket methods. I think it's nice to keep it as an internal concern in Client.rs instead of doing it in app.rs?

@henrikskog henrikskog marked this pull request as ready for review July 16, 2024 19:18
Copy link
Owner

@lusingander lusingander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing this 🙂

  • I don't think the update to Cargo.lock is necessary, so could you please remove it?
  • Could you fix the compiler warnings that are causing the CI to fail?

src/cache.rs Outdated Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
src/client.rs Outdated Show resolved Hide resolved
src/client.rs Outdated Show resolved Hide resolved
src/client.rs Outdated Show resolved Hide resolved
src/client.rs Outdated Show resolved Hide resolved
@henrikskog
Copy link
Contributor Author

Thanks for the review. Knowing when to panick and when to propagate errors is a bit tricky I think.

@henrikskog henrikskog requested a review from lusingander July 18, 2024 11:44
Copy link
Owner

@lusingander lusingander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I have a few more comments 🙂

  • You need to address the lint errors. The CI runs cargo clippy --all-targets --all-features -- -D warnings.

Knowing when to panick and when to propagate errors is a bit tricky I think.

  • While I haven't strictly defined rules, at least get_bucket_region returns AppError, so I thought it would be good to handle that consistently.

src/cache.rs Outdated Show resolved Hide resolved
@henrikskog henrikskog requested a review from lusingander July 18, 2024 13:16
@lusingander
Copy link
Owner

Looks good, thanks!

@lusingander lusingander merged commit cc6f75c into lusingander:master Jul 18, 2024
1 check passed
@henrikskog henrikskog deleted the cache-bucket-locations branch July 18, 2024 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache location of s3 buckets
2 participants