Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataApi] Get BlobCount By AccountID #541

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

siddimore
Copy link
Contributor

@siddimore siddimore commented May 3, 2024

Why are these changes needed?

These changes enable querying blob count by AccountId

Changes

  1. Add GSI AccountId
  2. Add method to GetBlobMetadataCount by AccountId
  3. Assign accountKey to accountId in BlobAuthHeader.
  4. Update DataApiHandler to getBlobMetadataCount by AccountId
  5. Define DataApi Endpt for getting BlobCount by AccountId

Checks

  • I've made sure the lint is passing in this PR.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@siddimore siddimore marked this pull request as draft May 5, 2024 22:01
@siddimore siddimore marked this pull request as ready for review May 6, 2024 00:43
@siddimore siddimore requested a review from dmanc May 6, 2024 04:43
@siddimore siddimore changed the title [Disperser] Assign acconuntKey to accountId [DataApi] Get BlobCount By AccountID May 6, 2024
@siddimore siddimore requested review from jianoaix and pschork May 7, 2024 01:18
Copy link
Contributor

@ian-shim ian-shim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.
Is the primary purpose of adding the new field and index to support count query?
I feel like getting the actual metadata might be more useful. Maybe we can support that later?

disperser/dataapi/server.go Outdated Show resolved Hide resolved
q.mu.RLock()
defer q.mu.RUnlock()
count := int32(0)
for _, meta := range q.Metadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This for loop can potentially be really large right?

Copy link
Contributor Author

@siddimore siddimore May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true but this is only for testing purpose.....

@siddimore
Copy link
Contributor Author

Overall looks good to me. Is the primary purpose of adding the new field and index to support count query? I feel like getting the actual metadata might be more useful. Maybe we can support that later?

@ian-shim yes that is correct this is a stop gap solution to get blobCount by AccountID. I have suggested couple more approaches in a document that make it more real-time based on DynamodB streams

// GetBlobMetadataByAccount Count returns the count of all the metadata with the given status
// Because this function scans the entire index, it should only be used for status with a limited number of items.
// It should only be used to filter "Processing" status. To support other status, a streaming version should be implemented.
func (s *BlobMetadataStore) GetBlobMetadataCountByAccountID(ctx context.Context, accountID core.AccountID) (int32, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want the total amount of data rather than the total number of blobs. Or perhaps both.

Let's do another sync on the exact use case for this work!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mooselumph i think if we want total amount of data everytime than better approach is to have a lambda invoke on DynamodB stream.....which only processes INSERT_EVENT and can be used to update:

  1. Amount of Data
  2. Count of blobs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now temporarily i will just update Batcher to update amount of data after Blob is confirmed and increment count

// Update AccountID to accountKey
// This is a combination of origin and authenticatedAddress
// AccountId is later used to track blobs sent by the same account
blob.RequestHeader.BlobAuthHeader.AccountID = accountKey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need to change depending on if the authenticated endpoint is being used right ?

@@ -244,6 +250,7 @@ func (s *server) Start() error {
{
feed.GET("/blobs", s.FetchBlobsHandler)
feed.GET("/blobs/:blob_key", s.FetchBlobHandler)
feed.GET("/blobs/count/:accountId", s.FetchBlobCountByAccountIdHandler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should definitely cache these endpoints.

@dmanc dmanc mentioned this pull request Jun 7, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants