Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filestore] Cache inconsistency during concurrent stat and write operations #2737

Closed
debnatkh opened this issue Dec 19, 2024 · 2 comments
Closed
Assignees
Labels
2024Q4 filestore Add this label to run only cloud/filestore build and tests on PR

Comments

@debnatkh
Copy link
Collaborator

Symptoms:

filestore-client ls --node 6821 --json | jq '.content[0]'
{
  "Type": 1,
  "Links": 1,
  "Size": 262144,
  "Name": "",
  "Uid": 114,
  "MTime": 1734112031734435,
  "ATime": 1734111198060740,
  "Mode": 384,
  "Id": 6821,
  "CTime": 1734111198060740,
  "Gid": 124
}
filestore-client stat --path  '/proller-find-corruption-pgbench1/postgresql/pg_xact/0088' --json | jq {
  "Id": 6821,
  "Type": 1,
  "Mode": 384,
  "Uid": 114,
  "Gid": 124,
  "ATime": 1734111198060740,
  "MTime": 1734112005215331,
  "CTime": 1734111198060740,
  "Size": 253952,
  "Links": 1
}

So in this particular case state of the cache and localdb diverged because first request is supposed to use localdb, unlike the second request.

@debnatkh debnatkh added filestore Add this label to run only cloud/filestore build and tests on PR 2024Q4 labels Dec 19, 2024
@debnatkh debnatkh self-assigned this Dec 19, 2024
@qkrorlqr
Copy link
Collaborator

race вида:

  1. prepare GetNodeAttr - reads old size, stores it in args
  2. execute GetNodeAttr - NOP
  3. prepare WriteData
  4. execute WriteData - updates size, invalidates cache (but not args of the concurrently running GetNodeAttr)
  5. complete GetNodeAttr - writes old size to cache
  6. complete WriteData

может смущать следующее: почему GetNodeAttr мгновенно не пролетает весь путь от чтения данных до complete - там же нечему коммититься, эта транзакция же ничего не пишет

но, судя по всему, GetNodeAttr не закомплитился мгновенно, а ждал, т.к. была еще одна третья транзакция, запустившаяся до GetNodeAttr, и ждавшая коммита

то есть если совсем подробно, то порядок такой:

  1. prepare SomeOtherRWTx
  2. execute SomeOtherRWTx
  3. prepare GetNodeAttr
  4. execute GetNodeAttr
  5. … complete GetNodeAttr delayed till SomeOtherRWTx completes …
  6. prepare WriteData
  7. execute WriteData
  8. complete SomeOtherRWTx
  9. complete GetNodeAttr
  10. complete WriteData

@qkrorlqr
Copy link
Collaborator

аналогичный race есть в ReadAheadCache (починен также в #2741 )

@qkrorlqr qkrorlqr closed this as completed Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024Q4 filestore Add this label to run only cloud/filestore build and tests on PR
Projects
None yet
Development

No branches or pull requests

2 participants