Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential race condition obtaining data from CDS-beta #110

Closed
stefan-maxar opened this issue Aug 7, 2024 · 9 comments
Closed

Potential race condition obtaining data from CDS-beta #110

stefan-maxar opened this issue Aug 7, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@stefan-maxar
Copy link

What happened?

When submitting a request to CDS-beta, we occasionally get the following error from the CDS API. Resubmitting the same exact JSON payload request again runs successfully. So, something isn't "waiting" long enough, but it only happens randomly.

2024-08-07T12:04:45.331Z c.retrieve(cds_archive,request,out_file)
2024-08-07T12:04:45.332Z File "/usr/local/lib/python3.11/site-packages/cads_api_client/legacy_api_client.py", line 149, in retrieve
2024-08-07T12:04:45.332Z result = self.logging_decorator(self.client.submit_and_wait_on_result)(
2024-08-07T12:04:45.332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-07T12:04:45.332Z File "/usr/local/lib/python3.11/site-packages/cads_api_client/legacy_api_client.py", line 134, in wrapper
2024-08-07T12:04:45.332Z return func(*args, **kwargs)
2024-08-07T12:04:45.332Z ^^^^^^^^^^^^^^^^^^^^^
2024-08-07T12:04:45.332Z File "/usr/local/lib/python3.11/site-packages/cads_api_client/api_client.py", line 77, in submit_and_wait_on_result
2024-08-07T12:04:45.332Z return self.retrieve_api.submit_and_wait_on_result(
2024-08-07T12:04:45.332Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-08-07T12:04:45.332Z File "/usr/local/lib/python3.11/site-packages/cads_api_client/processing.py", line 451, in submit_and_wait_on_result
2024-08-07T12:04:45.332Z return remote.make_results()
2024-08-07T12:04:45.332Z ^^^^^^^^^^^^^^^^^^^^^
2024-08-07T12:04:45.332Z File "/usr/local/lib/python3.11/site-packages/cads_api_client/processing.py", line 260, in make_results
2024-08-07T12:04:45.332Z raise ValueError(f"Result not ready, job is {status}")
2024-08-07T12:04:45.332Z ValueError: Result not ready, job is running

What are the steps to reproduce the bug?

The above issue is sporadic and can be randomly produced by submitting to the CDS-beta via Python API. We have noticed it more for pressure-level requests like the following JSON:

{'variable': ['u_component_of_wind'], 'product_type': ['reanalysis'], 'pressure_level': ['250', '850', '1000', '500'], 'year': ['2024'], 'month': ['04'], 'day': ['09'], 'time': ['00:00', '01:00', '02:00', '03:00', '04:00', '05:00', '06:00', '07:00', '08:00', '09:00', '10:00', '11:00', '12:00', '13:00', '14:00', '15:00', '16:00', '17:00', '18:00', '19:00', '20:00', '21:00', '22:00', '23:00'], 'format': 'grib', 'nocache': 'afnqsPppsdL'}

Our CDS client configuration looks like (we understand that verify=1 is not implemented yet):

c = cdsapi.Client(url=cds_url,key=cds_key,verify=1,quiet=True)

Version

0.7.0

Platform (OS and architecture)

AmazonLinux 2023 with Python 3.11

Relevant log output

No response

Accompanying data

No response

Organisation

No response

@stefan-maxar stefan-maxar added the bug Something isn't working label Aug 7, 2024
@sdoerner84
Copy link

sdoerner84 commented Aug 15, 2024

If it helps and if I follow the stacktrace correctly, then:
cads_api_client/processing.py:
On successful submit the method waits 1s (see line 231) for the request to complete and multiplies that waiting time by 1.5 (line 242) with each attempt until it reaches 120s (see line 179). The maximum amount of attempts it checks for the request to be complete is 500 (handed over by **retry_options, legacy_api_client.py, line 91). When this maximum is reached, the method seems to attempt to download a request that is still running. This of course can happen if the waiting queue time is very long.

I submitted this issue as a ticket to ECMWF support (almost a week ago) and was told that Common Data Store Engine team would take care about it (yesterday), but did not hear anything from their side yet.

Possible workaround:
As the client does not accept any arguments as "This is a beta version. The following parameters have not been implemented yet: {'wait_until_complete': False, 'delete': False}.". Sadly that renders implementing the method as suggested by "fridgerator" here useless. Currently I only see the hard-code option to change your local copy of legacy_api_client.py, line 91 to a higher value. Maybe 1000? I could not check this workaround yet as it requires my request to be in a long waiting queue.

Sidenote: This error seems to be independent from the actual request.

@veenstrajelmer
Copy link

veenstrajelmer commented Aug 19, 2024

I am also encountering this issue, would be helpful if this would be fixed in the toolbox.

@sdoerner84 did you also recieve a ECMWF support issue number by any chance? And did you provide them with this issue number?

@sdoerner84
Copy link

@veenstrajelmer To my knowledge I did not get a support issue number and would know whom to provide what where (sorry - I'm a complete beginner on GitHub). I have a support ticket number and was only told that "[ECMWF support is] forwarding [my] request to the Common Data Store Engine team for their attention."

@siggemannen
Copy link

I got hit with this as well, a workaround is to ignore the exception perhaps and retry later

@akrherz
Copy link

akrherz commented Aug 29, 2024

Is the hope that this is fixed by ecmwf-projects/cads-api-client#59 ?

@stefan-maxar
Copy link
Author

Got note from an ECMWF support ticket that this should be resolved. Will close this out in a couple days if I don't encounter the issue anymore.

@veenstrajelmer
Copy link

veenstrajelmer commented Oct 17, 2024

@stefan-maxar was it indeed fixed? Could you link the related PR to this issue?

@stefan-maxar
Copy link
Author

stefan-maxar commented Oct 17, 2024

@veenstrajelmer from what I've seen, yes, it was fixed. I dont have a PR as the fix was an internal change (to the new CDS archive I believe) by ECMWF's Data Stores team.

@veenstrajelmer
Copy link

Ok, thanks for the clarification and great that it is fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants