Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/unity): add exception handling #12639

Merged
merged 3 commits into from
Feb 17, 2025

Conversation

anshbansal
Copy link
Collaborator

To prevent this stacktrace from stopping ingestion in the middle

  File "/tmp/datahub/ingest/venv-unity-catalog-bac61a203a4fea43/lib/python3.10/site-packages/datahub/ingestion/source/unity/source.py", line 467, in process_schemas
    yield from self.process_tables(schema)
  File "/tmp/datahub/ingest/venv-unity-catalog-bac61a203a4fea43/lib/python3.10/site-packages/datahub/ingestion/source/unity/source.py", line 472, in process_tables
    for table in self.unity_catalog_api_proxy.tables(schema=schema):
  File "/tmp/datahub/ingest/venv-unity-catalog-bac61a203a4fea43/lib/python3.10/site-packages/datahub/ingestion/source/unity/proxy.py", line 181, in tables
    for table in response:
  File "/tmp/datahub/ingest/venv-unity-catalog-bac61a203a4fea43/lib/python3.10/site-packages/databricks/sdk/service/catalog.py", line 12048, in list
    json = self._api.do('GET', '/api/2.1/unity-catalog/tables', query=query, headers=headers)
  File "/tmp/datahub/ingest/venv-unity-catalog-bac61a203a4fea43/lib/python3.10/site-packages/databricks/sdk/core.py", line 77, in do
    return self._api_client.do(method=method,
  File "/tmp/datahub/ingest/venv-unity-catalog-bac61a203a4fea43/lib/python3.10/site-packages/databricks/sdk/_base_client.py", line 186, in do
    response = call(method,
  File "/tmp/datahub/ingest/venv-unity-catalog-bac61a203a4fea43/lib/python3.10/site-packages/databricks/sdk/retries.py", line 55, in wrapper
    raise err
  File "/tmp/datahub/ingest/venv-unity-catalog-bac61a203a4fea43/lib/python3.10/site-packages/databricks/sdk/retries.py", line 34, in wrapper
    return func(*args, **kwargs)
  File "/tmp/datahub/ingest/venv-unity-catalog-bac61a203a4fea43/lib/python3.10/site-packages/databricks/sdk/_base_client.py", line 278, in _perform
    raise error from None
databricks.sdk.errors.platform.BadRequest: There is a table name change on the delta sharing server side. We currently don't support table name coalescing.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Feb 14, 2025
Copy link

codecov bot commented Feb 14, 2025

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
2217 2 2215 50
View the full list of 2 ❄️ flaky tests
tests.entity_versioning.test_versioning::test_link_unlink_three_versions_unlink_middle_and_latest

Flake rate in main: 29.82% (Passed 40 times, Failed 17 times)

Stack Traces | 8.95s run time
graph_client = DataHubGraph: configured to talk to http://localhost:8080 with token: eyJh**********-X0U

    @pytest.fixture(scope="function", autouse=True)
    def ingest_cleanup_data(graph_client: DataHubGraph):
        try:
            for urn in ENTITY_URN_OBJS:
                graph_client.emit_mcp(
                    MetadataChangeProposalWrapper(
                        entityUrn=urn.urn(),
                        aspect=DatasetKeyClass(
                            platform=urn.platform, name=urn.name, origin=urn.env
                        ),
                    )
                )
            for i in [2, 1, 0, 0, 1, 2]:
>               graph_client.unlink_asset_from_version_set(ENTITY_URNS[i])

tests/entity_versioning/test_versioning.py:31: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
...../ingestion/graph/entity_versioning.py:164: in unlink_asset_from_version_set
    response = self.execute_graphql(self.UNLINK_VERSION_MUTATION, variables)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DataHubGraph: configured to talk to http://localhost:8080 with token: eyJh**********-X0U
query = '\n        mutation($input: UnlinkVersionInput!) {\n            unlinkAssetVersion(input: $input) {\n                urn\n            }\n        }\n    '
variables = {'input': {'unlinkedEntity': 'urn:li:dataset:(urn:li:dataPlatform:snowflake,versioning_0,PROD)', 'versionSet': 'urn:li:versionSet:(12345678910,dataset)'}}
operation_name = None, format_exception = True

    def execute_graphql(
        self,
        query: str,
        variables: Optional[Dict] = None,
        operation_name: Optional[str] = None,
        format_exception: bool = True,
    ) -> Dict:
        url = f"{self.config.server}/api/graphql"
    
        body: Dict = {
            "query": query,
        }
        if variables:
            body["variables"] = variables
        if operation_name:
            body["operationName"] = operation_name
    
        logger.debug(
            f"Executing {operation_name or ''} graphql query: {query} with variables: {json.dumps(variables)}"
        )
        result = self._post_generic(url, body)
        if result.get("errors"):
            if format_exception:
>               raise GraphError(f"Error executing graphql query: {result['errors']}")
E               datahub.configuration.common.GraphError: Error executing graphql query: [{'message': 'An unknown error occurred.', 'locations': [{'line': 3, 'column': 13}], 'path': ['unlinkAssetVersion'], 'extensions': {'code': 500, 'type': 'SERVER_ERROR', 'classification': 'DataFetchingException'}}]

...../ingestion/graph/client.py:1188: GraphError
tests.entity_versioning.test_versioning::test_link_unlink_three_versions_unlink_and_relink

Flake rate in main: 29.82% (Passed 40 times, Failed 17 times)

Stack Traces | 30.2s run time
graph_client = DataHubGraph: configured to talk to http://localhost:8080 with token: eyJh**********-X0U

    @pytest.fixture(scope="function", autouse=True)
    def ingest_cleanup_data(graph_client: DataHubGraph):
        try:
            for urn in ENTITY_URN_OBJS:
                graph_client.emit_mcp(
                    MetadataChangeProposalWrapper(
                        entityUrn=urn.urn(),
                        aspect=DatasetKeyClass(
                            platform=urn.platform, name=urn.name, origin=urn.env
                        ),
                    )
                )
            for i in [2, 1, 0, 0, 1, 2]:
                graph_client.unlink_asset_from_version_set(ENTITY_URNS[i])
            yield
        finally:
            for i in [2, 1, 0, 0, 1, 2]:
>               graph_client.unlink_asset_from_version_set(ENTITY_URNS[i])

tests/entity_versioning/test_versioning.py:35: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
...../ingestion/graph/entity_versioning.py:164: in unlink_asset_from_version_set
    response = self.execute_graphql(self.UNLINK_VERSION_MUTATION, variables)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DataHubGraph: configured to talk to http://localhost:8080 with token: eyJh**********-X0U
query = '\n        mutation($input: UnlinkVersionInput!) {\n            unlinkAssetVersion(input: $input) {\n                urn\n            }\n        }\n    '
variables = {'input': {'unlinkedEntity': 'urn:li:dataset:(urn:li:dataPlatform:snowflake,versioning_1,PROD)', 'versionSet': 'urn:li:versionSet:(12345678910,dataset)'}}
operation_name = None, format_exception = True

    def execute_graphql(
        self,
        query: str,
        variables: Optional[Dict] = None,
        operation_name: Optional[str] = None,
        format_exception: bool = True,
    ) -> Dict:
        url = f"{self.config.server}/api/graphql"
    
        body: Dict = {
            "query": query,
        }
        if variables:
            body["variables"] = variables
        if operation_name:
            body["operationName"] = operation_name
    
        logger.debug(
            f"Executing {operation_name or ''} graphql query: {query} with variables: {json.dumps(variables)}"
        )
        result = self._post_generic(url, body)
        if result.get("errors"):
            if format_exception:
>               raise GraphError(f"Error executing graphql query: {result['errors']}")
E               datahub.configuration.common.GraphError: Error executing graphql query: [{'message': 'An unknown error occurred.', 'locations': [{'line': 3, 'column': 13}], 'path': ['unlinkAssetVersion'], 'extensions': {'code': 500, 'type': 'SERVER_ERROR', 'classification': 'DataFetchingException'}}]

...../ingestion/graph/client.py:1188: GraphError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Feb 14, 2025
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Feb 17, 2025
Copy link
Contributor

@sgomezvillamor sgomezvillamor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Feb 17, 2025
@anshbansal anshbansal merged commit 71d1092 into master Feb 17, 2025
179 of 187 checks passed
@anshbansal anshbansal deleted the ab-2025-feb-14-add-error-handling branch February 17, 2025 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants