Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion) Adding vertexAI ingestion source (v1 - model group and model) #12632

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

ryota-cloud
Copy link
Collaborator

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Feb 13, 2025
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Feb 13, 2025
Copy link

codecov bot commented Feb 13, 2025

Codecov Report

Attention: Patch coverage is 32.90323% with 104 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ingestion/src/datahub/ingestion/source/vertexai.py 32.90% 104 Missing ⚠️
Files with missing lines Coverage Δ
...ingestion/src/datahub/ingestion/source/vertexai.py 32.90% <32.90%> (ø)

... and 59 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb67726...49b480e. Read the comment docs.

@ryota-cloud ryota-cloud changed the title (WIP) feat(ingestion) Adding vertexAI ingestion source (v1 - model group and model) feat(ingestion) Adding vertexAI ingestion source (v1 - model group and model) Feb 24, 2025
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every entity needs container aspects

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improve docs

type: vertexai
config:
project_id: "acryl-poc"
region: "us-west2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do the creds come from?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do something similar to bq source - support either env var or creds

Comment on lines 187 to 189
yield self._get_data_process_properties_workunit(job)
yield from self._get_job_output_workunit(job)
yield from self._get_job_input_workunit(job)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate the full data process entity in one go

maybe add generate_training_job() that calls each one individually

description=f"Dataset: {ds.display_name} for training job",
customProperties={"displayName": ds.display_name,
"resourceName": ds.resource_name,
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datasets need more aspects than this

use the new sdk's Dataset type

return urn


def _make_vertexai_name(self,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be too generic

return urn


def _make_vertexai_name(self,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_make_job_name
_make_dataset_name

...

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Feb 25, 2025
@datahub-cyborg datahub-cyborg bot added needs-review Label for PRs that need review from a maintainer. and removed pending-submitter-response Issue/request has been reviewed but requires a response from the submitter labels Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants