Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Re-ingest local files on updates #5656

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

feat: Re-ingest local files on updates #5656

wants to merge 11 commits into from

Conversation

k-anshul
Copy link
Member

@k-anshul k-anshul commented Sep 10, 2024

closes #5479

Sample model file:

connector: local_file
path: "data/a.csv"
invalidate_on_change: true

output:
  connector: duckdb
  materialize: true

@k-anshul k-anshul self-assigned this Sep 10, 2024
@k-anshul k-anshul changed the title feat: Re-ingest local files if they are updated on model reconcile feat: Re-ingest local files on updates Sep 17, 2024
runtime/compilers/rillv1/parse_model.go Outdated Show resolved Hide resolved
runtime/compilers/rillv1/parse_model.go Outdated Show resolved Hide resolved
runtime/reconcilers/project_parser.go Outdated Show resolved Hide resolved
runtime/reconcilers/project_parser.go Outdated Show resolved Hide resolved
runtime/compilers/rillv1/parser.go Outdated Show resolved Hide resolved
runtime/drivers/duckdb/model_executor_localfile_self.go Outdated Show resolved Hide resolved
runtime/drivers/duckdb/model_manager.go Outdated Show resolved Hide resolved
runtime/reconcilers/project_parser.go Outdated Show resolved Hide resolved
runtime/compilers/rillv1/parse_model.go Show resolved Hide resolved
Comment on lines +124 to +131
root, err := p.Repo.Root(ctx)
if err != nil {
return err
}

// Update parser's localDataToResourcepath map to track which resources depend on the local file
for _, entry := range entries {
localPaths = append(localPaths, filepath.Join(root, entry.Path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same parser is only used for one repo, so could it just store the project-relative path? That would reduce dependency on the local filesystem.

I suppose there's a case on local where you can reference a path outside of the project. I would actually prefer we do not hash or auto-refresh for files outside the project because a) it's a workflow we should discourage, and b) the watcher doesn't (and shouldn't) watch them.

Comment on lines +133 to +134
if !ok {
resources = make(map[string]any)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nesting is getting very deep here, consider pulling the inputConnector == "local_file" handling into a separate function to support early return: https://github.com/uber-go/guide/blob/master/style.md#reduce-nesting

Comment on lines +299 to +302
file, err := os.Open(path)
if err != nil {
return "", err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This again becomes the only place with a hard dependency on a real local file system in the rillv1 package – any chance we could put this under an interface? I realize using RepoStore.Get won't do, but perhaps this could be solved by adding a buffered GetReader(ctx, path) (io.Reader, error) function to RepoStore?

Comment on lines +187 to 192
// Mapping of local data files to resource paths that depend on them
localDataToResourcepath map[string]map[string]any

// Internal state
resourcesForPath map[string][]*Resource // Reverse index of Resource.Paths
resourcesForUnspecifiedRef map[string][]*Resource // Reverse index of Resource.rawRefs where kind=ResourceKindUnspecified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the parser already keeps state of the paths for a particular ResourceName (through Parser.Resources[name]), could it just store the resource names that use a local data file?

Also, can it put it under the // Internal state section? Ordering of fields indicate their importance and relationship, and this is for example a less central piece of state than resourcesForPath.

For example:

Suggested change
// Mapping of local data files to resource paths that depend on them
localDataToResourcepath map[string]map[string]any
// Internal state
resourcesForPath map[string][]*Resource // Reverse index of Resource.Paths
resourcesForUnspecifiedRef map[string][]*Resource // Reverse index of Resource.rawRefs where kind=ResourceKindUnspecified
// Internal state
resourcesForPath map[string][]*Resource // Reverse index of Resource.Paths
resourcesForUnspecifiedRef map[string][]*Resource // Reverse index of Resource.rawRefs where kind=ResourceKindUnspecified
resourceNamesForDataPaths map[string][]ResourceName // Index of local data files to resources that depend on them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Runtime: Re-ingest local files if they are updated
2 participants