Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding signac diff functionality #247

Merged
merged 46 commits into from
Dec 14, 2019
Merged

adding signac diff functionality #247

merged 46 commits into from
Dec 14, 2019

Conversation

atravitz
Copy link
Collaborator

@atravitz atravitz commented Nov 22, 2019

Description

compares two or more jobs, returns jobs ids with only their keys that are not shared by all other jobs.

Motivation and Context

several requests for this functionality, often when two state points only vary by one or two values.

Resolves #248.

Types of Changes

  • Documentation update
  • Bug fix
  • New feature
  • Breaking change1

1The change breaks (or has the potential to break) existing functionality.

Checklist:

If necessary:

  • I have updated the API documentation as part of the package doc-strings.
  • I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
  • I have updated the changelog.

@atravitz atravitz requested a review from a team as a code owner November 22, 2019 18:54
@ghost ghost requested review from vyasr and removed request for a team November 22, 2019 18:54
@bdice bdice added the enhancement New feature or request label Nov 23, 2019
@bdice bdice added this to the v1.3.0 milestone Nov 23, 2019
@atravitz atravitz mentioned this pull request Nov 25, 2019
@vyasr
Copy link
Contributor

vyasr commented Dec 4, 2019

Discussion on the appropriate scope and API of this PR is happening at #248. Once everything there is finalized we can update this PR.

@atravitz atravitz requested a review from a team as a code owner December 10, 2019 17:10
@atravitz atravitz requested review from cbkerr and removed request for a team December 10, 2019 17:10
@bdice bdice self-requested a review December 11, 2019 18:56
signac/__main__.py Outdated Show resolved Hide resolved
signac/__main__.py Outdated Show resolved Hide resolved
signac/__main__.py Outdated Show resolved Hide resolved
signac/__main__.py Outdated Show resolved Hide resolved
signac/diff.py Outdated Show resolved Hide resolved
signac/diff.py Outdated Show resolved Hide resolved
signac/diff.py Outdated Show resolved Hide resolved
tests/test_diff.py Outdated Show resolved Hide resolved
tests/test_diff.py Outdated Show resolved Hide resolved
tests/test_shell.py Outdated Show resolved Hide resolved
Copy link
Member

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make a couple commits to this branch to add proper docs.

signac/diff.py Outdated Show resolved Hide resolved
signac/diff.py Outdated Show resolved Hide resolved
@bdice
Copy link
Member

bdice commented Dec 13, 2019

Reviewers @vyasr @cbkerr and author @atravitz: I made some final touch-ups to this PR, and I am satisfied with the feature's quality and completeness for merging to master. I will be making a release tomorrow afternoon, probably around 5pm. If you have any review comments, it would be great to hear those and incorporate any needed changes before then.

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the performance, for 10,000 jobs with a trivial statepoint ({'i': i} for i in range(int(1e4))) the diff took less than a second. For 100,000 jobs it can take a while to just open all the jobs (perhaps many minutes depending on whether any of them are in the FS cache and the nature of the filesystem), but the diffing itself is less than ten seconds, so my concerns about performance aren't too significant IMO. No need for progressbars.

changelog.txt Outdated Show resolved Hide resolved
signac/__main__.py Outdated Show resolved Hide resolved
signac/__main__.py Outdated Show resolved Hide resolved
signac/__main__.py Outdated Show resolved Hide resolved
signac/__main__.py Outdated Show resolved Hide resolved
signac/diff.py Outdated Show resolved Hide resolved
tests/test_diff.py Outdated Show resolved Hide resolved
tests/test_diff.py Outdated Show resolved Hide resolved
tests/test_diff.py Outdated Show resolved Hide resolved
tests/test_diff.py Outdated Show resolved Hide resolved
@vyasr
Copy link
Contributor

vyasr commented Dec 13, 2019

Regarding the different use-cases outlined in #248, I think the existing API is sufficiently general to allow this. We would simply need to add an extra argument to the Python and CLI such as diff_by that would default to statepoint (which I think is the sensible default for diffing jobs). The output would be of the same form, except that if we chose to diff files the leaf nodes of the dict would probably have to be a list of files instead of a mapping of key-value pairs. Document and data diffing should be transparently supported using the same API and output structure.

@atravitz atravitz requested a review from a team as a code owner December 13, 2019 18:05
@atravitz
Copy link
Collaborator Author

suggestions from @vyasr have been addressed

@bdice bdice requested a review from vyasr December 13, 2019 18:52
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nearly there! Just a few very small changes.

signac/diff.py Outdated Show resolved Hide resolved
signac/diff.py Show resolved Hide resolved
@bdice bdice requested a review from vyasr December 14, 2019 00:47
@bdice bdice merged commit 6e5f1aa into master Dec 14, 2019
@bdice bdice deleted the sp_diff branch December 14, 2019 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

signac diff
4 participants