|
| 1 | +# Repo Diff Trees |
| 2 | + |
| 3 | +repo_diff_trees.py compares two repo source trees and outputs reports on the |
| 4 | +findings. |
| 5 | + |
| 6 | +The ouput is in CSV and is easily consumable in a spreadsheet. |
| 7 | + |
| 8 | +In addition to importing to a spreadsheet, you can also create your own |
| 9 | +Data Studio dashboard like [this one](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU). |
| 10 | + |
| 11 | +If you wish to create your own dashboard follow the instructions below: |
| 12 | + |
| 13 | +1. Sync the two repo workspaces you wish to compare. Example: |
| 14 | + |
| 15 | +``` |
| 16 | +mkdir android-8.0.0_r1 |
| 17 | +cd android-8.0.0_r1 |
| 18 | +repo init \ |
| 19 | + --manifest-url=https://android.googlesource.com/platform/manifest \ |
| 20 | + --manifest-branch=android-8.0.0_r1 |
| 21 | +# Adjust the number of parallel jobs to your needs |
| 22 | +repo sync --current-branch --no-clone-bundle --no-tags --jobs=8 |
| 23 | +cd .. |
| 24 | +mkdir android-8.0.0_r11 |
| 25 | +cd android-8.0.0_r11 |
| 26 | +repo init \ |
| 27 | + --manifest-url=https://android.googlesource.com/platform/manifest \ |
| 28 | + --manifest-branch=android-8.0.0_r11 |
| 29 | +# Adjust the number of parallel jobs to your needs |
| 30 | +repo sync --current-branch --no-clone-bundle --no-tags --jobs=8 |
| 31 | +cd .. |
| 32 | +``` |
| 33 | + |
| 34 | +2. Run repo_diff_trees.py. Example: |
| 35 | + |
| 36 | +``` |
| 37 | +python repo_diff_trees.py --exclusions_file=android_exclusions.txt \ |
| 38 | + android-8.0.0_r1 android-8.0.0_r11 |
| 39 | +``` |
| 40 | + |
| 41 | +3. Create a [new Google spreadsheet](https://docs.google.com/spreadsheets/create). |
| 42 | +4. Import projects.csv to a new sheet. |
| 43 | +5. Create a [new data source in Data Studio](https://datastudio.google.com/datasources/create). |
| 44 | +6. Connect your new data source to the project.csv sheet in the Google spreadsheet. |
| 45 | +7. Add a "Count Diff Status" field by selecting the menu next to the "Diff |
| 46 | + Status" field and selecting "Count". |
| 47 | +8. Copy the [Data Studio dashboard sample](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU). |
| 48 | + Make sure you are logged into your Google account and you have agreed to Data Studio's terms of service. Once |
| 49 | + this is done you should get a link to "Make a copy of this report". |
| 50 | +9. Select your own data source for your copy of the dashboard when prompted. |
| 51 | +10. You may see a "Configuration Incomplete" message under |
| 52 | + the "Modified Projects" pie chart. To address this select the pie chart, |
| 53 | + then replace the "Invalid Metric" field for "Count Diff Status". |
| 54 | + |
| 55 | +## Analysis method |
| 56 | + |
| 57 | +repo_diff_trees.py goes through several stages when comparing two repo |
| 58 | +source trees: |
| 59 | + |
| 60 | +1. Match projects in source tree A with projects in source tree B. |
| 61 | +2. Diff projects that have a match. |
| 62 | +3. Find commits in source tree B that are not in source tree A. |
| 63 | + |
| 64 | +The first two steps are self explanatory. The method |
| 65 | +of finding commits only in B is explaned below. |
| 66 | + |
| 67 | +## Finding commits not upstream |
| 68 | + |
| 69 | +After matching up projects in both source tree |
| 70 | +and diffing, the last stage is to iterate |
| 71 | +through each project matching pair and find |
| 72 | +the commits that exist in the downstream project (B) but not the |
| 73 | +upstream project (A). |
| 74 | + |
| 75 | +'git cherry' is a useful tool that finds changes |
| 76 | +which exist in one branch but not another. It does so by |
| 77 | +not only by finding which commits that were merged |
| 78 | +to both branches, but also by matching cherry picked |
| 79 | +commits. |
| 80 | + |
| 81 | +However, there are many instances where a change in one branch |
| 82 | +can have an equivalent in another branch without being a merge |
| 83 | +or a cherry pick. Some examples are: |
| 84 | + |
| 85 | +* Commits that were squashed with other commits |
| 86 | +* Commits that were reauthored |
| 87 | + |
| 88 | +Cherry pick will not recognize these commits as having an equivalent |
| 89 | +yet they clearly do. |
| 90 | + |
| 91 | +This is addressed in two steps: |
| 92 | + |
| 93 | +1. First listing the "git cherry" commits that will give us the |
| 94 | + list of changes for which "git cherry" could not find an equivalent. |
| 95 | +2. Then we "git blame" the entire project's source tree and compile |
| 96 | + a list of changes that actually have lines of code in the tree. |
| 97 | +3. Finally we find the intersection: 'git cherry' changes |
| 98 | + that have lines of code in the final source tree. |
| 99 | + |
| 100 | + |
| 101 | +## Caveats |
| 102 | + |
| 103 | +The method described above has proven effective on Android |
| 104 | +source trees. It does have shortcomings. |
| 105 | + |
| 106 | +* It does not find commits that only delete lines of code. |
| 107 | +* It does take into accounts merge conflict resolutions. |
0 commit comments