Skip to content

Commit 6126856

Browse files
Treehugger RobotGerrit Code Review
authored andcommitted
Merge "Add README.md"
2 parents 039d8db + 7514722 commit 6126856

File tree

4 files changed

+109
-2
lines changed

4 files changed

+109
-2
lines changed

tools/repo_diff/README.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Repo Diff Trees
2+
3+
repo_diff_trees.py compares two repo source trees and outputs reports on the
4+
findings.
5+
6+
The ouput is in CSV and is easily consumable in a spreadsheet.
7+
8+
In addition to importing to a spreadsheet, you can also create your own
9+
Data Studio dashboard like [this one](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU).
10+
11+
If you wish to create your own dashboard follow the instructions below:
12+
13+
1. Sync the two repo workspaces you wish to compare. Example:
14+
15+
```
16+
mkdir android-8.0.0_r1
17+
cd android-8.0.0_r1
18+
repo init \
19+
--manifest-url=https://android.googlesource.com/platform/manifest \
20+
--manifest-branch=android-8.0.0_r1
21+
# Adjust the number of parallel jobs to your needs
22+
repo sync --current-branch --no-clone-bundle --no-tags --jobs=8
23+
cd ..
24+
mkdir android-8.0.0_r11
25+
cd android-8.0.0_r11
26+
repo init \
27+
--manifest-url=https://android.googlesource.com/platform/manifest \
28+
--manifest-branch=android-8.0.0_r11
29+
# Adjust the number of parallel jobs to your needs
30+
repo sync --current-branch --no-clone-bundle --no-tags --jobs=8
31+
cd ..
32+
```
33+
34+
2. Run repo_diff_trees.py. Example:
35+
36+
```
37+
python repo_diff_trees.py --exclusions_file=android_exclusions.txt \
38+
android-8.0.0_r1 android-8.0.0_r11
39+
```
40+
41+
3. Create a [new Google spreadsheet](https://docs.google.com/spreadsheets/create).
42+
4. Import projects.csv to a new sheet.
43+
5. Create a [new data source in Data Studio](https://datastudio.google.com/datasources/create).
44+
6. Connect your new data source to the project.csv sheet in the Google spreadsheet.
45+
7. Add a "Count Diff Status" field by selecting the menu next to the "Diff
46+
Status" field and selecting "Count".
47+
8. Copy the [Data Studio dashboard sample](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU).
48+
Make sure you are logged into your Google account and you have agreed to Data Studio's terms of service. Once
49+
this is done you should get a link to "Make a copy of this report".
50+
9. Select your own data source for your copy of the dashboard when prompted.
51+
10. You may see a "Configuration Incomplete" message under
52+
the "Modified Projects" pie chart. To address this select the pie chart,
53+
then replace the "Invalid Metric" field for "Count Diff Status".
54+
55+
## Analysis method
56+
57+
repo_diff_trees.py goes through several stages when comparing two repo
58+
source trees:
59+
60+
1. Match projects in source tree A with projects in source tree B.
61+
2. Diff projects that have a match.
62+
3. Find commits in source tree B that are not in source tree A.
63+
64+
The first two steps are self explanatory. The method
65+
of finding commits only in B is explaned below.
66+
67+
## Finding commits not upstream
68+
69+
After matching up projects in both source tree
70+
and diffing, the last stage is to iterate
71+
through each project matching pair and find
72+
the commits that exist in the downstream project (B) but not the
73+
upstream project (A).
74+
75+
'git cherry' is a useful tool that finds changes
76+
which exist in one branch but not another. It does so by
77+
not only by finding which commits that were merged
78+
to both branches, but also by matching cherry picked
79+
commits.
80+
81+
However, there are many instances where a change in one branch
82+
can have an equivalent in another branch without being a merge
83+
or a cherry pick. Some examples are:
84+
85+
* Commits that were squashed with other commits
86+
* Commits that were reauthored
87+
88+
Cherry pick will not recognize these commits as having an equivalent
89+
yet they clearly do.
90+
91+
This is addressed in two steps:
92+
93+
1. First listing the "git cherry" commits that will give us the
94+
list of changes for which "git cherry" could not find an equivalent.
95+
2. Then we "git blame" the entire project's source tree and compile
96+
a list of changes that actually have lines of code in the tree.
97+
3. Finally we find the intersection: 'git cherry' changes
98+
that have lines of code in the final source tree.
99+
100+
101+
## Caveats
102+
103+
The method described above has proven effective on Android
104+
source trees. It does have shortcomings.
105+
106+
* It does not find commits that only delete lines of code.
107+
* It does take into accounts merge conflict resolutions.

tools/repo_diff/repo_diff_android.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
import argparse
1313
import os
1414
import subprocess
15-
import repo_diff_downstream
15+
import repo_diff_trees
1616

1717
HELP_MSG = "Diff a repo (downstream) and its upstream"
1818

@@ -152,7 +152,7 @@ def diff(manifest_url, manifest_branch, tag, upstream_manifest_url,
152152
upstream_workspace)
153153

154154
# do the comparison
155-
repo_diff_downstream.diff(
155+
repo_diff_trees.diff(
156156
upstream_workspace,
157157
workspace,
158158
os.path.abspath("project.csv"),

0 commit comments

Comments
 (0)