Skip to content

Commit

Permalink
Preparing private docs
Browse files Browse the repository at this point in the history
  • Loading branch information
okhat committed Jul 30, 2023
1 parent 4970569 commit 625b1ef
Show file tree
Hide file tree
Showing 3 changed files with 187 additions and 1 deletion.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,3 @@ We'll rebrand to the **DSPy** framework soon.
<img align="center" src="docs/images/DSPy7.png" width="260px" />
</p>
<p align="left">

52 changes: 52 additions & 0 deletions private_docs/sync_to_public.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#!/bin/bash

# Exit if insufficient arguments are provided
if [ "$#" -lt 1 ]; then
echo "Usage: ./sync_to_public.sh <branch-name> [--squash]"
exit 1
fi

BRANCH_NAME=$1
SQUASH_COMMITS=false

# Check if the squash option is provided
if [ "$#" -eq 2 ] && [ "$2" == "--squash" ]; then
SQUASH_COMMITS=true
fi

# Variables
PRIVATE_REMOTE="origin"
PUBLIC_REMOTE="upstream"
CURRENT_DATE=$(date +%Y%m%d%H%M%S)
TEMP_BRANCH="temp-public-sync-$CURRENT_DATE"

# Ensure we are on the specified branch and have the latest changes
git checkout $BRANCH_NAME
git pull $PRIVATE_REMOTE $BRANCH_NAME

# If squash option is true, squash the commits
if [ "$SQUASH_COMMITS" = true ]; then
# Count the number of commits since the last public sync
NUM_COMMITS=$(git rev-list --count $PUBLIC_REMOTE/$BRANCH_NAME..HEAD)

# Squash the last NUM_COMMITS into one
if [ "$NUM_COMMITS" -gt 1 ]; then
git reset --soft HEAD~$NUM_COMMITS && git commit
fi
fi

# Create a new temporary branch
git checkout -b $TEMP_BRANCH

# Remove private_* directories and commit
git rm -r private_* 2> /dev/null
git commit -m "Remove private content for public sync"

# Push the temporary branch to the corresponding branch of the public repo
git push $PUBLIC_REMOTE $TEMP_BRANCH:$BRANCH_NAME

# Return to the specified branch and delete the temporary branch
git checkout $BRANCH_NAME
git branch -D $TEMP_BRANCH

echo "Sync completed successfully!"
135 changes: 135 additions & 0 deletions private_docs/workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
## Dual Repository Workflow: Public and Private Repositories on GitHub

### 1) Introduction

When developing sensitive projects with the intent to periodically release public versions, a dual repository workflow offers the advantage of private development while ensuring periodic transparent releases. This document details the steps and practices for managing both public and private repositories on GitHub, with a specific focus on ensuring that certain private contents never get pushed to the public repository.

### 2) Setup [do this once]

You can do this once (i.e., every time you need a new clone to work from).

Step 1: **Initialization**

- Private Repository: A repository (`okhat/dsp-private`) where all development work occurs, including potentially sensitive data.
- Public Repository: A repository (`stanfordnlp/dsp`) where sanitized, non-sensitive work is pushed for public consumption.

Step 2: **Clone the Private Repository**

```bash
git clone https://github.com/okhat/dsp-private.git
cd dsp-private
```


Step 3: **Add the Public Repository as an Upstream Remote**

```bash
git remote add upstream https://github.com/stanfordnlp/dsp.git
```

### 3) Development Workflow [you do this every day]

This is what you do in the private repo day to day.

Step 1: **Pull Latest Changes from Private Repository**

Ensure you're always working with the latest data from the private repo:

```bash
git checkout main
git pull origin main
```

Step 2: **Development** (this is the actual work!)

Conduct your development as usual.

When adding sensitive or private content, make sure to place it within directories prefixed by `private_*`.

We already have `private_docs` for example.


Step 3: **Commit Changes in Private Repository**

```bash
git add .
git commit -m "Your descriptive commit message"
git push origin main
```

### 4) Pulling Updates from the Public Repository [you do this when needed]

If the public repository receives contributions, you'll want to incorporate them into your private repository at some point:

```bash
git checkout main
git pull upstream main # pull and merge anything from public upstream
git push origin main # push now any recent things to the private origin
```

This is definitely important before syncing back to the public repo.

You may sometimes do this with other branches.


### 5) Syncing to Public Repository [you do this when needed]

Step 1: **Use the Sync Script**

This is located in the `private_docs/` directory (i.e., here).

```bash
# First, ensure you have the latest changes from the public repo
# and deal with any potential merge conflicts.
git checkout <branch-name>
git pull upstream <branch-name>

# Once everything is up-to-date and conflicts (if any) are resolved,
# proceed to sync your changes to the public repo.
./sync_to_public.sh <name_of_branch> [--squash]
```

The specified `<branch-name>` could be `main`, but perhaps you're working on a feature or fix and want to merge from/to a specific branch like `feature_branch_name`. In such cases, replace `<branch-name>` with the name of your desired branch.

You can also optionally use the `--squash` argument. When used, this will squash all recent commits from the private repo into a single commit before syncing to the public repo. This is particularly useful when you want a cleaner and more consolidated commit history in the public repository.

We can agree on the better approach on a case-by-case basis.

This script will:
- Pull the latest changes from the private repository.
- Create a temporary branch with a unique name.
- Remove the `private_*` directories from this temporary branch.
- Push this sanitized temporary branch to the (selected) branch of the public repository.
- Return you to the (selected) branch of your private repository and delete the temporary branch.


#### FAQ: Is this squash safe, *IF* we follow the pull upstream sequence?

The `--squash` option in the provided script squashes commits in the private repo, but it does this in a temporary branch. So, the original commit history in the branch you're working on (e.g., main, feature_branch_name, etc.) in the private repo remains intact.

Here's a breakdown of how the squashing process in the script works:

- You check out the branch you want to sync (e.g., main).
- The script then creates a temporary branch off of your current branch.
- Within this temporary branch, the script squashes the commits.
- The changes (now squashed into one commit) are pushed to the public repo from this temporary branch.
- The script then deletes the temporary branch and returns you to your original branch.
- Your original branch in the private repo remains unchanged, so all your detailed commit history is still there.

In short: Yes, it's safe. If you decide later that you didn't want things squashed in the public repo, you can always sync again from the private repo without the --squash option, as the original commits remain intact in your private repo.


Step 2: **Review Public Repo**

After the script has been run, always take a moment to check the public repository on GitHub to ensure everything looks as expected.



### 6) Conclusion

This dual-repo system allows you to maintain a private workspace with potentially sensitive or incomplete work while providing the ability to periodically release sanitized versions to the public. Using the provided script ensures that sensitive directories (private_*) remain in the private repo and never get exposed to the public.

Remember to always backup important data, understand the implications of each step in the workflow, and maintain regular checks to ensure the integrity of both repositories.



0 comments on commit 625b1ef

Please sign in to comment.