Changing history, or How to Git pretty
OpenSky’s engineering and product teams have an ongoing lunchtime presentation series called Lunch and Learn. A couple of weeks ago, I gave a talk entitled “Lunch and learn2git”. This article is an expansion on that presentation, and covers lessons learned from using Git in the open source community.
Lesson one: Code is messy
… what with all this branching.
The above graph represents approximately a week of development in OpenSky’s primary repository. In a given two-week period, we branch (and merge) between 75 and 150 discrete feature branches. We began using internal pull requests 77 days ago, and we just closed pull request 718. As you can imagine, it gets a little crazy.
Swimlanes make sense of the chaos
One of my favorite ways to wrap my head around development graphs like this comes from
Vincent Driessen’s git-flow
branching model. It takes the standard graph visualization and turns it into swimlanes:
The full article is well worth a read. For the purposes of our discussion today, it is probably enough to know that at OpenSky we use a combination of git-flow and GitHub flow, which is a bit more relevant for the topic at hand.
git-flow
At a high level, git-flow
enforces a branching model with two primary branches: develop
and production
. All
development, unsurprisingly, happens in the develop
branch. More specifically, it happens in feature branches which
are merged back into develop
. This develop
branch is always production-ready. It is periodically tagged and merged
into production
, and released into the wild.
So rather than worrying about the hundred-ish currently active branches in our repo, all I care about are three:
develop
, production
and whatever feature
branch I’m currently working in.
With this mental filter in place, our repository becomes much easier to understand:
GitHub flow
Scott Chacon from GitHub posted a response to git-flow, which outlines GitHub’s development workflow. It’s similar, but optimized for GitHub’s style of continuous deployment.
Like git-flow
, GitHub’s workflow involves a production-ready branch, but in this case it’s the master
branch. All
features branch from master
, and development happens there. When a feature is ready for release, the developer opens a
pull request, inviting code review and feedback. Once the feature has had enough eyeballs on it, the pull request is
merged into master
and released.
While we’re not (yet) doing continuous deployment, OpenSky has moved toward a GitHub-style workflow, and the primary reason is pull requests.
Pull requests
These are the big awesome in GitHub’s workflow.
GitHub has an amazing code review system called Pull Requests that I fear not enough people know about. Many people use it for open source work – fork a project, update the project, send a pull request to the maintainer. However, it can also easily be used as an internal code review system, which is what we do.
Actually, we use it more as a branch conversation view more than a pull request. You can send pull requests from one branch to another in a single project (public or private) in GitHub, so you can use them to say “I need help or review on this” in addition to “Please merge this in”.
Read Scott’s article for more reasons you should use pull requests as a regular part of your development workflow. We do, and they have become one of the best code review and collaboration tools we use.
Lesson two: Pretty pull requests get merged
OpenSky pull request dynamics mirror those that I’ve observed in open source projects. In both cases, as a developer you have one primary objective:
You’ve made something awesome, and you want to contribute.
Great! We — OpenSky and open source — welcome your contribution. But in the attention economy we’ve built, we have to face a tough reality: beauty really does matter.
Given a queue full of pull requests, the easiest ones to grok will be reviewed first, will get more1 feedback, and as a result, will be merged first.
So how do you make your pull request pretty?
Make the most of your commit message
A model commit message looks something like this:
Capitalized, short (50 chars or less) summary
More detailed explanatory text, if necessary. Wrap it to about 72
characters or so. In some contexts, the first line is treated as the
subject of an email and the rest of the text as the body. The blank
line separating the summary from the body is critical (unless you omit
the body entirely); tools like rebase can get confused if you run the
two together.
Write your commit message in the present tense: "Fix bug" and not "Fixed
bug." This convention matches up with commit messages generated by
commands like git merge and git revert.
Further paragraphs come after blank lines.
- Bullet points are okay, too
- Typically a hyphen or asterisk is used for the bullet, preceded by a
single space, with blank lines in between, but conventions vary here
- Use a hanging indent
The summary line is critical. GitHub’s recent design changes enforce this brevity as well. If you can’t find a way to summarize your commit in 50 characters or less, you probably should make two commits instead.
- Err on the side of bullets.
- They’re concise, descriptive, and easier to grok at a glance than prose.
- Too many bullets? You probably should make two commits instead.
Make exactly as many commits as you need
Too many is confusing. Too few will result in epic diffs. So strive for just the right number.
But this is easier said than done, right? For now, just commit a bit too often. I’ll show you how to clean that up in a minute.
Nobody likes a drive-by
Each commit should do one thing. Each pull request should do one (bigger) thing.
Obviously, the exact definition of “one thing” is very context-dependant, but it’s a solid principle. A few examples:
-
If you’re editing a file and you reformat whitespace on the lines you change, that’s great. It belongs in your commit. If you reformat all the whitespace in the file, that’s a second “thing”, and belongs in a second commit. Maintaining this distinction will make your life easier when you need someone to review your pull request, or when you have to resolve the next failed merge.
Sometimes your whitespace drive-by is accidental. Perhaps you have an awesome IDE which likes to reformat things for you. Maybe you accidentally did
git add .
instead of staging individual files like a responsible adult. It’s still your mess. Take responsibility. -
A feature drive-by is also bad, but this is a harder line to draw. If you make disparate functional changes in a single commit, it’s nearly impossible to write a pretty first line for the commit message. Similarly, many open source projects will refuse to accept pull requests containing multiple features, even if they would accept all the changes independently. Maintain a clear purpose for the code in each commit, and for each of the the commits in your feature branch, and for your eventual pull request. If you come across something unrelated to change,
git stash
what you’re working on and commit the other change where it belongs. -
Use
gitx
orgit add -p
2 to stage partial-file changesets. I often end up with file changes that should ideally be split into multiple commits, or even committed into multiple branches. Partial-file commits are a lifesaver.Use them.
On a related note,
git add .
andgit commit -a
are the enemy of clean commits3.
Don’t bite off more than I can chew
If you send me an epic pull request, I’m likely to stare blankly at it for a second then move on to a shorter one.
I’ll come back to it, but if it’s too intense there’s a distinct possibility that it’ll stay in the tl;dr
queue indefinitely.
Try to isolate minimum viable pull requests. Can you release anything right now?
There are undoubtedly groups of functionality which are pre-requisites for your feature, which can be released independently of your feature. For example, if you need to refactor a heavy controller into a service before you can write a related controller or add new functionality, then break that into a discrete pull request. As long as it doesn’t break current code, this sort of thing could be released at any time. Open a pull request now, and it will most likely be merged into production by the time you’re finishing up your primary feature.
Lesson three: How to change history
So you have a mess on your hands.
Maybe you took my advice back there and committed too many times. Maybe you merged develop
into your feature branch way
too often and your history is littered with unnecessary merge commits. Maybe you opened a pull request and my only
response was “Arrrgghhhh! Rebase!”
We can fix that.
Here are a few of the many ways to change git history, along with a light smattering of advice and best practices.
Revert, what?
In Git, revert doesn’t do what you think it does.
You can think of git revert
as an anti-commit. If you added a file in the last commit, git revert HEAD^
will delete
that file in the next commit. If you removed a line in 123
, git revert 123
will add that line again. It’s not doing this
by forgetting about the original commit, it’s doing it by committing a brand new anti-commit to cancel out the effects
of the old one.
Guess what happens if you try to git revert
a merge?
Yeah. Don’t do that.
There’s a good chance one of the next few options is a better choice for your current needs. But sometimes you want to
use git revert
anyway.
Your last commit can be --amend
ed
Did you forget a file, or write a lame commit message for your last commit? As long as you haven’t pushed your latest
commit upstream, you can use git commit --amend
to fix it up.
git add myawesomefile.txt
git commit --amend
or…
git commit --amend -m "With a better message..."
Just throw the last commit away
If you committed something dumb, you can actually just remove it. This will remove both the commit and all changes in
your code… but if that’s what you want, simply git reset --hard HEAD^
.
Bam. Your last commit is gone.
If you’ve been on a dumb-commit rampage for a while, and need to get rid of more, git reset --hard HEAD~N
will
disappear N commits.
You can’t use this method to get rid of commits if you’ve already already pushed ‘em upstream. If published, these commits are like those stupid trick birthday candles that you can’t ever really blow out. The next time you pull, your commits will be merged back into your current branch.
Reset and re-commit
This is the second easiest method of changing history. If you haven’t published your changes, sometimes the easiest
thing to do is pretend you never committed your changes. For example, git reset HEAD^
will put your pointer back the
way it was right before your last commit — uncommitted changes and all — giving you a mulligan on staging and committing
your changes.
Most of the time you will use git reset COMMITISH
, where COMMITISH
is any commit, tag or branch name in recent history.
You shouldn’t use the “reset and re-commit” method to fix up any commits which have already been published, as you’re guaranteed to give everyone downstream merge conflicts and rage issues.
Rebase before (or after) opening your pull request
A simple rebase can be used to remove your merge commits and make like they never happened. Assuming you’re
working in the feature/awesomesauce
feature branch, you can git rebase origin/develop
to re-apply all of your
awesomesauce changes onto the current HEAD
of the develop
branch.
For most things, you’ll use the more complicated “interactive rebase”. Among other things, an interactive rebase will allow you to
- Fix commit messages
- Permanently remove a commit
- Squash commits into something that makes sense
- Split a commit in two (or more)
Be sure to follow that last link for a more in-depth look at interactive rebasing.
origin
doesn’t like it when you change history.
If you rebase or --amend
something that you’ve already pushed upstream, origin
will refuse your rewritten commits.
This is because your histories have diverged and are no longer compatible.
You’ll have to push --force
.
DANGER, Will Robinson.
If you use git push --force
without explicitly specifying a remote and branch, it will overwrite the remote branch of
all tracking branches. This usually includes origin/production
and origin/develop
. This is a Really Bad Thing.
Never, ever, ever use git push --force
without specifying git push --force origin mybranchname
.
Ever.
Also, be sure to read the next few bits before you use push --force
Don’t rebase develop
or master
Seriously, it’s not worth it. Rebase feature branches. Rebase hotfixes. Rebase your local unpublished work all you want.
But don’t rebase develop
or master
.
Don’t pee in the pool
Changing history sometimes has far-reaching implications, and if you’re going to go this route, you should realize the consequences of your actions.
Back to the swimlanes for a minute.
If you pee in your own swimlane, you’re the only person who has to deal with it. But as soon as you pee in someone else’s swimlane, you’ve crossed a line.
Be aware of who is downstream
Basically, you should only change history if (1) you are the only one affected, or (2) you hate everyone downstream from you.
Wow. That’s a lot to keep straight.
I know, right? But look! I made a flowchart!