Changing history, or How to Git pretty
OpenSky’s engineering and product teams have an ongoing lunchtime presentation series called Lunch and Learn. A couple of weeks ago, I gave a talk entitled “Lunch and learn2git”. This article is an expansion on that presentation, and covers lessons learned from using Git in the open source community.
Lesson one: Code is messy
… what with all this branching.
The above graph represents approximately a week of development in OpenSky’s primary repository. In a given two-week period, we branch (and merge) between 75 and 150 discrete feature branches. We began using internal pull requests 77 days ago, and we just closed pull request 718. As you can imagine, it gets a little crazy.
Swimlanes make sense of the chaos
One of my favorite ways to wrap my head around development graphs like this comes from
git-flow branching model. It takes the standard graph visualization and turns it into swimlanes:
The full article is well worth a read. For the purposes of our discussion today, it is probably enough to know that at OpenSky we use a combination of git-flow and GitHub flow, which is a bit more relevant for the topic at hand.
At a high level,
git-flow enforces a branching model with two primary branches:
development, unsurprisingly, happens in the
develop branch. More specifically, it happens in feature branches which
are merged back into
develop branch is always production-ready. It is periodically tagged and merged
production, and released into the wild.
So rather than worrying about the hundred-ish currently active branches in our repo, all I care about are three:
production and whatever
feature branch I’m currently working in.
With this mental filter in place, our repository becomes much easier to understand:
Scott Chacon from GitHub posted a response to git-flow, which outlines GitHub’s development workflow. It’s similar, but optimized for GitHub’s style of continuous deployment.
git-flow, GitHub’s workflow involves a production-ready branch, but in this case it’s the
master branch. All
features branch from
master, and development happens there. When a feature is ready for release, the developer opens a
pull request, inviting code review and feedback. Once the feature has had enough eyeballs on it, the pull request is
master and released.
While we’re not (yet) doing continuous deployment, OpenSky has moved toward a GitHub-style workflow, and the primary reason is pull requests.
These are the big awesome in GitHub’s workflow.
GitHub has an amazing code review system called Pull Requests that I fear not enough people know about. Many people use it for open source work – fork a project, update the project, send a pull request to the maintainer. However, it can also easily be used as an internal code review system, which is what we do.
Actually, we use it more as a branch conversation view more than a pull request. You can send pull requests from one branch to another in a single project (public or private) in GitHub, so you can use them to say “I need help or review on this” in addition to “Please merge this in”.
Read Scott’s article for more reasons you should use pull requests as a regular part of your development workflow. We do, and they have become one of the best code review and collaboration tools we use.
Lesson two: Pretty pull requests get merged
OpenSky pull request dynamics mirror those that I’ve observed in open source projects. In both cases, as a developer you have one primary objective:
You’ve made something awesome, and you want to contribute.
Great! We — OpenSky and open source — welcome your contribution. But in the attention economy we’ve built, we have to face a tough reality: beauty really does matter.
Given a queue full of pull requests, the easiest ones to grok will be reviewed first, will get more1 feedback, and as a result, will be merged first.
So how do you make your pull request pretty?
Make the most of your commit message
A model commit message looks something like this:
Capitalized, short (50 chars or less) summary More detailed explanatory text, if necessary. Wrap it to about 72 characters or so. In some contexts, the first line is treated as the subject of an email and the rest of the text as the body. The blank line separating the summary from the body is critical (unless you omit the body entirely); tools like rebase can get confused if you run the two together. Write your commit message in the present tense: "Fix bug" and not "Fixed bug." This convention matches up with commit messages generated by commands like git merge and git revert. Further paragraphs come after blank lines. - Bullet points are okay, too - Typically a hyphen or asterisk is used for the bullet, preceded by a single space, with blank lines in between, but conventions vary here - Use a hanging indent
The summary line is critical. GitHub’s recent design changes enforce this brevity as well. If you can’t find a way to summarize your commit in 50 characters or less, you probably should make two commits instead.
- Err on the side of bullets.
- They’re concise, descriptive, and easier to grok at a glance than prose.
- Too many bullets? You probably should make two commits instead.
Make exactly as many commits as you need
Too many is confusing. Too few will result in epic diffs. So strive for just the right number.
But this is easier said than done, right? For now, just commit a bit too often. I’ll show you how to clean that up in a minute.
Nobody likes a drive-by
Each commit should do one thing. Each pull request should do one (bigger) thing.
Obviously, the exact definition of “one thing” is very context-dependant, but it’s a solid principle. A few examples:
If you’re editing a file and you reformat whitespace on the lines you change, that’s great. It belongs in your commit. If you reformat all the whitespace in the file, that’s a second “thing”, and belongs in a second commit. Maintaining this distinction will make your life easier when you need someone to review your pull request, or when you have to resolve the next failed merge.
Sometimes your whitespace drive-by is accidental. Perhaps you have an awesome IDE which likes to reformat things for you. Maybe you accidentally did
git add .instead of staging individual files like a responsible adult. It’s still your mess. Take responsibility.
A feature drive-by is also bad, but this is a harder line to draw. If you make disparate functional changes in a single commit, it’s nearly impossible to write a pretty first line for the commit message. Similarly, many open source projects will refuse to accept pull requests containing multiple features, even if they would accept all the changes independently. Maintain a clear purpose for the code in each commit, and for each of the the commits in your feature branch, and for your eventual pull request. If you come across something unrelated to change,
git stashwhat you’re working on and commit the other change where it belongs.
git add -p2 to stage partial-file changesets. I often end up with file changes that should ideally be split into multiple commits, or even committed into multiple branches. Partial-file commits are a lifesaver.
On a related note,
git add .and
git commit -aare the enemy of clean commits3.
Don’t bite off more than I can chew
If you send me an epic pull request, I’m likely to stare blankly at it for a second then move on to a shorter one.
I’ll come back to it, but if it’s too intense there’s a distinct possibility that it’ll stay in the
tl;dr queue indefinitely.
Try to isolate minimum viable pull requests. Can you release anything right now?
There are undoubtedly groups of functionality which are pre-requisites for your feature, which can be released independently of your feature. For example, if you need to refactor a heavy controller into a service before you can write a related controller or add new functionality, then break that into a discrete pull request. As long as it doesn’t break current code, this sort of thing could be released at any time. Open a pull request now, and it will most likely be merged into production by the time you’re finishing up your primary feature.
Lesson three: How to change history
So you have a mess on your hands.
Maybe you took my advice back there and committed too many times. Maybe you merged
develop into your feature branch way
too often and your history is littered with unnecessary merge commits. Maybe you opened a pull request and my only
response was “Arrrgghhhh! Rebase!”
We can fix that.
Here are a few of the many ways to change git history, along with a light smattering of advice and best practices.
In Git, revert doesn’t do what you think it does.
You can think of
git revert as an anti-commit. If you added a file in the last commit,
git revert HEAD^ will delete
that file in the next commit. If you removed a line in
git revert 123 will add that line again. It’s not doing this
by forgetting about the original commit, it’s doing it by committing a brand new anti-commit to cancel out the effects
of the old one.
Guess what happens if you try to
git revert a merge?
Yeah. Don’t do that.
There’s a good chance one of the next few options is a better choice for your current needs. But sometimes you want to
git revert anyway.
Your last commit can be
Did you forget a file, or write a lame commit message for your last commit? As long as you haven’t pushed your latest
commit upstream, you can use
git commit --amend to fix it up.
git add myawesomefile.txt git commit --amend
git commit --amend -m "With a better message..."
Just throw the last commit away
If you committed something dumb, you can actually just remove it. This will remove both the commit and all changes in
your code… but if that’s what you want, simply
git reset --hard HEAD^.
Bam. Your last commit is gone.
If you’ve been on a dumb-commit rampage for a while, and need to get rid of more,
git reset --hard HEAD~N will
disappear N commits.
You can’t use this method to get rid of commits if you’ve already already pushed ‘em upstream. If published, these commits are like those stupid trick birthday candles that you can’t ever really blow out. The next time you pull, your commits will be merged back into your current branch.
Reset and re-commit
This is the second easiest method of changing history. If you haven’t published your changes, sometimes the easiest
thing to do is pretend you never committed your changes. For example,
git reset HEAD^ will put your pointer back the
way it was right before your last commit — uncommitted changes and all — giving you a mulligan on staging and committing
Most of the time you will use
git reset COMMITISH, where
COMMITISH is any commit, tag or branch name in recent history.
You shouldn’t use the “reset and re-commit” method to fix up any commits which have already been published, as you’re guaranteed to give everyone downstream merge conflicts and rage issues.
Rebase before (or after) opening your pull request
A simple rebase can be used to remove your merge commits and make like they never happened. Assuming you’re
working in the
feature/awesomesauce feature branch, you can
git rebase origin/develop to re-apply all of your
awesomesauce changes onto the current
HEAD of the
For most things, you’ll use the more complicated “interactive rebase”. Among other things, an interactive rebase will allow you to
- Fix commit messages
- Permanently remove a commit
- Squash commits into something that makes sense
- Split a commit in two (or more)
Be sure to follow that last link for a more in-depth look at interactive rebasing.
origin doesn’t like it when you change history.
If you rebase or
--amend something that you’ve already pushed upstream,
origin will refuse your rewritten commits.
This is because your histories have diverged and are no longer compatible.
You’ll have to
DANGER, Will Robinson.
If you use
git push --force without explicitly specifying a remote and branch, it will overwrite the remote branch of
all tracking branches. This usually includes
origin/develop. This is a Really Bad Thing.
Never, ever, ever use
git push --force without specifying
git push --force origin mybranchname.
Also, be sure to read the next few bits before you use
Seriously, it’s not worth it. Rebase feature branches. Rebase hotfixes. Rebase your local unpublished work all you want.
But don’t rebase
Don’t pee in the pool
Changing history sometimes has far-reaching implications, and if you’re going to go this route, you should realize the consequences of your actions.
Back to the swimlanes for a minute.
If you pee in your own swimlane, you’re the only person who has to deal with it. But as soon as you pee in someone else’s swimlane, you’ve crossed a line.
Be aware of who is downstream
Basically, you should only change history if (1) you are the only one affected, or (2) you hate everyone downstream from you.
Wow. That’s a lot to keep straight.
I know, right? But look! I made a flowchart!