Changing history, or How to Git pretty

OpenSky’s engineering and product teams have an ongoing lunchtime presentation series called Lunch and Learn. A couple of weeks ago, I gave a talk entitled “Lunch and learn2git”. This article is an expansion on that presentation, and covers lessons learned from using Git in the open source community.

Lesson one: Code is messy

… what with all this branching.

OpenSky project branches

The above graph represents approximately a week of development in OpenSky’s primary repository. In a given two-week period, we branch (and merge) between 75 and 150 discrete feature branches. We began using internal pull requests 77 days ago, and we just closed pull request 718. As you can imagine, it gets a little crazy.

Swimlanes make sense of the chaos

One of my favorite ways to wrap my head around development graphs like this comes from Vincent Driessen’s git-flow branching model. It takes the standard graph visualization and turns it into swimlanes:

Yay swimlanes!

The full article is well worth a read. For the purposes of our discussion today, it is probably enough to know that at OpenSky we use a combination of git-flow and GitHub flow, which is a bit more relevant for the topic at hand.

git-flow

At a high level, git-flow enforces a branching model with two primary branches: develop and production. All development, unsurprisingly, happens in the develop branch. More specifically, it happens in feature branches which are merged back into develop. This develop branch is always production-ready. It is periodically tagged and merged into production, and released into the wild.

So rather than worrying about the hundred-ish currently active branches in our repo, all I care about are three: develop, production and whatever feature branch I’m currently working in.

With this mental filter in place, our repository becomes much easier to understand:

git-flow swimlanes that I actually care about

GitHub flow

Scott Chacon from GitHub posted a response to git-flow, which outlines GitHub’s development workflow. It’s similar, but optimized for GitHub’s style of continuous deployment.

Like git-flow, GitHub’s workflow involves a production-ready branch, but in this case it’s the master branch. All features branch from master, and development happens there. When a feature is ready for release, the developer opens a pull request, inviting code review and feedback. Once the feature has had enough eyeballs on it, the pull request is merged into master and released.

While we’re not (yet) doing continuous deployment, OpenSky has moved toward a GitHub-style workflow, and the primary reason is pull requests.

Pull requests

These are the big awesome in GitHub’s workflow.

GitHub has an amazing code review system called Pull Requests that I fear not enough people know about. Many people use it for open source work - fork a project, update the project, send a pull request to the maintainer. However, it can also easily be used as an internal code review system, which is what we do.

Actually, we use it more as a branch conversation view more than a pull request. You can send pull requests from one branch to another in a single project (public or private) in GitHub, so you can use them to say “I need help or review on this” in addition to “Please merge this in”.

Read Scott’s article for more reasons you should use pull requests as a regular part of your development workflow. We do, and they have become one of the best code review and collaboration tools we use.

Lesson two: Pretty pull requests get merged

OpenSky pull request dynamics mirror those that I’ve observed in open source projects. In both cases, as a developer you have one primary objective:

You’ve made something awesome, and you want to contribute.

Great! We — OpenSky and open source — welcome your contribution. But in the attention economy we’ve built, we have to face a tough reality: beauty really does matter.

Given a queue full of pull requests, the easiest ones to grok will be reviewed first, will get more1 feedback, and as a result, will be merged first.

So how do you make your pull request pretty?

Make the most of your commit message

A model commit message looks something like this:

Capitalized, short (50 chars or less) summary

More detailed explanatory text, if necessary.  Wrap it to about 72
characters or so.  In some contexts, the first line is treated as the
subject of an email and the rest of the text as the body.  The blank
line separating the summary from the body is critical (unless you omit
the body entirely); tools like rebase can get confused if you run the
two together.

Write your commit message in the present tense: "Fix bug" and not "Fixed
bug."  This convention matches up with commit messages generated by
commands like git merge and git revert.

Further paragraphs come after blank lines.

 - Bullet points are okay, too

 - Typically a hyphen or asterisk is used for the bullet, preceded by a
   single space, with blank lines in between, but conventions vary here

 - Use a hanging indent

The summary line is critical. GitHub’s recent design changes enforce this brevity as well. If you can’t find a way to summarize your commit in 50 characters or less, you probably should make two commits instead.

  • Err on the side of bullets.
  • They’re concise, descriptive, and easier to grok at a glance than prose.
  • Too many bullets? You probably should make two commits instead.

Make exactly as many commits as you need

Too many is confusing. Too few will result in epic diffs. So strive for just the right number.

But this is easier said than done, right? For now, just commit a bit too often. I’ll show you how to clean that up in a minute.

Nobody likes a drive-by

Each commit should do one thing. Each pull request should do one (bigger) thing.

Obviously, the exact definition of “one thing” is very context-dependant, but it’s a solid principle. A few examples:

  • If you’re editing a file and you reformat whitespace on the lines you change, that’s great. It belongs in your commit. If you reformat all the whitespace in the file, that’s a second “thing”, and belongs in a second commit. Maintaining this distinction will make your life easier when you need someone to review your pull request, or when you have to resolve the next failed merge.

    Sometimes your whitespace drive-by is accidental. Perhaps you have an awesome IDE which likes to reformat things for you. Maybe you accidentally did git add . instead of staging individual files like a responsible adult. It’s still your mess. Take responsibility.

  • A feature drive-by is also bad, but this is a harder line to draw. If you make disparate functional changes in a single commit, it’s nearly impossible to write a pretty first line for the commit message. Similarly, many open source projects will refuse to accept pull requests containing multiple features, even if they would accept all the changes independently. Maintain a clear purpose for the code in each commit, and for each of the the commits in your feature branch, and for your eventual pull request. If you come across something unrelated to change, git stash what you’re working on and commit the other change where it belongs.

  • Use gitx or git add -p2 to stage partial-file changesets. I often end up with file changes that should ideally be split into multiple commits, or even committed into multiple branches. Partial-file commits are a lifesaver.

    Use them.

    On a related note, git add . and git commit -a are the enemy of clean commits3.

Don’t bite off more than I can chew

If you send me an epic pull request, I’m likely to stare blankly at it for a second then move on to a shorter one. I’ll come back to it, but if it’s too intense there’s a distinct possibility that it’ll stay in the tl;dr queue indefinitely.

Try to isolate minimum viable pull requests. Can you release anything right now?

There are undoubtedly groups of functionality which are pre-requisites for your feature, which can be released independently of your feature. For example, if you need to refactor a heavy controller into a service before you can write a related controller or add new functionality, then break that into a discrete pull request. As long as it doesn’t break current code, this sort of thing could be released at any time. Open a pull request now, and it will most likely be merged into production by the time you’re finishing up your primary feature.

Lesson three: How to change history

So you have a mess on your hands.

Maybe you took my advice back there and committed too many times. Maybe you merged develop into your feature branch way too often and your history is littered with unnecessary merge commits. Maybe you opened a pull request and my only response was “Arrrgghhhh! Rebase!”

We can fix that.

Here are a few of the many ways to change git history, along with a light smattering of advice and best practices.

Revert, what?

In Git, revert doesn’t do what you think it does.

You can think of git revert as an anti-commit. If you added a file in the last commit, git revert HEAD^ will delete that file in the next commit. If you removed a line in 123, git revert 123 will add that line again. It’s not doing this by forgetting about the original commit, it’s doing it by committing a brand new anti-commit to cancel out the effects of the old one.

Guess what happens if you try to git revert a merge?

Yeah. Don’t do that.

There’s a good chance one of the next few options is a better choice for your current needs. But sometimes you want to use git revert anyway.

Your last commit can be --amended

Did you forget a file, or write a lame commit message for your last commit? As long as you haven’t pushed your latest commit upstream, you can use git commit --amend to fix it up.

git add myawesomefile.txt
git commit --amend

or…

git commit --amend -m "With a better message..."

Just throw the last commit away

If you committed something dumb, you can actually just remove it. This will remove both the commit and all changes in your code… but if that’s what you want, simply git reset --hard HEAD^.

Bam. Your last commit is gone.

If you’ve been on a dumb-commit rampage for a while, and need to get rid of more, git reset --hard HEAD~N will disappear N commits.

You can’t use this method to get rid of commits if you’ve already already pushed ‘em upstream. If published, these commits are like those stupid trick birthday candles that you can’t ever really blow out. The next time you pull, your commits will be merged back into your current branch.

Reset and re-commit

This is the second easiest method of changing history. If you haven’t published your changes, sometimes the easiest thing to do is pretend you never committed your changes. For example, git reset HEAD^ will put your pointer back the way it was right before your last commit — uncommitted changes and all — giving you a mulligan on staging and committing your changes.

Most of the time you will use git reset COMMITISH, where COMMITISH is any commit, tag or branch name in recent history.

You shouldn’t use the “reset and re-commit” method to fix up any commits which have already been published, as you’re guaranteed to give everyone downstream merge conflicts and rage issues.

Rebase before (or after) opening your pull request

A simple rebase can be used to remove your merge commits and make like they never happened. Assuming you’re working in the feature/awesomesauce feature branch, you can git rebase origin/develop to re-apply all of your awesomesauce changes onto the current HEAD of the develop branch.

For most things, you’ll use the more complicated “interactive rebase”. Among other things, an interactive rebase will allow you to

  • Fix commit messages
  • Permanently remove a commit
  • Squash commits into something that makes sense
  • Split a commit in two (or more)

Be sure to follow that last link for a more in-depth look at interactive rebasing.

origin doesn’t like it when you change history.

If you rebase or --amend something that you’ve already pushed upstream, origin will refuse your rewritten commits. This is because your histories have diverged and are no longer compatible.

You’ll have to push --force.

DANGER, Will Robinson.

If you use git push --force without explicitly specifying a remote and branch, it will overwrite the remote branch of all tracking branches. This usually includes origin/production and origin/develop. This is a Really Bad Thing.

Never, ever, ever use git push --force without specifying git push --force origin mybranchname.

Ever.

Also, be sure to read the next few bits before you use push --force

Don’t rebase develop or master

Seriously, it’s not worth it. Rebase feature branches. Rebase hotfixes. Rebase your local unpublished work all you want.

But don’t rebase develop or master.

Don’t pee in the pool

Changing history sometimes has far-reaching implications, and if you’re going to go this route, you should realize the consequences of your actions.

Back to the swimlanes for a minute.

Peeing in your own swimlane: gross, but your problem

Peeing in my swimlane: crossing the line

If you pee in your own swimlane, you’re the only person who has to deal with it. But as soon as you pee in someone else’s swimlane, you’ve crossed a line.

Be aware of who is downstream

Basically, you should only change history if (1) you are the only one affected, or (2) you hate everyone downstream from you.

Wow. That’s a lot to keep straight.

I know, right? But look! I made a flowchart!

Cleaning up your own mess

  1. Better?

  2. And its pals: git commit -p, git checkout -p, git reset -p and git stash -p.

  3. And the friend of committing .swp files.