How to perform a gentle migration to Git from SVN

There are many reasons to use Git. There are lots of different ways of migrating from SVN to Git. After experimenting with a few I found the wonderful git svn command to be the perfect tool to bring all our SVN history to our Git repository slowly and surely. Here's how I did it.

Assumptions and Goals

I have an existing SVN repository that a team of four or five developers have been contributing to on a regular basis for the last few years. We have a main development trunk that we regularly update from and commit to. We want to migrate all of our code over to a Git repository, but until we have our Jenkins/Nexus/Bugzilla/etc hooked up to Git, we want to continue using Subversion. Any commits to Subversion should end up in Git as well. Once we can hook up Jenkins/Nexus/Bugzilla/etc to Git, we can retire Subversion.

Create an authors file

The point of this is to match up our Git authors with the SVN authors. We'll be creating an authors.txt file that we'll later feed to Git when we initialize the repository.

First I checked out the trunk of our SVN repository. You can skip this step if you already a local copy checked out.

$ mkdir svn-migration
$ cd svn-migration
$ svn co http://build.company.intra/svn/repo/web-project/project-parent/trunk

Then wait a few minutes to check everything out. Now we'll go into the trunk and run the svn log command feeding it into grep or awk to generate our authors. There are two commands I've seen floating around the internet, the latter that worked for me.

$ cd trunk
$ svn log --xml | grep author | sort -u | perl -pe 's/.*>(.*?)<.*/$1 = /'

or

$ cd trunk
$ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors.txt

After a few minutes I ended up with a file that looked roughly like this:

$ more authors.txt
alices = alices <alices>
jimb = jimb <jimb>
susanq = susanq <susanq>
bobo = bobo <bobo>
stephen = stephen <stephen>
katyp = katyp <katyp>

However Git prefers a slightly different format, so I manually modified the authors.txt file to look like this:

alices = Alice Swoo <alice.swoo@company.com>
jimb = Jim Bimbo <jim.bimbo@company.com>
susanq = Susan Queue <susan.queue@company.com>
bobo = Bob Obo <bob.obo@company.com>
stepheny = Stephen Yep <stephen.yep@company.com>
katyp = Katy Perry <katy.perry@company.com>

Note that if you have issues where you don't have an author, you can add this line to your authors.txt

(no author) = noauthor <noauthor@noauthor>

Configure your Git settings

Make sure Git knows about you. I ran

$ git config --list

and it returned nothing. Time to add myself

$ git config --global user.name "Stephen Yep"
$ git config --global user.email "stephen.yep@company.com"

I also told Git about the authors.txt file and confirmed it was there:

$ git config --global svn.authorsfile authors.txt
$ git config --list
user.name=Stephen Yep
user.email=stephen.yep@company.com
svn.authorsfile=authors.txt

Try out that git svn command

Here's where I ran into trouble. I created a new folder where I was going to initialize the Git repository and fetch from the existing SVN for the first time.

$ git svn
git: 'svn' is not a git command. See 'git --help'.

Did you mean one of these?
    fsck
    mv
    show

(Note that on some versions of git, the hyphen is necessary, so the command may be git-svn)

If I was on a Ubuntu machine or a flavor of Linux that had apt-get as the package manager I could have just ran sudo apt-get install git-svn. But I was on a RHEL 6.5 system. I tried all sorts of things. I upgraded my version of Git. I compiled Git from the source. I upgraded my version of SVN. I banged my head on the desk. At one point I got something like this:

$ git svn
Can't locate SVN/Core.pm in @INC (@INC contains:      /usr/lib/perl5/site_perl/5.10.0

Then I realized I could try this command out on a Windows machine instead of the RHEL 6.5 machine.

$ git --version
git version 1.9.5.msysgit.1
$ git svn
fatal: Not a git repository (or any of the parent directories): .git
Unable to find .git directory

Okay, great, the command is installed and available to use. Let's carry on then.

$ mkdir new-git-repo-from-svn
$ cd new-git-repo-from-svn
$ git svn init http://build.company.intra/svn/repo/web-project/project-parent/trunk --stdlayout

It then asked me for my username and password for SVN. Then it ran through a bunch of steps searching SVN for authors and revisions and getting a sense of what was in there. Once you do the git svn init you'll still end up with an empty directory, so you follow it with a git svn fetch to actually start sucking down the files from SVN and matching up the revisions and authors.

$ git svn fetch
(many hours later)

For a big project this may be the longest step. What's convenient is that you can run it multiple times and pick up where you left off.

Once it's all done you should have your familiar project structure in the new folder, only this time it's managed by Git and Subversion. You will not see any .svn files though. I confirmed there were no remotes or branches set up:

$ git remote -v
(nothing listed)
$ git branch
* master

Okay. Perfect. Since we're going to stick this file into our company's internal GitLab instance, I'll add the remote and push it there for the first time.

$ git remote add origin git@gitlab.company.intra:GroupName/our-super-rad-project.git
$ git push -u origin master

I checked our GitLab page and was happy to see our first push. And you'll notice that there's 3,017 commits, so all our SVN history came with it.

Keeping Git in sync with SVN

Okay, so now that I have the Git repository set up, imagine a few days pass and the developers on my team continue to commit to our "old fashioned" SVN repository. How do I keep the Git repository up to date?

Well, in the same repository that I performed the git svn init in, we'll do a few more commands to bring the Git repository up to date:

(something changed in SVN)
$ git svn fetch
$ git svn rebase
$ git push

Cool, now if I look at GitLab (where I just pushed the changes to) I'll see the latest SVN commits there:

Next steps

This concludes the gentle migration to Git from SVN. Our team is still going to keep committing to SVN because we have other tools (like Maven, Nexus, Jenkins, and Bugzilla) that depend on SVN, but now we have an identical Git repository that we can start interfacing Maven, Nexus, Jenkins, and Bugzilla to, allowing us to perform a gentle migration to Git.

A note on the--no-metadata parameter for git init

I came across many migration guides that suggested doing a git init like this:

$ git svn init http://build.company.intra/svn/repo/web-project/project-parent/trunk --stdlayout --no-metadata
$ git svn fetch

The problem is that if SVN changes after the initial fetch, Git can't figure that out. That means these commands don't work, if you included that --no-metadata parameter:

$ git svn log .
Unable to determine upstream SVN information from working tree history
$ git svn info .
Unable to determine upstream SVN information from working tree history

So I would recommend against that parameter. The SVN metadata lets you match up SVN revision numbers to Git commits, say for bug tracking purposes. It won't litter your project with .svn folders either, so I don't see any reason not to use it.

Resources