Using git annex

I remember being a young pup using Debian and reading the mailing lists. I was always seeing the name Joey Hess answering the tough questions. I had great respect for him then, and it has remained to this day. I have probably used his unclutter program for ten years and it has never crashed. I recently installed ikiwiki to run this blog, etckeeper to track changes in my config files, and I figured that I should try his other recent software: git-annex.

I came at it with two objectives: to be able to sync my podcasts and to be able to manage my media files better. The second has worked really well, and the first is hopefully getting there soon. I will concentrate on the second for the moment, since that is what I have working the best. There are multiple use-cases described on git-annex's website – I will just go through mine.

All of my media files are on a terabyte external drive. When I am at home, I have it mounted as an NFS drive on /media/mybook. I have videos there, split into tv and movies folders, as well as music in the music folder. All of the folders are treated the same, so I will just go through the procedure I used in the music folder.

First, I set up the annex:

~ $ cd /media/mybook/music
/media/mybook/music $ git init .
/media/mybook/music $ git annex init mybook-music
/media/mybook/music $ emacs .git/config

at this point, I set the backends keyword in the [annex] section to either SHA1E or SHA256E. There is a tradeoff there between speed and safety – it is a bit more likely that two files hash to the same file with SHA1E, but it is faster than SHA256E. The E at the end of the hash means that the filename extension is preserved in the hashing. This is important for some mp3 players and other programs that can not tell what type a file is without an extension. The next command takes a long time with a lot of files:

/media/mybook/music $ git annex add .
[All the files are hashed and their contents are put into .git/annex]
[You will see that all your files are now symlinks to files in .git/annex]
/media/mybook/music $ git commit -m "Added my files"

Now it is possible to clone this repository back on the laptop

~ $ git clone /media/mybook/music
~ $ cd music
~/music $ git annex init laptop-music
~/music $ git remote add mybook-music /media/mybook/music
~/music $ cd /media/mybook/music
/media/mybook/music $ git remote add laptop-music ~/music

And now all of the cool stuff can begin. You can look inside the ~/music directory and see that it appears that all your files are there. However, they are not taking up any space. What is happening is that they are broken symlinks to objects in ~/music/.git/annex. If I want to listen to some of the music, I move it over to the laptop with the following command

~/music $ git annex get --from mybook-music Beatles/

I can now take my laptop on the plane and listen to the Beatles. When I'm back from my trip, I can free up the space by doing

~/music $ git annex drop Beatles/

Suppose I obtain some new music, getting into the Rolling Stones. Then I can use git annex add to add the Stones into my collection, git commit them and git pull in the correct repository to duplicate the information. I can then use git annex copy or git annex move to put the files where I want them.

For now, I have a queue of unseen videos and music that I want to hear on my laptop, while the bulk of my media collection sits on my external drive. I can take what I want with me and not worry about my whole collection using up all of my hard drive. I can see what I have in a particular repository with the git annex find command.

As I said, there are many other uses for git-annex and features that I have not yet learned. One thing that I am interested in trying out is using the web as a repository.

blog comments powered by Disqus