Using git annex
I remember being a young pup using Debian and reading the mailing
lists. I was always seeing the name Joey Hess answering the tough
questions. I had great respect for him then, and it has remained to
this day. I have probably used his unclutter
program for ten
years and it has never crashed. I recently installed ikiwiki
to
run this blog, etckeeper
to track changes in my config files, and
I figured that I should try his other recent software: git-annex
.
I came at it with two objectives: to be able to sync my podcasts and to be able to manage my media files better. The second has worked really well, and the first is hopefully getting there soon. I will concentrate on the second for the moment, since that is what I have working the best. There are multiple use-cases described on git-annex's website – I will just go through mine.
All of my media files are on a terabyte external drive. When I am
at home, I have it mounted as an NFS drive on /media/mybook
. I
have videos there, split into tv
and movies
folders, as well as
music in the music
folder. All of the folders are treated the
same, so I will just go through the procedure I used in the music
folder.
First, I set up the annex:
~ $ cd /media/mybook/music /media/mybook/music $ git init . /media/mybook/music $ git annex init mybook-music /media/mybook/music $ emacs .git/config
at this point, I set the backends
keyword in the [annex]
section
to either SHA1E
or SHA256E
. There is a tradeoff there between
speed and safety – it is a bit more likely that two files hash to
the same file with SHA1E
, but it is faster than SHA256E
. The
E
at the end of the hash means that the filename extension is
preserved in the hashing. This is important for some mp3 players
and other programs that can not tell what type a file is without an
extension. The next command takes a long time with a lot of files:
/media/mybook/music $ git annex add . [All the files are hashed and their contents are put into .git/annex] [You will see that all your files are now symlinks to files in .git/annex] /media/mybook/music $ git commit -m "Added my files"
Now it is possible to clone this repository back on the laptop
~ $ git clone /media/mybook/music ~ $ cd music ~/music $ git annex init laptop-music ~/music $ git remote add mybook-music /media/mybook/music ~/music $ cd /media/mybook/music /media/mybook/music $ git remote add laptop-music ~/music
And now all of the cool stuff can begin. You can look inside the
~/music
directory and see that it appears that all your files are
there. However, they are not taking up any space. What is
happening is that they are broken symlinks to objects in
~/music/.git/annex
. If I want to listen to some of the music, I
move it over to the laptop with the following command
~/music $ git annex get --from mybook-music Beatles/
I can now take my laptop on the plane and listen to the Beatles. When I'm back from my trip, I can free up the space by doing
~/music $ git annex drop Beatles/
Suppose I obtain some new music, getting into the Rolling Stones.
Then I can use git annex add
to add the Stones into my collection,
git commit
them and git pull
in the correct repository to
duplicate the information. I can then use git annex copy
or git annex move
to put the files where I want them.
For now, I have a queue of unseen videos and music that I want to
hear on my laptop, while the bulk of my media collection sits on my
external drive. I can take what I want with me and not worry about
my whole collection using up all of my hard drive. I can see what I
have in a particular repository with the git annex find
command.
As I said, there are many other uses for git-annex and features that I have not yet learned. One thing that I am interested in trying out is using the web as a repository.