105 lines
3.8 KiB
Markdown
105 lines
3.8 KiB
Markdown
Here's how to set up a local cache of annexed files, that can be used
|
|
to avoid repeated downloads.
|
|
|
|
An example use case: Your CI system is operating on a git-annex repository,
|
|
so every time it runs it makes a fresh clone of the repository and uses
|
|
`git-annex get` to download a lot of data into it.
|
|
|
|
We'll create a cache repository, set it as a remote of the other git-annex
|
|
repositories, and configure git-annex to check the cache first before other
|
|
more expensive ways of retrieving content. The cache can be cleaned out
|
|
whenever you like with simple unix commands.
|
|
|
|
Some other nice properties -- When used on a system like BTRFS with COW
|
|
support, content from the cache can populate multiple other repositories
|
|
without using any additional disk space. And, git-annex repositories that
|
|
are otherwise unrelated can share use of the cache if they happen to
|
|
contain a common file.
|
|
|
|
You'll need git-annex 6.20180802 or newer to follow these instructions.
|
|
|
|
## creating the cache
|
|
|
|
First let's create a new, empty git-annex repository. It will be put in
|
|
~/.annex-cache in the example, but for best results, put it in the same
|
|
filesystem as your other git-annex repositories.
|
|
|
|
git init --bare ~/.annex-cache
|
|
cd ~/.annex-cache
|
|
git annex init
|
|
git config annex.hardlink true
|
|
git annex untrust here
|
|
|
|
The cache does not need to be a git annex repository; any kind of special
|
|
remote can be used as a cache too. But, using a git repository lets
|
|
annex.hardlink be used to make hard links between the cache and
|
|
repositories using it.
|
|
|
|
The cache is made untrusted, because its contents can be cleaned at any
|
|
time; other repositories should not trust it to retain content.
|
|
|
|
## making repositories use the cache
|
|
|
|
Now in each git-annex repository that you want to use the cache, add it as
|
|
a remote, and configure it as follows:
|
|
|
|
cd my-repository
|
|
git remote add cache ~/.annex-cache
|
|
git config remote.cache.annex-speculate-present true
|
|
git config remote.cache.annex-cost 10
|
|
git config remote.cache.annex-pull false
|
|
git config remote.cache.annex-push false
|
|
git config remote.cache.fetch do-not-fetch-from-this-remote:
|
|
|
|
The annex-speculate-present setting is the essential part. It makes
|
|
git-annex know that the cache repository may contain the content of any
|
|
annexed file. So, when getting a file, git-annex will try the cache
|
|
repository first.
|
|
|
|
The low annex-cost makes git-annex try to get content from the cache remote
|
|
before any other remotes.
|
|
|
|
The annex-pull and annex-push settings prevent `git-annex sync` from
|
|
pulling and pushing to the remote, and the remote.cache.fetch setting
|
|
further prevents git commands from fetching from it or pushing to it. The
|
|
cache repository will remain an empty git repository (except for the
|
|
content of annexed files). This means that the same cache can be used with
|
|
multiple different git-annex repositories, without intermingling their git
|
|
data.
|
|
|
|
## populating the cache
|
|
|
|
For the cache to be used, you need to get file contents into it somehow.
|
|
A simple way to do that is, in a git-annex repository that already
|
|
contains the content of files:
|
|
|
|
git annex copy --to cache
|
|
|
|
You could run that anytime after you get content. There are also ways to
|
|
automate it, but getting some files into the cache manually is a good
|
|
enough start.
|
|
|
|
## cleaning the cache
|
|
|
|
You safely can remove content from the cache at any time to free up disk
|
|
space.
|
|
|
|
To remove everything:
|
|
|
|
cd ~/.annex-cache
|
|
git annex drop --force
|
|
|
|
To remove files that have not been requested from the cache for the past day:
|
|
|
|
cd ~/.annex-cache
|
|
git annex drop --force --not --accessedwithin=1d
|
|
|
|
## automatically populating the cache
|
|
|
|
The assistant can be used to automatically populate the cache with files
|
|
that git-annex downloads into a repository.
|
|
|
|
## more caches
|
|
|
|
The example above used a local cache on the same system. However, it's also
|
|
possible to have a cache repository shared amoung computers on a LAN.
|