Here's how to set up a local cache of annexed files, that can be used to avoid repeated downloads. An example use case: Your CI system is operating on a git-annex repository, so every time it runs it makes a fresh clone of the repository and uses `git-annex get` to download a lot of data into it. We'll create a cache repository, set it as a remote of the other git-annex repositories, and configure git-annex to check the cache first before other more expensive ways of retrieving content. The cache can be cleaned out whenever you like with simple unix commands. Some other nice properties -- When used on a system like BTRFS with COW support, content from the cache can populate multiple other repositories without using any additional disk space. And, git-annex repositories that are otherwise unrelated can share use of the cache if they happen to contain a common file. You'll need git-annex 6.20180802 or newer to follow these instructions. ## creating the cache First let's create a new, empty git-annex repository. It will be put in ~/.annex-cache in the example, but for best results, put it in the same filesystem as your other git-annex repositories. git init --bare ~/.annex-cache cd ~/.annex-cache git annex init git config annex.hardlink true git annex untrust here The cache does not need to be a git annex repository; any kind of special remote can be used as a cache too. But, using a git repository lets annex.hardlink be used to make hard links between the cache and repositories using it. The cache is made untrusted, because its contents can be cleaned at any time; other repositories should not trust it to retain content. ## making repositories use the cache Now in each git-annex repository that you want to use the cache, add it as a remote, and configure it as follows: cd my-repository git remote add cache ~/.annex-cache git config remote.cache.annex-speculate-present true git config remote.cache.annex-cost 10 git config remote.cache.annex-pull false git config remote.cache.annex-push false git config remote.cache.fetch do-not-fetch-from-this-remote: The annex-speculate-present setting is the essential part. It makes git-annex know that the cache repository may contain the content of any annexed file. So, when getting a file, git-annex will try the cache repository first. The low annex-cost makes git-annex try to get content from the cache remote before any other remotes. The annex-pull and annex-push settings prevent `git-annex sync` from pulling and pushing to the remote, and the remote.cache.fetch setting further prevents git commands from fetching from it or pushing to it. The cache repository will remain an empty git repository (except for the content of annexed files). This means that the same cache can be used with multiple different git-annex repositories, without intermingling their git data. ## populating the cache For the cache to be used, you need to get file contents into it somehow. A simple way to do that is, in a git-annex repository that already contains the content of files: git annex copy --to cache You could run that anytime after you get content. There are also ways to automate it, but getting some files into the cache manually is a good enough start. ## cleaning the cache You safely can remove content from the cache at any time to free up disk space. To remove everything: cd ~/.annex-cache git annex drop --force To remove files that have not been requested from the cache for the past day: cd ~/.annex-cache git annex drop --force --not --accessedwithin=1d ## automatically populating the cache The assistant can be used to automatically populate the cache with files that git-annex downloads into a repository. ## more caches The example above used a local cache on the same system. However, it's also possible to have a cache repository shared amoung computers on a LAN.