git-annex/doc/design/assistant/deltas.mdwn
2014-07-28 17:11:37 -04:00

27 lines
1.3 KiB
Markdown

Speed up syncing of modified versions of existing files.
One simple way is to find the key of the old version of a file that's
being transferred, so it can be used as the basis for rsync, or any
other similar transfer protocol.
For remotes that don't use rsync, use a rolling checksum based chunker,
such as BuzHash. This will produce [[chunks]], which can be stored on the
remote as regular Keys -- where unlike the fixed size chunk keys, the
SHA256 part of these keys is the checksum of the chunk they contain.
Once that's done, it's easy to avoid uploading chunks that have been sent
to the remote before.
When retriving a new version of a file, there would need to be a way to get
the list of chunk keys that constitute the new version. Probably best to
store this list on the remote. Then there needs to be a way to find which
of those chunks are available in locally present files, so that the locally
available chunks can be extracted, and combined with the chunks that need
to be downloaded, to reconstitute the file.
To find which chucks are locally available, here are 2 ideas:
1. Use a single basis file, eg an old version of the file. Re-chunk it, and
use its chunks. Slow, but simple.
2. Some kind of database of locally available chunks. Would need to be kept
up-to-date as files are added, and as files are downloaded.