implement updating the ContentIdentifier db with info from the git-annex branch

untested

This won't be super slow, but it does need to diff two likely large
trees, and since the git-annex branch rarely sits still, it will most
likely be run at the beginning of every import.

A possible speed improvement would be to only run this when the database
did not contain a ContentIdentifier. But that would only speed up
imports when there is no new version of a file on the special remote,
at most renames of existing files being imported.

A better speed improvement would be to record something in the git-annex
branch that indicates when an import has been run, and only do the diff
if the git-annex branch has record of a newer import than we've seen
before. Then, it would only run when there is in fact new
ContentIdentifier information available from a remote. Certianly doable,
but didn't want to complicate things yet.
This commit is contained in:
Joey Hess 2019-03-06 18:04:30 -04:00
parent 12e4906657
commit ee251b2e2e
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 72 additions and 24 deletions

View file

@ -17,19 +17,9 @@ this.
* Need to support annex-tracking-branch configuration, which documentation
says makes git-annex sync and assistant do imports.
* Database.ContentIdentifier needs a way to update the database with
information coming from the git-annex branch. This will allow multiple
clones to import from the same remote, and share content identifier
information amoung them.
It will only need to be updated when listContents returns a
ContentIdentifier that is not already known in the database.
How to do the update: Stash the ref of the last git-annex branch it's
updated from in the database. Diff between that ref and the current
git-annex branch. For each file in the diff that's a .cid file, read
the file from the branch, and store into the database.
Update the stashed ref.
* Test behavior when multiple repos import from same special remote;
the second importer should not re-download as long as it has pulled
from the first importer.
* When on an adjusted unlocked branch, need to import the files unlocked.
Also, the tracking branch code needs to know about such branches,