Works, more or less. --dead is not implemented, and so far a new branch
is made, but keys no longer present anywhere are not scrubbed.
git annex sync fails to push the synced/git-annex branch after a forget,
because it's not a fast-forward of the existing synced branch. Could be
fixed by making git-annex sync use assistant-style sync branches.
The reversion was that, if a file was git rm'd, but still in branches, it
would not be seen as used. Looking at both the added and the removed (or
changed) files from the diff-index is a cheap way to fix that.
Instead of populating the second-level Bloom filter with every key
referenced in every Git reference, consider only those which differ
from what's referenced in the index.
Incidentaly, unlike with its old behavior, staged
modifications/deletion/... will now be detected by 'unused'.
Credits to joeyh for the algorithm. :-)
When quvi is installed, git-annex addurl automatically uses it to detect
when an page is a video, and downloads the video file.
web special remote: Also support using quvi, for getting files,
or checking if files exist in the web.
This commit was sponsored by Mark Hepburn. Thanks!
This is a simple approach for setting up a mirroring repository.
It will work with any type of remotes.
Mirror --from is more expensive than mirror --to in general.
OTOH, mirror --from will get the file from any remote that has it, not only
the named mirror remote. And if the named mirror remote is not the fastest
available remote with a file, that can speed things up.
It would be possible to make the assistant or watch command do a more
dynamic mirroring, that didn't need to scan every time.
Note that --deduplicate currently checksums each file twice,
once to see if it's a known key, and once when importing it.
Perhaps this could be revisited and the extra checksum gotten rid of,
at the cost of not locking down the file when adding it.
Started with a problem when running addurl on a really long url,
because the whole url is munged into the filename. Ended up doing
a fairly extensive review for places where filenames could get too large,
although it's hard to say I'm not missed any..
Backend.Url had a 128 character limit, which is fine when the limit is 255,
but not if it's a lot shorter on some systems. So check the pathconf()
limit. Note that this could result in fromUrl creating different keys
for the same url, if run on systems with different limits. I don't see
this is likely to cause any problems. That can already happen when using
addurl --fast, or if the content of an url changes.
Both Command.AddUrl and Backend.Url assumed that urls don't contain a
lot of multi-byte unicode, and would fail to truncate an url that did
properly.
A few places use a filename as the template to make a temp file.
While that's nice in that the temp file name can be easily related back to
the original filename, it could lead to `git annex add` failing to add a
filename that was at or close to the maximum length.
Note that in Command.Add.lockdown, the template is still derived from the
filename, just with enough space left to turn it into a temp file.
This is an important optimisation, because the assistant may lock down
a bunch of files all at once, and using the same template for all of them
would cause openTempFile to iterate through the same set of names,
looking for an unused temp file. I'm not very happy with the relatedTemplate
hack, but it avoids that slowdown.
Backend.WORM does not limit the filename stored in the key.
I have not tried to change that; so git annex add will fail on really long
filenames when using the WORM backend. It seems better to preserve the
invariant that a WORM key always contains the complete filename, since
the filename is the only unique material in the key, other than mtime and
size. Since nobody has complained about add failing (I think I saw it
once?) on WORM, probably it's ok, or nobody but me uses it.
There may be compatability problems if using git annex addurl --fast
or the WORM backend on a system with the 255 limit and then trying to use
that repo in a system with a smaller limit. I have not tried to deal with
those.
This commit was sponsored by Alexander Brem. Thanks!
When there's no extension, don't use "none", but "".
When there is an extension, it starts with a dot, so don't put a redundant
dot in the default format.
This was the last place in git-annex that could remove data referred to by
the git history, without being forced.
Like drop, dropunused checks remotes, and honors the global annex.numcopies
setting. (However, .gitattributes settings cannot apply to unused files.)
In direct mode, it's best to whenever possible not move direct mode files
out of the way, and so I made unannex avoid touching the direct mode file at
all.
That actually turns out to be easy, because in direct mode, unlike indirect
mode, the pre-commit hook won't get confused if the unannexed file later
gets added back by git add. So there's no need to commit the unannex right
away; it can be staged for the user to commit later. This also means that
unannex in direct mode is a lot faster than in indirect mode!
Another subtle bit is the bookkeeping that is done when unannexing a direct
mode file. The inode cache needs to be removed so that when uninit runs
getKeysPresent, it doesn't see the cache and think the key is still
present and crash when it's not.
This commit is sponsored by Douglas Butts. Thanks!
A common failure mode for direct mode has been for files to end up still
stored in indirect mode. While I hope that doesn't happen anymore, fsck
should deal with it.