The annex object for it may have been modified due to hard link, and
that should be cleaned up when the new version is added. If another
associated file has the old key's content, that's linked into the annex
object. Otherwise, update location log to reflect that content has been
lost.
When a v6 unlocked files is removed from the work tree,
unused doesn't show it. When it gets removed from the index,
unused does show it. This is the same as a locked file.
This only adds 1 stat to each file fscked for locked files, so
added overhead is minimal.
For unlocked files it has to access the database to see if a file
is modified.
If multiple files point to the same annex object, the user may want to
modify them independently, so don't use a hard link.
Also, check diskreserve when copying.
Note that the implementation uses replaceFile, so that the actual
replacement of the work tree file is atomic. This seems a good property to
have!
It would be possible for unlock in v6 mode to be run on files that do not
have their content present. However, that would be a behavior change from
before, and I don't see any immediate need to support it, so I didn't
implement it.
This avoids querying the database when the content file doen't exist
(or otherwise fails the provided check). However, it does add overhead of
querying the database, and will certianly impact performance.
The Keys database can hold multiple inode caches for a given key. One for
the annex object, and one for each pointer file, which may not be hard
linked to it.
Inode caches for a key are recorded when its content is added to the annex,
but only if it has known pointer files. This is to avoid the overhead of
maintaining the database when not needed.
When the smudge filter outputs a file's content, the inode cache is not
updated, because git's smudge interface doesn't let us write the file. So,
dropping will fall back to doing an expensive verification then. Ideally,
git's interface would be improved, and then the inode cache could be
updated then too.
This removes ambiguity, because while someone might have "WORM--foo" in a
file that's not intended to be a git-annex pointer file,
"annex/objects/WORM--foo" is less likely.
Also, 664cc987e8 had a caveat about symlink
targets being parsed as pointer files, and now the same parser is used for
both.
I did not include any hash directories before the key in the pointer file,
as they're not needed. However, if they were included, the parser would
still work ok.
Note that this changes the default behavior of git add in a newly
initialized repository; it will add files to the annex.
Don't like that this could break workflows, but it's necessary in order for
any pointer files in the repo to be handled by git-annex.
Since all places where a repo is used in direct mode need to have git-annex
upgraded before the repo can safely be converted to v6, the upgrade needs
to be manual for now.
I suppose that at some point I'll want to drop all the direct mode support
code. At that point, will stop supporting v5, and will need to auto-upgrade
any remaining v5 repos. If possible, I'd like to carry the direct mode
support for say, a year or so, to give people plenty of time to upgrade and
avoid disruption.
importfeed just calls addurl functions, so inherits this from it.
Note that addurl still generates a temp file, and uses that key to download
the file. It just adds it to the work tree at the end when the file is small.
Implemented with no additional overhead of compares etc.
This is safe to do for presence logs because of their locality of change;
a given repo's presence logs are only ever changed in that repo, or in a
repo that has just been actively changing the content of that repo.
So, we don't need to worry about a split-brain situation where there'd
be disagreement about the location of a key in a repo. And so, it's ok to
not update the timestamp when that's the only change that would be made
due to logging presence info.
* When annex objects are received into git repositories, their checksums are
verified then too.
* To get the old, faster, behavior of not verifying checksums, set
annex.verify=false, or remote.<name>.annex-verify=false.
* setkey, rekey: These commands also now verify that the provided file
matches the key, unless annex.verify=false.
* reinject: Already verified content; this can now be disabled by
setting annex.verify=false.
recvkey and reinject already did verification, so removed now duplicate
code from them. fsck still does its own verification, which is ok since it
does not use getViaTmp, so verification doesn't happen twice when using fsck
--from.
Seems easy, but git ls-files can't list the right subset of files.
So, I wrote a whole new parser for git status output, and converted the
status command to use that.
There are a few other small behavior changes. The order changed. Unlocked
files show as T. In indirect mode, deleted files were not shown before, and
that's fixed. Regular files checked directly into git and modified
were not shown before, and are now.