The annex object for it may have been modified due to hard link, and
that should be cleaned up when the new version is added. If another
associated file has the old key's content, that's linked into the annex
object. Otherwise, update location log to reflect that content has been
lost.
1. git add file
2. git commit
3. modify file
4. git commit
5. git reset HEAD^
Before this fix, that resulted in git saying the file was modified. And
indeed, it didn't have the content it should in the just checked out ref,
because step 3 modified the object file for the old key.
This only adds 1 stat to each file fscked for locked files, so
added overhead is minimal.
For unlocked files it has to access the database to see if a file
is modified.
If multiple files point to the same annex object, the user may want to
modify them independently, so don't use a hard link.
Also, check diskreserve when copying.
Note that the implementation uses replaceFile, so that the actual
replacement of the work tree file is atomic. This seems a good property to
have!
It would be possible for unlock in v6 mode to be run on files that do not
have their content present. However, that would be a behavior change from
before, and I don't see any immediate need to support it, so I didn't
implement it.
Before the smudge filter added a trailing newline, but other things that
wrote formatPointer to a file did not.
also some new pointer staging code to use later
The Keys database can hold multiple inode caches for a given key. One for
the annex object, and one for each pointer file, which may not be hard
linked to it.
Inode caches for a key are recorded when its content is added to the annex,
but only if it has known pointer files. This is to avoid the overhead of
maintaining the database when not needed.
When the smudge filter outputs a file's content, the inode cache is not
updated, because git's smudge interface doesn't let us write the file. So,
dropping will fall back to doing an expensive verification then. Ideally,
git's interface would be improved, and then the inode cache could be
updated then too.
Renamed the db to keys, since it is various info about a Keys.
Dropping a key will update its pointer files, as long as their content can
be verified to be unmodified. This falls back to checksum verification, but
I want it to use an InodeCache of the key, for speed. But, I have not made
anything populate that cache yet.
This removes ambiguity, because while someone might have "WORM--foo" in a
file that's not intended to be a git-annex pointer file,
"annex/objects/WORM--foo" is less likely.
Also, 664cc987e8 had a caveat about symlink
targets being parsed as pointer files, and now the same parser is used for
both.
I did not include any hash directories before the key in the pointer file,
as they're not needed. However, if they were included, the parser would
still work ok.
Backend.lookupFile is changed to always fall back to catKey when
operating on a file that's not a symlink.
catKey is changed to understand pointer files, as well as annex symlinks.
Before, catKey needed a file mode witness, to be sure it was looking at a
symlink. That was complicated stuff. Now, it doesn't actually care if a
file in git is a symlink or not; in either case asking git for the content
of the file will get the pointer to the key.
This does mean that git-annex will treat a link
foo -> WORM--bar as a git-annex file, and also treats
a regular file containing annex/objects/WORM--bar as a git-annex file.
Calling catKey could make git-annex commands need to do more work than
before. This would especially be the case if a repo contained many regular
files, and only a few annexed files, as now git-annex will need to ask
git about the contents of the regular files.
Since all places where a repo is used in direct mode need to have git-annex
upgraded before the repo can safely be converted to v6, the upgrade needs
to be manual for now.
I suppose that at some point I'll want to drop all the direct mode support
code. At that point, will stop supporting v5, and will need to auto-upgrade
any remaining v5 repos. If possible, I'd like to carry the direct mode
support for say, a year or so, to give people plenty of time to upgrade and
avoid disruption.
The git filter config can be used to map the single git-annex command to
the 2 actions, and this avoids "git annex clean" being used for this thing,
it might have a better use for that name later.
importfeed just calls addurl functions, so inherits this from it.
Note that addurl still generates a temp file, and uses that key to download
the file. It just adds it to the work tree at the end when the file is small.
Commands that want to use it have to run their seek action inside
allowConcurrentOutput. Which seems reasonable; perhaps some future command
will want to support the -J flag but not use regions.
The region state moved from Annex to MessageState. This makes sense
organizationally, and note that some uses of onLocal use a different Annex
state, but pass the MessageState into it, which is what is needed.
sideAction is for things not generally related to the current action being
performed. And, it adds a newline after the side action. This was not the
right thing to use for stuff like "checksum", where doing a checksum is
part of the git annex get process, and indeed we want it to display
"(checksum...) ok"
There should be no behavior changes in this commit, it just adds a more
expressive data type and adjusts code that had been passing around a [UUID]
or sometimes a Maybe Remote to instead use [VerifiedCopy].
Although, since some functions were taking two different [UUID] lists,
there's some potential for me to have gotten it horribly wrong.
Also, rename lockContent to lockContentExclusive
inAnnexSafe should perhaps be eliminated, and instead use
`lockContentShared inAnnex`. However, I'm waiting on that, as there are
only 2 call sites for inAnnexSafe and it's fiddly.
In c6632ee5c8, it actually only handled
uploading objects to a shared repository. To avoid verification when
downloading objects from a shared repository, was a lot harder.
On the plus side, if the process of downloading a file from a remote
is able to verify its content on the side, the remote can indicate this
now, and avoid the extra post-download verification.
As of yet, I don't have any remotes (except Git) using this ability.
Some more work would be needed to support it in special remotes.
It would make sense for tahoe to implicitly verify things downloaded from it;
as long as you trust your tahoe server (which typically runs locally),
there's cryptographic integrity. OTOH, despite bup being based on shas,
a bup repo under an attacker's control could have the git ref used for an
object changed, and so a bup repo shouldn't implicitly verify. Indeed,
tahoe seems unique in being trustworthy enough to implicitly verify.
* When annex objects are received into git repositories, their checksums are
verified then too.
* To get the old, faster, behavior of not verifying checksums, set
annex.verify=false, or remote.<name>.annex-verify=false.
* setkey, rekey: These commands also now verify that the provided file
matches the key, unless annex.verify=false.
* reinject: Already verified content; this can now be disabled by
setting annex.verify=false.
recvkey and reinject already did verification, so removed now duplicate
code from them. fsck still does its own verification, which is ok since it
does not use getViaTmp, so verification doesn't happen twice when using fsck
--from.
Seems easy, but git ls-files can't list the right subset of files.
So, I wrote a whole new parser for git status output, and converted the
status command to use that.
There are a few other small behavior changes. The order changed. Unlocked
files show as T. In indirect mode, deleted files were not shown before, and
that's fixed. Regular files checked directly into git and modified
were not shown before, and are now.
Fix typo in commit 160d4b9 ("convert Unused, and remove some dead code
for old style option parsing", 2015-07-10), the "git-annex unused
--used-refspec" option was incorrectly changed to --unused-refspec.
Ben Boeckel had a patch, but..
Actually, that was not the only place that used ScheduleIncremental when
built w/o database. Since the data type doesn't need database stuff,
I've instead fixed this build problem by exposing the
ScheduleIncremental constructor to database-less builds.
Note that I had one in Annex.Action.startup too, but it resulted in a weird
message printed by ssh, "channel 2: bad ext data". I don't know why, but
it only happened when transferinfo was run, so I wonder
if 983a95f021 introduced a fragility somehow.
This was potentially a hole in the readonly mode armor even before my last
commit. If the user could push a git-annex branch to a repo, they could get
git-annex-shell to initialize the repo. After my last commit, the user
didn't even need to be allowed to push a branch to init the repo, so
this hole certianly needs to be closed now.
Now it suffices to run git remote add, followed by git-annex sync. Now the
remote is automatically initialized for use by git-annex, where before the
git-annex branch had to manually be pushed before using git-annex sync.
Note that this involved changes to git-annex-shell, so if the remote is
using an old version, the manual push is still needed.
Implementation required git-annex-shell be changed, so configlist can
autoinit a repository even when no git-annex branch has been pushed yet.
Unfortunate because we'll have to wait for it to get deployed to servers
before being able to rely on this change in the documentation.
Did consider making git-annex sync push the git-annex branch to repos that
didn't have a uuid, but this seemed difficult to do without complicating it
in messy ways.
It would be cleaner to split a command out from configlist to handle
the initialization. But this is difficult without sacrificing backwards
compatability, for users of old git-annex versions which would not use the
new command.
Git.Ref.headSha doesn't really work in direct mode as there's not a head,
so it was actually diffing against the empty tree and so not removing any
deleted files. Get the sha of the current branch instead, which is the same
thing Command.Sync does.
* proxy: Fix proxy git commit of non-annexed files in direct mode.
* proxy: If a non-proxied git command, such as git revert
would normally fail because of unstaged files in the work tree,
make the proxied command fail the same way.
* Perform a clean shutdown when --time-limit is reached.
This includes running queued git commands, and cleanup actions normally
run when a command is finished.
* fsck: Commit incremental fsck database when --time-limit is reached.
Previously, some of the last files fscked did not make it into the
database when using --time-limit.
Note that this changes Annex.addCleanup hooks, to run after --time-limit
expires. Fsck was using such a hook to clean up after a
--incremental-schedule, and that shouldn't run when --time-limit exipires
it. So, instead, moved that cleanup code to be run by cleanupIncremental.
Resulted in some data type juggling.
I've seen rss feeds that have no permalinks, only guids (which are
sometimes in the form of permalinks, argh/sigh).
I had previously avoided trusting guids to be globally unique, because my
survey of rss feeds that I subscribe to shows a lot of pretty bad
"guids" like "2 at http://serialpodcast.org" or even worse "oth20150401-hq".
Worry was that two podcasts that are generating guids so badly, that
there's no guarantee they're actually globally unique.
But, I'm seeing too many url changes that result in redundant files, so
let's try this. If feeds are so broken that guids overlap, they could just
as well incorrectly call them permalinks too.