Using fail here causes a "user error" exception to be thrown, which implies
the user is at fault in its wording, which is incorrect.
Also audited for other uses of fail in git-annex; the others are in monadic
contexts where fail may not throw an exception, and involve user input, so
kept them as-is.
The repo path is typically relative, not absolute, so
providing it to absPathFrom doesn't yield an absolute path.
This is not a bug, just unclear documentation.
Indeed, there seem to be no reason to simplifyPath here, which absPathFrom
does, so instead just combine the repo path and the TopFilePath.
Also, removed an export of the TopFilePath constructor; asTopFilePath
is provided to construct one as-is.
Several tricky parts:
* When the conflict is just between the same key being locked and unlocked,
the unlocked version wins, and the file is not renamed in this case.
* Need to update associated file map when conflict resolution renames
an unlocked file.
* git merge runs the smudge filter on the conflicting file, and actually
overwrites the file with the same content it had before, and so
invalidates its inode cache. This makes it difficult to know when it's
safe to remove such files as conflict cruft, without going so far as to
compare their entire contents.
Dealt with this by preventing the smudge filter from populating the file
when a merge is run. However, that also prevents the smudge filter being
run for non-conflicting files, so eg moving a file won't put its new
content into place.
* Ideally, if a merge or a merge conflict resolution renames an unlocked
file, the file in the work tree can just be moved, rather than copying
the content to a new worktree file.
This is attempted to be done in merge conflict resolution, but
due to git merge's behavior of running smudge filters, what actually
seems to happen is the old worktree file with the content is deleted and
rewritten as a pointer file, so doesn't get reused.
So, this is probably not as efficient as it optimally could be.
If that becomes a problem, could look into running the merge in a separate
worktree and updating the real worktree more efficiently, similarly to the
direct mode merge. However, the direct mode merge had a lot of bugs, and
I'd rather not use that more error-prone method unless really needed.
Writes are optimised by queueing up multiple writes when possible.
The queue is flushed after the Annex monad action finishes. That makes it
happen on program termination, and also whenever a nested Annex monad action
finishes.
Reads are optimised by checking once (per AnnexState) if the database
exists. If the database doesn't exist yet, all reads return mempty.
Reads also cause queued writes to be flushed, so reads will always be
consistent with writes (as long as they're made inside the same Annex monad).
A future optimisation path would be to determine when that's not necessary,
which is probably most of the time, and avoid flushing unncessarily.
Design notes for this commit:
- separate reads from writes
- reuse a handle which is left open until program
exit or until the MVar goes out of scope (and autoclosed then)
- writes are queued
- queue is flushed periodically
- immediate queue flush before any read
- auto-flush queue when database handle is garbage collected
- flush queue on exit from Annex monad
(Note that this may happen repeatedly for a single database connection;
or a connection may be reused for multiple Annex monad actions,
possibly even concurrent ones.)
- if database does not exist (or is empty) the handle
is not opened by reads; reads instead return empty results
- writes open the handle if it was not open previously
Caused by AMP.. Since I've finally upgraded my dev laptop to 7.10,
I may start missing imports that are not needed with it but are with older
versions..
http://bugs.debian.org/807341
* Fix insecure temporary permissions when git-annex repair is used in
in a corrupted git repository.
Other calls to withTmpDir didn't leak any potentially private data,
but repair clones the git repository to a temp directory which is made
using the user's umask. Thus, it might expose a git repo that is
otherwise locked down.
* Fix potential denial of service attack when creating temp dirs.
Since withTmpDir used easily predictable temporary directory names,
an attacker could create foo.0, foo.1, etc and as long as it managed to
keep ahead of it, could prevent it from ever returning.
I'd rate this as a low utility DOS attack. Most attackers in a position
to do this could just fill up the disk /tmp is on to prevent anything
from writing temp files. And few parts of git-annex use withTmpDir
anyway, so DOS potential is quite low.
Examined all callers of withTmpDir and satisfied myself that
switching to mkdtmp and so getting a mode 700 temp dir wouldn't break any
of them.
Note that withTmpDirIn continues to not force temp dir to 700.
But it's only used for temp directories inside .git/annex/wherever/
so that is not a problem.
Also re-audited all other uses of temp files and dirs in git-annex.
The Keys database can hold multiple inode caches for a given key. One for
the annex object, and one for each pointer file, which may not be hard
linked to it.
Inode caches for a key are recorded when its content is added to the annex,
but only if it has known pointer files. This is to avoid the overhead of
maintaining the database when not needed.
When the smudge filter outputs a file's content, the inode cache is not
updated, because git's smudge interface doesn't let us write the file. So,
dropping will fall back to doing an expensive verification then. Ideally,
git's interface would be improved, and then the inode cache could be
updated then too.
Had everything available, just didn't combine the progress meter with the
other places progress is sent to update it. (And to a remote repo already
did show progress.)
Most special remotes should already display progress meters with -J,
same as without it. One exception to this is the web, since it relies on
wget/curl progress display without -J. Still todo..
There's a potential race, but it's detected and just results in the other
process failing to take the side lock, so possibly retrying one second
later on. The race window is quite narrow so the extra delay is minor.
Left the side lock files mode 666 because an interruption can leave a side
lock file created by another user for a shared repository. When this
happens, the non-owning user can't delete it (+t) but can still lock it,
and so the code falls back to acting as it did before this commit.
This is less portable, since currently sidelocks rely on /dev/shm.
But, I've seen crazy lustre inconsistencies that make me not trust the
link() method at all, so what can you do.
I have a strace taken on a lustre filesystem on which link() returned 0,
but didn't actually succeed, since the file already existed.
One of the linux man pages recommended using link followed by checking like
this. I was reading it yesterday, but cannot find it now.
Fixes a recent-ish build warning on about 64 bit vs non.
This is the method used by the disk-free-space library, and I tested it to
yield the same results on even 10 tb drives on OSX -- so it's getting 64
bit values.