The ssh setup first runs ssh to the real hostname, to probe if a ssh key is
needed. If one is, it generates a mangled hostname that uses a key. This
mangled hostname was being used to ssh into the server to set up the key.
But if the server already had the key set up, and it was locked down, the
setup would fail. This changes it to use the real hostname when sshing in
to set up the key, which avoids the problem.
Note that it will redundantly set up the key on the ssh server. But it's
the same key; the ssh key generation code uses the key if it already
exists.
A common failure mode for direct mode has been for files to end up still
stored in indirect mode. While I hope that doesn't happen anymore, fsck
should deal with it.
This is a compromise. I would like to nice every thread except for the
webapp thread, but it's not practical to do so. That would need every
thread to run as a bound thread, which could add significant overhead.
And any forkIO would escape the nice level.
Better to have a working test suite that doesn't test a few things
than no working test suite.
Most of the disabled stuff is because for some reason "git annex sync"
doesn't work when run inside the test suite. Looks like PATH problems.
The directory and rsync special remotes seem broken on Windows, or
maybe the tests are. Pretty sure the hook special remote test is broken.
Yeah, that didn't actually work. Got error messages like it couldn't read
from the control socket, so probably ssh doesn't really support that on
Windows, at least the cygwin ssh build I'm using.
This write permission frobbing is very appropriate in indirect mode,
since annexed objects are stored as immutably as can be managed. But not
in direct mode, where files should be able to be modified at any time.
There are already sufficient guards that there's no need to prevent a file
being written to while it's being ingested, in direct mode. The inode cache
will detect (most) types of modifications, and the add will fail. Then a
re-add should be done. The assistant should get another inotify change
event, and automatically add the new version of the file.
Fuzz tests have shown that git cat-file --batch sometimes stops running.
It's not yet known why (no error message; repo seems ok). But this is
something we can deal with in the CoProcess framework, since all 3 types of
long-running git processes should be restartable if they fail.
Note that, as implemented, only IO errors are caught. So an error thrown
by the reveiver, when it sees something that is not valid output from
git cat-file (etc) will not cause a restart. I don't want it to retry
if git commands change their output or are just outputting garbage.
This does mean that if the command did a partial output and crashed in the
middle, it would still not be restarted.
There is currently no guard against restarting a command repeatedly, if,
for example, it crashes repeatedly on startup.
The checkpresent hook can return either True or, False, or fail with a message
if it cannot successfully check the remote. Currently for glacier, when
--trust-glacier is not set, it always returns False. Crucially, in the case
when a file is in glacier, this is telling git-annex it's not there, so copy
re-uploads it. This is not desirable; it breaks using glacier-cli to retreive
that file later, and it wastes money/bandwidth.
What if it instead, when the glacier inventory is missing a
file, it returns False. And when the glacier inventory has a file, unless
--trust-glacier is set, it *fails*.
The result would be:
* `git annex copy --to glacier` would only send things not listed in inventory. If a file is listed in the inventory, `copy`
would complain that --trust-glacier` is not set, and not re-upload the file.
* `git annex drop` would only trust that glacier has a file when --trust-glacier is set. Behavior unchanged.
* `git annex move --to glacier`, when the file is not listed in inventory, would send the file, and delete it locally. Behavior unchanged.
* `git annex move --to glacier`, when the file is listed in inventory, would only trust that glacier has the file when --trust-glacier is set
* `git annex copy --from glacier` / `git annex get`, when the file is located in glacier, would trust the location log, and attempt to get the file from glacier.
Ie, when there'a a conflicted merge we may get foo.variant-xxxx
created in a merge. If a second merge conflict occurs on that new file,
it was not falling back to putting in the whole key (which should stop
the merge conflicts happening for good, but is ugly).
The current manual mode preferred content expression is:
"present and (((exclude=*/archive/* and exclude=archive/*) or (not (copies=archive:1 or copies=smallarchive:1))) or (not copies=semitrusted+:1))"
The old matcher misparsed this, to basically:
OR (present and (...)) (not copies=semitrusted+:1))
The paren handling and indeed the whole conversion from tokens to the
matcher was just wrong. The new way may not be the cleverest, but I think
it is correct, and you can see how it pattern matches structurally against
the expressions when parsing them.
That expression is now parsed to:
MAnd (MOp <function>)
(MOr (MOr (MAnd (MOp <function>) (MOp <function>)) (MNot (MOr (MOp <function>) (MOp <function>))))
(MNot (MOp <function>)))
Which appears correct, and behaves correct in testing.
Also threw in a simplifier, so the final generated Matcher has less
unnecessary clutter in it. Mostly so that I could more easily read &
confirm them.
Also, added a simple test of the Matcher to the test suite.
There is a small chance of badly formed preferred content expressions
behaving differently than before due to this rewrite.
I noticed that when my modem hung up and redialed, my xmpp client was left
sending messages into the void. This will also handle any idle
disconnection issues.
This bug was turned up by the test suite, running fsck in direct mode.
A repository was cloned, was put into direct mode, was fscked, and fsck
incorrectly said that no copy existed of a file, that was actually present
in origin.
This turned out to occur because fsck first did a Annex.Branch.change,
recording that it did not locally have the file. That was recorded in the
journal. Since neither the git annex direct not the fsck had yet needed to
read any info from the branch, but had only made changes to it, the
origin/git-annex branch was not yet merged in. So the journal got a
location log entry written to it, but this did not include
the location log info for the origin. When fsck then did a
Annex.Branch.get, it trusted the journal was cosnsitent, and returned it,
again w/o merging from origin/git-annex. This latter behavior is the
actual bug.
Refer to commit e9bfa8eaed for the thinking
behind it being ok to make a change to a file on the branch, without
first merging the branch. That thinking still stands. However, it means
that files in the journal cannot be trusted to be consistent if the branch
has not been merged. So, to fix, just enure the branch gets merged, even
when reading from the journal.
In tests, this does not seem to cause any extra merging. Except, of course,
in the one case described above. But git annex add, etc, are able to make
changes w/o first merging the branch.
The bug was in movein, which just replaceFile'd the file with a symlink,
even if it already had the desired content, before trying to pull the
content out of the annex and replace the symlink with it.
That was ok-ish for non conflicted merges, where if the file existed it would
be an old version of the content. But for conflicted merges, the automatic
merge resolver has already run, and will have already put the desired
content into the file for the local variant.
Also, made removeDirect not trust that the associated files map is correct.
Only if it can verify that another file has the content will it not move it
into .git/annex/objects.
As seen in this bug report, the lifted exception handling using the StateT
monad throws away state changes when an action throws an exception.
http://git-annex.branchable.com/bugs/git_annex_fork_bombs_on_gpg_file/
.. Which can result in cached values being redundantly calculated, or other
possibly worse bugs when the annex state gets out of sync with reality.
This switches from a StateT AnnexState to a ReaderT (MVar AnnexState).
All changes to the state go via the MVar. So when an Annex action is
running inside an exception handler, and it makes some changes, they
immediately go into affect in the MVar. If it then throws an exception
(or even crashes its thread!), the state changes are still in effect.
The MonadCatchIO-transformers change is actually only incidental.
I could have kept on using lifted-base for the exception handling.
However, I'd have needed to write a new instance of MonadBaseControl
for the new monad.. and I didn't write the old instance.. I begged Bas
and he kindly sent it to me. Happily, MonadCatchIO-transformers is
able to derive a MonadCatchIO instance for my monad.
This is a deep level change. It passes the test suite! What could it break?
Well.. The most likely breakage would be to code that runs an Annex action
in an exception handler, and *wants* state changes to be thrown away.
Perhaps the state changes leaves the state inconsistent, or wrong. Since
there are relatively few places in git-annex that catch exceptions in the
Annex monad, and the AnnexState is generally just used to cache calculated
data, this is unlikely to be a problem.
Oh yeah, this change also makes Assistant.Types.ThreadedMonad a bit
redundant. It's now entirely possible to run concurrent Annex actions in
different threads, all sharing access to the same state! The ThreadedMonad
just adds some extra work on top of that, with its own MVar, and avoids
such actions possibly stepping on one-another's toes. I have not gotten
rid of it, but might try that later. Being able to run concurrent Annex
actions would simplify parts of the Assistant code.
Run the same code git-annex used to get the sha, including its sanity
checking. Much better than old grep. Should detect FreeBSD systems with
sha commands that output in stange format.
This after fielding a bug where git-annex was built with a sha256 program
whose output checked out, but was then run with one that output lines
like:
SHA256 (file) = <sha here>
Which it then parsed as having a SHA256 of "SHA256"!
Now the output of the command is required to be of the right length,
and contain only the right characters.
It's possible for files in indirect mode to have a direct mode mapping
file. Probably from when they were in direct mode. In this case,
toDirectGen tried to copy the content from the direct mode file that the
mapping said had it. But, being in indirect mode, it didn't really have the
content. So it did nothing. This fix makes it always move the content from
.git/annex/objects/ when it's there.
Observed that the pushed refs were received, but not merged into master.
The merger never saw an add event for these refs. Either git is not writing
to a new file and renaming it into place, or the inotify code didn't notice
that. Changed it to also watch for modify events and that seems to have
fixed it!
Note that the note on df88c51334 turned out
to be wrong. Multiple repository pairing over XMPP does work, because the
annex-uuid of the xmpp remote is updated to the uuid of the new repo
when pairing takes place. So the push from it is accepted. (And the other
UUIDs are listed in uuid.log, so pushes from those repositories also are
accepted of course.)
The root of the problem is that toInodeCache sees a non-symlink, and so
goes on and generates a new inode cache for the dummy symlink.
Any place that toInodeCache, or sameFileStatus, or genInodeCache are called
may need to deal with this case. Although many of them are ok. For example,
prepSendAnnex calls sameInodeCache, which calls genInodeCache.. but if
the file content is not present, the InodeCache generated for its standin
file is appropriately not the same, and so it returns Nothing.
I've audited some, but have to say I'm not happy with this; it should be
handled at the type level somehow, or a toInodeCache wrapper be used that
is aware of dummy symlinks.
(The Watcher already dealt with it, via the guardSymlinkStandin function.)
This is so git remotes on servers without git-annex installed can be used
to keep clients' git repos in sync.
This is a behavior change, but since annex-sync can be set to disable
syncing with a remote, I think it's acceptable.
assistant: Work around horrible, terrible, very bad behavior of
gnome-keyring, by not storing special-purpose ssh keys in ~/.ssh/*.pub.
Apparently gnome-keyring apparently will load and indiscriminately use such
keys in some cases, even if they are not using any of the standard ssh key
names. Instead store the keys in ~/.ssh/annex/, which gnome-keyring will
not check.
Note that neither I nor #debian-devel were able to quite reproduce this
problem, but I believe it exists, and that this fixes it. And it certianly
won't hurt anything..
Most remotes have meters in their implementations of retrieveKeyFile
already. Simply hooking these up to the transfer log makes that information
available. Easy peasy.
This is particularly valuable information for encrypted remotes, which
otherwise bypass the assistant's polling of temp files, and so don't have
good progress bars yet.
Still some work to do here (see progressbars.mdwn changes), but this
is entirely an improvement from the lack of progress bars for encrypted
downloads.
* addurl: Register transfer so the webapp can see it.
* addurl: Automatically retry downloads that fail, as long as some
additional content was downloaded.
Fixed by storing a list of cached inodes for a key, instead of just one.
Backwards compatability note: An old git-annex version will fail to parse
an inode cache file that has been written by a new version, and has
multiple items. It will succees if just one. So old git-annexes will have
even worse behavior when there are duplicated files, if that is possible.
I don't think it will be a problem. (Famous last words.)
Also, note that it doesn't expire old and unused inode caches for a key.
It would be possible to add this if needed; just look through the
associated files for a key and if there are more cached inodes, throw out
any not corresponding to associated files. Unless a file is being copied
repeatedly and the old copy deleted, this lack of expiry should not be a
problem.
* since this is a crippled filesystem anyway, git-annex doesn't use
symlinks on it
* so there's no reason to use the mixed case hash directories that we're
stuck using to avoid breaking everyone's symlinks to the content
* so we can do what is already done for all bare repos, and make non-bare
repos on crippled filesystems use the all-lower case hash directories
* which are, happily, all 3 letters long, so they cannot conflict with
mixed case hash directories
* so I was able to 100% fix this and even resuming `git annex add` in the
test case will recover and it will all just work.
This avoids commit churn by the assistant when eg,
replacing a file with a symlink.
But, just as importantly, it prevents the working tree being left with a
deleted file if git-annex, or perhaps the whole system, crashes at the
wrong time.
(It also probably avoids confusing displays in file managers.)
My test case for this bug is to have the assistant running and syncing to
a remote, and create a file in the annex. Then at the command line run
git annex drop. The assistant sees that the file is gone, sees it's a wanted
file, and downloads it from the remote.
With a directory special remote and a small file, I was seeing around 1
time in 3, a race where the file got unstaged from git after it got
downloaded.
Looking at what direct mode content managing code does in this case, it
deletes the symlink, and then adds the file content back. It would be
possible, sometimes, to avoid removing the symlink and do this atomically.
And I probably should.. but in some cases, particularly where the file
needs to be run through `cp` (multiple direct mode files with same
content), there's no way to atomically replace the symlink with the
content.
Anyway, the bug turns out to be something that the watcher does right for
indirect mode, but not for direct mode. When it got an add event, it
checked to see if this was a new file, or one we've already added. In the
latter case, no add event was queued. But that means that only the rm event
is queued, and so it unstages the file.
Fixed by queueing an add event even when the file is already in git.
Tested by running hundreds of drops in a loop; file remained staged.
I would have sort of liked to put this in .gitattributes, but it seems
it does not support multi-word attribute values. Also, making this a single
config setting makes it easy to only parse the expression once.
A natural next step would be to make the assistant `git add` files that
are not annex.largefiles. OTOH, I don't think `git annex add` should
`git add` such files, because git-annex command line tools are
not in the business of wrapping git command line tools.
There was confusion in different parts of the progress bar code about
whether an update contained the total number of bytes transferred, or the
number of bytes transferred since the last update. One way this bug
showed up was progress bars that seemed to stick at zero for a long time.
In order to fix it comprehensively, I add a new BytesProcessed data type,
that is explicitly a total quantity of bytes, not a delta.
Note that this doesn't necessarily fix every problem with progress bars.
Particularly, buffering can now cause progress bars to seem to run ahead
of transfers, reaching 100% when data is still being uploaded.
When a page is loaded, the javascript requests an notification url, and
does long polling on the url to be informed of changes. But if a change
occured before the notification url was requested, it would not be notified
of that change, and so the page display would not update.
I fixed this by *always* updating the page display after it gets
the notification url. This is extra work, but the overhead is not noticable
in the other overhead of loading a page.
(A nicer way would be to somehow record the version of a page initially
loaded, and then compare it with the current version when getting the
notification url, and only force an update if it's changed. But getting
the "version" of the different parts of the page that use long polling
is difficult.)
Move all the binaries and libraries under a bundle/ subdirectory;
so when it's in PATH only git-annex, runshell, and git-annex-webapp
will be available.
A long time ago I made Remote be an instance of the Ord typeclass, with an
implementation that compared the costs of Remotes. That seemed like a good
idea at the time, as it saved typing.. But at the time I was still making
custom Read and Show instances too. I've since learned that this is *not* a
good idea, and neither is making custom Ord instances, without deep thought
about the possible sets of values in a type. Haskell typeclasses are not a
toy.
This Ord instance came around and bit me when I put Remotes into a Set,
because now remotes with the same cost appeared to be in the Set even if
they were not. Also affected putting Remotes into a Map.
Rarely does a bug go this deep. I've fixed it comprehensively, first
removing the Ord instance entirely, and fixing the places that wanted to
order remotes by cost to do it explicitly. Then adding back an Ord instance
that is much more sane. Also by checking the rest of the Ord instances in
the code base (which were all ok).
While doing that, I found lots of places that kept remotes in Maps and
Sets. All of it was probably subtly broken in one way or another before
this fix, but it would be hard to say exactly how the bugs would
manifest.
This may work around google talk's horrible presence handling, in which
clients often don't learn about other clients, at least when using the same
account. This way, every time we start a git push over xmpp, we'll waste
bandwidth asking clients to please try again to identify themselves.
Just before starting a transfer, do one last check that it's still
preferred content.
I was just doing this for uploads, as part of the smarter flood filling
bug, but realized it's also possible for a download that was preferred
content to change to not be before the download begins, so check that too.
* addurl: Escape invalid characters in urls, rather than failing to
use an invalid url.
* addurl: Properly handle url-escaped characters in file:// urls.
The comments correctly noted that the remote could drop the key and
yet False be returned due to some problem that occurred afterwards.
For example, if it's a network remote, it could drop the key just
as the network goes down, and so things timeout and a nonzero exit
from ssh is propigated through and False returned.
However... Most of the time, this scenario will not have happened.
False will mean the remote was not available or could not drop the key
at all.
So, instead of assuming the worst, just trust the status we have.
If we get it wrong, and the scenario above happened, our location
log will think the remote has the key. But the remote's location
log (assuming it has one) will know it dropped it, and the next sync
will regain consistency.
For a special remote, with no location log, our location log will be wrong,
but this is no different than the situation where someone else dropped
the key from the remote and we've not synced with them. The standard
paranoia about not trusting the location log to be the last word about
whether a remote has a key will save us from these situations. Ie,
if we try to drop the file, we'll actively check the remote,
and determine the inconsistency then.
This cleaned up the code quite a bit; now the committer just looks at the
Change to see if it's a change that needs to have a transfer queued for it.
If I later want to add dropping keys for files that were removed, or
something like that, this should make it straightforward.
This also fixes a bug. In direct mode, moving a file out of an archive
directory failed to start a transfer to get its content. The problem
was that the file had not been committed to git yet, and so the transfer
code didn't want to touch it, since fileKey failed to get its key.
Only starting transfers after a commit avoids this problem.
Noticed that, At startup or network reconnect, git push messages were sent,
often before presence info has been gathered, so were not sent to any
buddies.
To fix this, keep track of which buddies have seen such messages,
and when new presence is received from a buddy that has not yet seen it,
resend.
This is done only for push initiation messages, so very little data needs
to be stored.
Make manualPull send push requests over XMPP.
When reconnecting with remotes, those that are XMPP remotes cannot
immediately be pulled from and scanned, so instead maintain a set of
(probably) desynced remotes, and put XMPP remotes on it. (This set could be
used in other ways later, if we can detect we're out of sync with other
types of remotes.)
The merger handles detecting when a XMPP push is received from a desynced
remote, and triggers a scan then, if they have in fact diverged.
This has one known bug: A single XMPP remote can have multiple clients
behind it. When this happens, only the UUID of one client is recorded
as the UUID of the XMPP remote. Pushes from the other XMPP clients will not
trigger a scan. If the client whose UUID is expected responds to the push
request, it'll work, but when that client is offline, we're SOL.
For example, copy --to such a remote would send the file, but as NoUUID was
its uuid, no location log update was done. And drop --from such a remote
would not do anything, because NoUUID is not listed in the location log..
assistant: Fix bug in direct mode that could occur when a symlink is moved
out of an archive directory, and resulted in the file not being set to
direct mode when it was transferred.
The bug was that the direct mode mapping was not up-to-date when the
transferrer finished. So, finding no direct mode place to store the object,
it was put into .git/annex in indirect mode.
To fix this, just make the watcher update the direct mode mapping to
include the new file before it starts the transfer. (Seems we don't need to
update it to remove the old file if the link was moved, because the direct
mode code will notice it's not present and the mapping gets updated for its
removal later.)
The reason this was a race, and was probably not seen often is because
the committer came along and updated the direct mode mapping as part of
adding the moved symlink. But when the file was sufficiently small or
the remote sufficiently fast, this could happen after the transfer
finished.
Looking through the git sources (documentation is unclear),
it seems commit doesn't ever trigger git-gc, mostly fetching and merging
seems to. I cannot easily override the setting in all those places, so
instead set gc.auto in git config when initializing a repository with
the assistant.
This does mean that the user cannot set gc.auto=0 and completely avoid
repacks, as the assistant does it daily. But, it only does it after there
are 100x the default number of loose objects, so this is probably not going
to be too annoying.
A transfer is queued, but if the file has already been transferred to the
remote before, the transfer is skipped. In this case, it needs to perform
any actions it would normally take after finishing the transfer, like
dropping the local object.
This cannot completely guard against a runaway log event, and only runs
every hour anyway, but it should avoid most problems with very
long-running, active assistants using up too much space.
The transfer queue can grow larger than 10 when queueing transfers for
files that were just received, as well as requeueing failed transfers.
I probably need to do some work to prevent that, as it could use a lot of
RAM. But for now, cap the number of displayed transfers in the webapp, to
avoid flooding the browser.
I have seen some other programs do this, and think it's pretty cool. Means
you can test wherever it's deployed, as well as at build time.
My other reason for doing it is less happy. Cabal's handling of test suites
sucks, requiring duplicated info, and even when that's done, it fails to
preprocess hsc files here. Building it in avoids that and avoids having
to explicitly tell cabal to enable test suites, which would then make it
link the test executable every time, which is unnecessarily slow.
This also has the benefit that now "make fast test" does a max speed build
and tests it.
The only thing lost is ./ghci
Speed: make fast used to take 20 seconds here, when rebuilding from
touching Command/Unused.hs. With cabal, it's 29 seconds.
Two fixes. First, and most importantly, relax the isLinkToAnnex check
to only look for /annex/objects/, not [^|/].git/annex/objects. If
GIT_DIR is used with a detached work tree, the git directory is
not necessarily named .git.
There are important caveats with doing that at all, since git-annex will
make symlinks that point at GIT_DIR, which means that the relative path
between GIT_DIR and GIT_WORK_TREE needs to remain stable across all clones
of the repository.
----
The other fix is just fixing crazy and wrong code that, when GIT_DIR is
set, expects to still find a git repository in the path below the work
tree, and uses some of its configuration, and some of GIT_DIR. What was I
thinking, and why can't I seem to get this code right?
In general, git-annex does not try to preserve file permissions. For
example, they don't round trip through special remotes. So it's ok to not
preserve them for git remotes either.
On crippled filesystems, rsync has been observed failing after the file
was transferred because it couldn't set some permission or other.
Adding a file that is already annexed, but has been modified, was broken in
direct mode.
This fix makes the new content be added. It does have the problem that
re-running `git annex add` will checksum and re-add the content repeatedly,
until it's committed. This happens because the key associated with the file
does not change until the new one gets committed, so it keeps thinking the
file has changed.
Refactored annex link code into nice clean new library.
Audited and dealt with calls to createSymbolicLink.
Remaining calls are all safe, because:
Annex/Link.hs: ( liftIO $ createSymbolicLink linktarget file
only when core.symlinks=true
Assistant/WebApp/Configurators/Local.hs: createSymbolicLink link link
test if symlinks can be made
Command/Fix.hs: liftIO $ createSymbolicLink link file
command only works in indirect mode
Command/FromKey.hs: liftIO $ createSymbolicLink link file
command only works in indirect mode
Command/Indirect.hs: liftIO $ createSymbolicLink l f
refuses to run if core.symlinks=false
Init.hs: createSymbolicLink f f2
test if symlinks can be made
Remote/Directory.hs: go [file] = catchBoolIO $ createSymbolicLink file f >> return True
fast key linking; catches failure to make symlink and falls back to copy
Remote/Git.hs: liftIO $ catchBoolIO $ createSymbolicLink loc file >> return True
ditto
Upgrade/V1.hs: liftIO $ createSymbolicLink link f
v1 repos could not be on a filesystem w/o symlinks
Audited and dealt with calls to readSymbolicLink.
Remaining calls are all safe, because:
Annex/Link.hs: ( liftIO $ catchMaybeIO $ readSymbolicLink file
only when core.symlinks=true
Assistant/Threads/Watcher.hs: ifM ((==) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file))
code that fixes real symlinks when inotify sees them
It's ok to not fix psdueo-symlinks.
Assistant/Threads/Watcher.hs: mlink <- liftIO (catchMaybeIO $ readSymbolicLink file)
ditto
Command/Fix.hs: stopUnless ((/=) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file)) $ do
command only works in indirect mode
Upgrade/V1.hs: getsymlink = takeFileName <$> readSymbolicLink file
v1 repos could not be on a filesystem w/o symlinks
Audited and dealt with calls to isSymbolicLink.
(Typically used with getSymbolicLinkStatus, but that is just used because
getFileStatus is not as robust; it also works on pseudolinks.)
Remaining calls are all safe, because:
Assistant/Threads/SanityChecker.hs: | isSymbolicLink s -> addsymlink file ms
only handles staging of symlinks that were somehow not staged
(might need to be updated to support pseudolinks, but this is
only a belt-and-suspenders check anyway, and I've never seen the code run)
Command/Add.hs: if isSymbolicLink s || not (isRegularFile s)
avoids adding symlinks to the annex, so not relevant
Command/Indirect.hs: | isSymbolicLink s -> void $ flip whenAnnexed f $
only allowed on systems that support symlinks
Command/Indirect.hs: whenM (liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f) $ do
ditto
Seek.hs:notSymlink f = liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f
used to find unlocked files, only relevant in indirect mode
Utility/FSEvents.hs: | Files.isSymbolicLink s = runhook addSymlinkHook $ Just s
Utility/FSEvents.hs: | Files.isSymbolicLink s ->
Utility/INotify.hs: | Files.isSymbolicLink s ->
Utility/INotify.hs: checkfiletype Files.isSymbolicLink addSymlinkHook f
Utility/Kqueue.hs: | Files.isSymbolicLink s = callhook addSymlinkHook (Just s) change
all above are lower-level, not relevant
Audited and dealt with calls to isSymLink.
Remaining calls are all safe, because:
Annex/Direct.hs: | isSymLink (getmode item) =
This is looking at git diff-tree objects, not files on disk
Command/Unused.hs: | isSymLink (LsTree.mode l) = do
This is looking at git ls-tree, not file on disk
Utility/FileMode.hs:isSymLink :: FileMode -> Bool
Utility/FileMode.hs:isSymLink = checkMode symbolicLinkMode
low-level
Done!!
In indirect mode, now checks the inode cache to detect changes to a file.
Note that a file can still be changed if a process has it open for write,
after landing in the annex.
In direct mode, some checking of the inode cache was done before, but
from a much later point, so fewer modifications could be detected. Now it's
as good as indirect mode.
On crippled filesystems, no lock down is done before starting to add a
file, so checking the inode cache is the only protection we have.
git annex init probes for crippled filesystems, and sets direct mode, as
well as `annex.crippledfilesystem`.
Avoid manipulating permissions of files on crippled filesystems.
That would likely cause an exception to be thrown.
Very basic support in Command.Add for cripped filesystems; avoids the lock
down entirely since doing it needs both permissions and hard links.
Will make this better soon.
Been meaning to do this for some time; Android port was last straw.
Note that newer versions of the uuid library have a Data.UUID.V4 that
generates random UUIDs slightly more cleanly, but Debian has an old version
of the library, so I do it slightly round-about.
These files were left behind, and made getKeysPresent find keys that were
not present. It would be expensive to make getKeysPresent check that the
actual key files are present (it just lists the directories). But that's not
needed if we just clean up the stale cache and mapping files.
To handle systems that were in direct mode and got switched back with stale
direct mode files, made cleanObjectLoc remove all files in the key's directory.
git annex unused will still list keys that are gone but for which the stale
direct mode files exists. To deal with that, made dropunused remove the key's
directory even if the key does not seem to be present.
Making the pre-commit hook look at git diff-index to find changed direct
mode files and update the mappings works pretty well.
One case where it does not work is when a file is git annex added, and then
git rmed, and then this is committed. That's a no-op commit, so the hook
probably doesn't even run, and it certianly never notices that the file
was deleted, so the mapping will still have the original filename in it.
For this and other reasons, it's important that the mappings still be
treated as possibly inconsistent.
Also, the assistant now allows the pre-commit hook to run when in direct
mode, so the mappings also get updated there.
The most common way for a mapping to be stale is when a file was deleted,
or renamed. Nothing updates the mappings for deletions yet.
But they can also become stale in other ways. For example a file can
be modified.
So, the mapping is not trusted to be consistent. When we get a key,
only replace symlinks that still point to that key with its content.
When we drop a key, only put back symlinks for files that still have
the direct mode content.
New setting, can be used to disable autocommit of changed files by the
assistant, while it still does data syncing and other tasks.
Also wired into webapp UI
Union merges involving two or more repositories could sometimes result in
data from one repository getting lost. This could result in the location
log data becoming wrong, and fsck being needed to fix it.
NB: I audited for any other occurrences of this problem. There are other
places than union merge where multiple changes are fed into update-index
in a stream, but they all involve working copy files being staged, or their
deletion being staged, and in this case it's fine for the later changes
to override the earlier ones.