Fixed by storing a list of cached inodes for a key, instead of just one.
Backwards compatability note: An old git-annex version will fail to parse
an inode cache file that has been written by a new version, and has
multiple items. It will succees if just one. So old git-annexes will have
even worse behavior when there are duplicated files, if that is possible.
I don't think it will be a problem. (Famous last words.)
Also, note that it doesn't expire old and unused inode caches for a key.
It would be possible to add this if needed; just look through the
associated files for a key and if there are more cached inodes, throw out
any not corresponding to associated files. Unless a file is being copied
repeatedly and the old copy deleted, this lack of expiry should not be a
problem.
* since this is a crippled filesystem anyway, git-annex doesn't use
symlinks on it
* so there's no reason to use the mixed case hash directories that we're
stuck using to avoid breaking everyone's symlinks to the content
* so we can do what is already done for all bare repos, and make non-bare
repos on crippled filesystems use the all-lower case hash directories
* which are, happily, all 3 letters long, so they cannot conflict with
mixed case hash directories
* so I was able to 100% fix this and even resuming `git annex add` in the
test case will recover and it will all just work.
This avoids commit churn by the assistant when eg,
replacing a file with a symlink.
But, just as importantly, it prevents the working tree being left with a
deleted file if git-annex, or perhaps the whole system, crashes at the
wrong time.
(It also probably avoids confusing displays in file managers.)
My test case for this bug is to have the assistant running and syncing to
a remote, and create a file in the annex. Then at the command line run
git annex drop. The assistant sees that the file is gone, sees it's a wanted
file, and downloads it from the remote.
With a directory special remote and a small file, I was seeing around 1
time in 3, a race where the file got unstaged from git after it got
downloaded.
Looking at what direct mode content managing code does in this case, it
deletes the symlink, and then adds the file content back. It would be
possible, sometimes, to avoid removing the symlink and do this atomically.
And I probably should.. but in some cases, particularly where the file
needs to be run through `cp` (multiple direct mode files with same
content), there's no way to atomically replace the symlink with the
content.
Anyway, the bug turns out to be something that the watcher does right for
indirect mode, but not for direct mode. When it got an add event, it
checked to see if this was a new file, or one we've already added. In the
latter case, no add event was queued. But that means that only the rm event
is queued, and so it unstages the file.
Fixed by queueing an add event even when the file is already in git.
Tested by running hundreds of drops in a loop; file remained staged.
I would have sort of liked to put this in .gitattributes, but it seems
it does not support multi-word attribute values. Also, making this a single
config setting makes it easy to only parse the expression once.
A natural next step would be to make the assistant `git add` files that
are not annex.largefiles. OTOH, I don't think `git annex add` should
`git add` such files, because git-annex command line tools are
not in the business of wrapping git command line tools.
There was confusion in different parts of the progress bar code about
whether an update contained the total number of bytes transferred, or the
number of bytes transferred since the last update. One way this bug
showed up was progress bars that seemed to stick at zero for a long time.
In order to fix it comprehensively, I add a new BytesProcessed data type,
that is explicitly a total quantity of bytes, not a delta.
Note that this doesn't necessarily fix every problem with progress bars.
Particularly, buffering can now cause progress bars to seem to run ahead
of transfers, reaching 100% when data is still being uploaded.
When a page is loaded, the javascript requests an notification url, and
does long polling on the url to be informed of changes. But if a change
occured before the notification url was requested, it would not be notified
of that change, and so the page display would not update.
I fixed this by *always* updating the page display after it gets
the notification url. This is extra work, but the overhead is not noticable
in the other overhead of loading a page.
(A nicer way would be to somehow record the version of a page initially
loaded, and then compare it with the current version when getting the
notification url, and only force an update if it's changed. But getting
the "version" of the different parts of the page that use long polling
is difficult.)
Move all the binaries and libraries under a bundle/ subdirectory;
so when it's in PATH only git-annex, runshell, and git-annex-webapp
will be available.
A long time ago I made Remote be an instance of the Ord typeclass, with an
implementation that compared the costs of Remotes. That seemed like a good
idea at the time, as it saved typing.. But at the time I was still making
custom Read and Show instances too. I've since learned that this is *not* a
good idea, and neither is making custom Ord instances, without deep thought
about the possible sets of values in a type. Haskell typeclasses are not a
toy.
This Ord instance came around and bit me when I put Remotes into a Set,
because now remotes with the same cost appeared to be in the Set even if
they were not. Also affected putting Remotes into a Map.
Rarely does a bug go this deep. I've fixed it comprehensively, first
removing the Ord instance entirely, and fixing the places that wanted to
order remotes by cost to do it explicitly. Then adding back an Ord instance
that is much more sane. Also by checking the rest of the Ord instances in
the code base (which were all ok).
While doing that, I found lots of places that kept remotes in Maps and
Sets. All of it was probably subtly broken in one way or another before
this fix, but it would be hard to say exactly how the bugs would
manifest.
This may work around google talk's horrible presence handling, in which
clients often don't learn about other clients, at least when using the same
account. This way, every time we start a git push over xmpp, we'll waste
bandwidth asking clients to please try again to identify themselves.
Just before starting a transfer, do one last check that it's still
preferred content.
I was just doing this for uploads, as part of the smarter flood filling
bug, but realized it's also possible for a download that was preferred
content to change to not be before the download begins, so check that too.
* addurl: Escape invalid characters in urls, rather than failing to
use an invalid url.
* addurl: Properly handle url-escaped characters in file:// urls.
The comments correctly noted that the remote could drop the key and
yet False be returned due to some problem that occurred afterwards.
For example, if it's a network remote, it could drop the key just
as the network goes down, and so things timeout and a nonzero exit
from ssh is propigated through and False returned.
However... Most of the time, this scenario will not have happened.
False will mean the remote was not available or could not drop the key
at all.
So, instead of assuming the worst, just trust the status we have.
If we get it wrong, and the scenario above happened, our location
log will think the remote has the key. But the remote's location
log (assuming it has one) will know it dropped it, and the next sync
will regain consistency.
For a special remote, with no location log, our location log will be wrong,
but this is no different than the situation where someone else dropped
the key from the remote and we've not synced with them. The standard
paranoia about not trusting the location log to be the last word about
whether a remote has a key will save us from these situations. Ie,
if we try to drop the file, we'll actively check the remote,
and determine the inconsistency then.
This cleaned up the code quite a bit; now the committer just looks at the
Change to see if it's a change that needs to have a transfer queued for it.
If I later want to add dropping keys for files that were removed, or
something like that, this should make it straightforward.
This also fixes a bug. In direct mode, moving a file out of an archive
directory failed to start a transfer to get its content. The problem
was that the file had not been committed to git yet, and so the transfer
code didn't want to touch it, since fileKey failed to get its key.
Only starting transfers after a commit avoids this problem.
Noticed that, At startup or network reconnect, git push messages were sent,
often before presence info has been gathered, so were not sent to any
buddies.
To fix this, keep track of which buddies have seen such messages,
and when new presence is received from a buddy that has not yet seen it,
resend.
This is done only for push initiation messages, so very little data needs
to be stored.
Make manualPull send push requests over XMPP.
When reconnecting with remotes, those that are XMPP remotes cannot
immediately be pulled from and scanned, so instead maintain a set of
(probably) desynced remotes, and put XMPP remotes on it. (This set could be
used in other ways later, if we can detect we're out of sync with other
types of remotes.)
The merger handles detecting when a XMPP push is received from a desynced
remote, and triggers a scan then, if they have in fact diverged.
This has one known bug: A single XMPP remote can have multiple clients
behind it. When this happens, only the UUID of one client is recorded
as the UUID of the XMPP remote. Pushes from the other XMPP clients will not
trigger a scan. If the client whose UUID is expected responds to the push
request, it'll work, but when that client is offline, we're SOL.
For example, copy --to such a remote would send the file, but as NoUUID was
its uuid, no location log update was done. And drop --from such a remote
would not do anything, because NoUUID is not listed in the location log..
assistant: Fix bug in direct mode that could occur when a symlink is moved
out of an archive directory, and resulted in the file not being set to
direct mode when it was transferred.
The bug was that the direct mode mapping was not up-to-date when the
transferrer finished. So, finding no direct mode place to store the object,
it was put into .git/annex in indirect mode.
To fix this, just make the watcher update the direct mode mapping to
include the new file before it starts the transfer. (Seems we don't need to
update it to remove the old file if the link was moved, because the direct
mode code will notice it's not present and the mapping gets updated for its
removal later.)
The reason this was a race, and was probably not seen often is because
the committer came along and updated the direct mode mapping as part of
adding the moved symlink. But when the file was sufficiently small or
the remote sufficiently fast, this could happen after the transfer
finished.
Looking through the git sources (documentation is unclear),
it seems commit doesn't ever trigger git-gc, mostly fetching and merging
seems to. I cannot easily override the setting in all those places, so
instead set gc.auto in git config when initializing a repository with
the assistant.
This does mean that the user cannot set gc.auto=0 and completely avoid
repacks, as the assistant does it daily. But, it only does it after there
are 100x the default number of loose objects, so this is probably not going
to be too annoying.
A transfer is queued, but if the file has already been transferred to the
remote before, the transfer is skipped. In this case, it needs to perform
any actions it would normally take after finishing the transfer, like
dropping the local object.
This cannot completely guard against a runaway log event, and only runs
every hour anyway, but it should avoid most problems with very
long-running, active assistants using up too much space.
The transfer queue can grow larger than 10 when queueing transfers for
files that were just received, as well as requeueing failed transfers.
I probably need to do some work to prevent that, as it could use a lot of
RAM. But for now, cap the number of displayed transfers in the webapp, to
avoid flooding the browser.
I have seen some other programs do this, and think it's pretty cool. Means
you can test wherever it's deployed, as well as at build time.
My other reason for doing it is less happy. Cabal's handling of test suites
sucks, requiring duplicated info, and even when that's done, it fails to
preprocess hsc files here. Building it in avoids that and avoids having
to explicitly tell cabal to enable test suites, which would then make it
link the test executable every time, which is unnecessarily slow.
This also has the benefit that now "make fast test" does a max speed build
and tests it.
The only thing lost is ./ghci
Speed: make fast used to take 20 seconds here, when rebuilding from
touching Command/Unused.hs. With cabal, it's 29 seconds.