These files were left behind, and made getKeysPresent find keys that were
not present. It would be expensive to make getKeysPresent check that the
actual key files are present (it just lists the directories). But that's not
needed if we just clean up the stale cache and mapping files.
To handle systems that were in direct mode and got switched back with stale
direct mode files, made cleanObjectLoc remove all files in the key's directory.
git annex unused will still list keys that are gone but for which the stale
direct mode files exists. To deal with that, made dropunused remove the key's
directory even if the key does not seem to be present.
Making the pre-commit hook look at git diff-index to find changed direct
mode files and update the mappings works pretty well.
One case where it does not work is when a file is git annex added, and then
git rmed, and then this is committed. That's a no-op commit, so the hook
probably doesn't even run, and it certianly never notices that the file
was deleted, so the mapping will still have the original filename in it.
For this and other reasons, it's important that the mappings still be
treated as possibly inconsistent.
Also, the assistant now allows the pre-commit hook to run when in direct
mode, so the mappings also get updated there.
The most common way for a mapping to be stale is when a file was deleted,
or renamed. Nothing updates the mappings for deletions yet.
But they can also become stale in other ways. For example a file can
be modified.
So, the mapping is not trusted to be consistent. When we get a key,
only replace symlinks that still point to that key with its content.
When we drop a key, only put back symlinks for files that still have
the direct mode content.
New setting, can be used to disable autocommit of changed files by the
assistant, while it still does data syncing and other tasks.
Also wired into webapp UI
Union merges involving two or more repositories could sometimes result in
data from one repository getting lost. This could result in the location
log data becoming wrong, and fsck being needed to fix it.
NB: I audited for any other occurrences of this problem. There are other
places than union merge where multiple changes are fed into update-index
in a stream, but they all involve working copy files being staged, or their
deletion being staged, and in this case it's fine for the later changes
to override the earlier ones.
It used to not log to daemon.log when a repository was first created, and
when starting the webapp. Now both do. Redirecting stdout and stderr to the
log is tricky when starting the webapp, because the web browser may want to
communicate with the user. (Either a console web browser, or web.browser = echo)
This is handled by restoring the original fds when running the browser.
A while ago I added code to support recent versions of shakespeare-js,
(commit fe11b3a940). But it seems that resulted
in quoting of all strings inserted into javascript files, which means it's
now impossible to do the type of metaprogramming that longpolling.julius
relied on. I have found another way to accomplish the same thing without
needing to generate unique function names. Hopefully it's portable.
Opinion of shakespeare-js now at rock bottom. One of these days, this
needs to be redone to use Fay.
since some systems may have configuration problems or other issues that
prevent web browsers from connecting to the right localhost IP for the
webapp.
Tested on both ipv4 and ipv6 localhost. Url for the latter looks like:
http://[::1]:50676
The expensive scan uses lookupFile, but in direct mode, that doesn't work
for files that are present. So the scan was not finding things that are
present that need to be uploaded. (It did find things not present that
needed to be downloaded.)
Now lookupFile also works in direct mode. Note that it still prefers
symlinks on disk to info committed to git, in direct mode. This is
necessary to make things like Assistant.Threads.Watcher.onAddSymlink
work correctly, when given a new symlink not yet checked into git (or
replacing a file checked into git).
Would like to also have restart UI, but that's rather harder to do,
seems it'd need to start another copy of the webapp, and redirect the
browser to its new url, but running two assistants in the same repo at
the same time isn't good.
* SHA*E backends: Exclude non-alphanumeric characters from extensions.
* migrate: Remove leading \ in SHA* checksums, and non-alphanumerics
from extensions of SHA*E keys.
* Bugfix: Remove leading \ from checksums output by sha*sum commands,
when the filename contains \ or a newline. Closes: #696384
* fsck: Still accept checksums with a leading \ as valid, now that
above bug is fixed.
* migrate: Remove leading \ in checksums
The newline after the filename was included in it.
This was generally benign -- mostly these filenames are just displayed,
and the newline didn't matter.
But in the assistant, it caused unexpected dropping of preferred
content.
A characteristic of this bug is that the drop was displayed like this:
drop some_file
ok
* get/copy --auto: Transfer data even if it would exceed numcopies,
when preferred content settings want it.
* drop --auto: Fix dropping content when there are no preferred content
settings.
It was doubly broken; both missing a slash, and containing
"runshell git-annex", while some parts of the code expected it to be a
simple path to a program. This appears to include the transfer queue
runner, and the code that starts a new assistant process when switching to
another repository in the webapp.
Ensure that each file has something written to it, even if the bytestring
chunk size is greater than the configured chunksize.
This means we may write a bit larger than the configured value, but only
when the configured value is very small; ie, < 8 kb.
Files are now written to a tmp directory in the remote, and once all
chunks are written, etc, it's moved into the final place atomically.
For now, checkpresent still checks every single chunk of a file, because
the old method could leave partially transferred files with some chunks
present and others not.
Both the directory and webdav special remotes used to have to buffer
the whole file contents before it could be decrypted, as they read
from chunks. Now the chunks are streamed through gpg with no buffering.
This adds a dep on haskell's async library, but since that's been
added to the recent haskell platform release, it should not be
much hardship to my poor long-suffering library chasing users.
Wrote a better git remote name sanitizer. Git blows up on lots of weird
stuff, especially if it starts the remote name, but I managed to get
some common punctuation working.
Adjust build deps to ensure that only a fixed version of the library will
be used.
Also, removed the bound thread stuff, which I now think was (probably)
a red herring.
MountWatcher can't do this, because it uses the session dbus,
and won't have access to the new DBUS_SESSION_BUS_ADDRESS if a new session
is started.
Bumped dbus library version, FD leak in it is fixed.
For now, when dbus goes away, the assistant keeps running but does not fall
back or reconnect. To do so needs more changes to the DBus library; in
particular a connectSessionWith and connectSystemWith to let me specify
my own clientThreadRunner.
So, it might be called sha1sum, or on some other OS, it might be called
sha1. It might be hidden away off of PATH on that OS. That's just expected
insanity; UNIX has been this way since 1980's. And these days, nobody even
gives the flying flip about standards that we briefly did in the 90's
after the first round of unix wars.
But it's the 2010's now, and we've certainly learned something.
So, let's make it so sometimes sha1 is a crazy program that wants to run as
root so it can lock memory while prompting for a passphrase, and outputting
binary garbage. Yes, that'd be wise. Let's package that in major Linux
distros, too, so users can stumble over it.
bup 0.25 does not accept that; and bup split reads from stdin by
default if no file is given. I'm not sure what version of bup changed this.
This only affected bup special remotes that were encrypted.
Monitors git-annex branch for changes, which are noticed by the Merger
thread whenever the branch ref is changed (either due to an incoming push,
or a local change), and refreshes cached config values for modified config
files.
Rate limited to run no more often than once per minute. This is important
because frequent git-annex branch changes happen when files are being
added, or transferred, etc.
A primary use case is that, when preferred content changes are made,
and get pushed to remotes, the remotes start honoring those settings.
Other use cases include propigating repository description and trust
changes to remotes, and learning when a remote has added a new special
remote, so the webapp can present the GUI to enable that special remote
locally.
Also added a uuid.log cache. All other config files already had caches.
in= was problimatic in two ways. First, it referred to a remote by name,
but preferred content expressions can be evaluated elsewhere, where that
remote doesn't exist, or a different remote has the same name. This name
lookup code could error out at runtime. Secondly, in= seemed pretty useless.
in=here did not cause content to be gotten, but it did let present content
be dropped.
present is more useful, although "not present" is unstable and should be
avoided.
When in a subdir, both the normal filepath, and the filepath relative to
the top of the git repo are needed for matching. The former for key lookup,
and the latter for include/exclude to match against. Previously, key lookup
didn't work in this situation.
The old code was just wrong in taking fromPath of GIT_DIR -- that made an
localUnknown location with the GIT_DIR in it, which only worked by
accident, and failed in submodules.
Now that this is handled correctly, git-annex can be used in git submodules.
Also, fixed infelicity where Git.CurrentRepo and Git.Config.updateLocation
were both dealing with core.worktree. Now updateLocation handles it for
Local as well as for LocalUnknown repos.
Aka solve the github problem.
Note that it's possible the initial configlist will fail for some network
reason etc, and then the fetch succeeds. In this case, a usable remote gets
disabled. But it does print a message, and this only happens once per
remote, so that seems ok.
There was one forkProcess lurking in test.hs, and that seems to be
responsible for recent buildd failures on amd64 and armhf. I was able to
reproduce it pretty easily on amd64, and even once on i386, and it was
clearly that same bad old threaded runtime hang. So removing this
forkProcess should fix it. Odd that it lurked for some months before
popping up.
Setting GIT_INDEX_FILE clobbers the rest of the environment, making git
not read ~/.gitconfig, and blow up if GECOS didn't have a name for the
user.
I'm not entirely happy with getEnvironment being run every time now,
that's somewhat expensive. It may make sense to just set GIT_COMMITTER_*
and GIT_AUTHOR_*, but I worry that clobbering the rest could break PATH,
or GIT_PATH, or something else that might be used by a command run in here.
And caching the environment is not a good idea either; it can change..
webapp: Adds newly created repositories to one of these groups:
clients, drives, servers
This is heuristic, but it's a pretty good heuristic, and can always be
configured.
Both when queueing downloads, and uploads, consults the preferred content
settings.
I didn't make it check yet when requeing failed transfers or queuing
deferred downloads; dealing with the preferred content settings (or indeed,
other settings) changing while the assistant is running still needs work.
Incomplete; I need to finish parsing and saving. This will also be used
for editing transfer control expresssions.
Removed the group display from the status output, I didn't really
like that format, and vicfg can be used to see as well as edit rempository
group membership.
Makes it safe to use git annex unlock with the watcher/assistant.
And also to mix use of the watcher/assistant with regular files stored in git.
Long ago, I had avoided doing this check, except during the startup scan,
because it would be slow to run ls-files repeatedly.
But then I added the lsof check, and to make that fast, got it to detect
batch file adds. So let's move the ls-files check to also occur when it'll
have a batch, and can check them all with one call.
This does slow down adding a single file by just a bit, but really only
a little bit. (The lsof check is probably more expensive.) It also
speeds up the startup scan, especially when there are lots of new files
found by the scan.
Also, fixed the sleep for annex.delayadd to not run while the threadstate
lock is held, so it doesn't unnecessarily freeze everything else.
Also, --force no longer makes it skip the lsof check, which was not
documented, and seems never a good idea.