Avoid performing repository fixups for submodules and git-worktrees
when there's a .noannex file that will prevent git-annex from being
used in the repository.
This change is ok as long as the .noannex file is really going to prevent
git-annex from being used. But, init --force could override the file.
Which would result in the repo being initialized without the fixups
having run.
To avoid that situation decided to change init, to not let --force be used
to override a .noannex file. Instead the user can just delete the file.
If the worktree file already exists, and is annexed and uses the same
key, avoid failing, nothing needs to be done.
Had to add lookupFileNotHidden to handle the case where an adjust --hide-missing
is in use, and the worktree file was hidden due to the object content
being missing. lookupFile would return the key of the hidden file,
but it makes sense that after fromkey succeeds, the worktree must
contain the file it was supposed to set up.
Need to create the directory after the lock is held, not before.
The other racing process would need to shut down at just the wrong time,
running cleanupOtherTmp.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
I am very doubtful that commit 613e747d91
was right about this doing anything, and I've verified that without it,
ctrl-c sends sigINT to child processes, and git-annex get does not
continue to the next item.
It seems likely that the real problem back then was something catching
the async exception.
Hard to see how installing a default signal handler could cause any
change from default behavior either.
One reason to want to get rid of this cruft now is that tasty has a
sigINT handler of its own, and this would override it.
(Tasty is not currently setting that handler up the way git-annex uses
it, due to a problem in tasty, but that will hopefully change.)
* Switch to using .git/annex/othertmp for tmp files other than partial
downloads, and make stale files left in that directory when git-annex
is interrupted be cleaned up promptly by subsequent git-annex processes.
* The .git/annex/misctmp directory is no longer used and git-annex will
delete anything lingering in there after it's 1 week old.
Also, in Annex.Ingest, made the filename it uses in the tmp dir be
prefixed with "ingest-" to avoid potentially using a filename used by
some other code.
Adding that field broke the Read/Show serialization back-compat,
and also the Eq and Ord instances were not blinded to it, which broke
git annex fsck and probably more.
I think that the new approach used in formatKeyVariety will be nearly
as fast, but have not benchmarked it.
This reverts commit 4536c93bb2.
That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.
It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.
Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
The builder produces a lazy ByteString, and L.toStrict has to copy it,
but needing to use the builder is no longer to common case; the
serialization will normally be cached already as a strict ByteString,
and this avoids keyFile' needing to use L.toStrict . serializeKey'
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.
It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
Now there's a ByteString used all the way from disk to Key.
The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.
Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.
A keyName could contain "/", though this is unlikely and certianly only
ever could happen with WORM keys.
The change to addunused to escape that is no problem at all.
The change to VariantFile to escape it means that different versions of
git-annex could resolve a merge conflict differently in this case, which
is unfortunate. There would be different .variant files used, so the two
resolutions would themselves merge together without additional
conflicts, but the user would have to clean up the extra .variant
files.
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.
Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
Not likely to be any speed gain here, but this completes porting every
log file over.
And, it let me get rid of code copied from ghc and modified, so
simplifying the licensing.
This preserves the workaround for the old bug that caused NoUUID items
to be stored in the log, prefixing log lines with " ". It's now handled
implicitly, by using takeWhile1 (/= ' ') to get the uuid.
There is a behavior change from the old parser, which split the value
into words and then recombined it. That meant that "foo bar" and "foo\tbar"
came out as "foo bar". That behavior was not documented, and seems
surprising; it meant that after a git-annex describe here "foo bar",
you wouldn't get that same string back out when git-annex displayed repo
descriptions.
Otoh, some other parsers relied on the old behavior, and the attoparsec
rewrites had to deal with the issue themselves...
For group.log, there are some edge cases around the user providing a
group name with a leading or trailing space. The old parser would ignore
such excess whitespace. The new parser does too, because the alternative
is to refuse to parse something like " group1 group2 " due to excess
whitespace, which would be even more confusing behavior.
The only git-annex branch log file that is not converted to attoparsec
and bytestring-builder now is transitions.log.
There should be some speed gains here, especially for chunk and remote
state logs, which are queried once per key.
Now only old-format uuid-based logs still need to be converted to attoparsec.
Mostly didn't push the ByteStrings down very deep, but all of these log
files are not written to frequently at all, so slight remaining
innefficiency doesn't matter.
In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in
git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was
last touched by that ancient git-annex version, the descriptions of remotes
would appear missing when used with this version of git-annex. That is such minor
breakage, and so unlikely to still be a problem for any repos, that it was not
worth forward-porting that code to ByteString.
Probably not any particular speedup in this, since most of these logs
are not written to often. Possibly chunk log writing is sped up, but
writes to chunk logs are interleaved with expensive data transfers to
remotes, so unlikely to be a noticiable speedup.
Most of the individual logs are not converted yet, only presense logs
have an efficient ByteString Builder implemented so far. The rest
convert to and from String.
This should make == comparison of UUIDs somewhat faster, and perhaps a
few other operations around maps of UUIDs etc.
FromUUID/ToUUID are used to convert String, which is still used for all
IO of UUIDs. Eventually the hope is those instances can be removed,
and all git-annex branch log files etc use ByteString throughout, for a
real speed improvement.
Note the use of fromRawFilePath / toRawFilePath -- while a UUID usually
contains only alphanumerics and so could be treated as ascii, it's
conceivable that some git-annex repository has been initialized using
a UUID that is not only not a canonical UUID, but contains high unicode
or invalid unicode. Using the filesystem encoding avoids any problems
with such a thing. However, a NUL in a UUID seems extremely unlikely,
so I didn't use encodeBS / decodeBS to avoid their extra overhead in
handling NULs.
The Read/Show instance for UUID luckily serializes the same way for
ByteString as it did for String.
downloadUrl uses meteredFile, which sets up one progress meter,
and Remote.Web also uses metered, so two progress meters are displayed for
the same download.
Reversion introduced with the http-conduit switch in
c34152777b -- I don't know why the extra
call to metered was added there.
When -J is not used, the extra progress meter didn't display,
but an extra blank line did get output, which is also fixed.
This commit was sponsored by John Pellman on Patreon.
init: When --version=5 is passed on a crippled filesystem, use a v5 direct
mode repo as requested, rather than upgrading to v7 adjusted unlocked.
Fixed test suite on crippled filesystems, making it request --version=5
to test direct mode.
This fixes a bug with the numcopies counting when using sync --content.
It did not always pass the local repo uuid to handleDropsFrom, and so the
numcopies counting was off by one, and unwanted local content would only be
dropped when there were numcopies+1 remote copies.
Also, support dropping local content that has reached an
exporttree remote that is not untrusted (currently only S3 remotes
with versioning).
* Fix bug upgrading from direct mode to v7: when files in the repository
were already committed as v7 unlocked files elsewhere, and the
content was present in the direct mode repository, the annexed files
got their full content checked into git.
* Fix bug that caused v7 unlocked files in a direct mode repository
to get locked when committing.
This commit was sponsored by Nick Piper on Patreon.
When a file was already unlocked, but the annex object was present, the
upgrade process populated the unlocked file, but neglected to update the
index.
This commit was sponsored by Jochen Bartl on Patreon.
* findref: Support file matching options: --include, --exclude,
--want-get, --want-drop, --largerthan, --smallerthan, --accessedwithin
* Commands supporting --branch now apply file matching options --include,
--exclude, --want-get, --want-drop to filenames from the branch.
Previously, combining --branch with those would fail to match anything.
* add, import, findref: Support --time-limit.
This commit was sponsored by Jake Vosloo on Patreon.
dropunused: When an unused object file has gotten modified, eg due to
annex.thin being set, don't silently skip it, but display a warning and let
--force drop it.
This commit was sponsored by Ethan Aubin.
* init: When a crippled filesystem causes an adjusted unlocked branch to
be used, set repo version to 7, which it neglected to do before.
* init: When on a crippled filesystem, and the git version is too old
to use an adjusted unlocked branch, fall back to using direct mode.
This commit was sponsored by Ilya Shlyakhter on Patreon.
Seems that youtube-dl --get-filename on a playlist lists all the filenames
for the playlist, which can take quite some time. The code already only
took the first name, so --no-playlist can speed it up a lot.
This commit was sponsored by Brett Eisenberg on Patreon.
When an export conflict prevents accessing a special remote, be clearer
about what the problem is and how to resolve it.
This commit was sponsored by Trenton Cronholm on Patreon.
Fix hang when transferring the same objects to two different clients at the
same time. (Or when annex.pidlock is used, two different objects to the
same or different clients.)
Could also potentially occur if a client was downloading an object and
somehow lost connection but that git-annex-shell was still running and
holding the transfer lock.
This does not guarantee that, if `transfer` fails for some other reason,
a DATA response will be made.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
Should be redundant, but test suite is ending up with
a lot of extra sqlite connections before unused keys database handles
get garbage collected.
While running the test suite, I often saw 2-4+ open fds to the same
repo's keys database. After this change, it seems to mostly have 1,
occasionally 2.
And that might explain some of the strange sqlite failures in the test suite.
Especially the failures of test_lock_v7_force, where the keys database
gets renamed to a new directory out from under sqlite.
Cache high-resolution mtimes for improved detection of modified files in v7
(and direct mode).
Including on Windows.
With back-compat support so old low-res mtimes won't break anything, and
so the new information also won't break old versions of git-annex.
This is safe, because while the annex object ends up executable,
there were already at least two other cases where it ended up executable:
1. git add an an executable file
2. chmod +x of a a non-executable worktree file that was hard linked to the
annex object
After copy/hard link, it always fixes up the permissions to match the mode
of the worktree file, so when an executable annex object gets hard linked
to a non-executable worktree file, its execute bit gets removed.
Commit b7c8bf5274 already *said* it would do
this; I suspect the line of code I've removed was included in that commit
accidentially.
Also improves annex.thin documentation.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
It's auto-upgraded to 5, so does not need to be listed there.
Let's keep supportedVersions for versions that git-annex will actually
use without autoupgrading or demanding an upgrade.
init: When in a crippled filesystem, initialize a v7 repository using an
adjusted unlocked branch, instead of a direct mode repository.
Direct mode is deprecated, so this makes sense to do already I hope.
This commit was sponsored by Ole-Morten Duesund on Patreon.
Install new git hooks in this version.
This does beg the question of what to do if git later gets eg a
post-smudge hook, that could run git-annex smudge --update. I think the
thing to do in that case would be to make git-annex smudge --update
install the new hooks. That way, as the user uses git-annex, the hook
would be created pretty quickly and without needing any extra syscalls
except for when git-annex smudge --update is called.
I considered doing something like that for installation of the
post-checkout and post-merge hooks, which would have avoided the need
for v7. But the only place it was cheap to do it would be in git-annex smudge
which could cheaply notice that smudge.log didn't exist yet and so know
the hooks needed to be installed. But since smudge used to populate pointer
files, it would be quite surprising if a single git checkout/merge failed
to update the work tree, and so that idea didn't work out.
The other reason for v7 is psychological -- users don't need to worry
about whether they might be running an old version of git-annex that
doesn't support their v7 repository very well. And bug reports about
"v6" have gotten a bit of a bad association in my head since they often
hit one of the known limitations and didn't realize it was experimental.
newtyped RepoVersion Int to avoid needing 2 comparisons in
versionSupportsUnlockedPointers etc. Also it's just nicer.
This commit was sponsored by John Pellman on Patreon.
Usually, git won't run clean filter when a file is unmodified. But, when
git checkout runs git annex smudge --update, it populates the pointer
runs git update-index, which sees the file has changed and runs
git annex smudge --clean, which was checksumming the file unncessarily
as it re-ingested it.
With annex.thin set, this is the difference between git checkout of a
branch with a 1 gb file taking 30s and 0.1s.
This commit was sponsored by Brett Eisenberg on Patreon.
* init, upgrade: Install git post-checkout and post-merge hooks that run
git annex smudge --update.
* precommit: Run git annex smudge --update, because the post-merge
hook is not run when there is a merge conflict. So the work tree will
be updated when a commit is made to resolve the merge conflict.
* precommit: Run git annex smudge --update, because the post-merge
hook is not run when there is a merge conflict. So the work tree will
be updated when a commit is made to resolve the merge conflict.
* Note that git has no hooks run after git stash or git cherry-pick,
so the user will have to manually run git annex smudge --update
after such commands.
Nothing currently installs the hooks into v6 repos that already exist.
Something will need to be done about that, either move this behavior to v7,
or document that the user will need to manually fix up their v6 repos.
This commit was sponsored by Eric Drechsel on Patreon.
The smuge filter no longer provides git with annexed file content, to
avoid a git memory leak, and because that did not honor annex.thin.
git annex smudge --update has to be run after a checkout to update
unlocked files in the working tree with annexed file contents.
No hooks yet to run it.
This commit was sponsored by Nick Piper on Patreon.
It was trying to git annex adjust when in a direct mode repo, and that
of course fails. What I don't understand though, is how the test suite
managed to work before, when it was clearly checking the wrong thing.
Since the right way to fix it was obvious, I have not bisected.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
This completes initial support for --hide-missing, although the
assistant still needs to be updated and it perhaps needs to be sped up,
and maybe there needs to be a way for git-annex get to operate on
missing files. Opened some more todos for those things.
This commit was sponsored by Henrik Riomar.
This relies on git ls-files --with-tree, which I'm using in a way that
its man page does not document. Hm. I emailed the git list to try to get
the docs improved, but at least the git test suite does test the same
kind of use case I'm using here.
Performance impact when not in an adjusted branch is limited to some
additional MVar accesses, and a single git call to determine the name of
the current branch. So very minimal.
When in an adjusted branch, the performance impact is
in Annex.WorkTree.lookupFile, which starts doing an equal amount of work
for files that didn't exist as it already did for files that were
unlocked.
This commit was sponsored by Jochen Bartl on Patreon.
Both Command.Sync and Annex.Ingest had their own versions of this.
The one in Annex.Ingest used Git.Branch.currentUnsafe, but does not seem
to need it. That is only checking to see if it's in an adjusted unlocked
branch, and when in an adjusted branch, the branch does in fact exist,
so the added check that Git.Branch.current does is fine.
This commit was sponsored by Denis Dzyubenko on Patreon.
* At long last there's a way to hide annexed files whose content
is missing from the working tree: git-annex adjust --hide-missing
* When already in an adjusted branch, running git-annex adjust
again will update the branch as needed. This is mostly
useful with --hide-missing to hide/unhide files after their content
has been dropped or received.
Still needs integration with sync and the assistant, and not as fast as it
could be, but already usable.
This commit was sponsored by Ethan Aubin.
Combinations like --hide-misssing --unlocked seem very useful. On the
other hand, combining --fix with --unlock doesn't make sense because a
file can be either unlocked or a symlink that can be fixed, but not
both.
Changed the serialization of HideMissingAdjustment in passing, but it
has not actually been used yet so nothing will be broken.
This commit was sponsored by Trenton Cronholm on Patreon.
That could cause git-annex to get confused about whether a locked file's
content was present, when the object file got touched.
Unfortunately this means more work sometimes when annex.thin is set,
since it has to checksum the file to tell if it's still got the right
content.
Had to suppress output when inAnnex calls isUnmodified, otherwise
"(checksum...)" would be printed in places it ought not to be,
eg "git annex get" could turn out not need to get anything, and
so only display that.
This commit was sponsored by Ole-Morten Duesund on Patreon.
Running git-annex linux builds in termux seems to work well enough that the
only reason to keep the Android app would be to support Android 4-5, which
the old Android app supported, and which I don't know if the termux method
works on (although I see no reason why it would not).
According to [1], Android 4-5 remains on around 29% of devices, down from
51% one year ago.
[1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/
This is a rather large commit, but mostly very straightfoward removal of
android ifdefs and patches and associated cruft.
Also, removed support for building with very old ghc < 8.0.1, and with
yesod < 1.4.3, and without concurrent-output, which were only being used
by the cross build.
Some documentation specific to the Android app (screenshots etc) needs
to be updated still.
This commit was sponsored by Brett Eisenberg on Patreon.
Inverted logic added as part of the url security fix made it always use
curl when annex.security.allowed-http-addresses=all unless annex.web-options
was set.
That nobody noticed kind of makes me wonder if anyone uses
annex.web-options..
This commit was sponsored by Denis Dzyubenko on Patreon.
* init: Improve generated post-receive hook, so it won't fail when
run on a system whose git-annex is too old to support git-annex post-receive
* init: Update the post-receive hook when re-run in an existing repository.
This commit was sponsored by Jack Hill on Patreon.
v6: Fix annex object file permissions when git-annex add is run on a
modified unlocked file, and in some related cases.
If a hard link is made, don't freeze it; annex.thin
uses writable object files.
Also: For some reason, linkToAnnex used to thawContent src. I can see no
reason why it needed to do that, so I eliminated that.
This commit was sponsored by Brock Spratlen on Patreon.
Make git-annex sync and the assistant skip trying to drop from appendonly
remotes since it's just going to fail.
git-annex drop and similar commands will still try to drop from
appendonly, so the user will see failure messages when they try to do
that. To do otherwise would be confusing since the user has explicitly
asked for a drop with those commands.
This commit was supported by the NSF-funded DataLad project.
Probably not noticed until now because the queue is large enough that two
threads each filling theirs at the same time and flushing is unlikely to
happen.
Also made explicit that each worker thread gets its own queue.
I think that was the case before, but if something was put in the queue
before worker threads were forked off, they could have each inherited the
same queue.
Could have gone with a single shared queue, but per-worker queues is more
efficient, because a worker can add lots of stuff to its own queue without
any locking.
This commit was sponsored by Ole-Morten Duesund on Patreon.
Avoids annex.largefiles inconsitency and also avoids a lot of
unneccessary calls to the clean filter when a large repo's clone
is being initialized.
This commit was supported by the NSF-funded DataLad project.
v6: When annex.largefiles is not configured for a file, running git add or
git commit, or otherwise using git to stage a file will add it to the annex
if the file was in the annex before, and to git otherwise. This is to avoid
accidental conversion.
Note that git-annex add's behavior has not changed, for reasons explained
in the added comment.
Performance: No added overhead when annex.largefiles is configured.
When not configured, there is an added call to catObjectMetaData,
which involves a round trip through git cat-file --batch.
However, the earlier catKeyFile primes the cache for it.
This commit was supported by the NSF-funded DataLad project.