Initremote sets that, so after both initremote and enableremote,
the git config will be set.
Any remote that does not use Annex.SpecialRemote won't set
annex-config-uuid. But that's only Remote.Git, which doesn't use
RemoteConfig anyway.
This avoids some extra work, but I don't think it was possible for two ssh
endpoint discoveries run concurrently to both prompt for the ssh password;
Annex.Ssh itself deals with concurrency.
This is mostly groundwork for http password prompting.
tryGitConfigRead may run ensureInitialized first, but when checkuuid = false,
that is skipped. So, make sure it's run before all onLocal actions.
ensureInitialized is inexpensive, so the extra call by tryGitConfigRead
is not a big deal. But since it was easy to do, I made it only be run
once by all calls to onLocal.
A few calls to onLocal didn't call ensureInitialized before. Notably,
the checkPresent action didn't, and does now.
That means that there's a guarantee that any necessary repo upgrades
will be run before the checkPresent action runs in the repo. Which is
important especially for the direct mode conversion, because without
that upgrade, the checkPresent action would need to support direct mode
still. Now I can remove the last bits of direct mode support in
Annex.Content without worrying that it will break accessing remotes
that have not been upgraded.
This does necessarily mean that checkPresent needs to write to the disk
when performing such a repo upgrade. The other remote actions already
did, so retrieval from a readonly remote that needed to be upgraded would
fail. Having checkPresent also fail doesn't seem like a large reversion,
especially since it already failed in the default case when checkuuid = true.
Prompted by the test suite on windows failing to with "export foo failed"
and no information about what went wrong.
Note that only storeExportWithContentIdentifier has been converted.
storeExport still returns a Bool and so exceptions may be hidden.
However, storeExportWithContentIdentifier has many more failure modes,
since it needs to avoid overwriting modified files. So it's more
important it have better error display.
The test suite was intermittently failing with rsync complaining it
could not write to dest.
get foo (from origin...)
SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77
20 100% 0.00kB/s 0:00:00 ^M 20 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=0/1)
(from origin...)
SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77
20 100% 0.00kB/s 0:00:00 ^M 20 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=0/1)
rsync: open "/home/joey/src/git-annex/.t/tmprepo1103/.git/annex/tmp/SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77" failed: Permission denied (13)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1207) [sender=3.1.3]
It seems that the first rsync actually transferred the file, but then for some
reason git-annex thinks it failed, so it retries. The second rsync then fails
because the first rsync copied the file mode over and so the file is not
writable now.
So, this fixes that problem, but leaves open the question of why git-annex
would think rsync failed when it wrote the file and didn't output any
error message. Possibly a bug in rsyncProgress that either hides an
error message, or somehow makes rsync unhappy?
Using Logs.RemoteState for this means that if the same key gets uploaded
twice to a git-lfs remote, but somehow has different content the two
times (eg it's an URL key with non-stable content), the sha256/size of
the newer content uploaded will overwrite what was remembered before. That
seems ok; it just means that git-annex will request the newer version of
the content when downloading from git-lfs.
It will remember the sha256 and size if both are not known, or if only
the sha256 is not known but the size is known, it only remembers the
sha256, to avoid wasting space on the size. I did not add special case
for when the sha256 is known and the size is not, because it's been a
long time since git-annex created SHA256 keys without a size.
(See doc/upgrades/SHA_size.mdwn)
The protocol design allows the server to respond with some other object;
if a server for some reason a server did that, it would not be right for
git-annex to download its content. I don't think it would be a security
hole, since git-annex is downloading a specific key and will verify the
key's content. Seems like a good idea to belt-and-suspenders test for
such a misuse of the protocol.
This is a special remote and a git remote at the same time; git can pull
and push to it and git-annex can use it as a special remote.
Remote.Git has to check if it's configured as a git-lfs special remote
and sets it up as one if so.
Object methods not implemented yet.
In 40ecf58d4b I changed the license of code I
wrote from GPL to AGPL. But, two files containing code I wrote combined
with code by others were updated to say their license is AGPL, while in
fact part of it was (the code I wrote) but part remained under the original
license (the code written by others).
Remote/Ddar.hs is now changed entirely back to GPL 3.
Annex/DirHashes.hs stays AGPL, but I broke out Utility/MD5.hs with the code
not written by me, and corrected its license statement to GPL-2, which
is the actual version of the GPL included with the code in its original
distribution at http://www.cs.ox.ac.uk/people/ian.lynagh/md5/
Improved probing when CoW copies can be made between files on the same
drive. Now supports CoW between BTRFS subvolumes. And, falls back to rsync
instead of using cp when CoW won't work, eg copies between repos on the
same EXT4 filesystem.
Rather than trying cp --reflink=always for each file copied to a remote,
it's tried once and if it fails it falls back to using rsync thereafter
for the lifetime of the Remote object. That avoids overhead of calling cp
which while small, will add up over a large number of files.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.
The only remaining version ifdefs in the entire code base are now a couple
for aws!
This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.
This commit was sponsored by Ilya Shlyakhter.
Avoid a delay at startup when concurrency is enabled and there are
rsync or gcrypt special remotes, which was caused by git-annex
opening a ssh connection to the remote too early.
sshOptions makes a connection to the ssh server if one is not already open,
when concurrency is enabled. Avoid doing that at startup, when the remote
list is being built, but the remote may not be used at all.
Instead, rsync/gcrypt now runs sshOptions once per ssh connection to the
server. This should not be significant overhead since Remote.Git already
has the same overhead (as do Bup and Ddar).
When a remote is configured to be readonly, don't allow changing what's
exported to it.
This was missed in the original export remote implementation, but it makes
sense for a readonly export remote to not be allowed to change.
* Added mimeencoding= term to annex.largefiles expressions.
This is probably mostly useful to match non-text files with eg
"mimeencoding=binary"
* git-annex matchexpression: Added --mimeencoding option.
As well as adding the necessary methods, a few other changes to the adb
remote:
* Use ".annextmp" extension for temp files, to avoid conflict with other
temp files.
* Stop using "echo $?" to get exit status of command inside adb.
There were two problems; first the "echo" just before it meant it was
always 0! And secondly, it seems kind of random on my phone whether it's
1 or 0, not dependant on whether the command seems to have succeeded.
Unfortunately, "port" has to be set by default, or the old git-annex
will crash when trying to enable the S3 remote.
So, when protocol=https is specified, it needs to override port=80,
since it may be a default setting.
protocol=https implies port=443 and
port=443 implies protocol=https
-- this was necessary because the existing configs set port=443, but
with a protocol setting, users will naturally want to use it, and then
there's no need for them to supply the default https port. So we keep
back-compat, add a nicer way to enable https, and also add support for
non-standard https ports.
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.
Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.
(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
Avoid a warning message when renameExport is not supported, and just
fallback to deleting with a subsequent re-upload. Especially needed for
importtree remotes, where renameExport needs to be disabled.
This changes the external special remote protocol, but in a
backwards-compatible way. A reply of UNSUPPORTED-REQUEST to an older
version of git-annex will cause it to make renameExport return False.
This is not super efficient; it would be better to lock the database
once and build up a queue of changes and flush once.
But, storeExportWithContentIdentifier is likely going to be the really
expensive part, so let's do the simple thing and only optimise later if
needed.
git-annex: thread blocked indefinitely in an STM transaction
failed
git-annex: sqlite query crashed
CallStack (from HasCallStack):
error, called at ./Database/Handle.hs:98:42 in main:Database.Handle
failed
This needs further investigation.
Use same, simpler method to make only one thread open the export db as
is used for the ContentIdentifier db.
And, always update the export db once before using.
Had to add two more API calls to override export APIs that are not safe
for use in combination with import.
It's unfortunate that removeExportDirectory is documented to be allowed
to remove non-empty directories. I'm not entirely sure why it's that
way, my best guess is it was intended to make it easy to implement with
just rm -rf.
For now, it's only allowed when exporttree=yes is also set.
That simplified the implementation, but could later be changed if
there's a remote that makes sense to be an import but not an export.
However, it may work just as well to make a remote be readonly to
prevent export to it while still allowing import.
Not sure if my reasoning about the races really holds.
It would certianly be possible to better guard against races by using
Linux-specific renameat2 with RENAME_EXCHANGE or RENAME_NOREPLACE.
Or by using link and relying on it not overwriting existing files -- but
that would need a filesystem that supports hard links and directory can
be used in filesystems that don't.
This does not avoid all possible races, but it does avoid all likely
ones, and is demonstratably better than git's own handling of races
where files get modified at the same time as it's updating the working
tree.
The main thing this won't detect are not unlikely races where part
of a file gets changed while it's being copied and then the file is
restored to its original condition before the modification check.
No, it's more likely that the limitations of checking inode, size,
and mtime won't detect certian modifications, involving eg mmapped
files.
Made some api changes.
listImportableContents needs to provide the size
of the data, so the downloader can check disk free space.
retrieveExportWithContentIdentifier is passed the filepath to write to
Use temporary "CID" key during download of a ContentIdentifier from a
remote, so withTmp can be used and then move the content to the real key
once it's known.
Installing git-annex with stack rsync won't be available.
Also, using the git-annex installer with 64 bit git installs a non-working
rsync binary because it's linked with libraries provided by 32 bit git.
xporting files with '#' or '?' in their name won't work because urls get
truncated on those. Fail in a better way in this case, and avoid failing
when removing such files from the export, so after the user has renamed the
problem files the export will succeed.
This gets back any speed lost in commit
9cebfd7002, and speeds up all uses of S3
remotes that operate on them more than once.
This commit was sponsored by Brett Eisenberg on Patreon.
Pushed the ResourceT out into larger code blocks, and made sure that
the the http result from a sendS3Handle is processed inside the same
ResourceT block.
I don't think this fixes any bugs, but it allows getting rid of a scary
comment.
This commit was sponsored by Eric Drechsel on Patreon.
Purifying exportActions will allow introspecting and modifying it,
which is needed to add progress bar display to it.
Only S3 and WebDAV ran an Annex action while constructing ExportActions.
There was a small performance gain from them doing that, since a
resource was able to be prepared and reused for multiple actions by
Command.Export.
As seen in commit 809cfbbd8a and
5d394023eb S3 and WebDAV actually create a
new handle for each access in normal, non-export use. It doesn't seem
worth making export use of them marginally more efficient than normal
use. It would be better to do that work upfront when constructing the
remote. Or perhaps use a MVar to cache a handle.
This commit was sponsored by Nick Piper on Patreon.
resourcePrepare does not cause the resource to only be prepared once.
The http manager should be reused, which does avoid http connection
overhead, but not because of the use of resourcePrepare.
I seem to have thought that a Preparer was only run once when a remote
is accessed multiple times, but that is not in fact the case. prepareS3Handle
is run once per access. So, there is no point to it.
That there is some duplicate work done on each access is now apparent.
Luckily, the http manager is reused, so only one http connection is
made. But the S3 creds are loaded repeatedly. Room for improvement here.
This commit was sponsored by Jack Hill on Patreon.
When key-based retrieval from a S3 remote with exporttree=yes
appendonly=yes fails, fall back to trying to retrieve from the exported
tree. This allows downloads of files that were exported to such a remote
before versioning was enabled on it.
This is useful at least for a transition for users who got into that
situation, so they can download content from their S3 remote. May want to
remove this in the future though, since normally trying to download the
second time is only extra work.
This commit was sponsored by Brock Spratlen on Patreon.
Like the earlier fixed one in Command.Export, it occurred when the same
tree was exported by multiple clones. Previous fix was incomplete since
several other places looked at the list of exported trees to detect when
there was an export conflict. Added a single unified function to avoid
missing any places it needed to be fixed.
This commit was sponsored by mo on Patreon.