Commit graph

769 commits

Author SHA1 Message Date
Joey Hess
c3f47ba389
make .noannex file prevent repo fixups
Avoid performing repository fixups for submodules and git-worktrees
when there's a .noannex file that will prevent git-annex from being
used in the repository.

This change is ok as long as the .noannex file is really going to prevent
git-annex from being used. But, init --force could override the file.
Which would result in the repo being initialized without the fixups
having run.

To avoid that situation decided to change init, to not let --force be used
to override a .noannex file. Instead the user can just delete the file.
2019-02-05 14:43:23 -04:00
Joey Hess
b080699a95
fromkey --json
* fromkey: Added --json.
* fromkey --batch output changed to support using it with --json.
  The old output was not parseable for any useful information, so
  this is not expected to break anything.
2019-02-05 14:03:29 -04:00
Joey Hess
7b46b43c48
fromkey: Made idempotent
If the worktree file already exists, and is annexed and uses the same
key, avoid failing, nothing needs to be done.

Had to add lookupFileNotHidden to handle the case where an adjust --hide-missing
is in use, and the worktree file was hidden due to the object content
being missing. lookupFile would return the key of the hidden file,
but it makes sense that after fromkey succeeds, the worktree must
contain the file it was supposed to set up.
2019-02-05 13:13:13 -04:00
Joey Hess
a64fca92f6
Fix race in cleanup of othertmp directory that could result in a failure attempting to access it.
Need to create the directory after the lock is held, not before.

The other racing process would need to shut down at just the wrong time,
running cleanupOtherTmp.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2019-02-02 13:56:31 -04:00
Joey Hess
7b9701675e
Display progress bar when getting files from export remotes
And moved the progress bar display into storeExport as well.

This commit was sponsored by John Pellman on Patreon.
2019-01-31 13:34:12 -04:00
Joey Hess
ab689cf0cd
Improved speed of S3 remote by only loading S3 creds once
This gets back any speed lost in commit
9cebfd7002, and speeds up all uses of S3
remotes that operate on them more than once.

This commit was sponsored by Brett Eisenberg on Patreon.
2019-01-30 16:20:14 -04:00
Joey Hess
720e5fda5c
export retrieval fallback to handle S3 remote with partially missing version IDs
When key-based retrieval from a S3 remote with exporttree=yes
appendonly=yes fails, fall back to trying to retrieve from the exported
tree. This allows downloads of files that were exported to such a remote
before versioning was enabled on it.

This is useful at least for a transition for users who got into that
situation, so they can download content from their S3 remote. May want to
remove this in the future though, since normally trying to download the
second time is only extra work.

This commit was sponsored by Brock Spratlen on Patreon.
2019-01-30 13:23:03 -04:00
Joey Hess
ad1d422dd7
fix false positive in export conflict detection
Like the earlier fixed one in Command.Export, it occurred when the same
tree was exported by multiple clones. Previous fix was incomplete since
several other places looked at the list of exported trees to detect when
there was an export conflict. Added a single unified function to avoid
missing any places it needed to be fixed.

This commit was sponsored by mo on Patreon.
2019-01-30 12:36:30 -04:00
Joey Hess
4cf7deb57e
releasing package git-annex version 7.20190129 2019-01-29 15:21:44 -04:00
Joey Hess
a8f1add4d1
S3: Detect when version=yes but an exported file lacks versioning, and refuse to delete it, to avoid data loss.
This commit was sponsored by Denis Dzyubenko on Patreon.
2019-01-29 15:07:27 -04:00
Joey Hess
bb9817ceae
enableremote S3: Do not let versioning=yes be set on existing remote
Because when git-annex lacks S3 version IDs for files stored in the bucket,
deleting them would cause data loss.

Also because git-annex is not able to download unversioned objects from a bucket
when versioning=yes.

This also prevents setting versioning=no. While that would perhaps be
possible to do safely, it would add complexity, and would mean that if
the user accidentially did enableremote versioning=no, they would not be
able to undo it.

This commit was sponsored by Trenton Cronholm on Patreon.
2019-01-29 14:09:50 -04:00
Joey Hess
ee011b3cbb
initremote S3: Automatically enable versioning in S3 buckets when configured with versioning=yes.
Needs not yet released version 0.22 of aws library; with older versions
asks the user to configure the bucket versioning themselves.

Note that S3 endpoints that don't support versioning will cause putBucketVersioning
to throw an exception, so initremote will fail.

This commit was sponsored by Jake Vosloo on Patreon.
2019-01-29 13:46:04 -04:00
Joey Hess
669b305de2
S3: Send a Content-Type header when storing objects in S3
So exports to public buckets can be linked to from web pages.

(When git-annex is built with MagicMime support.)

Thanks to Jared Cosulich for the idea.
2019-01-23 13:08:47 -04:00
Joey Hess
f918e8798f
releasing package git-annex version 7.20190122 2019-01-22 12:28:14 -04:00
Joey Hess
6ec7295870
Android: For armv71 architecture, use the armel build
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2019-01-22 11:50:29 -04:00
Joey Hess
9a4406e5e7
webapp: remove configurators for obsolste cloud storage services
* webapp: Remove configurator for box.com repository, since their
  webdav support is going away at the end of this January.
* webapp: Remove configurator for gitlab, which stopped supporting git-annex
  some time ago.

This commit was sponsored by Brock Spratlen on Patreon.
2019-01-22 11:48:35 -04:00
Joey Hess
112bb82fc2
Windows: If 64 bit git is installed, use it when installing git-annex.
However, rsync still won't work with 64 bit git and
this is still not the documented way to install it.

So, if both 64 and 32 are installed, go with 32.

And if neither git can be found, default to 32.
2019-01-21 15:51:48 -04:00
Joey Hess
e38b654096
Estimated time to completion display shortened from eg "1h1m1s" to "1h1m"
Because seconds accuracy over such a time is unlikely to be accurate.
Also, it was possible to get a ridiculous "1y1d1h1m1s" if stalled or
very slow.
2019-01-21 00:04:35 -04:00
Joey Hess
d5f2463702
misctmp cleanup
* Switch to using .git/annex/othertmp for tmp files other than partial
  downloads, and make stale files left in that directory when git-annex
  is interrupted be cleaned up promptly by subsequent git-annex processes.
* The .git/annex/misctmp directory is no longer used and git-annex will
  delete anything lingering in there after it's 1 week old.

Also, in Annex.Ingest, made the filename it uses in the tmp dir be
prefixed with "ingest-" to avoid potentially using a filename used by
some other code.
2019-01-17 16:02:22 -04:00
Joey Hess
8555169e71
testremote: Support testing readonly remotes with the --test-readonly option
This commit was sponsored by Ilya Shlyakhter on Patreon.
2019-01-17 12:44:52 -04:00
Joey Hess
d79ac08532
devblog 2019-01-14 19:00:38 -04:00
Joey Hess
4536c93bb2
cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
2019-01-14 16:37:28 -04:00
Joey Hess
1791447cc8
avoid creating work tree files in subdirectories in an edge case
A keyName could contain "/", though this is unlikely and certianly only
ever could happen with WORM keys.

The change to addunused to escape that is no problem at all.

The change to VariantFile to escape it means that different versions of
git-annex could resolve a merge conflict differently in this case, which
is unfortunate. There would be different .variant files used, so the two
resolutions would themselves merge together without additional
conflicts, but the user would have to clean up the extra .variant
files.
2019-01-14 13:14:25 -04:00
Joey Hess
727767e1e2
make everything build again after ByteString Key changes 2019-01-11 16:39:46 -04:00
Joey Hess
2eadb6cd68
convert transitions.log to attoparsec and bytestring-builder
Not likely to be any speed gain here, but this completes porting every
log file over.

And, it let me get rid of code copied from ghc and modified, so
simplifying the licensing.
2019-01-10 17:13:30 -04:00
Joey Hess
591e4b145f
convert old uuid-based log parsers to attoparsec
This preserves the workaround for the old bug that caused NoUUID items
to be stored in the log, prefixing log lines with " ". It's now handled
implicitly, by using takeWhile1 (/= ' ') to get the uuid.

There is a behavior change from the old parser, which split the value
into words and then recombined it. That meant that "foo  bar" and "foo\tbar"
came out as "foo bar". That behavior was not documented, and seems
surprising; it meant that after a git-annex describe here "foo  bar",
you wouldn't get that same string back out when git-annex displayed repo
descriptions.

Otoh, some other parsers relied on the old behavior, and the attoparsec
rewrites had to deal with the issue themselves...

For group.log, there are some edge cases around the user providing a
group name with a leading or trailing space. The old parser would ignore
such excess whitespace. The new parser does too, because the alternative
is to refuse to parse something like " group1  group2 " due to excess
whitespace, which would be even more confusing behavior.

The only git-annex branch log file that is not converted to attoparsec
and bytestring-builder now is transitions.log.
2019-01-10 16:34:20 -04:00
Joey Hess
2fef43dd71
convert all per-uuid log files to use Builder
Mostly didn't push the ByteStrings down very deep, but all of these log
files are not written to frequently at all, so slight remaining
innefficiency doesn't matter.

In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in
git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was
last touched by that ancient git-annex version, the descriptions of remotes
would appear missing when used with this version of git-annex. That is such minor
breakage, and so unlikely to still be a problem for any repos, that it was not
worth forward-porting that code to ByteString.
2019-01-09 14:00:35 -04:00
Joey Hess
ccd75c60d2
correct ghc version number 2019-01-05 16:07:53 -04:00
Joey Hess
2e0e557e75
Support being built with ghc 8.0.1 (MonadFail)
Tested on an older ghc by enabling MonadFailDesugaring globally.

In TransferQueue, the lack of a MonadFail for STM exposed what would
normally be a bug in the pattern matching, although in this case an
earlier check that the queue was not empty avoided a pattern match
failure.
2019-01-05 11:55:15 -04:00
Joey Hess
11d6e2e260
new improved benchmark command that can benchmark anything git-annex does 2019-01-04 13:46:36 -04:00
Joey Hess
3ba6e9bb96
use attoparsec parser for String parsing, 10x speedup
This is not as efficient as using ByteStrings throughout, but converting
the String to ByteString is actually significantly faster than the old
parser.

    benchmarking parse/old
    time                 9.657 μs   (9.600 μs .. 9.732 μs)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 9.703 μs   (9.645 μs .. 9.785 μs)
    std dev              231.6 ns   (161.5 ns .. 323.7 ns)
    variance introduced by outliers: 25% (moderately inflated)

    benchmarking parse/new
    time                 834.6 ns   (797.1 ns .. 886.9 ns)
                         0.987 R²   (0.976 R² .. 0.999 R²)
    mean                 816.4 ns   (802.7 ns .. 845.1 ns)
    std dev              62.39 ns   (37.66 ns .. 108.4 ns)
    variance introduced by outliers: 82% (severely inflated)

There is a small behavior change from the old parsePOSIXTime,
which accepted any amount of trailing whitespace after the timestamp.
That behavior was not documented, and it doesn't seem anything relied on it.
2019-01-02 13:28:44 -04:00
Joey Hess
6512b40bac
importfeed: Better error message when downloading the feed fails
It used to display the "bad feed content" message indicating there were no
enclosures found, which was misleading when the http request for the feed
failed.

This commit was sponsored by Ewen McNeill on Patreon.
2018-12-30 16:14:55 -04:00
Joey Hess
a26514d67e
Fix doubled progress display when downloading an url when -J is used.
downloadUrl uses meteredFile, which sets up one progress meter,
and Remote.Web also uses metered, so two progress meters are displayed for
the same download.

Reversion introduced with the http-conduit switch in
c34152777b -- I don't know why the extra
call to metered was added there.

When -J is not used, the extra progress meter didn't display,
but an extra blank line did get output, which is also fixed.

This commit was sponsored by John Pellman on Patreon.
2018-12-30 12:29:49 -04:00
Joey Hess
365286279f
unused: Update suggested git log message to see where data was previously used so it will also work with v7 unlocked pointer files. 2018-12-19 13:53:49 -04:00
Joey Hess
5759e93444
honor init --version=5 on crippled filesystem
init: When --version=5 is passed on a crippled filesystem, use a v5 direct
mode repo as requested, rather than upgrading to v7 adjusted unlocked.

Fixed test suite on crippled filesystems, making it request --version=5
to test direct mode.
2018-12-19 13:17:04 -04:00
Joey Hess
14971414dc
Make test suite work better when the temp directory is on NFS.
Deleting directories is one of the great unsolved problems of CS, thanks to
abominations like NFS lock files and Windows and races with other processes
cleaning up after themselves in the background. The gpg test harness
sometimes failed to delete its temp directory on NFS. Avoid the problem
class by not deleting it at all, and putting it inside the tmp repo being
tested. The test suite's more robust (and/or nonsensical) workarounds for
deleting its test dir will thus be used, hopefully avoiding the problem
until an OS finds a new way to violate POSIX and the laws of nature.

Note that this means that the .gnupg directory will be on whatever
filesystem the test suite is being run on, which may be a lesser quality
filesystem than gpg is really expecting. Gpg does not seem to need to
write sockets etc to there so this seems ok. The only known problem is
that if the filesystem forces a directory mode like 777, gpg will warn
about unsafe home directory perms, but it still works.
2018-12-19 12:44:56 -04:00
Joey Hess
6d381df0e6
sync --content: Fix dropping unwanted content from the local repository
This fixes a bug with the numcopies counting when using sync --content.
It did not always pass the local repo uuid to handleDropsFrom, and so the
numcopies counting was off by one, and unwanted local content would only be
dropped when there were numcopies+1 remote copies.

Also, support dropping local content that has reached an
exporttree remote that is not untrusted (currently only S3 remotes
with versioning).
2018-12-18 13:58:12 -04:00
Joey Hess
426bdbf113
releasing package git-annex version 7.20181211 2018-12-11 16:33:30 -04:00
Joey Hess
bbf7dcc193
fix bugs involving v7 unlocked files and direct mode
* Fix bug upgrading from direct mode to v7: when files in the repository
  were already committed as v7 unlocked files elsewhere, and the
  content was present in the direct mode repository, the annexed files
  got their full content checked into git.
* Fix bug that caused v7 unlocked files in a direct mode repository
  to get locked when committing.

This commit was sponsored by Nick Piper on Patreon.
2018-12-11 13:47:35 -04:00
Joey Hess
11dbb829bc
Fix a case where upgrade to v7 caused git to think that unlocked files were modified
When a file was already unlocked, but the annex object was present, the
upgrade process populated the unlocked file, but neglected to update the
index.

This commit was sponsored by Jochen Bartl on Patreon.
2018-12-11 13:05:03 -04:00
Joey Hess
3f587d447a
fix webdav reversion
webdav: When initializing, avoid trying to make a directory at the top of
the webdav server, which could never accomplish anything and failed on
nextcloud servers. (Reversion introduced in version 6.20170925.)

This commit was sponsored by mo on patreon.
2018-12-10 12:49:51 -04:00
Joey Hess
904be4e6be
add --branch option to git-annex find and mildly deprecate findref in favor of it
No deprecation warning at run time, just one on the man page.

One thing findref remains able to do that find cannot is to run in a bare
repo. Find was made to refuse to run in a bare repo because it seemed
confusing for it to not list any files ever in that situation. It would be
better for find --branch to work in a bare repo but not without --branch
but I don't currently have a way to do that.

Probably a better solution would be to make git-annex in a bare repo
default to --branch master or something like that instead of --all.

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-12-09 14:10:37 -04:00
Joey Hess
029ae8d4db
support findred and --branch with file matching options
* findref: Support file matching options: --include, --exclude,
  --want-get, --want-drop, --largerthan, --smallerthan, --accessedwithin
* Commands supporting --branch now apply file matching options --include,
  --exclude, --want-get, --want-drop to filenames from the branch.
  Previously, combining --branch with those would fail to match anything.
* add, import, findref: Support --time-limit.

This commit was sponsored by Jake Vosloo on Patreon.
2018-12-09 13:38:35 -04:00
Joey Hess
4579dd6201
S3: Improve diagnostics when a remote is configured with exporttree and versioning, but no S3 version id has been recorded for a key.
When public access is used for the remote, it complained that the user
needed to set creds to use it, which was just wrong.

When creds were being used, it fell back from trying to use the version ID
to just accessing the key in the bucket, which was ok for non-export
remotes, but wrong for buckets.

In both cases, display a hopefully useful warning.

This should only come up when an existing S3 remote has been exported
to, and then later versioning was enabled.

Note that it would perhaps be possible to fall back from trying to use
retrieveKeyFile when it fails and instead use retrieveKeyFileFromExport,
which may work when S3 version ID is missing. But there are problems
with that approach; how to tell when retrieveKeyFile has failed due to this
rather than a network problem etc? Anyway, that approach would only work
until the file in the export got overwritten, and then it would no
longer be accessible. And with versioning enabled, the user wants old
versions of objects to remain accessible, so it seems better to warn
about the problem as soon as possible, so they can go back and add S3
version IDs.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-12-06 13:44:37 -04:00
Joey Hess
1d16605f93
releasing package git-annex version 7.20181205 2018-12-05 16:19:11 -04:00
Joey Hess
ab7746a2ae
annex.cachecreds: New config to allow disabling of credentials caching for special remotes.
Note that it does not prevent storing p2p access tokens or multicast
encryption keys, since those are not cached; the previous commit
established the distinction.

How well this works depends on how often getRemoteCredPair is called and
how expensive it is. In some cases setting this will result in an annoying
number of gpg password prompts and/or slowdowns due to reading creds
from the git-annex branch and decrypting, which could be improved by calling
getRemoteCredPair less often.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2018-12-04 14:16:56 -04:00
Joey Hess
aa8243df4c
dropunused edge case when annex.thin caused unused object to be modified
dropunused: When an unused object file has gotten modified, eg due to
annex.thin being set, don't silently skip it, but display a warning and let
--force drop it.

This commit was sponsored by Ethan Aubin.
2018-12-04 12:20:34 -04:00
Joey Hess
b8f9dea27d
add exportedtree to info
info: When used with an exporttree remote, includes an "exportedtree" info,
which is the tree last exported to the remote. During an export conflict,
multiple values will be listed.

This commit was sponsored by John Pellman on Patreon.
2018-12-03 14:36:00 -04:00
Joey Hess
865d556103
fix init in cripped filesystem version issues
* init: When a crippled filesystem causes an adjusted unlocked branch to
  be used, set repo version to 7, which it neglected to do before.
* init: When on a crippled filesystem, and the git version is too old
  to use an adjusted unlocked branch, fall back to using direct mode.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2018-12-03 12:57:23 -04:00
Joey Hess
19372e47ea
Fix build without concurrent-output.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-12-03 12:33:00 -04:00
Joey Hess
ecdba3ed3f
When running youtube-dl to get a filename, pass --no-playlist
Seems that youtube-dl --get-filename on a playlist lists all the filenames
for the playlist, which can take quite some time. The code already only
took the first name, so --no-playlist can speed it up a lot.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-11-28 17:14:47 -04:00
Joey Hess
5a741c624e
Make bittorrent special remote work w/o btshowmetainfo installed when it was build with torrentparser. Thanks, Robert Schütz 2018-11-27 12:31:06 -04:00
Joey Hess
f81eaaf411
releasing package git-annex version 7.20181121 2018-11-21 14:24:04 -04:00
Yaroslav Halchenko
e80bb8bc4b
Meld ReproNim into Yarik/DataLad's identity 2018-11-21 14:04:28 -04:00
Joey Hess
95506d17f2
Updated stack.yaml to lts-12.19
And added stack-lts-9.9.yaml to support old versions of stack.
The i386 ancient autobuilder needs stack-lts-9.9.yaml; the OSX autobuilder
may also use it for a while, and it's needed to build on eg debian stable.
2018-11-20 14:00:02 -04:00
Joey Hess
e8f57a2254
typo 2018-11-20 12:02:21 -04:00
Joey Hess
7eddee0a67
add thanks 2018-11-20 11:57:12 -04:00
Joey Hess
ec896c1cd3
remove stack.yaml update item
That didn't actually happen, newer lts like that one are not supported
by the version of stack in Debian stable, used for the i386-ancient
autobuild, and generally I want git-annex to be buildable on stable
releases of linux distros etc. So stack.yaml is going to be stuck on old
versions for some time until some years after stack stops breaking backwards
compatability.
2018-11-20 11:52:29 -04:00
Joey Hess
f62114e5ad
Merge branch 'remove-esqueleto' 2018-11-20 11:50:04 -04:00
Joey Hess
3c1e5ac0a3
changelog for now fixed crash 2018-11-19 18:59:45 -04:00
Joey Hess
39fbaa0682
catch all (non-async) exceptions when running a commandAction
When a command is operating on multiple files and there's an error with
one, try harder to continue to the rest. (As was already done for many
types of errors including IO errors.)

This handles cases like lockContentForRemoval throwing an exception when
the content is already locked. Just because a drop of one file fails, does
not mean it shouldn't go on to try to drop other files.

I looked over uses of `giveup` in Command/*; there are too many to check
them all extensively, but none stood out as being problems that should let
one commandAction stop running other commandActions. Worst case, something
bad will happen and rather than stopping right away with an error,
git-annex will display multiple errors as it fails over and over on each
file. I don't think I ever really intended `error`/`giveup` to stop other
commandActions; this was a relic of old confusion over haskell exception
handling.

Test suite passes.

This commit was sponsored by Ethan Aubin.
2018-11-15 15:59:43 -04:00
Joey Hess
c8bd5710b1
check onlyActionOn in Drop
* drop -J: Avoid processing the same key twice at the same time when
  multiple annexes files use it.

This prevents a drop of a key conflicting with another drop of the same
key.

This commit was sponsored by Brock Spratlen on Patreon.
2018-11-15 15:43:51 -04:00
Joey Hess
71cc9cfaa2
improve smudge --clean behavior on outside work tree files
smudge: When passed a file located outside the working tree, eg by git
diff, avoid erroring out.

This commit was sponsored by Ewen McNeill on Patreon.
2018-11-15 13:04:40 -04:00
Joey Hess
c3fa1f2b08
avoid redundant export uploads
export, sync --content: Avoid unnecessarily trying to upload files to an
exporttree remote that already contains the files.

When the export was origianly made in one repo and now git-annex is
running in a different repo, the export database is not yet populated with
information about the exportLocation of files. So, it was trying to upload
the files to the export, even when it already contained them.

sync --content would first download the content from the export, and then
re-upload the content back.

And this also led to "not available" failures for each file that was not
locally present yet.

Fix: Just use checkPresentExport before uploading; if it succeeds update
the database.

This is a surprising oversight, it's possible it fixes a reversion because
I would have thought I'd have noticed this problem when originally
developing exporttree remotes.

This commit was sponsored by Jochen Bartl on Patreon.
2018-11-14 11:47:40 -04:00
Joey Hess
d65df7ab21
improve messages around export conflicts
When an export conflict prevents accessing a special remote, be clearer
about what the problem is and how to resolve it.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-11-13 15:50:06 -04:00
Joey Hess
ff9bd9620e
Fix resume of download of url when the whole file content is already actually downloaded
Don't much like that there's no way to distinguish between having the whole
content and having an old version of the file that's bigger, but of course
resuming a http transfer can always yield the wrong result if the file on
the http server is changing, and git-annex will detect that when it
verifies the downloaded content.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-12 16:08:47 -04:00
Joey Hess
5ce078da92
bash completion fix
Fix bash completion of "git annex" to propertly handle files with spaces
and other problem characters. (Completion of "git-annex" already did.)

This commit was sponsored by Jake Vosloo on Patreon.
2018-11-12 13:23:05 -04:00
Joey Hess
46dc52a317
update 2018-11-10 12:30:39 -04:00
Joey Hess
f78f97780c
Fix build with persistent-sqlite older than 2.6.3.
This commit was sponsored by Jack Hill on Patreon.
2018-11-09 13:09:02 -04:00
Joey Hess
6ecd55a9fa
Fixed some other potential hangs in the P2P protocol
Finishes the start made in 983c9d5a53, by
handling the case where `transfer` fails for some other reason, and so the
ReadContent callback does not get run. I don't know of a case where
`transfer` does fail other than the locking dealt with in that commit, but
it's good to have a guarantee.

StoreContent and StoreContentTo had a similar problem.
Things like `getViaTmp` may decide not to run the transfer action.
And `transfer` could certianly fail, if another transfer of the same
object was in progress. (Or a different object when annex.pidlock is set.)

If the transfer action was not run, the content of the object would
not all get consumed, and so would get interpreted as protocol commands,
which would not go well.

My approach to fixing all of these things is to set a TVar only
once all the data in the transfer is known to have been read/written.
This way the internals of `transfer`, `getViaTmp` etc don't matter.

So in ReadContent, it checks if the transfer completed.
If not, as long as it didn't throw an exception, send empty and Invalid
data to the callback. On an exception the state of the protocol is unknown
so it has to raise ProtoFailureException and close the connection,
same as before.

In StoreContent, if the transfer did not complete
some portion of the DATA has been read, so the protocol is in an unknown
state and it has to close the conection as well.

(The ProtoFailureMessage used here matches the one in Annex.Transfer, which
is the most likely reason. Not ideal to duplicate it..)

StoreContent did not ever close the protocol connection before. So this is
a protocol change, but only in an exceptional circumstance, and it's not
going to break anything, because clients already need to deal with the
connection breaking at any point.

The way this new behavior looks (here origin has annex.pidlock = true so will
only accept one upload to it at a time):

git annex copy --to origin -J2
copy x (to origin...) ok
copy y (to origin...)
  Lost connection (fd:25: hGetChar: end of file)

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-06 14:52:32 -04:00
Joey Hess
983c9d5a53
git-annex-shell: fix transfer hang
Fix hang when transferring the same objects to two different clients at the
same time. (Or when annex.pidlock is used, two different objects to the
same or different clients.)

Could also potentially occur if a client was downloading an object and
somehow lost connection but that git-annex-shell was still running and
holding the transfer lock.

This does not guarantee that, if `transfer` fails for some other reason,
a DATA response will be made.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-06 13:00:37 -04:00
Joey Hess
3016e94dbb
releasing package git-annex version 7.20181105 2018-11-05 13:33:36 -04:00
Joey Hess
76a25fdcf0
Fix test suite failure when git-annex test is not run inside a git repository
Not the first time this kind of test suite breakage has happened..
It would be good to avoid somehow it looking up from .t and finding a git
repo. But just running the test suite from time to time outside of
git-annex would also let me notice these before the distribution packagers
do.

This commit was sponsored by mo on Patreon.
2018-11-05 13:31:49 -04:00
Joey Hess
abe4b7ebd6
importfeed: Avoid erroring out when a feed has been repeatedly broken
That can leave other imported files not checked into git, because the git
command queue is not flushed when git-annex errors out. And since it only
happens once git-annex has concluded a feed is broken, it's an intermittent
bug, worst kind. Been seeing it for a while, only tracked down today.

Instead, by returning False, git-annex importfeed will cleanly shutdown and
still exit nonzero.

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-11-04 17:41:49 -04:00
Joey Hess
0b053b9611
Fix a P2P protocol hang
When readContent got Nothing from prepSendAnnex, it did not run its
callback, and the callback is what sends the DATA reply.

sendContent checks with contentSize that the object file is present, but
that doesn't really guarantee that prepSendAnnex won't return Nothing.

So, it was possible for a P2P protocol GET to not receive a response,
and appear to hang. When what it's really doing is waiting for the next
protocol command.

This seems most likely to happen when the annex is in direct mode, and the
file being requested has been modified. It could also happen in an indirect
mode repository if genInodeCache somehow failed. Perhaps due to a race
with a drop of the content file.

Fixed by making readContent behave the way its spec said it should,
and run the callback with L.empty in this case.

Note that, it's finee for readContent to send any amount of data
to the callback, including L.empty. sendBytes deals with that
by making sure it sends exactly the specified number of bytes,
aborting the protocol if it's too short. So, when L.empty is sent,
the protocol will end up aborting.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-02 13:41:50 -04:00
Joey Hess
2ca408dc33
Increase minimum QuickCheck version. 2018-10-31 15:53:22 -04:00
Joey Hess
8f9278787f
releasing package git-annex version 7.20181031 2018-10-31 15:46:57 -04:00
Joey Hess
5ab0f48ffb
high-res mtimes
Cache high-resolution mtimes for improved detection of modified files in v7
(and direct mode).

Including on Windows.

With back-compat support so old low-res mtimes won't break anything, and
so the new information also won't break old versions of git-annex.
2018-10-30 00:41:26 -04:00
Joey Hess
4431b82bce
migrate: Fix failure to migrate from URL keys. (Reversion introduced in version 6.20180926) 2018-10-29 16:36:36 -04:00
Joey Hess
c472c268c4
webapp: Fixed a crash when adding a git remote.
Reversion introduced in 2b66492d6e which added a new cache that needs to be
cleared.
2018-10-29 16:01:08 -04:00
Joey Hess
a622488758
remove CHECKURL-MULTI single url response special case
Removed undocumented special case in handling of a CHECKURL-MULTI response
with only a single file listed. Rather than ignoring the url that was in
the response, use it. This allows external special remotes that want to
provide some better url to do so, although I don't entirely agree with
using CHECKURL-MULTI to accomplish that. I'm more of the feeling that an
undocumented special case that throws data away is just not a good idea.

This could in theory break some external special remote program that relied
on the current behavior, but its seems unlikely that it would because such
a program must already handle the multiple url case, unless it only ever
provides a single url response to CHECKURL-MULTI.

Make addurl --file work with a single item CHECKURL-MULTI response.
It already did for external special remotes due to the special case,
but now it also will for builtin ones like the BitTorrent special remote.

This commit was sponsored by Ilya Shlyakhter on Patron.
2018-10-29 14:52:12 -04:00
Joey Hess
3af29b3ba9
When annex.thin is set, allow hard links to be made between executable work tree files and annex objects.
This is safe, because while the annex object ends up executable,
there were already at least two other cases where it ended up executable:

1. git add an an executable file
2. chmod +x of a a non-executable worktree file that was hard linked to the
   annex object

After copy/hard link, it always fixes up the permissions to match the mode
of the worktree file, so when an executable annex object gets hard linked
to a non-executable worktree file, its execute bit gets removed.

Commit b7c8bf5274 already *said* it would do
this; I suspect the line of code I've removed was included in that commit
accidentially.

Also improves annex.thin documentation.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-10-26 13:51:43 -04:00
Joey Hess
9f87133bf5
snap --version= to auto-upgrade
This makes --version=6 still work, despite v6 not being in
supportedVersions. Which is useful for scripts that use it.

I didn't document it on the man page, because it's indistinguishable
from an automatic upgrade after initting as v6.
2018-10-26 11:44:05 -04:00
Joey Hess
d59995b9ee
default to v7 adjusted unlocked in crippled filesystem
init: When in a crippled filesystem, initialize a v7 repository using an
adjusted unlocked branch, instead of a direct mode repository.

Direct mode is deprecated, so this makes sense to do already I hope.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-10-25 18:49:57 -04:00
Joey Hess
b996b38b4f
fix autoupgrade from v6 to go to v7, not v5
v3 and v4 still autoupgrade to v5

And a few more upgrade doc updates.
2018-10-25 18:40:04 -04:00
Joey Hess
234842a347
v7
Install new git hooks in this version.

This does beg the question of what to do if git later gets eg a
post-smudge hook, that could run git-annex smudge --update. I think the
thing to do in that case would be to make git-annex smudge --update
install the new hooks. That way, as the user uses git-annex, the hook
would be created pretty quickly and without needing any extra syscalls
except for when git-annex smudge --update is called.

I considered doing something like that for installation of the
post-checkout and post-merge hooks, which would have avoided the need
for v7. But the only place it was cheap to do it would be in git-annex smudge
which could cheaply notice that smudge.log didn't exist yet and so know
the hooks needed to be installed. But since smudge used to populate pointer
files, it would be quite surprising if a single git checkout/merge failed
to update the work tree, and so that idea didn't work out.

The other reason for v7 is psychological -- users don't need to worry
about whether they might be running an old version of git-annex that
doesn't support their v7 repository very well. And bug reports about
"v6" have gotten a bit of a bad association in my head since they often
hit one of the known limitations and didn't realize it was experimental.

newtyped RepoVersion Int to avoid needing 2 comparisons in
versionSupportsUnlockedPointers etc. Also it's just nicer.

This commit was sponsored by John Pellman on Patreon.
2018-10-25 18:24:23 -04:00
Joey Hess
ca7de61454
git post-checkout and post-merge hooks
* init, upgrade: Install git post-checkout and post-merge hooks that run
  git annex smudge --update.
* precommit: Run git annex smudge --update, because the post-merge
  hook is not run when there is a merge conflict. So the work tree will
  be updated when a commit is made to resolve the merge conflict.
* precommit: Run git annex smudge --update, because the post-merge
  hook is not run when there is a merge conflict. So the work tree will
  be updated when a commit is made to resolve the merge conflict.
* Note that git has no hooks run after git stash or git cherry-pick,
  so the user will have to manually run git annex smudge --update
  after such commands.

Nothing currently installs the hooks into v6 repos that already exist.
Something will need to be done about that, either move this behavior to v7,
or document that the user will need to manually fix up their v6 repos.

This commit was sponsored by Eric Drechsel on Patreon.
2018-10-25 15:59:51 -04:00
Joey Hess
917a2c6095
defer updating unlocked files until after smudge filter
The smuge filter no longer provides git with annexed file content, to
avoid a git memory leak, and because that did not honor annex.thin.

git annex smudge --update has to be run after a checkout to update
unlocked files in the working tree with annexed file contents.

No hooks yet to run it.

This commit was sponsored by Nick Piper on Patreon.
2018-10-25 15:08:20 -04:00
Joey Hess
c24e255de1
Fix concurrency bug that occurred on the first download from an exporttree remote
Block other threads while the export database is being constructed (or
updated) by the first thread to try to access it.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-10-22 12:59:10 -04:00
Joey Hess
d0b0589146
link to tip 2018-10-20 14:25:03 -04:00
Joey Hess
4a6ebb1034
make sync update adjusted branch to hide/unhide
This completes initial support for --hide-missing, although the
assistant still needs to be updated and it perhaps needs to be sped up,
and maybe there needs to be a way for git-annex get to operate on
missing files. Opened some more todos for those things.

This commit was sponsored by Henrik Riomar.
2018-10-20 14:22:28 -04:00
Joey Hess
4a788fbb3b
sync --content now supports --hide-missing adjusted branches
This relies on git ls-files --with-tree, which I'm using in a way that
its man page does not document. Hm. I emailed the git list to try to get
the docs improved, but at least the git test suite does test the same
kind of use case I'm using here.

Performance impact when not in an adjusted branch is limited to some
additional MVar accesses, and a single git call to determine the name of
the current branch. So very minimal.

When in an adjusted branch, the performance impact is
in Annex.WorkTree.lookupFile, which starts doing an equal amount of work
for files that didn't exist as it already did for files that were
unlocked.

This commit was sponsored by Jochen Bartl on Patreon.
2018-10-19 17:51:25 -04:00
Joey Hess
24838547e2
adjust --hide-missing
* At long last there's a way to hide annexed files whose content
  is missing from the working tree: git-annex adjust --hide-missing
* When already in an adjusted branch, running git-annex adjust
  again will update the branch as needed. This is mostly
  useful with --hide-missing to hide/unhide files after their content
  has been dropped or received.

Still needs integration with sync and the assistant, and not as fast as it
could be, but already usable.

This commit was sponsored by Ethan Aubin.
2018-10-18 15:32:42 -04:00
Joey Hess
b2bafdb2fc
v6: Fix database inconsistency
That could cause git-annex to get confused about whether a locked file's
content was present, when the object file got touched.

Unfortunately this means more work sometimes when annex.thin is set,
since it has to checksum the file to tell if it's still got the right
content.

Had to suppress output when inAnnex calls isUnmodified, otherwise
"(checksum...)" would be printed in places it ought not to be,
eg "git annex get" could turn out not need to get anything, and
so only display that.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-10-16 13:51:37 -04:00
Joey Hess
42842ea0ea
runshell: Use system locales when built with GIT_ANNEX_PACKAGE_INSTALL set
This is to work around https://github.com/datalad/datalad/issues/2769
which I don't know how to reproduce outside that environment, nor do I
understand the root cause of. For some time, Neurodebian has been working
around it by building its standalone debs with a patch that disables use
of the locales bundled with the standalone build, letting the system
locales be used.

Using the system locales is asking for trouble if there's
significant version skew between the system and bundled glibc, and
possibly also if the architeciture is different, or whatever. That's why
git-annex bundles and uses its own locales, because numerous users
reported real problems with using the system locales.

... However, in the specific case of the Neurodebian standalone debs,
the deb is built on a system very like the one it's targeted to be
installed on. Or well, so they assure me, although doc/install/Ubuntu.mdwn
also promotes those for use across all versions of Ubuntu, and the deb
is built avoiding xz so it will work with old versions of dpkg, so I wonder
how true it is. It does seem that, at least currently, there is no bad
version skew in the locales of the systems the deb is used on, since
it's already been using the system locales for some time.

Anyway, since the Neurodebian build already is setting
GIT_ANNEX_PACKAGE_INSTALL=1 in runshell, I made runshell use system
locales when that's set. This is a small scope creep for
GIT_ANNEX_PACKAGE_INSTALL, but it's not documented and AFAIK only used
for the Neurodebian build, so that seems ok. This will let them stop
carrying their patch for this forward.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-10-13 15:04:10 -04:00
Joey Hess
d14983ee68
webapp: fix termux detection
The bundled uname -o says Linux in termux; have runshell on Android
delete it so the termux one is used instead.

This fixes the webapp so it will enter Android mode.

This commit was sponsored by mo on Patreon.
2018-10-13 12:08:27 -04:00
Joey Hess
38d691a10f
removed the old Android app
Running git-annex linux builds in termux seems to work well enough that the
only reason to keep the Android app would be to support Android 4-5, which
the old Android app supported, and which I don't know if the termux method
works on (although I see no reason why it would not).
According to [1], Android 4-5 remains on around 29% of devices, down from
51% one year ago.

[1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/

This is a rather large commit, but mostly very straightfoward removal of
android ifdefs and patches and associated cruft.

Also, removed support for building with very old ghc < 8.0.1, and with
yesod < 1.4.3, and without concurrent-output, which were only being used
by the cross build.

Some documentation specific to the Android app (screenshots etc) needs
to be updated still.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-10-13 01:41:11 -04:00
Joey Hess
426f0f3f4b
releasing package git-annex version 6.20181011 2018-10-11 13:50:53 -04:00
Joey Hess
0240775f32
adding arm64 build, and improved termux installation process
* Added arm64 Linux standalone build. (No autobuilder yet.)
* Improved termux installation process.

Added git-annex-install.sh script to avoid user needing to type as much in
termux. The scope of this script is limited; runshell handles the rest.

Runshell runs termux-fix-shebang on the shell scripts. The problem is
the bundled bin/sh script, deleting that script also works, but then the
others probably use the system Android /bin/sh, which could be old or
broken or not posix or whatever. Using termux sh to run the scripts is
better.

This commit was sponsored by Eric Drechsel on Patreon.
2018-10-11 13:32:00 -04:00
Joey Hess
a97ef366fa
Linux standalone: Avoid using bundled cp before envionment is fully set up.
On android arm64, I saw the cp fail with "Bad system call", because proot
has not run yet. runshell only recently started using cp, and it's bundled
with git-annex, so this fixes a reversion.

This commit was sponsored by Nick Piper on Patreon.
2018-10-10 16:02:25 -04:00
Joey Hess
6f0d8870df
Fix crash when exporttree is set to a bad value.
Made it impossible to recover from setting a bad value since enableremote
to change it would crash.

This commit was sponsored by Henrik Riomar on Patreon.
2018-10-10 10:44:54 -04:00
Joey Hess
def5d8b02c
Fix potential crash in exporttree database due to failure to honor uniqueness constraint
I don't know the circumstances, but have a report of this:

git-annex: failed to commit changes to sqlite database: Just SQLite3 returned
ErrorConstraint while attempting to perform step.

All 3 tables in the export db have uniqueness constraints on them,
insertUnique is used for all the rest, but this use of insertMany
means it doesn't check the constraint. I guess that's what caused the
crash, but I have not been able to test it yet.

Use putMany when available, as it should be faster than mapM of insertMany.

This commit was sponsored by Brock Spratlen on Patreon.
2018-10-09 16:56:33 -04:00
Joey Hess
91b799d1a6
export: Fix false positive in export conflict detection
It occurred when the same tree was exported by multiple clones. nub out
identical trees.

This commit was sponsored by Jochen Bartl on Patreon.
2018-10-09 15:54:12 -04:00
Joey Hess
451171b7c1
clean up url removal presence update
* rmurl: Fix a case where removing the last url left git-annex thinking
  content was still present in the web special remote.
* SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING
  used to update the presence information of the external special remote
  that called them; this was not documented behavior and is no longer done.

Done by making setUrlPresent and setUrlMissing only update presence info
for the web, and only when the url is a web url. See the comment for
reasoning about why that's the right thing to do.

In AddUrl, had to make it update location tracking, to handle the
non-web-url case.

This commit was sponsored by Ewen McNeill on Patreon.
2018-10-04 17:35:49 -04:00
Joey Hess
4b793fb077
Fix reversion in support of annex.web-options
Inverted logic added as part of the url security fix made it always use
curl when annex.security.allowed-http-addresses=all unless annex.web-options
was set.

That nobody noticed kind of makes me wonder if anyone uses
annex.web-options..

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-10-04 13:43:29 -04:00
Joey Hess
6ba3dea566
annex.jobs
Added annex.jobs setting, which is like using the -J option.

Of course, -J overrides annex.jobs.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-10-04 12:47:27 -04:00
Joey Hess
303d10cee6
Improve display when git config download from a http remote fails.
The error message displayed used to only come from curl/wget and perhaps
was clearer than the one displayed now that http-client is used. In any
case, it does make sense to hide it because git-annex prints its own
warning message.

This commit was sponsored by Jake Vosloo on Patreon.
2018-10-03 12:31:09 -04:00
Joey Hess
9adee3f2fb
sync: Warn when a remote's export is not updated to the current tree because export tracking is not configured.
Only display the warning when the current branch has a tree that is not
the same as the tree in the export.

Note that it doesn't check to see if the current tree is
in incompleteExportedTreeish; it might be worth checking that and reminding
the user about an incomplete export, but when export tracking is not
configured, they are probably not in the right clone of the repository to
resolve the incomplete export.

This commit was sponsored by Ethan Aubin.
2018-09-27 15:41:18 -04:00
Joey Hess
012d67c3eb
releasing package git-annex version 6.20180926 2018-09-26 13:16:54 -04:00
Joey Hess
bc31b93c77
remote.name.annex-security-allow-unverified-downloads
Added remote.name.annex-security-allow-unverified-downloads, a per-remote
setting for annex.security.allow-unverified-downloads.

This commit was sponsored by Brock Spratlen on Patreon.
2018-09-25 15:34:47 -04:00
Joey Hess
177e45517f
improve back-compat of post-receive hook
* init: Improve generated post-receive hook, so it won't fail when
  run on a system whose git-annex is too old to support git-annex post-receive
* init: Update the post-receive hook when re-run in an existing repository.

This commit was sponsored by Jack Hill on Patreon.
2018-09-25 15:02:12 -04:00
Joey Hess
16cbecbd09
Revert "clean P2P protocol shutdown on EOF"
This reverts commit b18fb1e343.

That broke support for old git-annex-shell before p2pstdio was added.

The immediate problem is that postAuth had a fallthrough case
that sent an error back to the peer, but sending an error back when the
connection is closed is surely not going to work.

But thinking about it some more, making every function that uses receiveMessage
need to handle ProtocolEOF adds a lot of complication, so I don't want
to do that.

The commit only cleaned up the test suite output a tiny bit, so I'm just
gonna revert it for now.
2018-09-25 14:04:12 -04:00
Joey Hess
4ecba916a1
annex.maxextensionlength
Added annex.maxextensionlength for use cases where extensions longer than 4
characters are needed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-09-24 12:10:18 -04:00
Joey Hess
cc82f81227
More FreeBSD build fixes.
Untested, on FreeBSD but enough to fix the listed build errors.

Seems that System.Posix.Files must have used to export this stuff and it
was split.

This commit was sponsored by Peter on Patreon.
2018-09-24 11:25:56 -04:00
Joey Hess
1d1054faa6
added -z
Added -z option to git-annex commands that use --batch, useful for
supporting filenames containing newlines.

It only controls input to --batch, the output will still be line delimited
unless --json or etc is used to get some other output. While git often
makes -z affect both input and output, I don't like trying them together,
and making it affect output would have been a significant complication,
and also git-annex output is generally not intended to be machine parsed,
unless using --json or a format option.

Commands that take pairs like "file key" still separate them with a space
in --batch mode. All such commands take care to support filenames with
spaces when parsing that, so there was no need to change it, and it would
have needed significant changes to the batch machinery to separate tose
with a null.

To make fromkey and registerurl support -z, I had to give them a --batch
option. The implicit batch mode they enter when not provided with input
parameters does not support -z as that would have complicated option
parsing. Seemed better to move these toward using the same --batch as
everything else, though the implicit batch mode can still be used.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-09-20 16:11:47 -04:00
Joey Hess
2aae6e84af
Support newlines in filenames.
Work around git cat-file --batch's protocol not supporting newlines by
running git cat-file not batched and passing the filename as a
parameter.

Of course this is quite a lot less efficient, especially because it
currently runs it multiple times to query for different pieces of
information.

Also, it has subtly different behavior when the batch process was
started and then some changes were made, in which case the batch process
sees the old index but this workaround sees the current index. Since
that batch behavior is mostly a problem that affects the assistant and has
to be worked around in it, I think I can get away with this difference.

I don't know of any other problems with newlines in filenames, everything
else in git I can think of supports -z. And git-annex's json output
supports newlines in filenames so downstream parsers from git-annex will be ok.
git-annex commands that use --batch themselves don't support newlines
in input filenames; using --json --batch is currently a way around that
problem.

This commit was sponsored by Ewen McNeill on Patreon.
2018-09-20 13:45:44 -04:00
Yaroslav Halchenko
672973149f
BF: add netbase to Depends:
see https://github.com/nipy/heudiconv/issues/260 for more
context, but it seems to be required on a lean docker instances
for git annex to be usable
2018-09-19 15:28:59 -04:00
Joey Hess
b3c9c59d3d
--debug urls
When git-annex used wget and curl, --debug would show urls. So there can't
be any new security problem with doing so.

This commit was sponsored by John Pellman on Patreon.
2018-09-14 12:46:39 -04:00
Joey Hess
773084c49b
S3: Fix url construction bug
When the publicurl has been set to an url that does not end with a slash,
we need to add one in between it and the rest of the url.

As far as I can see, git-annex does not default to such publicurls; it's
careful to end them with slashes. But this was observed in the wild, and
there may be documentation that doesn't include the slash. And it's an easy
mistake to make in any case.

This commit was sponsored by Eric Drechsel on Patreon.
2018-09-14 12:25:23 -04:00
Joey Hess
547d01fd0e
releasing package git-annex version 6.20180913 2018-09-13 15:50:50 -04:00
Joey Hess
677038199c
fix build with older aws
S3: Multipart uploads are now only supported when git-annex is built
with aws-0.16.0 or later, as earlier versions of the library don't
support versioning with multipart uploads.

This will affect the android build, and debian stable also has a too old
aws to support both features at the same time.

This commit was sponsored by Nick Piper on Patreon.
2018-09-13 09:58:39 -04:00
Joey Hess
2743224658
change v6 git-annex add of staged unmodified unlocked file
v6: When a file is unlocked but has not been modified, and the unlocking is
only staged, git-annex add did not lock it. Now it will, for consistency
with how modified files are handled and with v5.

Note the removal of the sameInodeCache check. Otherwise it would see
that the unmodified file is unmodified and stop there. That check seems to have
been copied from the direct mode branch. But, direct mode had a specific
reason to check for unmodified content, that does not apply to v6.

The second pass means there is potential for a race, eg the unlocked
file could be modified in between the first and second passes.
No problem with that, since both passes do the same thing.

This commit was sponsored by Jake Vosloo on Patreon.
2018-09-12 14:00:05 -04:00
Joey Hess
942b466293
wording 2018-09-11 16:03:58 -04:00
Joey Hess
fdbdf64d87
fix reversions due to undocumented and buggy git behavior
* Don't use GIT_PREFIX when GIT_WORK_TREE=. because it seems git
  does not intend GIT_WORK_TREE to be relative to GIT_PREFIX in that
  case, despite GIT_WORK_TREE=.. being relative to GIT_PREFIX.
* Don't use GIT_PREFIX to fix up a relative GIT_DIR, because
  git 2.11 sets GIT_PREFIX set to a path it's not relative to.
  and apparently GIT_DIR is never relative to GIT_PREFIX.

Commit e50ed4ba48 led us down this path
by working around a git bug by relying on the barely documented GIT_PREFIX.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-09-11 15:54:21 -04:00
Joey Hess
7407a80c27
S3: Support AWS_SESSION_TOKEN
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-09-05 15:53:57 -04:00
Joey Hess
b600ad71ce
make linkToAnnex freezeContent the object file
v6: Fix annex object file permissions when git-annex add is run on a
modified unlocked file, and in some related cases.

If a hard link is made, don't freeze it; annex.thin
uses writable object files.

Also: For some reason, linkToAnnex used to thawContent src. I can see no
reason why it needed to do that, so I eliminated that.

This commit was sponsored by Brock Spratlen on Patreon.
2018-09-05 15:27:22 -04:00
Joey Hess
69907e397f
revert a few problem areas of git-annex.cabal patch 2018-09-05 11:47:00 -04:00
Joey Hess
69d4c8dce6
devblog 2018-08-30 15:52:44 -04:00
Joey Hess
f54c72d2e1
Fix build on FreeBSD
This must have been broken for years..

This commit was sponsored by Jack Hill on Patreon.
2018-08-29 12:09:03 -04:00
Joey Hess
c565340adc
stop using external hash programs, since cryptonite is faster
In 2013, I wrote "Cryptohash benchmarks 90 to 101% faster than external
hashers". Re-benchmarking today, I found cryptonite's sha256 consistently
outperformed coreutils by 10% for large files. Tested 10 mb, 100 mb, 1 gb
files with both sha256 and sha512. And for smaller files, the external
process startup time swamps the hash time.

Perhaps cryptonite has improved. Or it could just do better on my
current CPU Intel(R) Pentium(R) CPU 4410Y @ 1.50GHz). Anyway, even if cryptonite
is slower in some situations, seems likely it would only be marginally slower;
it's got the same class of highly optimised C code under the hood as coreutils.
The main difference between the two sha256 implementations seems to be
how much of the inner loop they unroll..

This commit was sponsored by Henrik Riomar on Patreon.
2018-08-28 18:10:58 -04:00
Joey Hess
759a87ad70
fix git command queue to be concurrency safe
Probably not noticed until now because the queue is large enough that two
threads each filling theirs at the same time and flushing is unlikely to
happen.

Also made explicit that each worker thread gets its own queue.
I think that was the case before, but if something was put in the queue
before worker threads were forked off, they could have each inherited the
same queue.

Could have gone with a single shared queue, but per-worker queues is more
efficient, because a worker can add lots of stuff to its own queue without
any locking.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-08-28 13:16:33 -04:00
Joey Hess
10138056dc
v6: avoid accidental conversion when annex.largefiles is not configured
v6: When annex.largefiles is not configured for a file, running git add or
git commit, or otherwise using git to stage a file will add it to the annex
if the file was in the annex before, and to git otherwise. This is to avoid
accidental conversion.

Note that git-annex add's behavior has not changed, for reasons explained
in the added comment.

Performance: No added overhead when annex.largefiles is configured.
When not configured, there is an added call to catObjectMetaData,
which involves a round trip through git cat-file --batch.
However, the earlier catKeyFile primes the cache for it.

This commit was supported by the NSF-funded DataLad project.
2018-08-27 14:51:10 -04:00
Joey Hess
50fa17aee6
v6: recover from race between git mv and git-annex get/drop
Update pointer file next time reconcileStaged is run to recover from the
race.

Note that restagePointerFile causes git to run the clean filter,
and that will run reconcileStaged. So, normally by the time the git
annex get/drop command finishes, the race has already been dealt with.
It may be that, in some case, that won't happen and the race will be
dealt with at a later point. git-annex could run reconcileStaged at
shutdown if that becomes a problem.

This does not handle the situation where the git mv is committed before
git-annex gets a chance to run again. git commit does run the clean
filter, and that happens to re-inject the content if it was supposed to
be dropped but is still populated. But, the case where the file was
supposed to be gotten but is not populated is not handled yet.

This commit was supported by the NSF-funded DataLad project.
2018-08-22 15:56:43 -04:00
Joey Hess
5e56d9b620
v6: Update associated files database when git has staged changes to pointer files
This commit was supported by the NSF-funded DataLad project.
2018-08-21 17:02:20 -04:00
Joey Hess
fa44bca8b3
linux standalone: When LOCPATH is already set, use it instead of the bundled locales.
It can be set to an empty string to use the system locales too. Of course
whether that will work depends on the amount of divergence.

This commit was supported by the NSF-funded DataLad project.
2018-08-20 12:20:54 -04:00
Joey Hess
48e9e12961
finally fixed v6 get/drop git status
After updating the worktree for an add/drop, update git's index, so git
status will not show the files as modified.

What actually happens is that the index update removes the inode
information from the index. The next git status (or similar) run
then has to do some work. It runs the clean filter.

So, this depends on the clean filter being reasonably fast and on git
not leaking memory when running it. Both problems were fixed in
a96972015d, but only for git 2.5. Anyone
using an older git will see very expensive git status after an add/drop.

This uses the same git update-index queue as other parts of git-annex, so
the actual index update is fairly efficient. Of course, updating the index
does still have some overhead. The annex.queuesize config will control how
often the index gets updated when working on a lot of files.

This is an imperfect workaround... Added several todos about new
problems this workaround causes. Still, this seems a lot better than the
old behavior.

This commit was supported by the NSF-funded DataLad project.
2018-08-14 16:23:58 -04:00
Joey Hess
a96972015d
massive v6 add speed/memory improvement
v6 add: Take advantage of improved SIGPIPE handler in git 2.5 to speed up
the clean filter by not reading the file content from the pipe. This also
avoids git buffering the whole file content in memory.

When built with an older git, still consumes stdin. If built with a newer
git and used with an older one, it breaks, but that's acceptable --
checking the git version every time would make repeated smudge runs slow.

This commit was supported by the NSF-funded DataLad project.
2018-08-09 18:17:46 -04:00
Joey Hess
12460fcea6
make --batch honor matching options
When --batch is used with matching options like --in, --metadata, etc, only
operate on the provided files when they match those options. Otherwise, a
blank line is output in the batch protocol.

Affected commands: find, add, whereis, drop, copy, move, get

In the case of find, the documentation for --batch already said it honored
the matching options. The docs for the rest didn't, but it makes sense to
have them honor them. While this is a behavior change, why specify the
matching options with --batch if you didn't want them to apply?

Note that the batch output for all of the affected commands could
already output a blank line in other cases, so batch users should
already be prepared to deal with it.

git-annex metadata didn't seem worth making support the matching options,
since all it does is output metadata or set metadata, the use cases for
using it in combination with the martching options seem small. Made it
refuse to run when they're combined, leaving open the possibility for later
support if a use case develops.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-08-08 12:07:06 -04:00
Joey Hess
947599aad4
releasing package git-annex version 6.20180807 2018-08-07 16:22:16 -04:00
Joey Hess
2503cd63d0
prep release 2018-08-06 20:30:38 -04:00
Joey Hess
4c918437ab
Fix git-annex branch data loss that could occur after git-annex forget --drop-dead
Added getStaged, to get the versions of git-annex branch files staged in its
index, and use during transitions so the result of merging sibling branches
is used.

The catFileStop in performTransitionsLocked is absolutely necessary,
without that the bug still occurred, because git cat-file was already
running and was looking at the old index file.

Note that getLocal still has cat-file look at the git-annex branch, not the
index. It might be faster if it looked at the index, but probably only
marginally so, and I've not benchmarked it to see if it's faster at all. I
didn't want to change unrelated behavior as part of this bug fix. And as
the need for catFileStop shows, using the index file has added
complications.

Anyway, it still seems fine for getLocal to look at the git-annex branch,
because normally the index file is updated just before the git-annex branch
is committed, and so they'll contain the same information. It's only during
a transition that the two diverge.

This commit was sponsored by Paul Walmsley in honor of Mark Phillips.
2018-08-06 17:36:30 -04:00
Joey Hess
38ddd6072d
addurl: Include filename in --json-progress output when known. 2018-08-06 12:53:44 -04:00
Joey Hess
1a02fc1159
Fix wrong sorting of remotes when using -J
It was sorting by uuid, rather than cost!

Avoid future bugs of this kind by changing the Ord to primarily compare
by cost, with uuid only used when the cost is the same.

This commit was supported by the NSF-funded DataLad project.
2018-08-03 13:10:50 -04:00
Joey Hess
ae11394efa
added annex.commitmessage
Added annex.commitmessage config that can specify a commit message for the
git-annex branch instead of the usual "update".

This commit was supported by the NSF-funded DataLad project.
2018-08-02 14:06:06 -04:00
Joey Hess
6e6c9cc6d3
Added --accessedwithin matching option.
Useful for dropping old objects from cache repositories.

But also, quite a genrally useful thing to have..

Rather than imitiating find's -atime and other options, all of which are
pretty horrible to use, I made this match files accessed within a time
period, using the same duration format used by git-annex schedule and
--limit-time

In passing, changed the --limit-time option parser to parse the
duration, instead of having it later throw an error.

This commit was supported by the NSF-funded DataLad project.
2018-08-01 15:34:03 -04:00
Joey Hess
fd5a392006
cache remotes via annex-speculate-present
Added remote.name.annex-speculate-present config that can be used to
make cache remotes.

Implemented it in Remote.keyPossibilities, which is used by the
get/move/copy/mirror commands, and nothing else. This way, things like
whereis will not show content that's speculatively present.

The assistant and sync --content were not using Remote.keyPossibilities,
and were changed to use it.

The efficiency hit should be small; Remote.keyPossibilities is only
used before transferring a file, which is the expensive operation.
And, it's only doing one lookup of the remoteList and a very cheap
filter over it.

Note that, git-annex still updates the location log when copying content
to a remote with annex-speculate-present set. In this case, the location
tracking will indicate that content is present in the remote. This may
not be wanted for caches, or may not be a real problem for them. TBD.

This commit was supported by the NSF-funded DataLad project.
2018-08-01 14:28:05 -04:00
Joey Hess
2884637cab
S3: Support credential-less download from remotes configured with public=yes exporttree=yes.
This commit was supported by the NSF-funded DataLad project.
2018-07-31 16:32:43 -04:00
Joey Hess
e1ab01f94d
Fix reversion in display of http 404 errors.
Switch to using http-client for large file downloads caused the reversion;
the code for displaying a 404 response was instead displaying the raw html
document, which is not useful.

This commit was sponsored by Ryan Newton on Patreon.
2018-07-31 12:15:26 -04:00
Joey Hess
e8ff5d8c66
releasing package git-annex version 6.20180719 2018-07-19 13:53:59 -04:00
Joey Hess
d986b57134
reorder 2018-07-18 14:48:06 -04:00
Joey Hess
22ff136230
prep for release tomorrow 2018-07-18 14:45:44 -04:00
Joey Hess
081f8e57c6
Support working trees set up by git-worktree.
Support working trees set up by git-worktree, by setting up some symlinks
such that git-annex links work right.

Also improved support for repositories created with --separate-git-dir.
At least recent git makes a .git file for those (older may have used a
symlink?), so that also needs to be converted to a symlink.

This commit was sponsored by Nick Piper on Patreon.
2018-07-18 14:27:26 -04:00
Joey Hess
e50ed4ba48
work around git bug
Work around git bug that runs smudge/clean filters at the top of the
repository while passing them a relative GIT_WORK_TREE that may point
outside of the repository, by using GIT_PREFIX to get back to the
subdirectory where a relative GIT_WORK_TREE is valid.

git devs have been informed of the bug and may fix it, which could conveivably
break this fix, but as it is, this works back to git 1.7.6.

This commit was sponsored by Jochen Bartl on Patreon.
2018-07-17 14:27:39 -04:00
Joey Hess
50609da787
fix User-Agent reversion
Send User-Agent and any configured annex.http-headers when downloading with
http, fixes reversion introduced when switching to http-client.

This commit was sponsored by mo on Patreon.
2018-07-16 11:56:47 -04:00
Joey Hess
cc2cb46857
unused --from: Allow specifiying a repository by uuid or description.
This commit was sponsored by Jake Vosloo on Patreon.
2018-07-11 16:01:35 -04:00
Joey Hess
25ec8ec4c6
update re writable HOME with standalone bundle 2018-07-10 14:22:37 -04:00
Joey Hess
e802323071
deal with the persistent locpath issue
linux standalone: Generate locale files in ~/.cache/git-annex/locales/ so
they're available even when the standalone tarball is installed in a
directory owned by root.

This avoids a full-on reference counting cleanup hell, by letting old
locale caches linger as long as the standalone bundle directory associated
with them is still around. Old ones get cleaned up.

In the case where the directory has a new bundle unpacked over top of it,
the old locale cache is invalidated and rebuilt. Of course, running
programs using that may get confused, but this was already the case, and
unpacking over top of a bundle is probably not a good idea anyhow.

To support that, added a buildid file, which only needs to be unique across
builds of git-annex with different libc versions. sha1sum of git-annex
seems good enough for that.

Removed debian/patches/standalone-no-LOCPATH as it's no longer
necessary.

This commit was supported by the NSF-funded DataLad project.
2018-07-10 12:13:19 -04:00
Joey Hess
3dd7f450c1
fix p2p --pair
p2p --pair: Fix interception of the magic-wormhole pairing code, which
since 0.8.2 it has sent to stderr rather than stdout.

This is highly annoying because I had asked the magic wormhole developers
for a machine-readable way to get the data, and instead they changed how
the data was output, and didn't even mention this in my issue, or in the
changelog.

Seems this needs to be tested periodically to make sure it's still working.

This commit was sponsored by Ethan Aubin.
2018-07-04 15:14:03 -04:00
Joey Hess
9f3a346f25
fix nested exception bug
Fix reversion introduced in version 6.20180316 that caused git-annex to
stop processing files when unable to contact a ssh remote.

The bug was not in any of the changed lines, but this one in inAnnex:

P2PHelper.checkpresent (Ssh.runProto rmt connpool (cantCheck rmt) fallback) key

cantCheck throws an exception, but that parameter to runProto expects a
value, which it returns. So, inAnnex is returning a Bool containing an
exception. This defeats the usual checks for checkPresent throwing an
exception, crashing git-annex.

Fixed by making runProto take an `Annex a` instead of an `a`, so
passing cantCheck to it doesn't nest exceptions.

This commit was sponsored by andrea rota.
2018-07-03 13:10:43 -04:00
Joey Hess
14557a3ff6
git-annex.cabal: Fix network version.
Needed for hostAddressToTuple.

Which means the build flag for the network-uri split is no longer needed.
2018-07-01 13:07:24 -04:00
Joey Hess
a63bbd868b
make addurl of media url fail when youtube-dl is disabled
addurl: When security configuration prevents downloads with youtube-dl,
still check if the url is one that it supports, and fail downloading it,
instead of downloading the raw web page.
2018-06-28 13:01:18 -04:00
Joey Hess
dc6cb6aa5f
Merge branch 'later' 2018-06-25 21:59:20 -04:00
Joey Hess
3160cadba3 git-annex version 6.20180626
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKKUAw1IH6rcvbA8l2xLbD/BfjzgFAlstCaQACgkQ2xLbD/Bf
 jzh5nxAAn7D9soTI0ex6AVDDo2CjOyTTDVrIcl2h5XizfuUD3ev5P0TR3BZmzpAb
 MI6uaZ8kxqZ/eGAsBTyH9PsV7QVYIdht9t89ytP4xWyTQiOgjyJeA6PnJl4zVK9z
 Y8Of3mlylaz+97+sndljpsvy/KHENrHI7HHd+qxAu7wKysJxG6fJB7CjremkjaCI
 zAwg3mIy72ZKyuR/8hL9puJN9fdfw1ulkzQR+he007e/HkurPCwgRAOYW/Aa2tpY
 Oigdb9a6/0nl/VnOS8ZyHrSPRrhLH9c4IBmsdC1Xt5NDVmID/sWgD9uPF9dsHSMF
 OM25QdSlJ5cSNg+/XCpmmhC9MjgKkuVNpZ/fWBaHFs6KYgGhtZcAayQdz5AmMS2N
 HTPWB1IxZiV5TQHQpLbdH/q3RfNtRq1G1tc24zpd/zdhzijeTM6D8n4No6LXNq8X
 7U0qcrp9TdLOpBCTf6Jrg/7qFaXddHoEW1e3KrsOmB0hlYHuNxfY4bs0+ROeXGOT
 00koezcbF8kEI0ekoDvJjtVqaUq+608YjJZ5v7dE0vbtTj0KGbl5EHwC9atUluCX
 MHyTDY89uq68g4HIDytL001ZLvE3EUGJc4jh3+OMDzuZSKB5uwJIIky+qIaQu34K
 QJrZuyAIY0sVFV6LUX9nwqTW6Nnx/bB+kZ6k0+gx+Lpf7pUpE+o=
 =kex4
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKKUAw1IH6rcvbA8l2xLbD/BfjzgFAlsxnX4ACgkQ2xLbD/Bf
 jzjK1xAAnJ58ZxLyTYlCZRcKiR81UHS/Mk6+SDAjRIRbT0SsY+6gSP55XKjrcuOb
 Jatp+6cNNSgk2lBpn37mq+rYIqboFh9moDRK7JSh1mDHCVtIwdARGblFRfuwaWPi
 xHnu+Pj43+SP7OF+8qP8/kDM+js3iMS+0gvBBz8pQN/yJDROXii6u0eONOd7vbER
 iRY9QpJdj5lp3hjaWfXt5iJC0re0eOAY4eUSHPsFIASysShnn33dFPOZ2hbhRKjR
 unQHUVIUE+ehmW3w9qIqn+9v2kca7laGK11cvzYRpmu/9rrvpf+RF1h42S8822dP
 CKHvxDkBGbyqTA+F9/6zpU1i9/ARgHFDpScRcdq7ZJi9FbWabKDklHCsgxwrkdXb
 +FXgb7N5Sa4+eVDNUf4rxldtLPX53nrtZ3IqrGiCWApCvbysNyP5kE0nix02l9z2
 xzY2vlpicx7TOMoO9mZesSFNgRzuFAbbya/zDJrz+xfgSRYXRYg58yTpmhpTFvSI
 h3Fw6+MYvehvRdAweLtoQt2p/UV2MAWrTpNzFoqgf2OCQOiH97ACDHn8Yki9rnQi
 NuMsqv9WOYQs4SaygDZMKemgAxftf3uaXiBW0RzHHwwWnDjHhqsEioOvOhNNyZbz
 U3OjKrH1JZlkNHlIBQD4BsWGLlIct66ZTU3k2OxPEp+mpEG/Xi4=
 =p+cW
 -----END PGP SIGNATURE-----

Merge tag '6.20180626' - previously embargoed security release
2018-06-25 21:56:43 -04:00
Joey Hess
6091b7b9db
info: Display uuid and description when a repository is identified by uuid, and for "here". 2018-06-24 17:38:18 -04:00
Joey Hess
a5228ac765
Support configuring remote.web.annex-cost and remote.bittorrent.annex-cost
Seems that has never worked before due to oversight.
2018-06-24 17:31:22 -04:00
Joey Hess
57dc30a029
finalize release 2018-06-22 10:37:01 -04:00
Joey Hess
dab55715da
add link to advistory 2018-06-22 10:27:22 -04:00
Joey Hess
787e46a44b
note that glacier was also limited 2018-06-21 16:40:31 -04:00
Joey Hess
a5460132a6
update version 2018-06-21 14:56:04 -04:00
Joey Hess
b657242f5d
enforce retrievalSecurityPolicy
Leveraged the existing verification code by making it also check the
retrievalSecurityPolicy.

Also, prevented getViaTmp from running the download action at all when the
retrievalSecurityPolicy is going to prevent verifying and so storing it.

Added annex.security.allow-unverified-downloads. A per-remote version
would be nice to have too, but would need more plumbing, so KISS.
(Bill the Cat reference not too over the top I hope. The point is to
make this something the user reads the documentation for before using.)

A few calls to verifyKeyContent and getViaTmp, that don't
involve downloads from remotes, have RetrievalAllKeysSecure hard-coded.
It was also hard-coded for P2P.Annex and Command.RecvKey,
to match the values of the corresponding remotes.

A few things use retrieveKeyFile/retrieveKeyFileCheap without going
through getViaTmp.
* Command.Fsck when downloading content from a remote to verify it.
  That content does not get into the annex, so this is ok.
* Command.AddUrl when using a remote to download an url; this is new
  content being added, so this is ok.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-06-21 13:37:01 -04:00
Joey Hess
f34faad9aa
finalize changelog for release 2018-06-19 11:41:50 -04:00
Joey Hess
c81b879d39
got a CVE number 2018-06-18 17:56:18 -04:00
Joey Hess
3c0a538335
allow ftp urls by default
They're no worse than http certianly. And, the backport of these
security fixes has to deal with wget, which supports http https and ftp
and has no way to turn off individual schemes, so this will make that
easier.
2018-06-18 15:37:17 -04:00
Joey Hess
cc08135e65
prevent using local http proxies per annex.security.allowed-http-addresses
A local http proxy would bypass the security configuration. So,
the security configuration has to be applied when choosing whether to
use the proxy.

While http rebinding attacks against the dns lookup of the proxy IP
address seem very unlikely, this implementation does prevent them, since
it resolves the IP address once, checks it, and then reconfigures
http-client's proxy using the resolved address.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-06-18 13:32:20 -04:00
Joey Hess
e62c4543c3
default to not using youtube-dl, for security
Pity, but same reasoning as curl applies to it.

This commit was sponsored by Peter on Patreon.
2018-06-17 14:51:02 -04:00
Joey Hess
b54b2cdc0e
prevent http connections to localhost and private ips by default
Security fix!

* git-annex will refuse to download content from http servers on
  localhost, or any private IP addresses, to prevent accidental
  exposure of internal data. This can be overridden with the
  annex.security.allowed-http-addresses setting.
* Since curl's interface does not have a way to prevent it from accessing
  localhost or private IP addresses, curl defaults to not being used
  for url downloads, even if annex.web-options enabled it before.
  Only when annex.security.allowed-http-addresses=all will curl be used.

Since S3 and WebDav use the Manager, the same policies apply to them too.

youtube-dl is not handled yet, and a http proxy configuration can bypass
these checks too. Those cases are still TBD.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-06-17 13:30:28 -04:00
Joey Hess
28720c795f
limit url downloads to whitelisted schemes
Security fix! Allowing any schemes, particularly file: and
possibly others like scp: allowed file exfiltration by anyone who had
write access to the git repository, since they could add an annexed file
using such an url, or using an url that redirected to such an url,
and wait for the victim to get it into their repository and send them a copy.

* Added annex.security.allowed-url-schemes setting, which defaults
  to only allowing http and https URLs. Note especially that file:/
  is no longer enabled by default.

* Removed annex.web-download-command, since its interface does not allow
  supporting annex.security.allowed-url-schemes across redirects.
  If you used this setting, you may want to instead use annex.web-options
  to pass options to curl.

With annex.web-download-command removed, nearly all url accesses in
git-annex are made via Utility.Url via http-client or curl. http-client
only supports http and https, so no problem there.
(Disabling one and not the other is not implemented.)

Used curl --proto to limit the allowed url schemes.

Note that this will cause git annex fsck --from web to mark files using
a disallowed url scheme as not being present in the web. That seems
acceptable; fsck --from web also does that when a web server is not available.

youtube-dl already disabled file: itself (probably for similar
reasons). The scheme check was also added to youtube-dl urls for
completeness, although that check won't catch any redirects it might
follow. But youtube-dl goes off and does its own thing with other
protocols anyway, so that's fine.

Special remotes that support other domain-specific url schemes are not
affected by this change. In the bittorrent remote, aria2c can still
download magnet: links. The download of the .torrent file is
otherwise now limited by annex.security.allowed-url-schemes.

This does not address any external special remotes that might download
an url themselves. Current thinking is all external special remotes will
need to be audited for this problem, although many of them will use
http libraries that only support http and not curl's menagarie.

The related problem of accessing private localhost and LAN urls is not
addressed by this commit.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-16 11:57:50 -04:00
Joey Hess
3f0d875b55
Include uname in standalone builds. 2018-06-16 10:02:05 -04:00
Joey Hess
b6e4ed9aa7
export: re-send lost exported files after fsck notices they're gone
When content has been lost from an export remote and  git-annex fsck --from
remote has noticed it's gone, re-running git-annex export or git-annex sync
--content will re-upload it.

Note that normally there's no way to remove a single file from an export.
doc/design/exporting_trees_to_special_remotes.mdwn talks about this
in the section "dropping from exports and copying to exports". But, if
a file is somehow deleted or corrupted on the export, and fsck notices
this, it will update the location log to say it's missing.

So, checking the location log when determining if a file needs to be sent
to the export will let such missing files be added back in. There's
otherwise no way to do so. It does not fall afoul of the races documented
in the abovementioned section, I think.

This commit was sponsored by Ryan Newton on Patreon.
2018-06-14 12:22:12 -04:00
Joey Hess
760f66829a
display p2pstdio stderr after auth
Display error messages that come from git-annex-shell when the p2p protocol
is used, so that diskreserve messages, IO errors, etc from the remote side
are visible again.

Felt like it should perhaps use outputError, so --json-error-messages would
include these, but as an async IO action, it can't, and this would need
MessageState to be converted to a tvar. Anyway, when not using p2pstdio,
that's not done; nor is it done for stderr from external special remotes
or other commands, so punted on the idea for now.

This commit was sponsored by mo on Patreon.
2018-06-12 14:59:05 -04:00
Joey Hess
90a3afb60f
adb: Android serial numbers are not all 16 characters long, so accept other lengths.
I can't find any documentation of how long it should be. Hard to imagine
it being shorter than 4 characters though, so put that in as a conservative
lower bound.

This commit was sponsored by Nick Piper on Patreon.
2018-06-12 13:56:01 -04:00
Joey Hess
c3c28f7617
add GETINFO to external protocol (for ronnypfa)
External special remotes can now add info to `git annex info $remote`, by
replying to the GETINFO message.

Had to generalize some helpers to allow consuming multiple messages from
the remote.

The code added to Remote/* here is AGPL licensed, thus changed the license
of the files.

This commit was sponsored by Jake Vosloo on Patreon.
2018-06-08 11:56:24 -04:00
Joey Hess
0f566ed242
removal of the rest of remoteGitConfig
In keyUrls, the GitConfig is used only by annexLocations
to support configured Differences. Since such configurations affect all
clones of a repository, the local repo's GitConfig must have the same
information as the remote's GitConfig would have. So, used getGitConfig
to get the local GitConfig, which is cached and so available cheaply.

That actually fixed a bug noone had ever noticed: keyUrls is
used for remotes accessed over http. The full git config of such a
remote is normally not available, so the remoteGitConfig that keyUrls
used would not have the necessary information in it.

In copyFromRemoteCheap', it uses gitAnnexLocation,
which does need the GitConfig of the remote repo itself in order to
check if it's crippled, supports symlinks, etc. So, made the
State include that GitConfig, cached. The use of gitAnnexLocation is
within a (not $ Git.repoIsUrl repo) guard, so it's local, and so
its git config will always be read and available.

(Note that gitAnnexLocation in turn calls annexLocations, so the
Differences config it uses in this case comes from the remote repo's
GitConfig and not from the local repo's GitConfig. As explained above
this is ok since they must have the same value.)

Not very happy with this mess of different GitConfigs not type-safe and
some read only sometimes etc. Very hairy. Think I got it this change
right. Test suite passes..

This commit was sponsored by Ethan Aubin.
2018-06-05 14:48:37 -04:00
Joey Hess
fc5888300f
fix annex-checkuuid
Fixed annex-checkuuid implementation, so that remotes configured that way
can be used. This was 100% broken from the first commit of it, oops.

This commit was sponsored by Øyvind Andersen Holm.
2018-06-04 16:52:22 -04:00
Joey Hess
2e6a6024c2
avoid unncessary version output differences in different contexts
Show operating system and repository version list when run outside
a git repo too.

Also made it only display the local repository version when in a git-annex
repo. Before it showed "unknown" when run in a git repo that was not
git-annex initialized. That seemed like confusing behavior.

This commit was sponsored by Jochen Bartl on Patreon.
2018-06-04 12:26:18 -04:00
Joey Hess
1c8ee99b46
Fix build with ghc 8.4+, which broke due to the Semigroup Monoid change
https://prime.haskell.org/wiki/Libraries/Proposals/SemigroupMonoid

I am not happy with the fragile pile of CPP boilerplate required to support
ghc back to 7.0, which git-annex still targets for both the android build
and the standalone build targeting old linux kernels. It makes me unlikely
to want to use Semigroup more in git-annex, because the benefit of the
abstraction is swamped by the ugliness. I actually considered ripping out
all the Semigroup instances, but some are needed to use
optparse-applicative.

The problem, I think, is they made this transaction on too fast a timeline.
(Although ironically, work on it started in 2015 or earlier!)
In particular, Debian oldstable is not out of security support, and it's
not possible to follow the simpler workarounds documented on the wiki and
have it build on oldstable (because the semigroups package in it is too
old).

I have only tested this build with ghc 8.2.2, not the newer and older
versions that branches of the CPP support. So there could be typoes, we'll
see.

This commit was sponsored by Brock Spratlen on Patreon.
2018-05-30 12:28:43 -04:00
Joey Hess
33834140e6
releasing package git-annex version 6.20180529 2018-05-29 13:06:56 -04:00
Joey Hess
c3064edac9
setpresentkey: Added --batch support (for ronnypfa)
This commit was sponsored by Peter on Patreon.
2018-05-27 14:56:14 -04:00
Joey Hess
85f9360d9b
GIT_ANNEX_SHELL_APPENDONLY
Makes it allow writes, but not deletion of annexed content. Note that
securing pushes to the git repository is left up to the user.

This commit was sponsored by Jack Hill on Patreon.
2018-05-25 13:17:56 -04:00
Joey Hess
4b748970ad
reorder 2018-05-25 12:10:49 -04:00
Joey Hess
2da2ae0919
fix migration bug and make fsck warn
* migrate: Fix bug in migration between eg SHA256 and SHA256E,
  that caused the extension to be included in SHA256 keys,
  and omitted from SHA256E keys.
  (Bug introduced in version 6.20170214)
* migrate: Check for above bug when migrating from SHA256 to SHA256
  (and same for SHA1 to SHA1 etc), and remove the extension that should
  not be in the SHA256 key.
* fsck: Detect and warn when keys need an upgrade, either to fix up
  from the above migrate bug, or to add missing size information
  (a long ago transition), or because of a few other past key related
  bugs.

This commit was sponsored by Henrik Riomar on Patreon.
2018-05-23 14:07:51 -04:00
Joey Hess
caaedb2993
fix http-client gzip decompression bug
Prevent haskell http-client from decompressing gzip files, so downloads of
such files works the same as it used to with wget and curl.

Explicitly setting accept-encoding to "identity" is probably not needed,
but that's what wget sends (curl does not send the header), and since
http-client is trying to be excessively smart, it seems we need to set
hAcceptEncoding to something to prevent it from inserting its own,
and this seems better than some hack like "".

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-05-21 15:10:25 -04:00
Joey Hess
2fabd7cdb5
remove the older move --force, which never behaved as documented and seems useless
* move: --force was accidentially enabling two unrelated behaviors
  since 6.20180427. The older behavior, which has never been well
  documented and seems almost entirely useless, has been removed.
* copy: --force no longer does anything.

This commit was sponsored by Øyvind Andersen Holm.
2018-05-21 13:21:19 -04:00
Joey Hess
5204e1dd9d
Workaround for bug in an old version of cryptonite that broke https downloads, by using curl for downloads when git-annex is built with it.
This commit was supported by the NSF-funded DataLad project.
2018-05-20 14:12:37 -04:00
Joey Hess
442e607b0a
Don't allow entering a view with staged or unstaged changes.
In some cases, unstaged changes are safe, eg dotfiles in the top which
are not affected by a view. Or non-annexed files in general which would
prevent view branch checkout from proceeding. But in other cases,
particularly unstaged changes to annexed files, entering a view would wipe
out those changes! And so don't allow entering a view with any unstaged
changes.

Staged changes are not safe when entering a view, because the changes get
committed to the view branch, and so the user is unlikely to remember them
when they exit the view, and so will effectively lose them, even if they're
still present in the view branch.

Also, improved the git status parser, although the improvement turned out
to not really be needed.

This commit was sponsored by Eric Drechsel on Patreon.
2018-05-14 16:51:06 -04:00
Joey Hess
d7021d420f
reuse hashes of dotfiles/dirs/submodules when entering view
This fixes a crash when a git submodule has a name starting with a dot.
Such a submodule might contain dotfiles that are intended to be used when
inside the view (since a dot-directory that's not a submodule was already
preserved when entering a view). So, rather than eliminating the submodule
from the view, its git ls-files --stage hash is copied over into the view.

dotfiles/dirs have their git ls-files --stage hashes similarly copied over
to the view. This is more efficient and simpler than the old method,
and also won't break if git ever adds a new type of tree item, like was
done with submodules.

Since the content of dotfiles in the working tree is no longer hashed
when entering a view, when there are unstaged modifications, they are
not included in the view branch. Entering the view branch still works,
but git checkout shows "M .dotfile", and git diff will show the unstaged
changes. This seems like an improvement over the old behavior.

Also made Command.View not delete empty directories that are submodules
when entering a view, while still deleting other empty directories.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 15:35:20 -04:00
Joey Hess
0632c49c22
releasing package git-annex version 6.20180509 2018-05-09 16:20:43 -04:00
Joey Hess
db720f6a9c
Display error message when http download fails.
* Display error message when http download fails.

  There's nothing in the http-client library to nicely format a http
  exception, so in some cases it has to fall back to using show on it.
  Seems better than just saying "it failed" or only showing the http
  status code.

* Avoid forward retry when 0 bytes were received.

  forwardRetry was comparing Nothing to Just 0, and so thought there had
  been progress made when 0 bytes were received.

This commit was supported by the NSF-funded DataLad project.
2018-05-08 16:11:45 -04:00
Joey Hess
c0ffd02ac5
close almost all old Android app bug reports
The old git-annex Android app is now deprecated in favor of running
git-annex in termux. I suspect all or nearly all of these no longer apply.

This commit was sponsored by Jochen Bartl on Patreon.
2018-05-08 15:00:46 -04:00
Joey Hess
7dc28dc705
Support building with hinotify-0.3.10.
Kept backwards compat with old versions via a shim.

This commit was sponsored by mo on Patreon.
2018-05-08 14:43:06 -04:00