Commit graph

1202 commits

Author SHA1 Message Date
Joey Hess
591e4b145f
convert old uuid-based log parsers to attoparsec
This preserves the workaround for the old bug that caused NoUUID items
to be stored in the log, prefixing log lines with " ". It's now handled
implicitly, by using takeWhile1 (/= ' ') to get the uuid.

There is a behavior change from the old parser, which split the value
into words and then recombined it. That meant that "foo  bar" and "foo\tbar"
came out as "foo bar". That behavior was not documented, and seems
surprising; it meant that after a git-annex describe here "foo  bar",
you wouldn't get that same string back out when git-annex displayed repo
descriptions.

Otoh, some other parsers relied on the old behavior, and the attoparsec
rewrites had to deal with the issue themselves...

For group.log, there are some edge cases around the user providing a
group name with a leading or trailing space. The old parser would ignore
such excess whitespace. The new parser does too, because the alternative
is to refuse to parse something like " group1  group2 " due to excess
whitespace, which would be even more confusing behavior.

The only git-annex branch log file that is not converted to attoparsec
and bytestring-builder now is transitions.log.
2019-01-10 16:34:20 -04:00
Joey Hess
66603d6f75
attoparsec parsers for all new-format uuid-based logs
There should be some speed gains here, especially for chunk and remote
state logs, which are queried once per key.

Now only old-format uuid-based logs still need to be converted to attoparsec.
2019-01-10 13:30:36 -04:00
Joey Hess
1928b82867
marginally faster VectorClock Builder
show of a POSIXTime is 7-bit ascii, so no need to use the filesystem
encoding on it
2019-01-09 14:17:00 -04:00
Joey Hess
232b1a08f3
simplification now that all logs use Builder 2019-01-09 14:10:05 -04:00
Joey Hess
2fef43dd71
convert all per-uuid log files to use Builder
Mostly didn't push the ByteStrings down very deep, but all of these log
files are not written to frequently at all, so slight remaining
innefficiency doesn't matter.

In Logs.UUID, removed the fixBadUUID code that cleaned up after a bug in
git-annex versions 3.20111105-3.20111110. In the unlikely event that a repo was
last touched by that ancient git-annex version, the descriptions of remotes
would appear missing when used with this version of git-annex. That is such minor
breakage, and so unlikely to still be a problem for any repos, that it was not
worth forward-porting that code to ByteString.
2019-01-09 14:00:35 -04:00
Joey Hess
de4980ef85
simplify Show instance by deriving 2019-01-09 13:13:31 -04:00
Joey Hess
2d46038754
converting more log files to use Builder
Probably not any particular speedup in this, since most of these logs
are not written to often. Possibly chunk log writing is sped up, but
writes to chunk logs are interleaved with expensive data transfers to
remotes, so unlikely to be a noticiable speedup.
2019-01-09 13:06:37 -04:00
Joey Hess
cb375977a6
follow-on changes from MetaData type changes
Including writing and parsing the metadata log files with
bytestring-builder and attoparsec.
2019-01-07 15:51:05 -04:00
Joey Hess
ef8ddaa713
attoparsec parser for presence logs 2019-01-03 15:27:29 -04:00
Joey Hess
bfc9039ead
convert git-annex branch access to ByteStrings and Builders
Most of the individual logs are not converted yet, only presense logs
have an efficient ByteString Builder implemented so far. The rest
convert to and from String.
2019-01-03 13:21:48 -04:00
Joey Hess
53905490df
convert Git.HashObject to use ByteStrings
Both lazy and strict, because sometimes it's more efficient to build a
small strict bytestring, and other times better to lazily stream.
2019-01-03 13:21:01 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
894716512d
add a UUIDDesc type containing a ByteString
Groundwork for handling uuid.log using ByteString
2019-01-01 16:17:54 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
9cc6d5549b
convert UUID from String to ByteString
This should make == comparison of UUIDs somewhat faster, and perhaps a
few other operations around maps of UUIDs etc.

FromUUID/ToUUID are used to convert String, which is still used for all
IO of UUIDs. Eventually the hope is those instances can be removed,
and all git-annex branch log files etc use ByteString throughout, for a
real speed improvement.

Note the use of fromRawFilePath / toRawFilePath -- while a UUID usually
contains only alphanumerics and so could be treated as ascii, it's
conceivable that some git-annex repository has been initialized using
a UUID that is not only not a canonical UUID, but contains high unicode
or invalid unicode. Using the filesystem encoding avoids any problems
with such a thing. However, a NUL in a UUID seems extremely unlikely,
so I didn't use encodeBS / decodeBS to avoid their extra overhead in
handling NULs.

The Read/Show instance for UUID luckily serializes the same way for
ByteString as it did for String.
2019-01-01 14:45:33 -04:00
Joey Hess
84e71dae2e
comment typo 2018-12-30 15:51:20 -04:00
Joey Hess
a26514d67e
Fix doubled progress display when downloading an url when -J is used.
downloadUrl uses meteredFile, which sets up one progress meter,
and Remote.Web also uses metered, so two progress meters are displayed for
the same download.

Reversion introduced with the http-conduit switch in
c34152777b -- I don't know why the extra
call to metered was added there.

When -J is not used, the extra progress meter didn't display,
but an extra blank line did get output, which is also fixed.

This commit was sponsored by John Pellman on Patreon.
2018-12-30 12:29:49 -04:00
Joey Hess
5759e93444
honor init --version=5 on crippled filesystem
init: When --version=5 is passed on a crippled filesystem, use a v5 direct
mode repo as requested, rather than upgrading to v7 adjusted unlocked.

Fixed test suite on crippled filesystems, making it request --version=5
to test direct mode.
2018-12-19 13:17:04 -04:00
Joey Hess
6d381df0e6
sync --content: Fix dropping unwanted content from the local repository
This fixes a bug with the numcopies counting when using sync --content.
It did not always pass the local repo uuid to handleDropsFrom, and so the
numcopies counting was off by one, and unwanted local content would only be
dropped when there were numcopies+1 remote copies.

Also, support dropping local content that has reached an
exporttree remote that is not untrusted (currently only S3 remotes
with versioning).
2018-12-18 13:58:12 -04:00
Joey Hess
bbf7dcc193
fix bugs involving v7 unlocked files and direct mode
* Fix bug upgrading from direct mode to v7: when files in the repository
  were already committed as v7 unlocked files elsewhere, and the
  content was present in the direct mode repository, the annexed files
  got their full content checked into git.
* Fix bug that caused v7 unlocked files in a direct mode repository
  to get locked when committing.

This commit was sponsored by Nick Piper on Patreon.
2018-12-11 13:47:35 -04:00
Joey Hess
992110c1be
remove debug 2018-12-11 13:10:33 -04:00
Joey Hess
11dbb829bc
Fix a case where upgrade to v7 caused git to think that unlocked files were modified
When a file was already unlocked, but the annex object was present, the
upgrade process populated the unlocked file, but neglected to update the
index.

This commit was sponsored by Jochen Bartl on Patreon.
2018-12-11 13:05:03 -04:00
Joey Hess
029ae8d4db
support findred and --branch with file matching options
* findref: Support file matching options: --include, --exclude,
  --want-get, --want-drop, --largerthan, --smallerthan, --accessedwithin
* Commands supporting --branch now apply file matching options --include,
  --exclude, --want-get, --want-drop to filenames from the branch.
  Previously, combining --branch with those would fail to match anything.
* add, import, findref: Support --time-limit.

This commit was sponsored by Jake Vosloo on Patreon.
2018-12-09 13:38:35 -04:00
Joey Hess
aa8243df4c
dropunused edge case when annex.thin caused unused object to be modified
dropunused: When an unused object file has gotten modified, eg due to
annex.thin being set, don't silently skip it, but display a warning and let
--force drop it.

This commit was sponsored by Ethan Aubin.
2018-12-04 12:20:34 -04:00
Joey Hess
865d556103
fix init in cripped filesystem version issues
* init: When a crippled filesystem causes an adjusted unlocked branch to
  be used, set repo version to 7, which it neglected to do before.
* init: When on a crippled filesystem, and the git version is too old
  to use an adjusted unlocked branch, fall back to using direct mode.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2018-12-03 12:57:23 -04:00
Joey Hess
efbf889e36
clarify comment 2018-11-30 12:37:45 -04:00
Joey Hess
ecdba3ed3f
When running youtube-dl to get a filename, pass --no-playlist
Seems that youtube-dl --get-filename on a playlist lists all the filenames
for the playlist, which can take quite some time. The code already only
took the first name, so --no-playlist can speed it up a lot.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-11-28 17:14:47 -04:00
Joey Hess
65bb30bcf5
fix accidental commit 2018-11-20 11:43:33 -04:00
Joey Hess
9c0cece35a
followup 2018-11-19 18:12:03 -04:00
Joey Hess
9127fe4821
add DebugLocks build flag
Using the method described in
https://www.fpcomplete.com/blog/2018/05/pinpointing-deadlocks-in-haskell
but my own code to implement it, and with callstacks added.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-19 15:02:43 -04:00
Joey Hess
370757087d
catch lockContentForRemoval exception
removeKey should not throw exceptions, so catch exception there

In Assistant.Unused, keep trying to drop other keys if one drop fails
2018-11-15 15:39:57 -04:00
Joey Hess
d65df7ab21
improve messages around export conflicts
When an export conflict prevents accessing a special remote, be clearer
about what the problem is and how to resolve it.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-11-13 15:50:06 -04:00
Joey Hess
983c9d5a53
git-annex-shell: fix transfer hang
Fix hang when transferring the same objects to two different clients at the
same time. (Or when annex.pidlock is used, two different objects to the
same or different clients.)

Could also potentially occur if a client was downloading an object and
somehow lost connection but that git-annex-shell was still running and
holding the transfer lock.

This does not guarantee that, if `transfer` fails for some other reason,
a DATA response will be made.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-06 13:00:37 -04:00
Joey Hess
1c71f563e0
explicitly close keys db in saveState
Should be redundant, but test suite is ending up with
a lot of extra sqlite connections before unused keys database handles
get garbage collected.

While running the test suite, I often saw 2-4+ open fds to the same
repo's keys database. After this change, it seems to mostly have 1,
occasionally 2.

And that might explain some of the strange sqlite failures in the test suite.
Especially the failures of test_lock_v7_force, where the keys database
gets renamed to a new directory out from under sqlite.
2018-10-30 22:19:32 -04:00
Joey Hess
5ab0f48ffb
high-res mtimes
Cache high-resolution mtimes for improved detection of modified files in v7
(and direct mode).

Including on Windows.

With back-compat support so old low-res mtimes won't break anything, and
so the new information also won't break old versions of git-annex.
2018-10-30 00:41:26 -04:00
Joey Hess
2e9f128dea
moved module and relicensed 2018-10-29 23:13:36 -04:00
Joey Hess
5d97898a7c
touch files with high-resolution timestamp
Needs unix 2.7.2, but that was included in ghc 8.0.1 (and much older)
so not really a new dep.
2018-10-29 22:25:21 -04:00
Joey Hess
497846d740
don't probe support for git-annex smudge --update
Any git-annex not supporting that doesn't support v7 repositories,
so will refuse to work in a repository that has this hook installed.
2018-10-26 14:37:43 -04:00
Joey Hess
3af29b3ba9
When annex.thin is set, allow hard links to be made between executable work tree files and annex objects.
This is safe, because while the annex object ends up executable,
there were already at least two other cases where it ended up executable:

1. git add an an executable file
2. chmod +x of a a non-executable worktree file that was hard linked to the
   annex object

After copy/hard link, it always fixes up the permissions to match the mode
of the worktree file, so when an executable annex object gets hard linked
to a non-executable worktree file, its execute bit gets removed.

Commit b7c8bf5274 already *said* it would do
this; I suspect the line of code I've removed was included in that commit
accidentially.

Also improves annex.thin documentation.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-10-26 13:51:43 -04:00
Joey Hess
e2c894d3df
remove debug prints 2018-10-26 12:56:40 -04:00
Joey Hess
679146384b
remove 3 from supportedVersions (no behavior change)
It's auto-upgraded to 5, so does not need to be listed there.
Let's keep supportedVersions for versions that git-annex will actually
use without autoupgrading or demanding an upgrade.
2018-10-25 18:50:44 -04:00
Joey Hess
d59995b9ee
default to v7 adjusted unlocked in crippled filesystem
init: When in a crippled filesystem, initialize a v7 repository using an
adjusted unlocked branch, instead of a direct mode repository.

Direct mode is deprecated, so this makes sense to do already I hope.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-10-25 18:49:57 -04:00
Joey Hess
5bb4308e1f
bump versionForAdjustedClone to v7 2018-10-25 18:46:16 -04:00
Joey Hess
b996b38b4f
fix autoupgrade from v6 to go to v7, not v5
v3 and v4 still autoupgrade to v5

And a few more upgrade doc updates.
2018-10-25 18:40:04 -04:00
Joey Hess
234842a347
v7
Install new git hooks in this version.

This does beg the question of what to do if git later gets eg a
post-smudge hook, that could run git-annex smudge --update. I think the
thing to do in that case would be to make git-annex smudge --update
install the new hooks. That way, as the user uses git-annex, the hook
would be created pretty quickly and without needing any extra syscalls
except for when git-annex smudge --update is called.

I considered doing something like that for installation of the
post-checkout and post-merge hooks, which would have avoided the need
for v7. But the only place it was cheap to do it would be in git-annex smudge
which could cheaply notice that smudge.log didn't exist yet and so know
the hooks needed to be installed. But since smudge used to populate pointer
files, it would be quite surprising if a single git checkout/merge failed
to update the work tree, and so that idea didn't work out.

The other reason for v7 is psychological -- users don't need to worry
about whether they might be running an old version of git-annex that
doesn't support their v7 repository very well. And bug reports about
"v6" have gotten a bit of a bad association in my head since they often
hit one of the known limitations and didn't realize it was experimental.

newtyped RepoVersion Int to avoid needing 2 comparisons in
versionSupportsUnlockedPointers etc. Also it's just nicer.

This commit was sponsored by John Pellman on Patreon.
2018-10-25 18:24:23 -04:00
Joey Hess
c28ca8294f
optimize smudge --clean of unmodified file
Usually, git won't run clean filter when a file is unmodified. But, when
git checkout runs git annex smudge --update, it populates the pointer
runs git update-index, which sees the file has changed and runs
git annex smudge --clean, which was checksumming the file unncessarily
as it re-ingested it.

With annex.thin set, this is the difference between git checkout of a
branch with a 1 gb file taking 30s and 0.1s.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-10-25 16:46:46 -04:00
Joey Hess
daa259ec6a
remove unused import 2018-10-25 16:25:21 -04:00
Joey Hess
ca7de61454
git post-checkout and post-merge hooks
* init, upgrade: Install git post-checkout and post-merge hooks that run
  git annex smudge --update.
* precommit: Run git annex smudge --update, because the post-merge
  hook is not run when there is a merge conflict. So the work tree will
  be updated when a commit is made to resolve the merge conflict.
* precommit: Run git annex smudge --update, because the post-merge
  hook is not run when there is a merge conflict. So the work tree will
  be updated when a commit is made to resolve the merge conflict.
* Note that git has no hooks run after git stash or git cherry-pick,
  so the user will have to manually run git annex smudge --update
  after such commands.

Nothing currently installs the hooks into v6 repos that already exist.
Something will need to be done about that, either move this behavior to v7,
or document that the user will need to manually fix up their v6 repos.

This commit was sponsored by Eric Drechsel on Patreon.
2018-10-25 15:59:51 -04:00
Joey Hess
917a2c6095
defer updating unlocked files until after smudge filter
The smuge filter no longer provides git with annexed file content, to
avoid a git memory leak, and because that did not honor annex.thin.

git annex smudge --update has to be run after a checkout to update
unlocked files in the working tree with annexed file contents.

No hooks yet to run it.

This commit was sponsored by Nick Piper on Patreon.
2018-10-25 15:08:20 -04:00
Joey Hess
94aa0e2f64
fix strange test failure
It was trying to git annex adjust when in a direct mode repo, and that
of course fails. What I don't understand though, is how the test suite
managed to work before, when it was clearly checking the wrong thing.
Since the right way to fix it was obvious, I have not bisected.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-10-22 16:51:09 -04:00
Joey Hess
63cf3381f0
fix incomplete renaming of function 2018-10-22 16:44:18 -04:00
Joey Hess
4a6ebb1034
make sync update adjusted branch to hide/unhide
This completes initial support for --hide-missing, although the
assistant still needs to be updated and it perhaps needs to be sped up,
and maybe there needs to be a way for git-annex get to operate on
missing files. Opened some more todos for those things.

This commit was sponsored by Henrik Riomar.
2018-10-20 14:22:28 -04:00
Joey Hess
1191d3d22d
document --force 2018-10-20 11:53:35 -04:00
Joey Hess
4a788fbb3b
sync --content now supports --hide-missing adjusted branches
This relies on git ls-files --with-tree, which I'm using in a way that
its man page does not document. Hm. I emailed the git list to try to get
the docs improved, but at least the git test suite does test the same
kind of use case I'm using here.

Performance impact when not in an adjusted branch is limited to some
additional MVar accesses, and a single git call to determine the name of
the current branch. So very minimal.

When in an adjusted branch, the performance impact is
in Annex.WorkTree.lookupFile, which starts doing an equal amount of work
for files that didn't exist as it already did for files that were
unlocked.

This commit was sponsored by Jochen Bartl on Patreon.
2018-10-19 17:51:25 -04:00
Joey Hess
8be5a7269a
refactor getCurrentBranch
Both Command.Sync and Annex.Ingest had their own versions of this.

The one in Annex.Ingest used Git.Branch.currentUnsafe, but does not seem
to need it. That is only checking to see if it's in an adjusted unlocked
branch, and when in an adjusted branch, the branch does in fact exist,
so the added check that Git.Branch.current does is fine.

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-10-19 17:29:18 -04:00
Joey Hess
24838547e2
adjust --hide-missing
* At long last there's a way to hide annexed files whose content
  is missing from the working tree: git-annex adjust --hide-missing
* When already in an adjusted branch, running git-annex adjust
  again will update the branch as needed. This is mostly
  useful with --hide-missing to hide/unhide files after their content
  has been dropped or received.

Still needs integration with sync and the assistant, and not as fast as it
could be, but already usable.

This commit was sponsored by Ethan Aubin.
2018-10-18 15:32:42 -04:00
Joey Hess
a6c8de84b6
improve types to allow combining some adjustments
Combinations like --hide-misssing --unlocked seem very useful. On the
other hand, combining --fix with --unlock doesn't make sense because a
file can be either unlocked or a symlink that can be fixed, but not
both.

Changed the serialization of HideMissingAdjustment in passing, but it
has not actually been used yet so nothing will be broken.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-10-18 12:59:05 -04:00
Joey Hess
b2bafdb2fc
v6: Fix database inconsistency
That could cause git-annex to get confused about whether a locked file's
content was present, when the object file got touched.

Unfortunately this means more work sometimes when annex.thin is set,
since it has to checksum the file to tell if it's still got the right
content.

Had to suppress output when inAnnex calls isUnmodified, otherwise
"(checksum...)" would be printed in places it ought not to be,
eg "git annex get" could turn out not need to get anything, and
so only display that.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-10-16 13:51:37 -04:00
Joey Hess
38d691a10f
removed the old Android app
Running git-annex linux builds in termux seems to work well enough that the
only reason to keep the Android app would be to support Android 4-5, which
the old Android app supported, and which I don't know if the termux method
works on (although I see no reason why it would not).
According to [1], Android 4-5 remains on around 29% of devices, down from
51% one year ago.

[1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/

This is a rather large commit, but mostly very straightfoward removal of
android ifdefs and patches and associated cruft.

Also, removed support for building with very old ghc < 8.0.1, and with
yesod < 1.4.3, and without concurrent-output, which were only being used
by the cross build.

Some documentation specific to the Android app (screenshots etc) needs
to be updated still.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-10-13 01:41:11 -04:00
Joey Hess
a9dd087074
centralized "yes"/"no" parsing
This commit was sponsored by Jack Hill on Patreon.
2018-10-10 11:14:27 -04:00
Joey Hess
4b793fb077
Fix reversion in support of annex.web-options
Inverted logic added as part of the url security fix made it always use
curl when annex.security.allowed-http-addresses=all unless annex.web-options
was set.

That nobody noticed kind of makes me wonder if anyone uses
annex.web-options..

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-10-04 13:43:29 -04:00
Joey Hess
177e45517f
improve back-compat of post-receive hook
* init: Improve generated post-receive hook, so it won't fail when
  run on a system whose git-annex is too old to support git-annex post-receive
* init: Update the post-receive hook when re-run in an existing repository.

This commit was sponsored by Jack Hill on Patreon.
2018-09-25 15:02:12 -04:00
Joey Hess
c63d28b39b
fix build with older ghc
null only used to work on lists, not sets.

Fixes the Android build and probably also i386ancient.

This commit was sponsored by mo on Patreon.
2018-09-12 14:10:08 -04:00
Joey Hess
fcff64f8bb
optimisation: avoid stat call
This commit was sponsored by Paul Walmsley on Patreon.
2018-09-05 17:26:12 -04:00
Joey Hess
b600ad71ce
make linkToAnnex freezeContent the object file
v6: Fix annex object file permissions when git-annex add is run on a
modified unlocked file, and in some related cases.

If a hard link is made, don't freeze it; annex.thin
uses writable object files.

Also: For some reason, linkToAnnex used to thawContent src. I can see no
reason why it needed to do that, so I eliminated that.

This commit was sponsored by Brock Spratlen on Patreon.
2018-09-05 15:27:22 -04:00
Joey Hess
0a7c5a9982
dropdead per-remote metadata
Had to refactor pure code into separate modules so it is accessible
inside Annex.Branch.Transitions.

This commit was sponsored by Peter on Patreon.
2018-09-05 13:52:46 -04:00
Joey Hess
813ee2357c
improve message 2018-09-02 16:08:00 -04:00
Joey Hess
76f32012af
avoid sync/assistant drop from appendonly
Make git-annex sync and the assistant skip trying to drop from appendonly
remotes since it's just going to fail.

git-annex drop and similar commands will still try to drop from
appendonly, so the user will see failure messages when they try to do
that. To do otherwise would be confusing since the user has explicitly
asked for a drop with those commands.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:23:57 -04:00
Joey Hess
759a87ad70
fix git command queue to be concurrency safe
Probably not noticed until now because the queue is large enough that two
threads each filling theirs at the same time and flushing is unlikely to
happen.

Also made explicit that each worker thread gets its own queue.
I think that was the case before, but if something was put in the queue
before worker threads were forked off, they could have each inherited the
same queue.

Could have gone with a single shared queue, but per-worker queues is more
efficient, because a worker can add lots of stuff to its own queue without
any locking.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-08-28 13:16:33 -04:00
Joey Hess
401a79675b
run git status before enabling clean filter
Avoids annex.largefiles inconsitency and also avoids a lot of
unneccessary calls to the clean filter when a large repo's clone
is being initialized.

This commit was supported by the NSF-funded DataLad project.
2018-08-28 10:36:22 -04:00
Joey Hess
10138056dc
v6: avoid accidental conversion when annex.largefiles is not configured
v6: When annex.largefiles is not configured for a file, running git add or
git commit, or otherwise using git to stage a file will add it to the annex
if the file was in the annex before, and to git otherwise. This is to avoid
accidental conversion.

Note that git-annex add's behavior has not changed, for reasons explained
in the added comment.

Performance: No added overhead when annex.largefiles is configured.
When not configured, there is an added call to catObjectMetaData,
which involves a round trip through git cat-file --batch.
However, the earlier catKeyFile primes the cache for it.

This commit was supported by the NSF-funded DataLad project.
2018-08-27 14:51:10 -04:00
Joey Hess
9ff1c62a4d
fix race
If a pointer file is being populated and something modifies it at the
same time, there was a race there the modified file's InodeCache
could get added into the keys database.

Note that replaceFile normally renames the temp file into place, so the
inode cache caculated for the temp file will still be good. If it has to
fall back to a copy, the worktree file won't be put in the inode cache.
This has the same result as if the worktree file gets touched, and will
be handled the same way. Eg, when dropping, isUnmodified will do an
expensive comparison and notice that the worktree file does have the
same content, and so drop it.

This commit was supported by the NSF-funded DataLad project.
2018-08-22 15:22:52 -04:00
Joey Hess
e094cf3377
split out modules from Annex.Content 2018-08-22 14:45:53 -04:00
Joey Hess
18ecf41917
avoid running reconcileStaged when the index has not changed
This commit was supported by the NSF-funded DataLad project.
2018-08-22 13:04:12 -04:00
Joey Hess
54d49eeac8
avoid update-index race
This commit was supported by the NSF-funded DataLad project.
2018-08-17 16:03:40 -04:00
Joey Hess
82c5dd8a01
queueing of internal IO actions on files
This would be better if getInternalFiles were
more polymorphic, but I can't see a good
way to accomplish that without messing with Data.Typeable,
which seemed like overkill.

Reverted CommandAction back to the simpler version.

This commit was sponsored by Eric Drechsel on Patreon.
2018-08-17 13:28:21 -04:00
Joey Hess
0f25d48639
pass absolute path to update-index
Test suite found a case where this is necessary.

And the man page says this, although current behavior is not as
documented..

           Note that files beginning with .  are discarded.
           This includes ./file and dir/./file. If you don’t want
           this, then use cleaner names.

This may hit path length limits on Windows. shrug

This commit was supported by the NSF-funded DataLad project.
2018-08-16 16:00:29 -04:00
Joey Hess
82a239675f
narrow the race where a file gets modified before update-index
Check just before running update-index if the worktree file's content is
still the same, don't update it when it's been modified. This narrows
the race window a lot, from possibly minutes or hours, to seconds or
less.

(Use replaceFile so that the worktree update happens atomically,
allowing the InodeCache of the new worktree file to itself be gathered
w/o any other race.)

This doesn't eliminate the race; it can still occur in the window before
update-index runs. When annex.queue is large, a lot of files will be
statted by the checks, and so the window may still be large enough to be a
problem.

When only a few files are being processed, the window is as small as it
is in the race where a modification gets overwritten by git-annex when
it updates the worktree. Or maybe as small as whatever race git
checkout/pull/merge may have when the worktree gets modified during it.
Still, I've kept a todo about this race.

This commit was supported by the NSF-funded DataLad project.
2018-08-16 15:56:43 -04:00
Joey Hess
82cfcfc838
better index file refresh method
Use git update-index --refresh, since it's a little bit more
efficient and the user can be told to run it if a locked index prevents
git-annex from running it.

This also fixes the problem where an annexed file was deleted in the index
and a get of another file that uses the same key caused the index update to
add back the deleted file. update-index will not add back the deleted file.

Documented in tips/unlocked_files.mdwn the gotcha that the index update
may conflict with other operations. I can't see any way to possibly avoid
that conflict.

One new todo about a race that causes a modification to be accidentially
staged.

Note that the assistant only flushes the git command queue when it
commits a modification. I have not tested the assistant with v6 unlocked
files, but assume most users of the assistant won't care if the index
shows a file as modified for a while.

This commit was supported by the NSF-funded DataLad project.
2018-08-16 14:16:24 -04:00
Joey Hess
5e87389f40
refactor 2018-08-15 13:46:28 -04:00
Joey Hess
48e9e12961
finally fixed v6 get/drop git status
After updating the worktree for an add/drop, update git's index, so git
status will not show the files as modified.

What actually happens is that the index update removes the inode
information from the index. The next git status (or similar) run
then has to do some work. It runs the clean filter.

So, this depends on the clean filter being reasonably fast and on git
not leaking memory when running it. Both problems were fixed in
a96972015d, but only for git 2.5. Anyone
using an older git will see very expensive git status after an add/drop.

This uses the same git update-index queue as other parts of git-annex, so
the actual index update is fairly efficient. Of course, updating the index
does still have some overhead. The annex.queuesize config will control how
often the index gets updated when working on a lot of files.

This is an imperfect workaround... Added several todos about new
problems this workaround causes. Still, this seems a lot better than the
old behavior.

This commit was supported by the NSF-funded DataLad project.
2018-08-14 16:23:58 -04:00
Joey Hess
4c918437ab
Fix git-annex branch data loss that could occur after git-annex forget --drop-dead
Added getStaged, to get the versions of git-annex branch files staged in its
index, and use during transitions so the result of merging sibling branches
is used.

The catFileStop in performTransitionsLocked is absolutely necessary,
without that the bug still occurred, because git cat-file was already
running and was looking at the old index file.

Note that getLocal still has cat-file look at the git-annex branch, not the
index. It might be faster if it looked at the index, but probably only
marginally so, and I've not benchmarked it to see if it's faster at all. I
didn't want to change unrelated behavior as part of this bug fix. And as
the need for catFileStop shows, using the index file has added
complications.

Anyway, it still seems fine for getLocal to look at the git-annex branch,
because normally the index file is updated just before the git-annex branch
is committed, and so they'll contain the same information. It's only during
a transition that the two diverge.

This commit was sponsored by Paul Walmsley in honor of Mark Phillips.
2018-08-06 17:36:30 -04:00
Joey Hess
1a02fc1159
Fix wrong sorting of remotes when using -J
It was sorting by uuid, rather than cost!

Avoid future bugs of this kind by changing the Ord to primarily compare
by cost, with uuid only used when the cost is the same.

This commit was supported by the NSF-funded DataLad project.
2018-08-03 13:10:50 -04:00
Joey Hess
ae11394efa
added annex.commitmessage
Added annex.commitmessage config that can specify a commit message for the
git-annex branch instead of the usual "update".

This commit was supported by the NSF-funded DataLad project.
2018-08-02 14:06:06 -04:00
Joey Hess
1a9f5ecdb8
fix android/old ghc build 2018-07-19 13:27:29 -04:00
Joey Hess
081f8e57c6
Support working trees set up by git-worktree.
Support working trees set up by git-worktree, by setting up some symlinks
such that git-annex links work right.

Also improved support for repositories created with --separate-git-dir.
At least recent git makes a .git file for those (older may have used a
symlink?), so that also needs to be converted to a symlink.

This commit was sponsored by Nick Piper on Patreon.
2018-07-18 14:27:26 -04:00
Joey Hess
d3f06ad112
avoid unneccessary Maybe 2018-07-16 12:06:06 -04:00
Joey Hess
a63bbd868b
make addurl of media url fail when youtube-dl is disabled
addurl: When security configuration prevents downloads with youtube-dl,
still check if the url is one that it supports, and fail downloading it,
instead of downloading the raw web page.
2018-06-28 13:01:18 -04:00
Joey Hess
2c62f8e63d
remove empty tmp workdir on failure
No point in keeping an empty tmp workdir around.

The associated tmp object file is retained even if empty, didn't want to
deal with any possible races with something else downloading to that
file at the same time this would check if it's empty. Anyhow, temp
object files are normally retained, and this will get cleaned out the
same way those do, by dropunused.
2018-06-28 12:58:11 -04:00
Joey Hess
eb8a8976a9
comment 2018-06-21 20:54:02 -04:00
Joey Hess
b657242f5d
enforce retrievalSecurityPolicy
Leveraged the existing verification code by making it also check the
retrievalSecurityPolicy.

Also, prevented getViaTmp from running the download action at all when the
retrievalSecurityPolicy is going to prevent verifying and so storing it.

Added annex.security.allow-unverified-downloads. A per-remote version
would be nice to have too, but would need more plumbing, so KISS.
(Bill the Cat reference not too over the top I hope. The point is to
make this something the user reads the documentation for before using.)

A few calls to verifyKeyContent and getViaTmp, that don't
involve downloads from remotes, have RetrievalAllKeysSecure hard-coded.
It was also hard-coded for P2P.Annex and Command.RecvKey,
to match the values of the corresponding remotes.

A few things use retrieveKeyFile/retrieveKeyFileCheap without going
through getViaTmp.
* Command.Fsck when downloading content from a remote to verify it.
  That content does not get into the annex, so this is ok.
* Command.AddUrl when using a remote to download an url; this is new
  content being added, so this is ok.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-06-21 13:37:01 -04:00
Joey Hess
923578ad78
improve error message
This commit was sponsored by Jack Hill on Patreon.
2018-06-19 14:21:41 -04:00
Joey Hess
cc08135e65
prevent using local http proxies per annex.security.allowed-http-addresses
A local http proxy would bypass the security configuration. So,
the security configuration has to be applied when choosing whether to
use the proxy.

While http rebinding attacks against the dns lookup of the proxy IP
address seem very unlikely, this implementation does prevent them, since
it resolves the IP address once, checks it, and then reconfigures
http-client's proxy using the resolved address.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-06-18 13:32:20 -04:00
Joey Hess
e62c4543c3
default to not using youtube-dl, for security
Pity, but same reasoning as curl applies to it.

This commit was sponsored by Peter on Patreon.
2018-06-17 14:51:02 -04:00
Joey Hess
b54b2cdc0e
prevent http connections to localhost and private ips by default
Security fix!

* git-annex will refuse to download content from http servers on
  localhost, or any private IP addresses, to prevent accidental
  exposure of internal data. This can be overridden with the
  annex.security.allowed-http-addresses setting.
* Since curl's interface does not have a way to prevent it from accessing
  localhost or private IP addresses, curl defaults to not being used
  for url downloads, even if annex.web-options enabled it before.
  Only when annex.security.allowed-http-addresses=all will curl be used.

Since S3 and WebDav use the Manager, the same policies apply to them too.

youtube-dl is not handled yet, and a http proxy configuration can bypass
these checks too. Those cases are still TBD.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-06-17 13:30:28 -04:00
Joey Hess
28720c795f
limit url downloads to whitelisted schemes
Security fix! Allowing any schemes, particularly file: and
possibly others like scp: allowed file exfiltration by anyone who had
write access to the git repository, since they could add an annexed file
using such an url, or using an url that redirected to such an url,
and wait for the victim to get it into their repository and send them a copy.

* Added annex.security.allowed-url-schemes setting, which defaults
  to only allowing http and https URLs. Note especially that file:/
  is no longer enabled by default.

* Removed annex.web-download-command, since its interface does not allow
  supporting annex.security.allowed-url-schemes across redirects.
  If you used this setting, you may want to instead use annex.web-options
  to pass options to curl.

With annex.web-download-command removed, nearly all url accesses in
git-annex are made via Utility.Url via http-client or curl. http-client
only supports http and https, so no problem there.
(Disabling one and not the other is not implemented.)

Used curl --proto to limit the allowed url schemes.

Note that this will cause git annex fsck --from web to mark files using
a disallowed url scheme as not being present in the web. That seems
acceptable; fsck --from web also does that when a web server is not available.

youtube-dl already disabled file: itself (probably for similar
reasons). The scheme check was also added to youtube-dl urls for
completeness, although that check won't catch any redirects it might
follow. But youtube-dl goes off and does its own thing with other
protocols anyway, so that's fine.

Special remotes that support other domain-specific url schemes are not
affected by this change. In the bittorrent remote, aria2c can still
download magnet: links. The download of the .torrent file is
otherwise now limited by annex.security.allowed-url-schemes.

This does not address any external special remotes that might download
an url themselves. Current thinking is all external special remotes will
need to be audited for this problem, although many of them will use
http libraries that only support http and not curl's menagarie.

The related problem of accessing private localhost and LAN urls is not
addressed by this commit.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-16 11:57:50 -04:00
Joey Hess
b94294a43d
remove no longer needed uuid check in prepSocket
Since 3dd43df9c2, the socket warmup does
not run git-annex-shell on the remote host, and the point of this check
was to avoid error messages running git-annex-shell when it was not
installed. So the check is not needed any longer.

Also, this is one of only two uses of remoteGitConfig, which
I want to get rid of for reasons explained in
fc5888300f.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-06-05 12:51:17 -04:00
Joey Hess
d7021d420f
reuse hashes of dotfiles/dirs/submodules when entering view
This fixes a crash when a git submodule has a name starting with a dot.
Such a submodule might contain dotfiles that are intended to be used when
inside the view (since a dot-directory that's not a submodule was already
preserved when entering a view). So, rather than eliminating the submodule
from the view, its git ls-files --stage hash is copied over into the view.

dotfiles/dirs have their git ls-files --stage hashes similarly copied over
to the view. This is more efficient and simpler than the old method,
and also won't break if git ever adds a new type of tree item, like was
done with submodules.

Since the content of dotfiles in the working tree is no longer hashed
when entering a view, when there are unstaged modifications, they are
not included in the view branch. Entering the view branch still works,
but git checkout shows "M .dotfile", and git diff will show the unstaged
changes. This seems like an improvement over the old behavior.

Also made Command.View not delete empty directories that are submodules
when entering a view, while still deleting other empty directories.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 15:35:20 -04:00
Joey Hess
0b7f6d24d3
rename BlobType and add submodule to it
This was badly named, it's a not a blob necessarily, but anything that a
tree can refer to.

Also removed the Show instance which was used for serialization to git
format, instead use fmtTreeItemType.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 14:45:41 -04:00
Joey Hess
db720f6a9c
Display error message when http download fails.
* Display error message when http download fails.

  There's nothing in the http-client library to nicely format a http
  exception, so in some cases it has to fall back to using show on it.
  Seems better than just saying "it failed" or only showing the http
  status code.

* Avoid forward retry when 0 bytes were received.

  forwardRetry was comparing Nothing to Just 0, and so thought there had
  been progress made when 0 bytes were received.

This commit was supported by the NSF-funded DataLad project.
2018-05-08 16:11:45 -04:00