Commit graph

2291 commits

Author SHA1 Message Date
Joey Hess
ca1a3caaa8
refactor 2019-03-09 13:34:57 -04:00
Joey Hess
633021e135
--no-push and remote.name.annex-push prevent exporting trees to special remotes
Users may want sync to only export, or only import and this is broadly
analagous to push and pull, so it makes sense to use the same
configuration for it.
2019-03-09 13:21:49 -04:00
Joey Hess
d9ee048d85
doc updates for import 2019-03-09 13:10:30 -04:00
Joey Hess
e412129523
concurrency and status messages when downloading from import 2019-03-08 12:33:44 -04:00
Joey Hess
e3a704224f
fix export db locking deadlock 2019-03-07 16:06:02 -04:00
Joey Hess
be6085cfe5
fix option parser
Alternative doesn't combine the subparsers the way I wanted.
Unfortunately this new parser has suboptimal usage because everything is
all jumbled together.
2019-03-06 13:10:29 -04:00
Joey Hess
5767b1b00d
avoid updating tracking branch when transfer to export throws exception 2019-03-05 16:51:13 -04:00
Joey Hess
aaacf431d8
handle importtree=yes config
For now, it's only allowed when exporttree=yes is also set.
That simplified the implementation, but could later be changed if
there's a remote that makes sense to be an import but not an export.
However, it may work just as well to make a remote be readonly to
prevent export to it while still allowing import.
2019-03-04 16:07:35 -04:00
Joey Hess
18d7a1dbbb
make export and sync update special remote tracking branch
The branch is only updated once the export is 100% complete. This way,
if an export is started but interrupted and so the remote does not yet
contain some of the files, an import will make a commit on the old
branch, and so won't delete the missing files.
2019-03-01 16:35:48 -04:00
Joey Hess
519cadd1de
refactor RemoteTrackingBranch
Not specific to Import; export will use it too.
2019-03-01 14:47:56 -04:00
Joey Hess
d28b0a8bd0
use disconnected history for import tracking branch
This avoids the first merge from it deleting all files in the current
branch, which was very surpring and unwanted behavior.
2019-03-01 14:33:29 -04:00
Joey Hess
45aacd888b
import downloader complete (untested)
Made some api changes.

listImportableContents needs to provide the size
of the data, so the downloader can check disk free space.

retrieveExportWithContentIdentifier is passed the filepath to write to

Use temporary "CID" key during download of a ContentIdentifier from a
remote, so withTmp can be used and then move the content to the real key
once it's known.
2019-02-27 13:15:02 -04:00
Joey Hess
f4b773e9a1
incomplete action to download files from import 2019-02-26 15:25:28 -04:00
Joey Hess
b6e2a5e9c2
reorg 2019-02-26 14:22:08 -04:00
Joey Hess
e4e464da65
import command is updating tracking branch 2019-02-26 13:15:48 -04:00
Joey Hess
5afe4135c2
import --from option parsing 2019-02-26 12:06:19 -04:00
Joey Hess
4747fa923d
export: Deprecated the --tracking option.
Instead, users can configure remote.<name>.annex-tracking-branch themselves.
2019-02-23 15:54:33 -04:00
Joey Hess
8fdea8f444
WIP
Added graftTree but it's buggy.

Should use graftTree in Annex.Branch.graftTreeish; it will be faster
than the current implementation there.

Started Annex.Import, but untested and it doesn't yet handle tree
grafting.
2019-02-21 17:32:59 -04:00
Joey Hess
7c25cc7715
fix build 2019-02-20 17:31:08 -04:00
Joey Hess
c3f47ba389
make .noannex file prevent repo fixups
Avoid performing repository fixups for submodules and git-worktrees
when there's a .noannex file that will prevent git-annex from being
used in the repository.

This change is ok as long as the .noannex file is really going to prevent
git-annex from being used. But, init --force could override the file.
Which would result in the repo being initialized without the fixups
having run.

To avoid that situation decided to change init, to not let --force be used
to override a .noannex file. Instead the user can just delete the file.
2019-02-05 14:43:23 -04:00
Joey Hess
b080699a95
fromkey --json
* fromkey: Added --json.
* fromkey --batch output changed to support using it with --json.
  The old output was not parseable for any useful information, so
  this is not expected to break anything.
2019-02-05 14:03:29 -04:00
Joey Hess
7b46b43c48
fromkey: Made idempotent
If the worktree file already exists, and is annexed and uses the same
key, avoid failing, nothing needs to be done.

Had to add lookupFileNotHidden to handle the case where an adjust --hide-missing
is in use, and the worktree file was hidden due to the object content
being missing. lookupFile would return the key of the hidden file,
but it makes sense that after fromkey succeeds, the worktree must
contain the file it was supposed to set up.
2019-02-05 13:13:13 -04:00
Joey Hess
7b9701675e
Display progress bar when getting files from export remotes
And moved the progress bar display into storeExport as well.

This commit was sponsored by John Pellman on Patreon.
2019-01-31 13:34:12 -04:00
Joey Hess
9cebfd7002
purify exportActions
Purifying exportActions will allow introspecting and modifying it,
which is needed to add progress bar display to it.

Only S3 and WebDAV ran an Annex action while constructing ExportActions.
There was a small performance gain from them doing that, since a
resource was able to be prepared and reused for multiple actions by
Command.Export.

As seen in commit 809cfbbd8a and
5d394023eb S3 and WebDAV actually create a
new handle for each access in normal, non-export use. It doesn't seem
worth making export use of them marginally more efficient than normal
use. It would be better to do that work upfront when constructing the
remote. Or perhaps use a MVar to cache a handle.

This commit was sponsored by Nick Piper on Patreon.
2019-01-30 15:11:40 -04:00
Joey Hess
ad1d422dd7
fix false positive in export conflict detection
Like the earlier fixed one in Command.Export, it occurred when the same
tree was exported by multiple clones. Previous fix was incomplete since
several other places looked at the list of exported trees to detect when
there was an export conflict. Added a single unified function to avoid
missing any places it needed to be fixed.

This commit was sponsored by mo on Patreon.
2019-01-30 12:36:30 -04:00
Joey Hess
a9593a43e9
explain why numcopies is not checked in performUnexport 2019-01-26 12:52:56 -04:00
Joey Hess
68198e803e
fix build with old version of optparse-applicative 2019-01-18 14:20:44 -04:00
Joey Hess
50a9a77148
fix build with old version of feed 2019-01-18 14:16:22 -04:00
Joey Hess
f8e7ea77fc
check present when testing readonly too
The object is supposed to be present on the readonly remote; have to
assume the location log is right about that, so the presence check
should succeed.
2019-01-17 16:08:25 -04:00
Joey Hess
d5f2463702
misctmp cleanup
* Switch to using .git/annex/othertmp for tmp files other than partial
  downloads, and make stale files left in that directory when git-annex
  is interrupted be cleaned up promptly by subsequent git-annex processes.
* The .git/annex/misctmp directory is no longer used and git-annex will
  delete anything lingering in there after it's 1 week old.

Also, in Annex.Ingest, made the filename it uses in the tmp dir be
prefixed with "ingest-" to avoid potentially using a filename used by
some other code.
2019-01-17 16:02:22 -04:00
Joey Hess
8555169e71
testremote: Support testing readonly remotes with the --test-readonly option
This commit was sponsored by Ilya Shlyakhter on Patreon.
2019-01-17 12:44:52 -04:00
Joey Hess
96aba8eff7
Revert "cache the serialization of a Key"
This reverts commit 4536c93bb2.

That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.

It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.

Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
2019-01-16 16:21:59 -04:00
Joey Hess
f0a57825e2
shorten some too-long descriptions 2019-01-16 14:16:32 -04:00
Joey Hess
4536c93bb2
cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
2019-01-14 16:37:28 -04:00
Joey Hess
5d98cba923
use ByteStrings when reading annex symlinks and pointers
Now there's a ByteString used all the way from disk to Key.

The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.

Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.
2019-01-14 15:37:08 -04:00
Joey Hess
0acbbf208f
use fileKey here
This doesn't change behavior in any way worth mentioning, but it's the
right thing to do.
2019-01-14 13:22:33 -04:00
Joey Hess
303e828b7c
rest of the deserializeKey renameing 2019-01-14 13:17:47 -04:00
Joey Hess
1791447cc8
avoid creating work tree files in subdirectories in an edge case
A keyName could contain "/", though this is unlikely and certianly only
ever could happen with WORM keys.

The change to addunused to escape that is no problem at all.

The change to VariantFile to escape it means that different versions of
git-annex could resolve a merge conflict differently in this case, which
is unfortunate. There would be different .variant files used, so the two
resolutions would themselves merge together without additional
conflicts, but the user would have to clean up the extra .variant
files.
2019-01-14 13:14:25 -04:00
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
727767e1e2
make everything build again after ByteString Key changes 2019-01-11 16:39:46 -04:00
Joey Hess
6f66b53a30
newtype Group to ByteString
This may speed up queries for things in groups, due to Eq and Ord being faster.
2019-01-09 15:05:49 -04:00
Joey Hess
cb375977a6
follow-on changes from MetaData type changes
Including writing and parsing the metadata log files with
bytestring-builder and attoparsec.
2019-01-07 15:51:05 -04:00
Joey Hess
5ba14b5095
build cleanrly when benchmark flag is not enabled 2019-01-05 08:09:28 -04:00
Joey Hess
11d6e2e260
new improved benchmark command that can benchmark anything git-annex does 2019-01-04 13:46:36 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
894716512d
add a UUIDDesc type containing a ByteString
Groundwork for handling uuid.log using ByteString
2019-01-01 16:17:54 -04:00
Joey Hess
9cc6d5549b
convert UUID from String to ByteString
This should make == comparison of UUIDs somewhat faster, and perhaps a
few other operations around maps of UUIDs etc.

FromUUID/ToUUID are used to convert String, which is still used for all
IO of UUIDs. Eventually the hope is those instances can be removed,
and all git-annex branch log files etc use ByteString throughout, for a
real speed improvement.

Note the use of fromRawFilePath / toRawFilePath -- while a UUID usually
contains only alphanumerics and so could be treated as ascii, it's
conceivable that some git-annex repository has been initialized using
a UUID that is not only not a canonical UUID, but contains high unicode
or invalid unicode. Using the filesystem encoding avoids any problems
with such a thing. However, a NUL in a UUID seems extremely unlikely,
so I didn't use encodeBS / decodeBS to avoid their extra overhead in
handling NULs.

The Read/Show instance for UUID luckily serializes the same way for
ByteString as it did for String.
2019-01-01 14:45:33 -04:00
Joey Hess
f4bde87525
fix layout 2019-01-01 12:31:03 -04:00
Joey Hess
6512b40bac
importfeed: Better error message when downloading the feed fails
It used to display the "bad feed content" message indicating there were no
enclosures found, which was misleading when the http request for the feed
failed.

This commit was sponsored by Ewen McNeill on Patreon.
2018-12-30 16:14:55 -04:00
Joey Hess
f943138508
avoid unnecessary monad 2018-12-30 15:59:15 -04:00
Joey Hess
365286279f
unused: Update suggested git log message to see where data was previously used so it will also work with v7 unlocked pointer files. 2018-12-19 13:53:49 -04:00
Joey Hess
6d381df0e6
sync --content: Fix dropping unwanted content from the local repository
This fixes a bug with the numcopies counting when using sync --content.
It did not always pass the local repo uuid to handleDropsFrom, and so the
numcopies counting was off by one, and unwanted local content would only be
dropped when there were numcopies+1 remote copies.

Also, support dropping local content that has reached an
exporttree remote that is not untrusted (currently only S3 remotes
with versioning).
2018-12-18 13:58:12 -04:00
Joey Hess
904be4e6be
add --branch option to git-annex find and mildly deprecate findref in favor of it
No deprecation warning at run time, just one on the man page.

One thing findref remains able to do that find cannot is to run in a bare
repo. Find was made to refuse to run in a bare repo because it seemed
confusing for it to not list any files ever in that situation. It would be
better for find --branch to work in a bare repo but not without --branch
but I don't currently have a way to do that.

Probably a better solution would be to make git-annex in a bare repo
default to --branch master or something like that instead of --all.

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-12-09 14:10:37 -04:00
Joey Hess
029ae8d4db
support findred and --branch with file matching options
* findref: Support file matching options: --include, --exclude,
  --want-get, --want-drop, --largerthan, --smallerthan, --accessedwithin
* Commands supporting --branch now apply file matching options --include,
  --exclude, --want-get, --want-drop to filenames from the branch.
  Previously, combining --branch with those would fail to match anything.
* add, import, findref: Support --time-limit.

This commit was sponsored by Jake Vosloo on Patreon.
2018-12-09 13:38:35 -04:00
Joey Hess
e89bb4361b
distinguish between cached and uncached creds
p2p and multicast creds are not cached the same way that s3 and webdav
creds are. The difference is that p2p and multicast obtain the creds
themselves, as part of a process like pairing. So they're storing the
only extant copy of the creds. In s3 and webdav etc the creds are
provided by the cloud storage provider.

This is a fine difference, but I do think it's a reasonable difference.
If the user wants to prevent s3 and webdav etc creds from being stored
unencrypted on disk, they won't feel the same about p2p auth tokens
used for tor, or a multicast encryption key, or for that matter their
local ssh private key.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-12-04 14:09:18 -04:00
Joey Hess
aa8243df4c
dropunused edge case when annex.thin caused unused object to be modified
dropunused: When an unused object file has gotten modified, eg due to
annex.thin being set, don't silently skip it, but display a warning and let
--force drop it.

This commit was sponsored by Ethan Aubin.
2018-12-04 12:20:34 -04:00
Joey Hess
83109affd1
remove leftovers from removed TestSuite build flag
Test suite is always built, so this can be simplified.
2018-11-19 12:39:16 -04:00
Joey Hess
c8bd5710b1
check onlyActionOn in Drop
* drop -J: Avoid processing the same key twice at the same time when
  multiple annexes files use it.

This prevents a drop of a key conflicting with another drop of the same
key.

This commit was sponsored by Brock Spratlen on Patreon.
2018-11-15 15:43:51 -04:00
Joey Hess
71cc9cfaa2
improve smudge --clean behavior on outside work tree files
smudge: When passed a file located outside the working tree, eg by git
diff, avoid erroring out.

This commit was sponsored by Ewen McNeill on Patreon.
2018-11-15 13:04:40 -04:00
Joey Hess
c3fa1f2b08
avoid redundant export uploads
export, sync --content: Avoid unnecessarily trying to upload files to an
exporttree remote that already contains the files.

When the export was origianly made in one repo and now git-annex is
running in a different repo, the export database is not yet populated with
information about the exportLocation of files. So, it was trying to upload
the files to the export, even when it already contained them.

sync --content would first download the content from the export, and then
re-upload the content back.

And this also led to "not available" failures for each file that was not
locally present yet.

Fix: Just use checkPresentExport before uploading; if it succeeds update
the database.

This is a surprising oversight, it's possible it fixes a reversion because
I would have thought I'd have noticed this problem when originally
developing exporttree remotes.

This commit was sponsored by Jochen Bartl on Patreon.
2018-11-14 11:47:40 -04:00
Joey Hess
d65df7ab21
improve messages around export conflicts
When an export conflict prevents accessing a special remote, be clearer
about what the problem is and how to resolve it.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-11-13 15:50:06 -04:00
Joey Hess
abe4b7ebd6
importfeed: Avoid erroring out when a feed has been repeatedly broken
That can leave other imported files not checked into git, because the git
command queue is not flushed when git-annex errors out. And since it only
happens once git-annex has concluded a feed is broken, it's an intermittent
bug, worst kind. Been seeing it for a while, only tracked down today.

Instead, by returning False, git-annex importfeed will cleanly shutdown and
still exit nonzero.

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-11-04 17:41:49 -04:00
Joey Hess
5ab0f48ffb
high-res mtimes
Cache high-resolution mtimes for improved detection of modified files in v7
(and direct mode).

Including on Windows.

With back-compat support so old low-res mtimes won't break anything, and
so the new information also won't break old versions of git-annex.
2018-10-30 00:41:26 -04:00
Joey Hess
2e9f128dea
moved module and relicensed 2018-10-29 23:13:36 -04:00
Joey Hess
5d97898a7c
touch files with high-resolution timestamp
Needs unix 2.7.2, but that was included in ghc 8.0.1 (and much older)
so not really a new dep.
2018-10-29 22:25:21 -04:00
Joey Hess
4431b82bce
migrate: Fix failure to migrate from URL keys. (Reversion introduced in version 6.20180926) 2018-10-29 16:36:36 -04:00
Joey Hess
a622488758
remove CHECKURL-MULTI single url response special case
Removed undocumented special case in handling of a CHECKURL-MULTI response
with only a single file listed. Rather than ignoring the url that was in
the response, use it. This allows external special remotes that want to
provide some better url to do so, although I don't entirely agree with
using CHECKURL-MULTI to accomplish that. I'm more of the feeling that an
undocumented special case that throws data away is just not a good idea.

This could in theory break some external special remote program that relied
on the current behavior, but its seems unlikely that it would because such
a program must already handle the multiple url case, unless it only ever
provides a single url response to CHECKURL-MULTI.

Make addurl --file work with a single item CHECKURL-MULTI response.
It already did for external special remotes due to the special case,
but now it also will for builtin ones like the BitTorrent special remote.

This commit was sponsored by Ilya Shlyakhter on Patron.
2018-10-29 14:52:12 -04:00
Joey Hess
9f87133bf5
snap --version= to auto-upgrade
This makes --version=6 still work, despite v6 not being in
supportedVersions. Which is useful for scripts that use it.

I didn't document it on the man page, because it's indistinguishable
from an automatic upgrade after initting as v6.
2018-10-26 11:44:05 -04:00
Joey Hess
234842a347
v7
Install new git hooks in this version.

This does beg the question of what to do if git later gets eg a
post-smudge hook, that could run git-annex smudge --update. I think the
thing to do in that case would be to make git-annex smudge --update
install the new hooks. That way, as the user uses git-annex, the hook
would be created pretty quickly and without needing any extra syscalls
except for when git-annex smudge --update is called.

I considered doing something like that for installation of the
post-checkout and post-merge hooks, which would have avoided the need
for v7. But the only place it was cheap to do it would be in git-annex smudge
which could cheaply notice that smudge.log didn't exist yet and so know
the hooks needed to be installed. But since smudge used to populate pointer
files, it would be quite surprising if a single git checkout/merge failed
to update the work tree, and so that idea didn't work out.

The other reason for v7 is psychological -- users don't need to worry
about whether they might be running an old version of git-annex that
doesn't support their v7 repository very well. And bug reports about
"v6" have gotten a bit of a bad association in my head since they often
hit one of the known limitations and didn't realize it was experimental.

newtyped RepoVersion Int to avoid needing 2 comparisons in
versionSupportsUnlockedPointers etc. Also it's just nicer.

This commit was sponsored by John Pellman on Patreon.
2018-10-25 18:24:23 -04:00
Joey Hess
c28ca8294f
optimize smudge --clean of unmodified file
Usually, git won't run clean filter when a file is unmodified. But, when
git checkout runs git annex smudge --update, it populates the pointer
runs git update-index, which sees the file has changed and runs
git annex smudge --clean, which was checksumming the file unncessarily
as it re-ingested it.

With annex.thin set, this is the difference between git checkout of a
branch with a 1 gb file taking 30s and 0.1s.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-10-25 16:46:46 -04:00
Joey Hess
ca7de61454
git post-checkout and post-merge hooks
* init, upgrade: Install git post-checkout and post-merge hooks that run
  git annex smudge --update.
* precommit: Run git annex smudge --update, because the post-merge
  hook is not run when there is a merge conflict. So the work tree will
  be updated when a commit is made to resolve the merge conflict.
* precommit: Run git annex smudge --update, because the post-merge
  hook is not run when there is a merge conflict. So the work tree will
  be updated when a commit is made to resolve the merge conflict.
* Note that git has no hooks run after git stash or git cherry-pick,
  so the user will have to manually run git annex smudge --update
  after such commands.

Nothing currently installs the hooks into v6 repos that already exist.
Something will need to be done about that, either move this behavior to v7,
or document that the user will need to manually fix up their v6 repos.

This commit was sponsored by Eric Drechsel on Patreon.
2018-10-25 15:59:51 -04:00
Joey Hess
917a2c6095
defer updating unlocked files until after smudge filter
The smuge filter no longer provides git with annexed file content, to
avoid a git memory leak, and because that did not honor annex.thin.

git annex smudge --update has to be run after a checkout to update
unlocked files in the working tree with annexed file contents.

No hooks yet to run it.

This commit was sponsored by Nick Piper on Patreon.
2018-10-25 15:08:20 -04:00
Joey Hess
f2a4db724c
remove redundant test
populatePointerFile checks the same thing
2018-10-25 14:31:45 -04:00
Joey Hess
fcca7adaff
instrument P2P --debug with connection and thread info
For debugging http://git-annex.branchable.com/bugs/annex_get_-J_16_via_ssh_stalls_/

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-10-22 15:52:11 -04:00
Joey Hess
4a6ebb1034
make sync update adjusted branch to hide/unhide
This completes initial support for --hide-missing, although the
assistant still needs to be updated and it perhaps needs to be sped up,
and maybe there needs to be a way for git-annex get to operate on
missing files. Opened some more todos for those things.

This commit was sponsored by Henrik Riomar.
2018-10-20 14:22:28 -04:00
Joey Hess
4a788fbb3b
sync --content now supports --hide-missing adjusted branches
This relies on git ls-files --with-tree, which I'm using in a way that
its man page does not document. Hm. I emailed the git list to try to get
the docs improved, but at least the git test suite does test the same
kind of use case I'm using here.

Performance impact when not in an adjusted branch is limited to some
additional MVar accesses, and a single git call to determine the name of
the current branch. So very minimal.

When in an adjusted branch, the performance impact is
in Annex.WorkTree.lookupFile, which starts doing an equal amount of work
for files that didn't exist as it already did for files that were
unlocked.

This commit was sponsored by Jochen Bartl on Patreon.
2018-10-19 17:51:25 -04:00
Joey Hess
8be5a7269a
refactor getCurrentBranch
Both Command.Sync and Annex.Ingest had their own versions of this.

The one in Annex.Ingest used Git.Branch.currentUnsafe, but does not seem
to need it. That is only checking to see if it's in an adjusted unlocked
branch, and when in an adjusted branch, the branch does in fact exist,
so the added check that Git.Branch.current does is fine.

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-10-19 17:29:18 -04:00
Joey Hess
24838547e2
adjust --hide-missing
* At long last there's a way to hide annexed files whose content
  is missing from the working tree: git-annex adjust --hide-missing
* When already in an adjusted branch, running git-annex adjust
  again will update the branch as needed. This is mostly
  useful with --hide-missing to hide/unhide files after their content
  has been dropped or received.

Still needs integration with sync and the assistant, and not as fast as it
could be, but already usable.

This commit was sponsored by Ethan Aubin.
2018-10-18 15:32:42 -04:00
Joey Hess
a6c8de84b6
improve types to allow combining some adjustments
Combinations like --hide-misssing --unlocked seem very useful. On the
other hand, combining --fix with --unlock doesn't make sense because a
file can be either unlocked or a symlink that can be fixed, but not
both.

Changed the serialization of HideMissingAdjustment in passing, but it
has not actually been used yet so nothing will be broken.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-10-18 12:59:05 -04:00
Joey Hess
558520d27a
fix rekey/migrate bookkeeping in v6
After 220317df5a the test suite still
detected a problem; migrate of an unlocked file replaced it with a
pointer file rather than a file with the content.

This was a bookeeping problem; the worktree file was being copied to the object
file and the inode cache updated, but if that database write didn't get
flushed in time, later checks would think the content was not present.
Fixed by copying the object file to the worktree file instead, which
avoids needing to update the inode cache.

Also, only copy when there's a hard link to break, not always.

This commit was sponsored by Brock Spratlen on Patreon.
2018-10-16 17:18:21 -04:00
Joey Hess
220317df5a
v6: fix migrate of unlocked file
After commit b2bafdb2fc the test suite
threw up a failure migrating unlocked files.

I'm not clear how that commit broke it (presumably by inAnnex reporting
the right information now), but the actual problem is plain:
The inodecache for the worktree file is generated, but then the file is
replaced with a copy (unncessarily unless annex.link is set, but the
code always does so) and so linkToAnnex/linkAnnex then fails because it
notices the inode cache is not valid.

This commit was sponsored by Jake Vosloo on Patreon.
2018-10-16 16:45:08 -04:00
Joey Hess
bdf6783b92
improve error message 2018-10-16 15:52:40 -04:00
Joey Hess
40dba8e933
prevent find running in bare repo 2018-10-16 10:44:09 -04:00
Joey Hess
38d691a10f
removed the old Android app
Running git-annex linux builds in termux seems to work well enough that the
only reason to keep the Android app would be to support Android 4-5, which
the old Android app supported, and which I don't know if the termux method
works on (although I see no reason why it would not).
According to [1], Android 4-5 remains on around 29% of devices, down from
51% one year ago.

[1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/

This is a rather large commit, but mostly very straightfoward removal of
android ifdefs and patches and associated cruft.

Also, removed support for building with very old ghc < 8.0.1, and with
yesod < 1.4.3, and without concurrent-output, which were only being used
by the cross build.

Some documentation specific to the Android app (screenshots etc) needs
to be updated still.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-10-13 01:41:11 -04:00
Joey Hess
91b799d1a6
export: Fix false positive in export conflict detection
It occurred when the same tree was exported by multiple clones. nub out
identical trees.

This commit was sponsored by Jochen Bartl on Patreon.
2018-10-09 15:54:12 -04:00
Joey Hess
451171b7c1
clean up url removal presence update
* rmurl: Fix a case where removing the last url left git-annex thinking
  content was still present in the web special remote.
* SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING
  used to update the presence information of the external special remote
  that called them; this was not documented behavior and is no longer done.

Done by making setUrlPresent and setUrlMissing only update presence info
for the web, and only when the url is a web url. See the comment for
reasoning about why that's the right thing to do.

In AddUrl, had to make it update location tracking, to handle the
non-web-url case.

This commit was sponsored by Ewen McNeill on Patreon.
2018-10-04 17:35:49 -04:00
Joey Hess
53526136e8
move commandAction out of CmdLine.Seek
This is groundwork for nested seek loops, eg seeking over all files and
then performing commandActions on a list of remotes, which can be done
concurrently.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-10-01 14:12:06 -04:00
Joey Hess
9adee3f2fb
sync: Warn when a remote's export is not updated to the current tree because export tracking is not configured.
Only display the warning when the current branch has a tree that is not
the same as the tree in the export.

Note that it doesn't check to see if the current tree is
in incompleteExportedTreeish; it might be worth checking that and reminding
the user about an incomplete export, but when export tracking is not
configured, they are probably not in the right clone of the repository to
resolve the incomplete export.

This commit was sponsored by Ethan Aubin.
2018-09-27 15:41:18 -04:00
Joey Hess
6134431254
clean P2P protocol shutdown on EOF try 2
Same goal as b18fb1e343 but without
breaking backwards compatability. Just return IO exceptions when running
the P2P protocol, so that git-annex-shell can detect eof and avoid the
ugly message.

This commit was sponsored by Ethan Aubin.
2018-09-25 16:49:59 -04:00
Joey Hess
4ecba916a1
annex.maxextensionlength
Added annex.maxextensionlength for use cases where extensions longer than 4
characters are needed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-09-24 12:10:18 -04:00
Joey Hess
1d1054faa6
added -z
Added -z option to git-annex commands that use --batch, useful for
supporting filenames containing newlines.

It only controls input to --batch, the output will still be line delimited
unless --json or etc is used to get some other output. While git often
makes -z affect both input and output, I don't like trying them together,
and making it affect output would have been a significant complication,
and also git-annex output is generally not intended to be machine parsed,
unless using --json or a format option.

Commands that take pairs like "file key" still separate them with a space
in --batch mode. All such commands take care to support filenames with
spaces when parsing that, so there was no need to change it, and it would
have needed significant changes to the batch machinery to separate tose
with a null.

To make fromkey and registerurl support -z, I had to give them a --batch
option. The implicit batch mode they enter when not provided with input
parameters does not support -z as that would have complicated option
parsing. Seemed better to move these toward using the same --batch as
everything else, though the implicit batch mode can still be used.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-09-20 16:11:47 -04:00
Joey Hess
50217f62a1
avoid duplicate add action for v6 unlocked modified file
The new second pass sees the file as type changed because the first
pass's changes have typically not reached git yet. So, have to
explicitly check for unmodified files in the second pass.

Note that, if the file has been touched but not really modified,
the first pass will handle it, and so the second pass does nothing.

This commit was sponsored by Jochen Bartl on Patreon.
2018-09-12 15:20:34 -04:00
Joey Hess
2743224658
change v6 git-annex add of staged unmodified unlocked file
v6: When a file is unlocked but has not been modified, and the unlocking is
only staged, git-annex add did not lock it. Now it will, for consistency
with how modified files are handled and with v5.

Note the removal of the sameInodeCache check. Otherwise it would see
that the unmodified file is unmodified and stop there. That check seems to have
been copied from the direct mode branch. But, direct mode had a specific
reason to check for unmodified content, that does not apply to v6.

The second pass means there is potential for a race, eg the unlocked
file could be modified in between the first and second passes.
No problem with that, since both passes do the same thing.

This commit was sponsored by Jake Vosloo on Patreon.
2018-09-12 14:00:05 -04:00
Joey Hess
fcff64f8bb
optimisation: avoid stat call
This commit was sponsored by Paul Walmsley on Patreon.
2018-09-05 17:26:12 -04:00
Joey Hess
d65a081f3f
improve message 2018-09-02 16:17:50 -04:00
Joey Hess
d0ef049cca
comment typo 2018-09-02 16:16:08 -04:00
Joey Hess
5c99f6247e
per-remote metadata storage
Actually very straightforward reuse of the metadata log file code.
Although I had to add a todo item as git-annex forget won't clean up
dead remote's metadata yet.

This would be worth adding to the external special remote interface
sometime. Have not opened a todo though, guess I'll wait until something
needs it.

This commit was supported by the NSF-funded DataLad project.
2018-08-31 12:23:22 -04:00
Joey Hess
76f32012af
avoid sync/assistant drop from appendonly
Make git-annex sync and the assistant skip trying to drop from appendonly
remotes since it's just going to fail.

git-annex drop and similar commands will still try to drop from
appendonly, so the user will see failure messages when they try to do
that. To do otherwise would be confusing since the user has explicitly
asked for a drop with those commands.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:23:57 -04:00
Joey Hess
8b39db20b5
export appendonly support
Make `git annex export` check appendonly when removing a file from an
export, and not update the location log, since the remote still contains
the content.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:18:20 -04:00
Joey Hess
6001b3cf45
fix build warning 2018-08-28 13:17:06 -04:00
Joey Hess
10138056dc
v6: avoid accidental conversion when annex.largefiles is not configured
v6: When annex.largefiles is not configured for a file, running git add or
git commit, or otherwise using git to stage a file will add it to the annex
if the file was in the annex before, and to git otherwise. This is to avoid
accidental conversion.

Note that git-annex add's behavior has not changed, for reasons explained
in the added comment.

Performance: No added overhead when annex.largefiles is configured.
When not configured, there is an added call to catObjectMetaData,
which involves a round trip through git cat-file --batch.
However, the earlier catKeyFile primes the cache for it.

This commit was supported by the NSF-funded DataLad project.
2018-08-27 14:51:10 -04:00
Joey Hess
98fd7ec6c9
recover from race between git mv+commit and git-annex get
Last of the known v6 races.

This also makes git add of a pointer file populate it when its content
is present in the annex. Which makes sense to do, I think.

This commit was supported by the NSF-funded DataLad project.
2018-08-22 16:01:50 -04:00
Joey Hess
7ee3b02d49
replace stack trace with an explanation 2018-08-20 21:26:07 -04:00
Joey Hess
48e9e12961
finally fixed v6 get/drop git status
After updating the worktree for an add/drop, update git's index, so git
status will not show the files as modified.

What actually happens is that the index update removes the inode
information from the index. The next git status (or similar) run
then has to do some work. It runs the clean filter.

So, this depends on the clean filter being reasonably fast and on git
not leaking memory when running it. Both problems were fixed in
a96972015d, but only for git 2.5. Anyone
using an older git will see very expensive git status after an add/drop.

This uses the same git update-index queue as other parts of git-annex, so
the actual index update is fairly efficient. Of course, updating the index
does still have some overhead. The annex.queuesize config will control how
often the index gets updated when working on a lot of files.

This is an imperfect workaround... Added several todos about new
problems this workaround causes. Still, this seems a lot better than the
old behavior.

This commit was supported by the NSF-funded DataLad project.
2018-08-14 16:23:58 -04:00
Joey Hess
a96972015d
massive v6 add speed/memory improvement
v6 add: Take advantage of improved SIGPIPE handler in git 2.5 to speed up
the clean filter by not reading the file content from the pipe. This also
avoids git buffering the whole file content in memory.

When built with an older git, still consumes stdin. If built with a newer
git and used with an older one, it breaks, but that's acceptable --
checking the git version every time would make repeated smudge runs slow.

This commit was supported by the NSF-funded DataLad project.
2018-08-09 18:17:46 -04:00
Joey Hess
12460fcea6
make --batch honor matching options
When --batch is used with matching options like --in, --metadata, etc, only
operate on the provided files when they match those options. Otherwise, a
blank line is output in the batch protocol.

Affected commands: find, add, whereis, drop, copy, move, get

In the case of find, the documentation for --batch already said it honored
the matching options. The docs for the rest didn't, but it makes sense to
have them honor them. While this is a behavior change, why specify the
matching options with --batch if you didn't want them to apply?

Note that the batch output for all of the affected commands could
already output a blank line in other cases, so batch users should
already be prepared to deal with it.

git-annex metadata didn't seem worth making support the matching options,
since all it does is output metadata or set metadata, the use cases for
using it in combination with the martching options seem small. Made it
refuse to run when they're combined, leaving open the possibility for later
support if a use case develops.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-08-08 12:07:06 -04:00
Joey Hess
4d4d238a08
add missing type signature 2018-08-06 15:41:44 -04:00
Joey Hess
38ddd6072d
addurl: Include filename in --json-progress output when known. 2018-08-06 12:53:44 -04:00
Joey Hess
ae11394efa
added annex.commitmessage
Added annex.commitmessage config that can specify a commit message for the
git-annex branch instead of the usual "update".

This commit was supported by the NSF-funded DataLad project.
2018-08-02 14:06:06 -04:00
Joey Hess
fd5a392006
cache remotes via annex-speculate-present
Added remote.name.annex-speculate-present config that can be used to
make cache remotes.

Implemented it in Remote.keyPossibilities, which is used by the
get/move/copy/mirror commands, and nothing else. This way, things like
whereis will not show content that's speculatively present.

The assistant and sync --content were not using Remote.keyPossibilities,
and were changed to use it.

The efficiency hit should be small; Remote.keyPossibilities is only
used before transferring a file, which is the expensive operation.
And, it's only doing one lookup of the remoteList and a very cheap
filter over it.

Note that, git-annex still updates the location log when copying content
to a remote with annex-speculate-present set. In this case, the location
tracking will indicate that content is present in the remote. This may
not be wanted for caches, or may not be a real problem for them. TBD.

This commit was supported by the NSF-funded DataLad project.
2018-08-01 14:28:05 -04:00
Joey Hess
cc2cb46857
unused --from: Allow specifiying a repository by uuid or description.
This commit was sponsored by Jake Vosloo on Patreon.
2018-07-11 16:01:35 -04:00
Joey Hess
79ac177ea5
improve tmp file cleanup
If youtubeDl fails, remove the tmp file. Here tmp is the
html file downloaded to check if the url is html, not what youtube-dl
might have started to download. If the tmp file were retained, a
re-run of addurl would try to resume downloading it, which the web
server might not support, causing the resume to fail.
And it's a smallish html page anyway so no benefit to
keeping it for such a resume.
2018-06-28 12:51:51 -04:00
Joey Hess
dc6cb6aa5f
Merge branch 'later' 2018-06-25 21:59:20 -04:00
Joey Hess
6091b7b9db
info: Display uuid and description when a repository is identified by uuid, and for "here". 2018-06-24 17:38:18 -04:00
Joey Hess
b657242f5d
enforce retrievalSecurityPolicy
Leveraged the existing verification code by making it also check the
retrievalSecurityPolicy.

Also, prevented getViaTmp from running the download action at all when the
retrievalSecurityPolicy is going to prevent verifying and so storing it.

Added annex.security.allow-unverified-downloads. A per-remote version
would be nice to have too, but would need more plumbing, so KISS.
(Bill the Cat reference not too over the top I hope. The point is to
make this something the user reads the documentation for before using.)

A few calls to verifyKeyContent and getViaTmp, that don't
involve downloads from remotes, have RetrievalAllKeysSecure hard-coded.
It was also hard-coded for P2P.Annex and Command.RecvKey,
to match the values of the corresponding remotes.

A few things use retrieveKeyFile/retrieveKeyFileCheap without going
through getViaTmp.
* Command.Fsck when downloading content from a remote to verify it.
  That content does not get into the annex, so this is ok.
* Command.AddUrl when using a remote to download an url; this is new
  content being added, so this is ok.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-06-21 13:37:01 -04:00
Joey Hess
28720c795f
limit url downloads to whitelisted schemes
Security fix! Allowing any schemes, particularly file: and
possibly others like scp: allowed file exfiltration by anyone who had
write access to the git repository, since they could add an annexed file
using such an url, or using an url that redirected to such an url,
and wait for the victim to get it into their repository and send them a copy.

* Added annex.security.allowed-url-schemes setting, which defaults
  to only allowing http and https URLs. Note especially that file:/
  is no longer enabled by default.

* Removed annex.web-download-command, since its interface does not allow
  supporting annex.security.allowed-url-schemes across redirects.
  If you used this setting, you may want to instead use annex.web-options
  to pass options to curl.

With annex.web-download-command removed, nearly all url accesses in
git-annex are made via Utility.Url via http-client or curl. http-client
only supports http and https, so no problem there.
(Disabling one and not the other is not implemented.)

Used curl --proto to limit the allowed url schemes.

Note that this will cause git annex fsck --from web to mark files using
a disallowed url scheme as not being present in the web. That seems
acceptable; fsck --from web also does that when a web server is not available.

youtube-dl already disabled file: itself (probably for similar
reasons). The scheme check was also added to youtube-dl urls for
completeness, although that check won't catch any redirects it might
follow. But youtube-dl goes off and does its own thing with other
protocols anyway, so that's fine.

Special remotes that support other domain-specific url schemes are not
affected by this change. In the bittorrent remote, aria2c can still
download magnet: links. The download of the .torrent file is
otherwise now limited by annex.security.allowed-url-schemes.

This does not address any external special remotes that might download
an url themselves. Current thinking is all external special remotes will
need to be audited for this problem, although many of them will use
http libraries that only support http and not curl's menagarie.

The related problem of accessing private localhost and LAN urls is not
addressed by this commit.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-16 11:57:50 -04:00
Joey Hess
391a83c985
remove unused value 2018-06-14 12:32:36 -04:00
Joey Hess
b6e4ed9aa7
export: re-send lost exported files after fsck notices they're gone
When content has been lost from an export remote and  git-annex fsck --from
remote has noticed it's gone, re-running git-annex export or git-annex sync
--content will re-upload it.

Note that normally there's no way to remove a single file from an export.
doc/design/exporting_trees_to_special_remotes.mdwn talks about this
in the section "dropping from exports and copying to exports". But, if
a file is somehow deleted or corrupted on the export, and fsck notices
this, it will update the location log to say it's missing.

So, checking the location log when determining if a file needs to be sent
to the export will let such missing files be added back in. There's
otherwise no way to do so. It does not fall afoul of the races documented
in the abovementioned section, I think.

This commit was sponsored by Ryan Newton on Patreon.
2018-06-14 12:22:12 -04:00
Joey Hess
a5f598a6aa
remove use of remoteGitConfig
Unfortunately one more use remains..

This should be just as fast as the other method. The remote's Git.Repo
has already had its config read, so Annex.new's call to Git.Config.read
is a noop.

Thid commit was sponsored by andrea rota.
2018-06-05 13:15:04 -04:00
Joey Hess
67e46229a5
change Remote.repo to Remote.getRepo
This is groundwork for letting a repo be instantiated the first time
it's actually used, instead of at startup.

The only behavior change is that some old special cases for xmpp remotes
were removed. Where before git-annex silently did nothing with those
no-longer supported remotes, it may now fail in some way.

The additional IO action should have no performance impact as long as
it's simply return.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2018-06-04 15:30:26 -04:00
Joey Hess
2e6a6024c2
avoid unncessary version output differences in different contexts
Show operating system and repository version list when run outside
a git repo too.

Also made it only display the local repository version when in a git-annex
repo. Before it showed "unknown" when run in a git repo that was not
git-annex initialized. That seemed like confusing behavior.

This commit was sponsored by Jochen Bartl on Patreon.
2018-06-04 12:26:18 -04:00
Joey Hess
1c8ee99b46
Fix build with ghc 8.4+, which broke due to the Semigroup Monoid change
https://prime.haskell.org/wiki/Libraries/Proposals/SemigroupMonoid

I am not happy with the fragile pile of CPP boilerplate required to support
ghc back to 7.0, which git-annex still targets for both the android build
and the standalone build targeting old linux kernels. It makes me unlikely
to want to use Semigroup more in git-annex, because the benefit of the
abstraction is swamped by the ugliness. I actually considered ripping out
all the Semigroup instances, but some are needed to use
optparse-applicative.

The problem, I think, is they made this transaction on too fast a timeline.
(Although ironically, work on it started in 2015 or earlier!)
In particular, Debian oldstable is not out of security support, and it's
not possible to follow the simpler workarounds documented on the wiki and
have it build on oldstable (because the semigroups package in it is too
old).

I have only tested this build with ghc 8.2.2, not the newer and older
versions that branches of the CPP support. So there could be typoes, we'll
see.

This commit was sponsored by Brock Spratlen on Patreon.
2018-05-30 12:28:43 -04:00
Joey Hess
c3064edac9
setpresentkey: Added --batch support (for ronnypfa)
This commit was sponsored by Peter on Patreon.
2018-05-27 14:56:14 -04:00
Joey Hess
85f9360d9b
GIT_ANNEX_SHELL_APPENDONLY
Makes it allow writes, but not deletion of annexed content. Note that
securing pushes to the git repository is left up to the user.

This commit was sponsored by Jack Hill on Patreon.
2018-05-25 13:17:56 -04:00
Joey Hess
2da2ae0919
fix migration bug and make fsck warn
* migrate: Fix bug in migration between eg SHA256 and SHA256E,
  that caused the extension to be included in SHA256 keys,
  and omitted from SHA256E keys.
  (Bug introduced in version 6.20170214)
* migrate: Check for above bug when migrating from SHA256 to SHA256
  (and same for SHA1 to SHA1 etc), and remove the extension that should
  not be in the SHA256 key.
* fsck: Detect and warn when keys need an upgrade, either to fix up
  from the above migrate bug, or to add missing size information
  (a long ago transition), or because of a few other past key related
  bugs.

This commit was sponsored by Henrik Riomar on Patreon.
2018-05-23 14:07:51 -04:00
Joey Hess
2fabd7cdb5
remove the older move --force, which never behaved as documented and seems useless
* move: --force was accidentially enabling two unrelated behaviors
  since 6.20180427. The older behavior, which has never been well
  documented and seems almost entirely useless, has been removed.
* copy: --force no longer does anything.

This commit was sponsored by Øyvind Andersen Holm.
2018-05-21 13:21:19 -04:00
Joey Hess
442e607b0a
Don't allow entering a view with staged or unstaged changes.
In some cases, unstaged changes are safe, eg dotfiles in the top which
are not affected by a view. Or non-annexed files in general which would
prevent view branch checkout from proceeding. But in other cases,
particularly unstaged changes to annexed files, entering a view would wipe
out those changes! And so don't allow entering a view with any unstaged
changes.

Staged changes are not safe when entering a view, because the changes get
committed to the view branch, and so the user is unlikely to remember them
when they exit the view, and so will effectively lose them, even if they're
still present in the view branch.

Also, improved the git status parser, although the improvement turned out
to not really be needed.

This commit was sponsored by Eric Drechsel on Patreon.
2018-05-14 16:51:06 -04:00
Joey Hess
d7021d420f
reuse hashes of dotfiles/dirs/submodules when entering view
This fixes a crash when a git submodule has a name starting with a dot.
Such a submodule might contain dotfiles that are intended to be used when
inside the view (since a dot-directory that's not a submodule was already
preserved when entering a view). So, rather than eliminating the submodule
from the view, its git ls-files --stage hash is copied over into the view.

dotfiles/dirs have their git ls-files --stage hashes similarly copied over
to the view. This is more efficient and simpler than the old method,
and also won't break if git ever adds a new type of tree item, like was
done with submodules.

Since the content of dotfiles in the working tree is no longer hashed
when entering a view, when there are unstaged modifications, they are
not included in the view branch. Entering the view branch still works,
but git checkout shows "M .dotfile", and git diff will show the unstaged
changes. This seems like an improvement over the old behavior.

Also made Command.View not delete empty directories that are submodules
when entering a view, while still deleting other empty directories.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 15:35:20 -04:00
Joey Hess
0b7f6d24d3
rename BlobType and add submodule to it
This was badly named, it's a not a blob necessarily, but anything that a
tree can refer to.

Also removed the Show instance which was used for serialization to git
format, instead use fmtTreeItemType.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 14:45:41 -04:00
Joey Hess
2fc768ce72
avoid git annex info remote buffering list of keys
This leaves git annex unused --from remote still using loggedKeysFor
and buffering more than ought to be necessary, but I can't see a way to
improve that.
2018-04-26 16:13:05 -04:00
Joey Hess
bea0ad220a
avoid --all buffering list of all keys
In Annex.Branch.branch, the (++) was killing laziness.
Rewrote so it streams lazily.

filterM also kills laziness, so made loggedKeys use a Unchecked type,
and check if the key is dead in the seek loop.

Note that loggedKeysFor still buffers, so git-annex info <remote> and
git-annex unused --from remote still use more memory than necessary.

Also removed some unused functions from Annex.Journal.
2018-04-26 16:00:20 -04:00
Joey Hess
9807e5bead
fix webapp opening in termux
Open real url not html shim since android and file:// urls is a nasty
kettle of fish.

This commit was sponsored by John Pellman on Patreon.
2018-04-25 14:38:42 -04:00
Joey Hess
89e1a05a8f
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.

It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.

This commit was supported by the NSF-funded DataLad project.
2018-04-16 16:21:21 -04:00
Joey Hess
f56594af9e
finish fixing inverted Ord for TrustLevel
Flipped all comparisons. When a TrustLevel list was wanted from Trusted
downwards, used Down to compare it in that order.

This commit was sponsored by mo on Patreon.
2018-04-13 15:17:54 -04:00
Joey Hess
a0e4b9678b
fix inverted Ord for TrustLevel (intermediate commit)
This commit removes the Ord and Enum instances, commenting out all code
that depends on them, to make sure that all code effected by the
inversion fix has been identified.

(Assuming no ifdefs involve TrustLevel.)

The next commit will fix up all the identified code.
2018-04-13 14:50:14 -04:00
Joey Hess
1831cc4a7d
remove unused import 2018-04-13 14:43:29 -04:00
Joey Hess
64980db7d9
move: Avoid drops that make bad situations worse, but otherwise allow
See the big comment at the bottom of Command.Drop for the full details.

(The --safe/--unsafe options were never released.)

This commit was sponsored by Jake Vosloo on Patreon.
2018-04-13 14:36:43 -04:00
Joey Hess
4b8c289154
display addurl url not file
The file gets displayed after download is complete, so this is the
simplest way to avoid redundant display.
2018-04-13 01:37:46 -04:00
Joey Hess
4cda021acc
remove redundant meter
This was stacked with another one, resulting in an extra newline
2018-04-13 01:23:09 -04:00
Joey Hess
b4a2bcaf4c
add missing newline between importfeed and subsequent addurl
got lost when wget was eliminated
2018-04-13 01:12:22 -04:00
Joey Hess
af8546990d
move: --safe/--unsafe and potential drop race fix
move: Added --safe option, which makes move honor numcopies settings.
Also --unsafe enables the default behavior, anticipating that the
default may one day change.

This commit was sponsored by Ethan Aubin.
2018-04-09 16:20:10 -04:00
Joey Hess
ae530f043e
disentagle copy and move option parsing 2018-04-09 14:38:46 -04:00
Joey Hess
0106752db2
refactor FromToHereOptions 2018-04-09 14:29:28 -04:00
Joey Hess
c34152777b
Use http-conduit for url downloads by default, annex.web-options enables curl
* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 17:36:20 -04:00
Joey Hess
6cb5b7294f
info: Changed sorting of numcopies stats table, so it's ordered by the variance from the desired number of copies.
Compare these...

numcopies stats:
	numcopies -1: 1986
	numcopies +0: 1170
	numcopies -2: 769
	numcopies +1: 716
	numcopies -4: 696
	numcopies -3: 485
	numcopies -6: 230
	numcopies -5: 111
	numcopies -7: 91
	numcopies -9: 9

numcopies stats:
	numcopies +1: 716
	numcopies +0: 1170
	numcopies -1: 1986
	numcopies -2: 769
	numcopies -3: 485
	numcopies -4: 696
	numcopies -5: 111
	numcopies -6: 230
	numcopies -7: 91
	numcopies -9: 9

I feel that the former is a jumbled mess that doesn't tell much overall,
while the second shows pretty clearly that most files are within 1 degree
of the desired number of copies, with some outliers without enough.
2018-04-05 14:54:39 -04:00
Joey Hess
817ebb5765
info: Added "combined size of repositories containing these files" stat
when run on a directory

This commit was sponsored by andrea rota.
2018-04-05 14:44:58 -04:00
Joey Hess
9b98d3f630
better HTTP connection reuse
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.

Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.

Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.

This commit was supported by the NSF-funded DataLad project.
2018-04-04 15:39:40 -04:00
Joey Hess
2ec07bc29f
Avoid running annex.http-headers-command more than once. 2018-04-04 15:15:08 -04:00
Joey Hess
46d4316954
implement annex.retry et al
Added annex.retry, annex.retry-delay, and per-remote versions to configure
transfer retries.

This commit was supported by the NSF-funded DataLad project.
2018-03-29 13:04:07 -04:00
Joey Hess
ae75eb06bc
exporttree support for adb special remote
This commit was sponsored by Michael Magin.
2018-03-27 16:28:41 -04:00
Joey Hess
ed81762c86
avoid compiler warning
add type sig so it's clear createtfile returns unit
2018-03-15 13:21:32 -04:00
Joey Hess
10d3b7fc62
Fix reversion introduced in 6.20171214 that caused concurrent transfers to incorrectly fail with "transfer already in progress".
Avoid creating transfer info file before transfer lock is created and
locked.

The wrong order for one thing caused transfer info to be overwritten
when a transfer was already in progress.

But worse, it caused checkTransfer to see the transfer info,
and so lock the transfer lock in order to verify the transfer was not in
progress. Which in a concurrent situation, prevented the transferrer
from locking the transfer lock, so it failed with "transfer already in
progress".

Note that the transferinfo command does not lock the transfer lock
before creating the transfer info. But, that's only run after
recvkey is running, and recvkey does lock the transfer lock, so that
seems more or less ok. (Other than being a super complicated legacy mess
that the P2P code has mostly obsoleted now.)

This commit was supported by the NSF-funded DataLad project.
2018-03-14 18:55:34 -04:00
Joey Hess
31e1adc005
deal with unlocked files
P2P protocol version 1 adds VALID|INVALID after DATA; INVALID means the
file was detected to change content while it was being sent and so we
may not have received the valid content of the file.

Added new MustVerify constructor for Verification, which forces
verification even when annex.verify=false etc. This is used when INVALID
and in protocol version 0.

As well as changing git-annex-shell p2psdio, this makes git-annex tor
remotes always force verification, since they don't yet use protocol
version 1. Previously, annex.verify=false could skip verification when
using tor remotes, and let bad data into the repository.

This commit was sponsored by Jack Hill on Patreon.
2018-03-13 14:27:14 -04:00
Joey Hess
e16b069331
use total size from DATA
Noticed that getting a key whose size is not known resulted in a
progress display that didn't include the percent complete.

Fixed for P2P by making the size sent with DATA be used to update the
meter's total size.

In order for rateLimitMeterUpdate to also learn the total size,
had to make it be passed the Meter, and some other reorg in
Utility.Metered was also done so that --json-progress can construct a
Meter to pass to rateLimitMeterUpdate.

When the fallback rsync is done, the progress display still doesn't
include the percent complete. Only way to fix that seems to be to let rsync
display its output again, but that would conflict with git-annex's
own progress meter, which is also being displayed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-03-12 21:46:58 -04:00
Joey Hess
596af7cbc4
move protocol version stuff to the Net free monad
Needs to be in Net not Local, so that Net actions can take the protocol
version into account.

This commit was sponsored by an anonymous bitcoin donor.
2018-03-12 15:20:51 -04:00
Joey Hess
c81768d425
version the P2P protocol
Unfortunately ReceiveMessage didn't handle unknown messages the way it
was documented to; client sending VERSION would cause the server to
return an ERROR and hang up. Fixed that, but old releases of git-annex
use the P2P protocol for tor and will still have that behavior.

So, version is not negotiated for Remote.P2P connections, only for
Remote.Git connections, which will support VERSION from their first
release. There will need to be a later flag day to change Remote.P2P;
left a commented out line that is the only thing that will need to be
changed then.

Version 1 of the P2P protocol is not implemented yet, but updated
the docs for the DATA change that will be allowed by that version.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-03-12 14:36:35 -04:00
Joey Hess
6a59bc4845
use P2P protocol for drop
Not yet used for everything else, but this is enough to
verify that it works, and do some benchmarking.

Some bugfixes included, which got it working. Also fallback to old
actions has been verified to work correctly.

Benchmarked dropping one thousand files from a ssh remote on localhost.
Using the old git-annex	40.867 seconds.
With the P2P protocol	9.905 seconds!

This commit was sponsored by Jochen Bartl on Patreon.
2018-03-08 16:56:17 -04:00
Joey Hess
c036a380b2
p2p ssh connection pools
Much like Remote.P2P, there's a pool of connections to a peer, in order
to support concurrent operations.

Deals with old git-annex-ssh on the remote that does not support p2pstdio,
by only trying once to use it, and remembering if it's not supported.

Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual
purposes of something to detect to see that the connection is working,
and a way to verify that it's connected to the right uuid.
(There's a redundant uuid check since the uuid field is sent
by git_annex_shell, but I anticipate that being removed later when
the legacy git-annex-shell stuff gets removed.)

Not entirely happy with Remote.Git.runSsh's behavior
when the proto action fails. Running the fallback will work ok, but what
will we do when the fallbacks later get removed? It might be better to
try to reconnect, in case the connection got closed.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-03-08 15:11:31 -04:00
Joey Hess
6ddfa9807b
implemented git-annex-shell p2pstdio
Not yet used by git-annex, but this will allow faster transfers etc than
using individual ssh connections and rsync.

Not called git-annex-shell p2p, because git-annex p2p does something
else and I don't want two subcommands with the same name between the two
for sanity reasons.

This commit was sponsored by Øyvind Andersen Holm.
2018-03-07 15:38:01 -04:00
Joey Hess
f4103744c3
make sure that lockContentShared is always paired with an inAnnex check
lockContentShared had a screwy caveat that it didn't verify that the content
was present when locking it, but in the most common case, eg indirect mode,
it failed to lock when the content is not present.

That led to a few callers forgetting to check inAnnex when using it,
but the potential data loss was unlikely to be noticed because it only
affected direct mode I think.

Fix data loss bug when the local repository uses direct mode, and a
locally modified file is dropped from a remote repsitory. The bug
caused the modified file to be counted as a copy of the original file.
(This is not a severe bug because in such a situation, dropping
from the remote and then modifying the file is allowed and has the same
end result.)

And, in content locking over tor, when the remote repository is
in direct mode, it neglected to check that the content was actually
present when locking it. This could cause git annex drop to remove
the only copy of a file when it thought the tor remote had a copy.

So, make lockContentShared do its own inAnnex check. This could perhaps
be optimised for direct mode, to avoid the check then, since locking
the content necessarily verifies it exists there, but I have not bothered
with that.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-03-07 14:23:52 -04:00
Joey Hess
ba53f60801
refactor 2018-03-06 15:14:53 -04:00
Joey Hess
db057dcff0
fix sync bug in direct mode
sync: Fix bug that prevented pulling changes into direct mode repositories
that were committed to remotes using git commit rather than git-annex sync.

This commit was supported by the NSF-funded DataLad project.
2018-02-26 14:10:03 -04:00
Joey Hess
cb3b73df6c
importfeed: Fix a failure when downloading with youtube-dl and the destination subdirectory does not exist yet.
Noticed while running this (which a user posted in a comment they deleted
for some reason):

git-annex importfeed https://vimeo.com/logiingimars/videos/rss

The filename that youtube-dl suggests included a subdirectory,
which didn't exist, so renaming to it failed.

This commit was sponsored by mo on Patreon.
2018-02-22 13:20:19 -04:00
Joey Hess
6583448bab
add --json-error-messages (not yet implemented)
Added --json-error-messages option, which includes error messages in the
json output, rather than outputting them to stderr.

The actual rediretion of errors is not implemented yet, this is only
the docs and option plumbing.

This commit was supported by the NSF-funded DataLad project.
2018-02-19 14:32:15 -04:00
Joey Hess
42ba888875
optimise for case where there are no required contents
Avoid reading location log in this case.
2018-02-08 14:16:00 -04:00
Joey Hess
7f5c6a28a6
fsck: Warn when required content is not present in the repository that requires it.
This commit was sponsored by Jack Hill on Patreon.
2018-02-08 14:08:41 -04:00
Joey Hess
cfbfb3ab9a
inprogress: Avoid showing failures for files not in progress. 2018-01-24 20:43:19 -04:00
Joey Hess
2b66492d6e
Improve startup time for commands that do not operate on remotes
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.

Got rid of the remotes member of Git.Repo. This was a bit painful.

Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.

This commit was sponsored by Jake Vosloo on Patreon.
2018-01-09 16:22:07 -04:00
Joey Hess
24df95f0f6
Fix several places where files in .git/annex/ were written with modes that did not take the core.sharedRepository config into account.
git grep writeFile finds some more that might also be problems, but
for now I've concentrated on .git/annex/ log files. There are certianly
cases where writeFile is not a problem too.

This commit was sponsored by mo on Patreon.
2018-01-02 17:25:25 -04:00
Joey Hess
75366d3c34
split BuildInfo and BuildFlags
The problem with combining these is that Build.Standalone etc need only
the BuildInfo, and since not built with cabal, the BuildFlags ifdefs
were causing bogus warnings.
2018-01-02 13:47:51 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
0b0d8ad54b
fix build 2017-12-31 15:06:33 -04:00
Joey Hess
fcdd9ce788
repeated addurl behavior reversion fix
addurl: When the file youtube-dl will download is already an annexed file,
don't download it again and fail to overwrite it, instead just do nothing,
like it used to when quvi was used.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-12-31 14:55:51 -04:00
Joey Hess
67338fd7ac
Added inprogress command for accessing files as they are being downloaded.
Chose to make this only handle files actively being downloaded, not temp
files for downloads that were interrupted or files that have been fully
downloaded.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-12-28 11:46:39 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
3cc94c1667
.noannex file
A top-level .noannex file will prevent git-annex init from being used in a
repository. This is useful for repositories that have a policy reason not
to use git-annex. The content of the file will be displayed to the user who
tries to run git-annex init.

This also affects git annex reinit and initialization via the webapp.
It does not affect automatic inits, when there's a sibling git-annex branch
already.

This commit was supported by the NSF-funded DataLad project.
2017-12-13 14:34:32 -04:00
Joey Hess
bd7f8be121
fix recorded url when using --file with external special remote
The youtube changes accidentially caused the OtherDownloader url to not
get used here, which broke datalad's test suite luckily.

This commit was supported by the NSF-funded DataLad project.
2017-12-11 13:41:41 -04:00
Joey Hess
cfdfe4df6c
lookupkey absolute path support
lookupkey: Support being given an absolute filename to a file within the
current git repository.

This commit was supported by the NSF-funded DataLad project.
2017-12-08 15:35:02 -04:00
Joey Hess
fc845e6530
more lambda-case conversion 2017-12-05 15:00:50 -04:00
Joey Hess
5e95d54604
make --raw avoid ever running youtube-dl
added DownloadOptions type to avoid needing two different Bool params
for some functions.

This commit was sponsored by Thom May on Patreon.
2017-11-30 17:06:15 -04:00
Joey Hess
67ab567bc7
display filename when file already has url
Otherwise it's confusing what happened..
2017-11-30 15:06:21 -04:00
Joey Hess
7c88633121
improve error message
checkCanAdd can be called on annexed files too, when youtube-dl is in
use.
2017-11-30 15:00:53 -04:00
Joey Hess
bbedc1c265
check youtube-dl for --fast and --relaxed when adding new file
The filename comes from youtube-dl also.

This commit was sponsored by Denis Dzyubenko on Patreon.
2017-11-30 14:57:20 -04:00
Joey Hess
2528e3ddb0
rethought --relaxed change
Better to make it not be surprising and slow, than surprising and fast.
--raw can be used when it needs to be really fast.

Implemented adding a youtube-dl supported url to an existing file.

This commit was sponsored by andrea rota.
2017-11-30 14:13:20 -04:00
Joey Hess
8a0038ec23
avoid warning when youtube-dl is not installed
If a user does not have it installed, don't warn on every imported item
about it.
2017-11-30 13:43:55 -04:00
Joey Hess
a7b4358c05
honor --file when downloading with youtube-dl
This used to be done with quvi, and got broken in the transition.
2017-11-30 13:24:52 -04:00
Joey Hess
24f27ec39d
convert importfeed to youtube-dl
Fully working, including --fast/--relaxed.

Note that, while git-annex addurl --relaxed is not going to check
youtube-dl, I kept git annex importfeed --relaxed checking it.
Thinking is that, let's not break people's importfeed cron jobs, and
importfeed does not typically have to check a large number of new items,
so it's ok if it's a little bit slower when used with youtube playlist
feeds.

importfeed's behavior is also improved (?) when a feed has links in it
to non-media files. Before, those were skipped. Now, the content of the
link is downloaded. This had to be done, because trying to use
youtube-dl is slow, and if those were skipped, it would have to check
every time importfeed was run. While this behavior change may not be
desirable for some feeds, that intersperse links to web pages with
enclosures, it will be desirable for other feeds, that have
non-enclosure directy links to media files.

Remove old quvi modules.

This commit was sponsored by Øyvind Andersen Holm.
2017-11-29 17:30:02 -04:00
Joey Hess
99bebdface
youtube-dl working
Including resuming and cleanup of incomplete downloads.

Still todo: --fast, --relaxed, importfeed, disk reserve checking,
quvi code cleanup.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-11-29 16:40:32 -04:00
Joey Hess
4e7e1fcff4
add gitAnnexTmpWorkDir and withTmpWorkDir
Needed to run youtube-dl in, but could also be useful for other stuff.

The tricky part of this was making the workdir be cleaned up whenever the
tmp object file is cleaned up.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-11-29 13:53:39 -04:00
Joey Hess
3febb79c8f
wip 2017-11-28 17:17:40 -04:00
Joey Hess
4781ca297b
showStart variant for when there's no worktree file
Clean up some uses of showStart with "" for the file,
or in some cases, a non-filename description string. That would
generate bad json, although none of the commands doing that
supported --json.

Using "" for the file resulted in output like "foo  rest";
now the extra space is eliminated.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-11-28 15:14:16 -04:00
Joey Hess
f5edb16729
Display progress meter when uploading a key without size information
Getting the size by statting the content file.

This commit was supported by the NSF-funded DataLad project.
2017-11-14 16:40:49 -04:00
Joey Hess
07c4be500d
clean up build warnings on Windows 2017-11-14 14:14:10 -04:00
Joey Hess
1d0bf44173
testremote: Test exporttree.
As long as the class of remotes supports exporting, it's tested whether
or not the remote is configured with exporttree=yes.

Also, made testremote of a remote configured with exporttree=yes
disable that configuration for testing non-export storage.

This commit was supported by the NSF-funded DataLad project.
2017-11-08 14:22:11 -04:00
Joey Hess
75ec0227f8
unlock, lock: Support --json. 2017-10-30 14:44:11 -04:00
Joey Hess
e1ac299ad0
better dup key with -J fix
This avoids all the complication about redundant work discussed in
the previous try at fixing this. At the expense of needing each command
that could have the problem to be patched to simply wrap the action in
onlyActionOn once the key is known. But there do not seem to be many
such commands.

onlyActionOn' should not be used with a CommandStart (or CommandPerform),
although the types do allow it. onlyActionOn handles running the whole
CommandStart chain. I couldn't immediately see a way to avoid mistken
use of onlyActionOn'.

This commit was supported by the NSF-funded DataLad project.
2017-10-17 18:48:53 -04:00
Joey Hess
68a49adcda
Improve behavior when -J transfers multiple files that point to the same key
After a false start, I found a fairly non-intrusive way to deal with it.
Although it only handles transfers -- there may be issues with eg
concurrent dropping of the same key, or other operations.

There is no added overhead when -J is not used, other than an added
inAnnex check. When -J is used, it has to maintain and check a small
Set, which should be negligible overhead.

It could output some message saying that the transfer is being done by
another thread. Or it could even display the same progress info for both
files that are being downloaded since they have the same content. But I
opted to keep it simple, since this is rather an edge case, so it just
doesn't say anything about the transfer of the file until the other
thread finishes.

Since the deferred transfer action still runs, actions that do more than
transfer content will still get a chance to do their other work. (An
example of something that needs to do such other work is P2P.Annex,
where the download always needs to receive the content from the peer.)
And, if the first thread fails to complete a transfer, the second thread
can resume it.

But, this unfortunately means that there's a risk of redundant work
being done to transfer a key that just got transferred.
That's not ideal, but should never cause breakage; the same
thing can occur when running two separate git-annex processes.

The get/move/copy/mirror --from commands had extra inAnnex checks added,
inside the download actions. Without those checks, the first thread
downloaded the content, and then the second thread woke up and
downloaded the same content redundantly.

move/copy/mirror --to is left doing redundant uploads for now. It
would need a second checkPresent of the remote inside the upload
to avoid them, which would be expensive. A better way to avoid
redundant work needs to be found..

This commit was supported by the NSF-funded DataLad project.
2017-10-17 17:10:50 -04:00
Joey Hess
85ed38a574
Avoid repeated checking that files passed on the command line exist.
git annex add, git annex lock etc make multiple seek passes,
and each seek pass checked that files existed. That was unncessary
redundant work.

Fixed by adding a new WorkTreeItem type, make seek actions use it,
and check that the files exist when constructing it.

This commit was supported by the NSF-funded DataLad project.
2017-10-16 14:10:20 -04:00
Joey Hess
a2bf0c5b6d
avoid warning 2017-10-16 12:54:00 -04:00
Joey Hess
f403c23bc6
copy, move: Behave same with --fast when sending to remotes located on a local disk as when sending to other remotes.
Let --fast override use of hasKey even when hasKeyCheap.
2017-09-29 16:30:43 -04:00
Joey Hess
e8c9a5c515
sync: Added --cleanup, which removes local and remote synced/ branches.
Also deletes any tagged pushes that the assistant might have done,
since those would also prevent resetting a branch back.

This commit was sponsored by andrea rota.
2017-09-28 14:58:48 -04:00
Joey Hess
812d90022b
metadata: Added --remove-all.
Motivation is to remove all metadata when it gets copied from a previous
version of the file, and that is not deisrable.

This commit was supported by the NSF-funded DataLad project.
2017-09-28 12:36:10 -04:00
Joey Hess
d71c65ca0a
add exporter thread to assistant
This is similar to the pusher thread, but a separate thread because git
pushes can be done in parallel with exports, and updating a big export
should not prevent other git pushes going out in the meantime.

The exportThread only runs at most every 30 seconds, since updating an
export is more expensive than pushing. This may need to be tuned.

Added a separate channel for export commits; the committer records a
commit in that channel.

Also, reconnectRemotes records a dummy commit, to make the exporter
thread wake up and make sure all exports are up-to-date. So,
connecting a drive with a directory special remote export will
immediately update it, and getting online will automatically
update S3 and WebDAV exports.

The transfer queue is not involved in exports. Instead, failed
exports are retried much like failed pushes.

This commit was sponsored by Ewen McNeill.
2017-09-20 15:29:13 -04:00
Joey Hess
28eba8e9c6
update transfer info and notify when exporting
Same as is done for all other transfers of content, so the webapp will
display progress bars etc.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-09-20 12:58:23 -04:00
Joey Hess
a6c0ed6698
export --fast sets up but does not populate export
sync --content finishes
2017-09-19 14:26:03 -04:00
Joey Hess
2e69efea8d
git annex sync --content to exports
Assistant still todo.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2017-09-19 14:20:47 -04:00
Joey Hess
527f734492
configuration and docs for tracking exports
Not yet handled by sync or assistant.

This commit was sponsored by Nick Daly on Patreon.
2017-09-19 13:05:43 -04:00
Joey Hess
f4be3c3f89
merge changes made on other repos into ExportTree
Now when one repository has exported a tree, another repository can get
files from the export, after syncing.

There's a bug: While the database update works, somehow the database on
disk does not get updated, and so the database update is run the next
time, etc. Wasn't able to figure out why yet.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-09-18 19:21:41 -04:00
Joey Hess
0ad7e36dc1
update ExportTree table efficiently
Use same diff and key lookup except when the whole tree has to be
scanned.

This commit was sponsored by Peter Hogg on Patreon.
2017-09-18 14:27:50 -04:00
Joey Hess
b03d77c211
add ExportTree table to export db
New table needed to look up what filenames are used in the currently
exported tree, for reasons explained in export.mdwn.

Also, added smart constructors for ExportLocation and ExportDirectory to
make sure they contain filepaths with the right direction slashes.

And some code refactoring.

This commit was sponsored by Francois Marier on Patreon.
2017-09-18 13:59:59 -04:00
Joey Hess
486902389d
lock to avoid more than one export to a remote at a time
This commit was sponsored by Jack Hill on Patreon.
2017-09-18 12:38:07 -04:00
Joey Hess
e1f5c90c92
split out Types.Export 2017-09-15 16:46:03 -04:00
Joey Hess
e54a05612e
avoid unncessary db queries when exported directory can't be empty
In rename foo/bar to foo/baz, foo can't be empty.

In delete zxyyz, there's no exported directory (top doesn't count).
2017-09-15 16:30:49 -04:00
Joey Hess
c633144d28
remove empty directories when removing from export
The subtle part of this is what happens when the remote fails to remove
an empty directory. The removal from the export needs to fail in that
case, so the removal will be tried again later. However, removeExportLocation
has already been run and changed the export db, so if the next run
checks getExportLocation, it might decide nothing remains to be done,
leaving the empty directory.

Dealt with that by making removeEmptyDirectories, handle a failure
by calling addExportLocation, reverting the database changes so the next
run will be guaranteed to try deleting the empty directory again.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-09-15 15:22:53 -04:00
Joey Hess
ab271ba6ca
trust level overridden message adjusted for forced untrusted export remotes 2017-09-13 12:08:46 -04:00
Joey Hess
301c959edf
remove debug print 2017-09-12 17:00:39 -04:00
Joey Hess
9c3622882b
export: cache connections for S3 and webdav 2017-09-12 16:59:04 -04:00
Joey Hess
8de516ad2c
leave export logged as incomplete if initial renames fail
This way, the temp files that might be left due to failure will be
cleaned up next time.

Also, nub the list of incomplete exports to avoid repeatedly adding the
same tree to it when running export repeatedly when it's failing.

This commit was supported by the NSF-funded DataLad project.
2017-09-12 14:21:15 -04:00
Joey Hess
4d3a464e83
export to webdav
This basically works, but there's a bug when renaming a file that leaves
a .git-annex-temp-content-key file in the webdav store, that never gets
cleaned up.

Also, exporting files with spaces to box.com seems to fail; perhaps it
does not support it?

This commit was supported by the NSF-funded DataLad project.
2017-09-12 14:10:09 -04:00
Joey Hess
cd5f405623
interrupted export recovery bugfixes
When an export was interrupted, the sqlite database won't have been
committed necessarily. Also, the interrupted export might have been
run in an entirely different repository. There's not a significant speed
benefit in checking getExportLocation in this case anyway, so avoid it.

Also, remove the old filename from the export database.

Recovery from interrupted exports is now tested working.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 15:51:31 -04:00
Joey Hess
a48b52c056
avoid renaming to temp files before deleting
Only rename when actually ncessary.

The diff gets buffered in memory. Probably git has to buffer a diff in
memory when generating it as well, so this memory usage should not be a
problem, even when the diff is very large. I hope.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 14:32:47 -04:00
Joey Hess
16eb2f976c
prevent exporttree=yes on remotes that don't support exports
Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 13:48:44 -04:00
Joey Hess
4f657ba918
bugfix 2017-09-06 15:59:02 -04:00
Joey Hess
cae3704a44
export file renaming
This is seriously super hairy. It has to handle interrupted exports,
which may be resumed with the same or a different tree. It also has to
recover from export conflicts, which could cause the wrong content
to be renamed to a file.

I think this works, or is close to working. See the update to the design
for how it works.

This is definitely not optimal, in that it does more renames than are
necessary. It would probably be worth finding the keys that are really
renamed and only renaming those. But let's get the "simple" approach to
work first..

This commit was supported by the NSF-funded DataLad project.
2017-09-06 15:44:10 -04:00
Joey Hess
0fa948b402
record incomplete exports in export.log
Not yet used, but essential for resuming cleanly.

Note that, in normmal operation, only one commit is made to export.log
during an export; the incomplete version only gets to the journal and
is then overwritten.

This commit was supported by the NSF-funded DataLad project.
2017-09-06 13:45:03 -04:00
Joey Hess
4da763439b
use export db to correctly handle duplicate files
Removed uncorrect UniqueKey key in db schema; a key can appear multiple
times with different files.

The database has to be flushed after each removal. But when adding files
to the export, lots of changes are able to be queued up w/o flushing.
So it's still fairly efficient.

If large removals of files from exports are too slow, an alternative
would be to make two passes over the diff, one pass queueing deletions
from the database, then a flush and the a second pass updating the
location log. But that would use more memory, and need to look up
exportKey twice per removed file, so I've avoided such optimisation yet.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 14:39:32 -04:00
Joey Hess
2c90ed1fea
flush queued changes to export db on exit 2017-09-04 14:00:54 -04:00
Joey Hess
42eaa340fe
remove some backtraces on user errors 2017-09-04 13:55:49 -04:00
Joey Hess
7eb9889bfd
track exported files in a sqlite database
Went with a separate db per export remote, rather than a single export
database. Mostly because there will probably not be a lot of separate
export remotes, and it might be convenient to be able to delete a given
remote's export database.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 13:53:08 -04:00
Joey Hess
28e2cad849
implement exporttree=yes configuration
* Only export to remotes that were initialized to support it.
* Prevent storing key/value on export remotes.
* Prevent enabling exporttree=yes and encryption in the same remote.

SetupStage Enable was changed to take the old RemoteConfig.
This allowed only setting exporttree when initially setting up a
remote, and not configuring it later after stuff might already be stored
in the remote.

Went with =yes rather than =true for consistency with other parts of
git-annex. Changed docs accordingly.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 13:09:38 -04:00
Joey Hess
a4328b49d2
refactor ExportActions
This will allow disabling exports for remotes that are not configured to
allow them. Also, exportSupported will be useful for the external
special remote to probe.

This commit was supported by the NSF-funded DataLad project
2017-09-01 13:05:09 -04:00
Joey Hess
5483ea90ec
graft exported tree into git-annex branch
So it will be available later and elsewhere, even after GC.

I first though to use git update-index to do this, but feeding it a line
with a tree object seems to always cause it to generate a git subtree
merge. So, fell back to using the Git.Tree interface to maniupulate the
trees, and not involving the git-annex branch index file at all.

This commit was sponsored by Andreas Karlsson.
2017-08-31 18:06:49 -04:00
Joey Hess
978885247e
implement export.log and resolve export conflicts
Incremental export updates work now too.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-08-31 15:47:23 -04:00
Joey Hess
7c7af82578
resuming exports
Make a pass over the whole exported tree, and upload anything that has
not yet reached the export. Update location log when exporting.

Note that the synthesized keys for non-annexed files are stored in the
location log too.

Some cases involving files in the tree with the same content are not
handled correctly yet.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-08-31 13:33:50 -04:00
Joey Hess
e662aceeac
improve type 2017-08-31 12:47:08 -04:00
Joey Hess
4694e49158
fix error message when content to export is not locally available 2017-08-31 12:39:10 -04:00
Joey Hess
9f3630f4e0
initial export command
Very basic operation works, but of course this is only the beginning.

This commit was sponsored by Nick Daly on Patreon.
2017-08-29 15:10:01 -04:00
Joey Hess
5f732717d8
toFeed was unused so remove 2017-08-28 12:51:25 -04:00
Joey Hess
ee2f096e3b
Support building with feed-1.0, while still supporting older versions.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-08-28 12:29:28 -04:00
Joey Hess
d39c120afa
add annex-ignore-command and annex-sync-command configs
Added remote configuration settings annex-ignore-command and
annex-sync-command, which are dynamic equivilants of the annex-ignore
and annex-sync configurations.

For this I needed a new DynamicConfig infrastructure. Its implementation
should be as fast as before when there is no dynamic config, and it caches
so shell commands are only run once.

Note that annex-ignore-command exits nonzero when the remote should be ignored.
While that may seem backwards, it allows using the same command for it as
for annex-sync-command when you want to disable both.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-08-17 13:54:14 -04:00
Joey Hess
2eb6309d3e
move, copy: Support --batch. 2017-08-15 12:39:10 -04:00
Joey Hess
2cecc8d2a3
Added GIT_ANNEX_VECTOR_CLOCK environment variable
Can be used to override the default timestamps used in log files in the
git-annex branch. This is a dangerous environment variable; use with
caution.

Note that this only affects writing to the logs on the git-annex branch.
It is not used for metadata in git commits (other env vars can be set for
that).

There are many other places where timestamps are still used, that don't
get committed to git, but do touch disk. Including regular timestamps
of files, and timestamps embedded in some files in .git/annex/, including
the last fsck timestamp and timestamps in transfer log files.

A good way to find such things in git-annex is to get for getPOSIXTime and
getCurrentTime, although some of the results are of course false positives
that never hit disk (unless git-annex gets swapped out..)

So this commit does NOT necessarily make git-annex comply with some HIPPA
privacy regulations; it's up to the user to determine if they can use it in
a way compliant with such regulations.

Benchmarking: It takes 0.00114 milliseconds to call getEnv
"GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log
files can be written with an added overhead of only 0.114 seconds. That
should be by far swamped by the actual overhead of writing the log files
and making the commit containing them.

This commit was supported by the NSF-funded DataLad project.
2017-08-14 14:19:58 -04:00
Joey Hess
81a861326d
fsck: Support --json.
One use case is to get a list of files that fsck fails on, in order to eg,
drop them from a remote.

This commit was sponsored by Nick Daly on Patreon.
2017-06-26 13:40:57 -04:00
Joey Hess
02df5c5932
support --to=. as shorthand for --to=here 2017-06-01 13:12:42 -04:00
Joey Hess
94351daba6
configuration to disable automatic merge conflict resolution
* Added annex.resolvemerge configuration, which can be set to false to
  disable the usual automatic merge conflict resolution done by git-annex
  sync and the assistant.
* sync: Added --no-resolvemerge option.

Note that disabling merge conflict resolution is probably not a good idea
in a direct mode repo or adjusted branch. Since updates to both are done
outside the usual work tree, if it fails the tree is not left in a
conflicted state, and it would be hard to manually resolve the conflict.
Still, made annex.resolvemerge be supported in those cases for consistency.

This commit was sponsored by Riku Voipio.
2017-06-01 12:51:01 -04:00
Joey Hess
bb18026b2c
move --to=here
* move --to=here moves from all reachable remotes to the local repository.

The output of move --from remote is changed slightly, when the remote and
local both have the content. It used to say:
move foo ok
Now:
move foo (from theremote...) ok

That was done so that, when move --to=here is used and the content is
locally present and also in several remotes, it's clear which remotes the
content gets dropped from.

Note that move --to=here will report an error if a non-reachable remote
contains the file, even if the local repository also contains the file. I
think that's reasonable; the user may be intending to move all other copies
of the file from remotes.

OTOH, if a copy of the file is believed to be present in some repository
that is not a configured remote, move --to=here does not report an error.
So a little bit inconsistent, but erroring in this case feels wrong.

copy --to=here came along for free, but it's basically the same behavior as
git-annex get, and probably with not as good messages in edge cases
(especially on failure), so I've not documented it.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-05-31 17:00:18 -04:00
Joey Hess
5ee6912cf3
support parsing options like --to=here
Reworked remote name parsing to allow things like that. Command.Move
uses it for --to=here, although there's not yet an implementation of
that option.

This commit was sponsored by Ignacio on Patreon.
2017-05-31 16:49:28 -04:00
Joey Hess
a1730cd6af
adeiu, MissingH
Removed dependency on MissingH, instead depending on the split
library.

After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.

Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.

This commit was sponsored by Shane-o on Patreon.
2017-05-16 01:03:52 -04:00
Joey Hess
0ec2f3b20f
rename to avoid name conflict 2017-05-11 18:31:14 -04:00
Joey Hess
db1600b2de
de-Maybe remoteGitConfig
It's always set, so does not need to be a Maybe.
2017-05-11 16:05:01 -04:00