Commit graph

2041 commits

Author SHA1 Message Date
Joey Hess
ed81762c86
avoid compiler warning
add type sig so it's clear createtfile returns unit
2018-03-15 13:21:32 -04:00
Joey Hess
10d3b7fc62
Fix reversion introduced in 6.20171214 that caused concurrent transfers to incorrectly fail with "transfer already in progress".
Avoid creating transfer info file before transfer lock is created and
locked.

The wrong order for one thing caused transfer info to be overwritten
when a transfer was already in progress.

But worse, it caused checkTransfer to see the transfer info,
and so lock the transfer lock in order to verify the transfer was not in
progress. Which in a concurrent situation, prevented the transferrer
from locking the transfer lock, so it failed with "transfer already in
progress".

Note that the transferinfo command does not lock the transfer lock
before creating the transfer info. But, that's only run after
recvkey is running, and recvkey does lock the transfer lock, so that
seems more or less ok. (Other than being a super complicated legacy mess
that the P2P code has mostly obsoleted now.)

This commit was supported by the NSF-funded DataLad project.
2018-03-14 18:55:34 -04:00
Joey Hess
31e1adc005
deal with unlocked files
P2P protocol version 1 adds VALID|INVALID after DATA; INVALID means the
file was detected to change content while it was being sent and so we
may not have received the valid content of the file.

Added new MustVerify constructor for Verification, which forces
verification even when annex.verify=false etc. This is used when INVALID
and in protocol version 0.

As well as changing git-annex-shell p2psdio, this makes git-annex tor
remotes always force verification, since they don't yet use protocol
version 1. Previously, annex.verify=false could skip verification when
using tor remotes, and let bad data into the repository.

This commit was sponsored by Jack Hill on Patreon.
2018-03-13 14:27:14 -04:00
Joey Hess
e16b069331
use total size from DATA
Noticed that getting a key whose size is not known resulted in a
progress display that didn't include the percent complete.

Fixed for P2P by making the size sent with DATA be used to update the
meter's total size.

In order for rateLimitMeterUpdate to also learn the total size,
had to make it be passed the Meter, and some other reorg in
Utility.Metered was also done so that --json-progress can construct a
Meter to pass to rateLimitMeterUpdate.

When the fallback rsync is done, the progress display still doesn't
include the percent complete. Only way to fix that seems to be to let rsync
display its output again, but that would conflict with git-annex's
own progress meter, which is also being displayed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-03-12 21:46:58 -04:00
Joey Hess
596af7cbc4
move protocol version stuff to the Net free monad
Needs to be in Net not Local, so that Net actions can take the protocol
version into account.

This commit was sponsored by an anonymous bitcoin donor.
2018-03-12 15:20:51 -04:00
Joey Hess
c81768d425
version the P2P protocol
Unfortunately ReceiveMessage didn't handle unknown messages the way it
was documented to; client sending VERSION would cause the server to
return an ERROR and hang up. Fixed that, but old releases of git-annex
use the P2P protocol for tor and will still have that behavior.

So, version is not negotiated for Remote.P2P connections, only for
Remote.Git connections, which will support VERSION from their first
release. There will need to be a later flag day to change Remote.P2P;
left a commented out line that is the only thing that will need to be
changed then.

Version 1 of the P2P protocol is not implemented yet, but updated
the docs for the DATA change that will be allowed by that version.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-03-12 14:36:35 -04:00
Joey Hess
6a59bc4845
use P2P protocol for drop
Not yet used for everything else, but this is enough to
verify that it works, and do some benchmarking.

Some bugfixes included, which got it working. Also fallback to old
actions has been verified to work correctly.

Benchmarked dropping one thousand files from a ssh remote on localhost.
Using the old git-annex	40.867 seconds.
With the P2P protocol	9.905 seconds!

This commit was sponsored by Jochen Bartl on Patreon.
2018-03-08 16:56:17 -04:00
Joey Hess
c036a380b2
p2p ssh connection pools
Much like Remote.P2P, there's a pool of connections to a peer, in order
to support concurrent operations.

Deals with old git-annex-ssh on the remote that does not support p2pstdio,
by only trying once to use it, and remembering if it's not supported.

Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual
purposes of something to detect to see that the connection is working,
and a way to verify that it's connected to the right uuid.
(There's a redundant uuid check since the uuid field is sent
by git_annex_shell, but I anticipate that being removed later when
the legacy git-annex-shell stuff gets removed.)

Not entirely happy with Remote.Git.runSsh's behavior
when the proto action fails. Running the fallback will work ok, but what
will we do when the fallbacks later get removed? It might be better to
try to reconnect, in case the connection got closed.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-03-08 15:11:31 -04:00
Joey Hess
6ddfa9807b
implemented git-annex-shell p2pstdio
Not yet used by git-annex, but this will allow faster transfers etc than
using individual ssh connections and rsync.

Not called git-annex-shell p2p, because git-annex p2p does something
else and I don't want two subcommands with the same name between the two
for sanity reasons.

This commit was sponsored by Øyvind Andersen Holm.
2018-03-07 15:38:01 -04:00
Joey Hess
f4103744c3
make sure that lockContentShared is always paired with an inAnnex check
lockContentShared had a screwy caveat that it didn't verify that the content
was present when locking it, but in the most common case, eg indirect mode,
it failed to lock when the content is not present.

That led to a few callers forgetting to check inAnnex when using it,
but the potential data loss was unlikely to be noticed because it only
affected direct mode I think.

Fix data loss bug when the local repository uses direct mode, and a
locally modified file is dropped from a remote repsitory. The bug
caused the modified file to be counted as a copy of the original file.
(This is not a severe bug because in such a situation, dropping
from the remote and then modifying the file is allowed and has the same
end result.)

And, in content locking over tor, when the remote repository is
in direct mode, it neglected to check that the content was actually
present when locking it. This could cause git annex drop to remove
the only copy of a file when it thought the tor remote had a copy.

So, make lockContentShared do its own inAnnex check. This could perhaps
be optimised for direct mode, to avoid the check then, since locking
the content necessarily verifies it exists there, but I have not bothered
with that.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-03-07 14:23:52 -04:00
Joey Hess
ba53f60801
refactor 2018-03-06 15:14:53 -04:00
Joey Hess
db057dcff0
fix sync bug in direct mode
sync: Fix bug that prevented pulling changes into direct mode repositories
that were committed to remotes using git commit rather than git-annex sync.

This commit was supported by the NSF-funded DataLad project.
2018-02-26 14:10:03 -04:00
Joey Hess
cb3b73df6c
importfeed: Fix a failure when downloading with youtube-dl and the destination subdirectory does not exist yet.
Noticed while running this (which a user posted in a comment they deleted
for some reason):

git-annex importfeed https://vimeo.com/logiingimars/videos/rss

The filename that youtube-dl suggests included a subdirectory,
which didn't exist, so renaming to it failed.

This commit was sponsored by mo on Patreon.
2018-02-22 13:20:19 -04:00
Joey Hess
6583448bab
add --json-error-messages (not yet implemented)
Added --json-error-messages option, which includes error messages in the
json output, rather than outputting them to stderr.

The actual rediretion of errors is not implemented yet, this is only
the docs and option plumbing.

This commit was supported by the NSF-funded DataLad project.
2018-02-19 14:32:15 -04:00
Joey Hess
42ba888875
optimise for case where there are no required contents
Avoid reading location log in this case.
2018-02-08 14:16:00 -04:00
Joey Hess
7f5c6a28a6
fsck: Warn when required content is not present in the repository that requires it.
This commit was sponsored by Jack Hill on Patreon.
2018-02-08 14:08:41 -04:00
Joey Hess
cfbfb3ab9a
inprogress: Avoid showing failures for files not in progress. 2018-01-24 20:43:19 -04:00
Joey Hess
2b66492d6e
Improve startup time for commands that do not operate on remotes
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.

Got rid of the remotes member of Git.Repo. This was a bit painful.

Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.

This commit was sponsored by Jake Vosloo on Patreon.
2018-01-09 16:22:07 -04:00
Joey Hess
24df95f0f6
Fix several places where files in .git/annex/ were written with modes that did not take the core.sharedRepository config into account.
git grep writeFile finds some more that might also be problems, but
for now I've concentrated on .git/annex/ log files. There are certianly
cases where writeFile is not a problem too.

This commit was sponsored by mo on Patreon.
2018-01-02 17:25:25 -04:00
Joey Hess
75366d3c34
split BuildInfo and BuildFlags
The problem with combining these is that Build.Standalone etc need only
the BuildInfo, and since not built with cabal, the BuildFlags ifdefs
were causing bogus warnings.
2018-01-02 13:47:51 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
0b0d8ad54b
fix build 2017-12-31 15:06:33 -04:00
Joey Hess
fcdd9ce788
repeated addurl behavior reversion fix
addurl: When the file youtube-dl will download is already an annexed file,
don't download it again and fail to overwrite it, instead just do nothing,
like it used to when quvi was used.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-12-31 14:55:51 -04:00
Joey Hess
67338fd7ac
Added inprogress command for accessing files as they are being downloaded.
Chose to make this only handle files actively being downloaded, not temp
files for downloads that were interrupted or files that have been fully
downloaded.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-12-28 11:46:39 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
3cc94c1667
.noannex file
A top-level .noannex file will prevent git-annex init from being used in a
repository. This is useful for repositories that have a policy reason not
to use git-annex. The content of the file will be displayed to the user who
tries to run git-annex init.

This also affects git annex reinit and initialization via the webapp.
It does not affect automatic inits, when there's a sibling git-annex branch
already.

This commit was supported by the NSF-funded DataLad project.
2017-12-13 14:34:32 -04:00
Joey Hess
bd7f8be121
fix recorded url when using --file with external special remote
The youtube changes accidentially caused the OtherDownloader url to not
get used here, which broke datalad's test suite luckily.

This commit was supported by the NSF-funded DataLad project.
2017-12-11 13:41:41 -04:00
Joey Hess
cfdfe4df6c
lookupkey absolute path support
lookupkey: Support being given an absolute filename to a file within the
current git repository.

This commit was supported by the NSF-funded DataLad project.
2017-12-08 15:35:02 -04:00
Joey Hess
fc845e6530
more lambda-case conversion 2017-12-05 15:00:50 -04:00
Joey Hess
5e95d54604
make --raw avoid ever running youtube-dl
added DownloadOptions type to avoid needing two different Bool params
for some functions.

This commit was sponsored by Thom May on Patreon.
2017-11-30 17:06:15 -04:00
Joey Hess
67ab567bc7
display filename when file already has url
Otherwise it's confusing what happened..
2017-11-30 15:06:21 -04:00
Joey Hess
7c88633121
improve error message
checkCanAdd can be called on annexed files too, when youtube-dl is in
use.
2017-11-30 15:00:53 -04:00
Joey Hess
bbedc1c265
check youtube-dl for --fast and --relaxed when adding new file
The filename comes from youtube-dl also.

This commit was sponsored by Denis Dzyubenko on Patreon.
2017-11-30 14:57:20 -04:00
Joey Hess
2528e3ddb0
rethought --relaxed change
Better to make it not be surprising and slow, than surprising and fast.
--raw can be used when it needs to be really fast.

Implemented adding a youtube-dl supported url to an existing file.

This commit was sponsored by andrea rota.
2017-11-30 14:13:20 -04:00
Joey Hess
8a0038ec23
avoid warning when youtube-dl is not installed
If a user does not have it installed, don't warn on every imported item
about it.
2017-11-30 13:43:55 -04:00
Joey Hess
a7b4358c05
honor --file when downloading with youtube-dl
This used to be done with quvi, and got broken in the transition.
2017-11-30 13:24:52 -04:00
Joey Hess
24f27ec39d
convert importfeed to youtube-dl
Fully working, including --fast/--relaxed.

Note that, while git-annex addurl --relaxed is not going to check
youtube-dl, I kept git annex importfeed --relaxed checking it.
Thinking is that, let's not break people's importfeed cron jobs, and
importfeed does not typically have to check a large number of new items,
so it's ok if it's a little bit slower when used with youtube playlist
feeds.

importfeed's behavior is also improved (?) when a feed has links in it
to non-media files. Before, those were skipped. Now, the content of the
link is downloaded. This had to be done, because trying to use
youtube-dl is slow, and if those were skipped, it would have to check
every time importfeed was run. While this behavior change may not be
desirable for some feeds, that intersperse links to web pages with
enclosures, it will be desirable for other feeds, that have
non-enclosure directy links to media files.

Remove old quvi modules.

This commit was sponsored by Øyvind Andersen Holm.
2017-11-29 17:30:02 -04:00
Joey Hess
99bebdface
youtube-dl working
Including resuming and cleanup of incomplete downloads.

Still todo: --fast, --relaxed, importfeed, disk reserve checking,
quvi code cleanup.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-11-29 16:40:32 -04:00
Joey Hess
4e7e1fcff4
add gitAnnexTmpWorkDir and withTmpWorkDir
Needed to run youtube-dl in, but could also be useful for other stuff.

The tricky part of this was making the workdir be cleaned up whenever the
tmp object file is cleaned up.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-11-29 13:53:39 -04:00
Joey Hess
3febb79c8f
wip 2017-11-28 17:17:40 -04:00
Joey Hess
4781ca297b
showStart variant for when there's no worktree file
Clean up some uses of showStart with "" for the file,
or in some cases, a non-filename description string. That would
generate bad json, although none of the commands doing that
supported --json.

Using "" for the file resulted in output like "foo  rest";
now the extra space is eliminated.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-11-28 15:14:16 -04:00
Joey Hess
f5edb16729
Display progress meter when uploading a key without size information
Getting the size by statting the content file.

This commit was supported by the NSF-funded DataLad project.
2017-11-14 16:40:49 -04:00
Joey Hess
07c4be500d
clean up build warnings on Windows 2017-11-14 14:14:10 -04:00
Joey Hess
1d0bf44173
testremote: Test exporttree.
As long as the class of remotes supports exporting, it's tested whether
or not the remote is configured with exporttree=yes.

Also, made testremote of a remote configured with exporttree=yes
disable that configuration for testing non-export storage.

This commit was supported by the NSF-funded DataLad project.
2017-11-08 14:22:11 -04:00
Joey Hess
75ec0227f8
unlock, lock: Support --json. 2017-10-30 14:44:11 -04:00
Joey Hess
e1ac299ad0
better dup key with -J fix
This avoids all the complication about redundant work discussed in
the previous try at fixing this. At the expense of needing each command
that could have the problem to be patched to simply wrap the action in
onlyActionOn once the key is known. But there do not seem to be many
such commands.

onlyActionOn' should not be used with a CommandStart (or CommandPerform),
although the types do allow it. onlyActionOn handles running the whole
CommandStart chain. I couldn't immediately see a way to avoid mistken
use of onlyActionOn'.

This commit was supported by the NSF-funded DataLad project.
2017-10-17 18:48:53 -04:00
Joey Hess
68a49adcda
Improve behavior when -J transfers multiple files that point to the same key
After a false start, I found a fairly non-intrusive way to deal with it.
Although it only handles transfers -- there may be issues with eg
concurrent dropping of the same key, or other operations.

There is no added overhead when -J is not used, other than an added
inAnnex check. When -J is used, it has to maintain and check a small
Set, which should be negligible overhead.

It could output some message saying that the transfer is being done by
another thread. Or it could even display the same progress info for both
files that are being downloaded since they have the same content. But I
opted to keep it simple, since this is rather an edge case, so it just
doesn't say anything about the transfer of the file until the other
thread finishes.

Since the deferred transfer action still runs, actions that do more than
transfer content will still get a chance to do their other work. (An
example of something that needs to do such other work is P2P.Annex,
where the download always needs to receive the content from the peer.)
And, if the first thread fails to complete a transfer, the second thread
can resume it.

But, this unfortunately means that there's a risk of redundant work
being done to transfer a key that just got transferred.
That's not ideal, but should never cause breakage; the same
thing can occur when running two separate git-annex processes.

The get/move/copy/mirror --from commands had extra inAnnex checks added,
inside the download actions. Without those checks, the first thread
downloaded the content, and then the second thread woke up and
downloaded the same content redundantly.

move/copy/mirror --to is left doing redundant uploads for now. It
would need a second checkPresent of the remote inside the upload
to avoid them, which would be expensive. A better way to avoid
redundant work needs to be found..

This commit was supported by the NSF-funded DataLad project.
2017-10-17 17:10:50 -04:00
Joey Hess
85ed38a574
Avoid repeated checking that files passed on the command line exist.
git annex add, git annex lock etc make multiple seek passes,
and each seek pass checked that files existed. That was unncessary
redundant work.

Fixed by adding a new WorkTreeItem type, make seek actions use it,
and check that the files exist when constructing it.

This commit was supported by the NSF-funded DataLad project.
2017-10-16 14:10:20 -04:00
Joey Hess
a2bf0c5b6d
avoid warning 2017-10-16 12:54:00 -04:00
Joey Hess
f403c23bc6
copy, move: Behave same with --fast when sending to remotes located on a local disk as when sending to other remotes.
Let --fast override use of hasKey even when hasKeyCheap.
2017-09-29 16:30:43 -04:00
Joey Hess
e8c9a5c515
sync: Added --cleanup, which removes local and remote synced/ branches.
Also deletes any tagged pushes that the assistant might have done,
since those would also prevent resetting a branch back.

This commit was sponsored by andrea rota.
2017-09-28 14:58:48 -04:00
Joey Hess
812d90022b
metadata: Added --remove-all.
Motivation is to remove all metadata when it gets copied from a previous
version of the file, and that is not deisrable.

This commit was supported by the NSF-funded DataLad project.
2017-09-28 12:36:10 -04:00
Joey Hess
d71c65ca0a
add exporter thread to assistant
This is similar to the pusher thread, but a separate thread because git
pushes can be done in parallel with exports, and updating a big export
should not prevent other git pushes going out in the meantime.

The exportThread only runs at most every 30 seconds, since updating an
export is more expensive than pushing. This may need to be tuned.

Added a separate channel for export commits; the committer records a
commit in that channel.

Also, reconnectRemotes records a dummy commit, to make the exporter
thread wake up and make sure all exports are up-to-date. So,
connecting a drive with a directory special remote export will
immediately update it, and getting online will automatically
update S3 and WebDAV exports.

The transfer queue is not involved in exports. Instead, failed
exports are retried much like failed pushes.

This commit was sponsored by Ewen McNeill.
2017-09-20 15:29:13 -04:00
Joey Hess
28eba8e9c6
update transfer info and notify when exporting
Same as is done for all other transfers of content, so the webapp will
display progress bars etc.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-09-20 12:58:23 -04:00
Joey Hess
a6c0ed6698
export --fast sets up but does not populate export
sync --content finishes
2017-09-19 14:26:03 -04:00
Joey Hess
2e69efea8d
git annex sync --content to exports
Assistant still todo.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2017-09-19 14:20:47 -04:00
Joey Hess
527f734492
configuration and docs for tracking exports
Not yet handled by sync or assistant.

This commit was sponsored by Nick Daly on Patreon.
2017-09-19 13:05:43 -04:00
Joey Hess
f4be3c3f89
merge changes made on other repos into ExportTree
Now when one repository has exported a tree, another repository can get
files from the export, after syncing.

There's a bug: While the database update works, somehow the database on
disk does not get updated, and so the database update is run the next
time, etc. Wasn't able to figure out why yet.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-09-18 19:21:41 -04:00
Joey Hess
0ad7e36dc1
update ExportTree table efficiently
Use same diff and key lookup except when the whole tree has to be
scanned.

This commit was sponsored by Peter Hogg on Patreon.
2017-09-18 14:27:50 -04:00
Joey Hess
b03d77c211
add ExportTree table to export db
New table needed to look up what filenames are used in the currently
exported tree, for reasons explained in export.mdwn.

Also, added smart constructors for ExportLocation and ExportDirectory to
make sure they contain filepaths with the right direction slashes.

And some code refactoring.

This commit was sponsored by Francois Marier on Patreon.
2017-09-18 13:59:59 -04:00
Joey Hess
486902389d
lock to avoid more than one export to a remote at a time
This commit was sponsored by Jack Hill on Patreon.
2017-09-18 12:38:07 -04:00
Joey Hess
e1f5c90c92
split out Types.Export 2017-09-15 16:46:03 -04:00
Joey Hess
e54a05612e
avoid unncessary db queries when exported directory can't be empty
In rename foo/bar to foo/baz, foo can't be empty.

In delete zxyyz, there's no exported directory (top doesn't count).
2017-09-15 16:30:49 -04:00
Joey Hess
c633144d28
remove empty directories when removing from export
The subtle part of this is what happens when the remote fails to remove
an empty directory. The removal from the export needs to fail in that
case, so the removal will be tried again later. However, removeExportLocation
has already been run and changed the export db, so if the next run
checks getExportLocation, it might decide nothing remains to be done,
leaving the empty directory.

Dealt with that by making removeEmptyDirectories, handle a failure
by calling addExportLocation, reverting the database changes so the next
run will be guaranteed to try deleting the empty directory again.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-09-15 15:22:53 -04:00
Joey Hess
ab271ba6ca
trust level overridden message adjusted for forced untrusted export remotes 2017-09-13 12:08:46 -04:00
Joey Hess
301c959edf
remove debug print 2017-09-12 17:00:39 -04:00
Joey Hess
9c3622882b
export: cache connections for S3 and webdav 2017-09-12 16:59:04 -04:00
Joey Hess
8de516ad2c
leave export logged as incomplete if initial renames fail
This way, the temp files that might be left due to failure will be
cleaned up next time.

Also, nub the list of incomplete exports to avoid repeatedly adding the
same tree to it when running export repeatedly when it's failing.

This commit was supported by the NSF-funded DataLad project.
2017-09-12 14:21:15 -04:00
Joey Hess
4d3a464e83
export to webdav
This basically works, but there's a bug when renaming a file that leaves
a .git-annex-temp-content-key file in the webdav store, that never gets
cleaned up.

Also, exporting files with spaces to box.com seems to fail; perhaps it
does not support it?

This commit was supported by the NSF-funded DataLad project.
2017-09-12 14:10:09 -04:00
Joey Hess
cd5f405623
interrupted export recovery bugfixes
When an export was interrupted, the sqlite database won't have been
committed necessarily. Also, the interrupted export might have been
run in an entirely different repository. There's not a significant speed
benefit in checking getExportLocation in this case anyway, so avoid it.

Also, remove the old filename from the export database.

Recovery from interrupted exports is now tested working.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 15:51:31 -04:00
Joey Hess
a48b52c056
avoid renaming to temp files before deleting
Only rename when actually ncessary.

The diff gets buffered in memory. Probably git has to buffer a diff in
memory when generating it as well, so this memory usage should not be a
problem, even when the diff is very large. I hope.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 14:32:47 -04:00
Joey Hess
16eb2f976c
prevent exporttree=yes on remotes that don't support exports
Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 13:48:44 -04:00
Joey Hess
4f657ba918
bugfix 2017-09-06 15:59:02 -04:00
Joey Hess
cae3704a44
export file renaming
This is seriously super hairy. It has to handle interrupted exports,
which may be resumed with the same or a different tree. It also has to
recover from export conflicts, which could cause the wrong content
to be renamed to a file.

I think this works, or is close to working. See the update to the design
for how it works.

This is definitely not optimal, in that it does more renames than are
necessary. It would probably be worth finding the keys that are really
renamed and only renaming those. But let's get the "simple" approach to
work first..

This commit was supported by the NSF-funded DataLad project.
2017-09-06 15:44:10 -04:00
Joey Hess
0fa948b402
record incomplete exports in export.log
Not yet used, but essential for resuming cleanly.

Note that, in normmal operation, only one commit is made to export.log
during an export; the incomplete version only gets to the journal and
is then overwritten.

This commit was supported by the NSF-funded DataLad project.
2017-09-06 13:45:03 -04:00
Joey Hess
4da763439b
use export db to correctly handle duplicate files
Removed uncorrect UniqueKey key in db schema; a key can appear multiple
times with different files.

The database has to be flushed after each removal. But when adding files
to the export, lots of changes are able to be queued up w/o flushing.
So it's still fairly efficient.

If large removals of files from exports are too slow, an alternative
would be to make two passes over the diff, one pass queueing deletions
from the database, then a flush and the a second pass updating the
location log. But that would use more memory, and need to look up
exportKey twice per removed file, so I've avoided such optimisation yet.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 14:39:32 -04:00
Joey Hess
2c90ed1fea
flush queued changes to export db on exit 2017-09-04 14:00:54 -04:00
Joey Hess
42eaa340fe
remove some backtraces on user errors 2017-09-04 13:55:49 -04:00
Joey Hess
7eb9889bfd
track exported files in a sqlite database
Went with a separate db per export remote, rather than a single export
database. Mostly because there will probably not be a lot of separate
export remotes, and it might be convenient to be able to delete a given
remote's export database.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 13:53:08 -04:00
Joey Hess
28e2cad849
implement exporttree=yes configuration
* Only export to remotes that were initialized to support it.
* Prevent storing key/value on export remotes.
* Prevent enabling exporttree=yes and encryption in the same remote.

SetupStage Enable was changed to take the old RemoteConfig.
This allowed only setting exporttree when initially setting up a
remote, and not configuring it later after stuff might already be stored
in the remote.

Went with =yes rather than =true for consistency with other parts of
git-annex. Changed docs accordingly.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 13:09:38 -04:00
Joey Hess
a4328b49d2
refactor ExportActions
This will allow disabling exports for remotes that are not configured to
allow them. Also, exportSupported will be useful for the external
special remote to probe.

This commit was supported by the NSF-funded DataLad project
2017-09-01 13:05:09 -04:00
Joey Hess
5483ea90ec
graft exported tree into git-annex branch
So it will be available later and elsewhere, even after GC.

I first though to use git update-index to do this, but feeding it a line
with a tree object seems to always cause it to generate a git subtree
merge. So, fell back to using the Git.Tree interface to maniupulate the
trees, and not involving the git-annex branch index file at all.

This commit was sponsored by Andreas Karlsson.
2017-08-31 18:06:49 -04:00
Joey Hess
978885247e
implement export.log and resolve export conflicts
Incremental export updates work now too.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-08-31 15:47:23 -04:00
Joey Hess
7c7af82578
resuming exports
Make a pass over the whole exported tree, and upload anything that has
not yet reached the export. Update location log when exporting.

Note that the synthesized keys for non-annexed files are stored in the
location log too.

Some cases involving files in the tree with the same content are not
handled correctly yet.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-08-31 13:33:50 -04:00
Joey Hess
e662aceeac
improve type 2017-08-31 12:47:08 -04:00
Joey Hess
4694e49158
fix error message when content to export is not locally available 2017-08-31 12:39:10 -04:00
Joey Hess
9f3630f4e0
initial export command
Very basic operation works, but of course this is only the beginning.

This commit was sponsored by Nick Daly on Patreon.
2017-08-29 15:10:01 -04:00
Joey Hess
5f732717d8
toFeed was unused so remove 2017-08-28 12:51:25 -04:00
Joey Hess
ee2f096e3b
Support building with feed-1.0, while still supporting older versions.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-08-28 12:29:28 -04:00
Joey Hess
d39c120afa
add annex-ignore-command and annex-sync-command configs
Added remote configuration settings annex-ignore-command and
annex-sync-command, which are dynamic equivilants of the annex-ignore
and annex-sync configurations.

For this I needed a new DynamicConfig infrastructure. Its implementation
should be as fast as before when there is no dynamic config, and it caches
so shell commands are only run once.

Note that annex-ignore-command exits nonzero when the remote should be ignored.
While that may seem backwards, it allows using the same command for it as
for annex-sync-command when you want to disable both.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-08-17 13:54:14 -04:00
Joey Hess
2eb6309d3e
move, copy: Support --batch. 2017-08-15 12:39:10 -04:00
Joey Hess
2cecc8d2a3
Added GIT_ANNEX_VECTOR_CLOCK environment variable
Can be used to override the default timestamps used in log files in the
git-annex branch. This is a dangerous environment variable; use with
caution.

Note that this only affects writing to the logs on the git-annex branch.
It is not used for metadata in git commits (other env vars can be set for
that).

There are many other places where timestamps are still used, that don't
get committed to git, but do touch disk. Including regular timestamps
of files, and timestamps embedded in some files in .git/annex/, including
the last fsck timestamp and timestamps in transfer log files.

A good way to find such things in git-annex is to get for getPOSIXTime and
getCurrentTime, although some of the results are of course false positives
that never hit disk (unless git-annex gets swapped out..)

So this commit does NOT necessarily make git-annex comply with some HIPPA
privacy regulations; it's up to the user to determine if they can use it in
a way compliant with such regulations.

Benchmarking: It takes 0.00114 milliseconds to call getEnv
"GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log
files can be written with an added overhead of only 0.114 seconds. That
should be by far swamped by the actual overhead of writing the log files
and making the commit containing them.

This commit was supported by the NSF-funded DataLad project.
2017-08-14 14:19:58 -04:00
Joey Hess
81a861326d
fsck: Support --json.
One use case is to get a list of files that fsck fails on, in order to eg,
drop them from a remote.

This commit was sponsored by Nick Daly on Patreon.
2017-06-26 13:40:57 -04:00
Joey Hess
02df5c5932
support --to=. as shorthand for --to=here 2017-06-01 13:12:42 -04:00
Joey Hess
94351daba6
configuration to disable automatic merge conflict resolution
* Added annex.resolvemerge configuration, which can be set to false to
  disable the usual automatic merge conflict resolution done by git-annex
  sync and the assistant.
* sync: Added --no-resolvemerge option.

Note that disabling merge conflict resolution is probably not a good idea
in a direct mode repo or adjusted branch. Since updates to both are done
outside the usual work tree, if it fails the tree is not left in a
conflicted state, and it would be hard to manually resolve the conflict.
Still, made annex.resolvemerge be supported in those cases for consistency.

This commit was sponsored by Riku Voipio.
2017-06-01 12:51:01 -04:00
Joey Hess
bb18026b2c
move --to=here
* move --to=here moves from all reachable remotes to the local repository.

The output of move --from remote is changed slightly, when the remote and
local both have the content. It used to say:
move foo ok
Now:
move foo (from theremote...) ok

That was done so that, when move --to=here is used and the content is
locally present and also in several remotes, it's clear which remotes the
content gets dropped from.

Note that move --to=here will report an error if a non-reachable remote
contains the file, even if the local repository also contains the file. I
think that's reasonable; the user may be intending to move all other copies
of the file from remotes.

OTOH, if a copy of the file is believed to be present in some repository
that is not a configured remote, move --to=here does not report an error.
So a little bit inconsistent, but erroring in this case feels wrong.

copy --to=here came along for free, but it's basically the same behavior as
git-annex get, and probably with not as good messages in edge cases
(especially on failure), so I've not documented it.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-05-31 17:00:18 -04:00
Joey Hess
5ee6912cf3
support parsing options like --to=here
Reworked remote name parsing to allow things like that. Command.Move
uses it for --to=here, although there's not yet an implementation of
that option.

This commit was sponsored by Ignacio on Patreon.
2017-05-31 16:49:28 -04:00
Joey Hess
a1730cd6af
adeiu, MissingH
Removed dependency on MissingH, instead depending on the split
library.

After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.

Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.

This commit was sponsored by Shane-o on Patreon.
2017-05-16 01:03:52 -04:00
Joey Hess
0ec2f3b20f
rename to avoid name conflict 2017-05-11 18:31:14 -04:00
Joey Hess
db1600b2de
de-Maybe remoteGitConfig
It's always set, so does not need to be a Maybe.
2017-05-11 16:05:01 -04:00
Joey Hess
4c1e3210fa
annex.backend is the new name for what was annex.backends
It takes a single key-value backend, rather than the unncessary and confusing list.
The old option still works if set.

Simplified some old old code too.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-05-09 15:04:07 -04:00
Joey Hess
e3184e54c9
version: Added "dependency versions" line.
This commit was sponsored by Anthony DeRobertis on Patreon.
2017-04-07 18:16:11 -04:00
Joey Hess
6896ac06e8
git annex add -u now supported, analagous to git add -u
Unlike git add -u, git annex add -u does not update the index for files
removed from the working tree. But then, "git add ." stages removals,
and "git annex add ." does not, so that's an existing divergence.

Seems that --update --batch would need to run git ls-files once per line of
batch input, which would surely be too slow, so just throw an error for
that.

This commit was supported by the NSF-funded DataLad project.
2017-04-07 15:55:45 -04:00
Joey Hess
99984967eb
enableremote: Fix re-enabling of existing gcrypt remotes, so that eg, encryption key changes take effect.
They were silently ignored, a reversion introduced in 6.20160527.

I don't like this regular git remote special case in enableremote, but I
can't see a way to get rid of it. So, check if the existing remote is
a Remote.Git

This commit was sponsored by Trenton Cronholm on Patreon.
2017-04-07 13:51:09 -04:00
Joey Hess
f406d16525
enableremote: When enabling a non-special remote, param=value parameters can't be used, so error out if any are provided.
This commit was sponsored by Riku Voipio.
2017-04-07 13:14:53 -04:00
Joey Hess
29e73f76ef
Added remote.<name>.annex-push and remote.<name>.annex-pull
The former can be useful to make remotes that don't get fully synced with
local changes, which comes up in a lot of situations.

The latter was mostly added for symmetry, but could be useful (though less
likely to be).

Implementing `remote.<name>.annex-pull` was a bit tricky, as there's no one
place where git-annex pulls/fetches from remotes. I audited all
instances of "fetch" and "pull". A few cases were left not checking this
config:

* Git.Repair can try to pull missing refs from a remote, and if the local
  repo is corrupted, that seems a reasonable thing to do even though
  the config would normally prevent it.
* Assistant.WebApp.Gpg and Remote.Gcrypt and Remote.Git do fetches
  as part of the setup process of a remote. The config would probably not
  be set then, and having the setup fail seems worse than honoring it if it
  is already set.

I have not prevented all the code that does a "merge" from merging branches
from remotes with remote.<name>.annex-pull=false. That could perhaps
be done, but it would need a way to map from branch name to remote name,
and the way refspecs work makes that hard to get really correct. So if the
user fetches manually, the git-annex branch will get merged, for example.
Anther way of looking at/justifying this is that the setting is called
"annex-pull", not "annex-merge".

This commit was supported by the NSF-funded DataLad project.
2017-04-05 13:22:35 -04:00
Joey Hess
c6d5d8f9bf
fix windows build 2017-04-05 11:19:29 -04:00
Joey Hess
256eb4807e
add missing "do"
Unsure how it got committed in an uncompilable state before..
2017-04-03 14:52:54 -04:00
Joey Hess
c3970f6c1a
multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting.
This commit was supported by the NSF-funded DataLad project.
2017-03-30 19:35:30 -04:00
Joey Hess
64f924dc93
sync --content-of=path
For when you want to sync only some files' contents, not the whole working
tree.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-03-20 16:00:48 -04:00
Joey Hess
faecd73f32
Support GIT_SSH and GIT_SSH_COMMAND
They are handled close the same as they are by git. However, unlike git,
git-annex sometimes needs to pass the -n parameter when using these.

So, this has the potential for breaking some setup, and perhaps there ought
to be a ANNEX_USE_GIT_SSH=1 needed to use these. But I'd rather avoid that
if possible, so let's see if anyone complains.

Almost all places where "ssh" was run have been changed to support the env
vars. Anything still calling sshOptions does not support them. In
particular, rsync special remotes don't. Seems that annex-rsync-transport
already gives sufficient control there.

(Fixed in passing: Remote.Helper.Ssh.toRepo used to extract
remoteAnnexSshOptions and pass them to sshOptions, which was redundant
since sshOptions also extracts those.)

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-03-17 16:20:37 -04:00
Joey Hess
1c4e5f65fc
Drop support for building with old versions of directory, feed, and http-types. 2017-03-10 15:57:41 -04:00
Joey Hess
55b178a6ba
minor cleanup 2017-03-10 15:03:33 -04:00
Joey Hess
71a05b0d25
use ActionItem rather than String
This changes fsck -A warnings to include the name of the key,
which is a bit redundant in one case, but was missing in another case.
2017-03-10 14:13:10 -04:00
Joey Hess
c8e1e3dada
AssociatedFile newtype
To prevent any further mistakes like 301aff34c4

This commit was sponsored by Francois Marier on Patreon.
2017-03-10 13:35:31 -04:00
Joey Hess
f90e2d0893
fix fsck bug introduced in 301aff34c4
Got two Maybe FilePaths crossed. Test suite caught it.
Slightly improved types to avoid this mistake.
2017-03-10 12:11:00 -04:00
Joey Hess
301aff34c4
fsck -q: When a file has bad content, include the name of the file in the warning message.
This commit was sponsored by Alexander Thompson on Patreon.
2017-03-08 15:15:20 -04:00
Joey Hess
874232f1a6
status: Propigate nonzero exit code from git status. 2017-03-02 14:09:42 -04:00
Joey Hess
ddf68b7c48
improve display of checking known urls
Display it as a separate action, so it ends with a newline
2017-02-28 14:41:08 -04:00
Joey Hess
a62802af08
remove old debug print 2017-02-28 14:41:00 -04:00
Joey Hess
75029536e5
squelch a couple of warnings about moveAnnex return code 2017-02-28 12:49:17 -04:00
Joey Hess
e53070c1ff
inheritable annex.securehashesonly
* init: When annex.securehashesonly has been set with git-annex config,
  copy that value to the annex.securehashesonly git config.
* config --set: As well as setting value in git-annex branch,
  set local gitconfig. This is needed especially for
  annex.securehashesonly, which is read only from local gitconfig and not
  the git-annex branch.

doc/todo/sha1_collision_embedding_in_git-annex_keys.mdwn has the
rationalle for doing it this way. There's no perfect solution; this
seems to be the least-bad one.

This commit was supported by the NSF-funded DataLad project.
2017-02-27 16:08:23 -04:00
Joey Hess
942e0174b3
make fsck check annex.securehashesonly, and new tip for working around SHA1 collisions with git-annex
This commit was sponsored by andrea rota.
2017-02-27 13:55:15 -04:00
Joey Hess
07f1e638ee
annex.securehashesonly
Cryptographically secure hashes can be forced to be used in a repository,
by setting annex.securehashesonly. This does not prevent the git repository
from containing files with insecure hashes, but it does prevent the content
of such files from being pulled into .git/annex/objects from another
repository.

We want to make sure that at no point does git-annex accept content into
.git/annex/objects that is hashed with an insecure key. Here's how it
was done:

* .git/annex/objects/xx/yy/KEY/ is kept frozen, so nothing can be
  written to it normally
* So every place that writes content must call, thawContent or modifyContent.
  We can audit for these, and be sure we've considered all cases.
* The main functions are moveAnnex, and linkToAnnex; these were made to
  check annex.securehashesonly, and are the main security boundary
  for annex.securehashesonly.
* Most other calls to modifyContent deal with other files in the KEY
  directory (inode cache etc). The other ones that mess with the content
  are:
	- Annex.Direct.toDirectGen, in which content already in the
	  annex directory is moved to the direct mode file, so not relevant.
	- fix and lock, which don't add new content
	- Command.ReKey.linkKey, which manually unlocks it to make a
	  copy.
* All other calls to thawContent appear safe.

Made moveAnnex return a Bool, so checked all callsites and made them
deal with a failure in appropriate ways.

linkToAnnex simply returns LinkAnnexFailed; all callsites already deal
with it failing in appropriate ways.

This commit was sponsored by Riku Voipio.
2017-02-27 13:33:59 -04:00
Joey Hess
27eca014be
fix up Read instance incompatability caused by recent commit
9c4650358c changed the Read instance for
Key.

I've checked all uses of that instance (by removing it and seeing what
breaks), and they're all limited to the webapp, except one.
That is GitAnnexDistribution's Read instance.

So, 9c4650358c would have broken upgrades
of git-annex from downloads.kitenet.net. Once the .info files there got
updated for a new release, old releases would have failed to parse them
and never upgraded.

To fix this, I found a way to make the .info files that contain
GitAnnexDistribution values be readable by the old version of git-annex.

This commit was sponsored by Ewen McNeill.
2017-02-24 18:59:12 -04:00
Joey Hess
9c4650358c
add KeyVariety type
Where before the "name" of a key and a backend was a string, this makes
it a concrete data type.

This is groundwork for allowing some varieties of keys to be disabled
in file2key, so git-annex won't use them at all.

Benchmarks ran in my big repo:

old git-annex info:

real	0m3.338s
user	0m3.124s
sys	0m0.244s

new git-annex info:

real	0m3.216s
user	0m3.024s
sys	0m0.220s

new git-annex find:

real	0m7.138s
user	0m6.924s
sys	0m0.252s

old git-annex find:

real	0m7.433s
user	0m7.240s
sys	0m0.232s

Surprising result; I'd have expected it to be slower since it now parses
all the key varieties. But, the parser is very simple and perhaps
sharing KeyVarieties uses less memory or something like that.

This commit was supported by the NSF-funded DataLad project.
2017-02-24 15:16:56 -04:00
Joey Hess
3afc7d83f2
noCommit for PostReceive
This was noticed because it broke the datalad test suite, which pushed
to the remote and then fetched to check if it had received the expected
branches. Auto-init caused the git-annex branch on the remote to
diverge, breaking that test.

https://github.com/datalad/datalad/issues/1319#issuecomment-281649518

The auto-init still happens, it's staged in the journal, and will be
commited by some later git-annex command when it runs. Which is fine,
it's the same as that later command doing the auto-init.

This commit was supported by the NSF-funded DataLad project
2017-02-23 18:37:02 -04:00
Joey Hess
75a15e1ad7
status: Pass --ignore-submodules=when option on to git status.
Didn't make --ignore-submodules without a value be handled because I can't
see a way to make optparse-applicative parse that. I've opened a bug
requesting a way to do that:
https://github.com/pcapriotti/optparse-applicative/issues/243
2017-02-20 17:01:24 -04:00
Joey Hess
e6857e75a6
sync hack to make updateInstead work on eg FAT
sync: When syncing with a local repository located on a crippled
filesystem, run the post-receive hook there, since it wouldn't get run
otherwise. This makes pushing to repos on FAT-formatted removable drives
update them when receive.denyCurrentBranch=updateInstead.

Made Remote.Git export onLocal, which was cleaned up to not have so many
caveats about its use.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-02-17 15:21:52 -04:00
Joey Hess
d074532aff
post-recive hook to make updateInstead work in direct mode and adjusted branches
* Added post-recieve hook, which makes updateInstead work with direct
  mode and adjusted branches.
* init: Set up the post-receive hook.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-02-17 14:04:43 -04:00
Joey Hess
4594bece40
make git-annex:git-annex push quiet again
Recent changes had a side effect of displaying errors in the fairly
common case when this push fails. Since the synced/git-annex push
is always forced, those errors are noise, so hide again.

This means 3 separate pushes are done now, where before it only made 2.
A bit more expensive, but ssh connection caching eliminates most of
the costs.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-02-17 14:03:17 -04:00
Joey Hess
d0651bb567
make query commands not output extraneous messages
config group groupwanted numcopies schedule wanted required:  Avoid
displaying extraneous messages about repository auto-init, git-annex branch
merging, etc, when being used to get information.
2017-02-16 13:24:35 -04:00
Joey Hess
a73c8ce4a1
sync: Improve integration with receive.denyCurrentBranch=updateInstead
By displaying error messages from the remote then it fails to update
its checked out branch.

Error messages in the default receive.denyCurrentBranch are still
suppressed, which matches user expectations.

This commit was sponsored by Nick Daly on Patreon.
2017-02-15 16:13:30 -04:00
Joey Hess
f07af03018
Run ssh with -n whenever input is not being piped into it
... to avoid it consuming stdin that it shouldn't.

This fixes git-annex-checkpresentkey --batch remote, which didn't output
results for all keys passed into it.

Other git-annex commands that communicate with a remote over ssh may also
have been consuming stdin that they shouldn't have, which could have
impacted using them in eg, shell scripts. For example, a shell script
reading files from stdin and passing them to git annex drop would be
impacted by this bug, whenever git annex drop ran git-annex-shell
checkpresent, it would consume part/all of the stdin that the shell script
was supposed to consume.

Fixed by adding a ConsumeStdin parameter to Annex.Ssh.sshOptions, which
is used throughout git-annex to run ssh (in order for ssh connection
caching to work). Every call site was checked to see if it used
CreatePipe for stdin, and if not was marked NoConsumeStdin.
2017-02-15 15:08:46 -04:00
Joey Hess
2af5f727a9
forgot to compile last commit; fix mistakes 2017-02-15 13:55:06 -04:00
Joey Hess
69baa45f14
sync, merge: Fail when the current branch has no commits yet, instead of not merging in anything from remotes and appearing to succeed.
At first I wanted to make it go ahead and merge into the newborn branch,
so made it use Git.Branch.currentUnsafe to get the current branch. But that
failed:

fatal: ambiguous argument 'refs/heads/master..refs/heads/synced/master':
unknown revision or path not in the working tree.

A whole nother code path to handle merging into newborn branches seemed
excessive, so went with displaying a warning and propigating failure
status.

This commit was sponsored by Brock Spratlen on Patreon.
2017-02-14 16:09:55 -04:00
Edward Betts
0750913136
correct spelling mistakes 2017-02-12 17:30:23 -04:00
Joey Hess
c1ece47ea0
import --reinject-duplicates
This is the same as running git annex reinject --known, followed by
git-annex import. The advantage to having it in one command is that it
only has to hash each file once; the two commands have to
hash the imported files a second time.

This commit was sponsored by Shane-o on Patreon.
2017-02-09 15:41:00 -04:00
Joey Hess
f617988a29
Make import --deduplicate and --skip-duplicates only hash once, not twice
import: --deduplicate and --skip-duplicates were implemented inneficiently;
they unncessarily hashed each file twice. They have been improved to only
hash once.

The new approach is to lock down (minimally) and hash files, and then
reuse that information when importing them.

This was rather tricky, especially in detecting changes to files while
they are being imported.

The output of import changed slightly. While before it silently skipped
over files with eg --skip-duplicates, now it shows each file as it starts
to act on it. Since every file is hashed first thing, it would otherwise
not be clear what file import is chewing on. (Actually, it wasn't clear
before when any of the duplicates switches were used.)

This commit was sponsored by Alexander Thompson on Patreon.
2017-02-09 15:32:22 -04:00
Joey Hess
e7e36b6e72
import: Changed how --deduplicate, --skip-duplicates, and --clean-duplicates determine if a file is a duplicate
Before, only content known to be present somewhere was considered a
duplicate. Now, any content that has been annexed before will be considered
a duplicate, even if all annexed copies of the data have been lost.

Note that --clean-duplicates and --deduplicate still check numcopies,
so won't delete duplicate files unless there's an annexed copy.

This makes import use the same method as reinject --known.

The man page already said that duplicate meant "its content is either
present in the local repository already, or git-annex knows of another
repository that contains it, or it was present in the annex before but has
been removed now". So, this is really only bringing the implementation into
line with the man page.

This commit was sponsored by Jochen Bartl on Patreon.
2017-02-07 17:41:58 -04:00
Joey Hess
27e89aeffc
initremote: When a uuid= parameter is passed, use the specified UUID for the new special remote, instead of generating a UUID.
This can be useful in some situations, eg when the same data can be
accessed via two different special remote backends.
2017-02-07 15:10:41 -04:00
Joey Hess
5c804cf42e
add SetupStage parameter to RemoteType.setup
Most remotes have an idempotent setup that can be reused for
enableremote, but in a few cases, it needs to tell which, and whether
a UUID was provided to setup was used.

This is groundwork for making initremote be able to provide a UUID.
It should not change any behavior.

Note that it would be nice to make the UUID always be provided to setup,
and make setup not need to generate and return a UUID. What prevented
this simplification is Remote.Git.gitSetup, which needs to reuse the
UUID of the git remote when setting it up, and so has to return that
UUID.

This commit was sponsored by Thom May on Patreon.
2017-02-07 14:55:58 -04:00
Joey Hess
3439f3cc87
assistant: Make --autostart --foreground wait for the children it starts.
Before, the --foreground was ignored when autostarting.

This commit was sponsored by Denis Dzyubenko on Patreon.
2017-02-07 13:31:45 -04:00
Joey Hess
3fe9d99f24
wormhole pairing appid flag day 2021-12-31
Wormhole pairing will start to provide an appid to wormhole on 2021-12-31.
An appid can't be provided now because Debian stable is going to ship a
older version of git-annex that does not provide an appid. Assumption is
that by 2021-12-31, this version of git-annex will be shipped in a Debian
stable release. If that turns out to not be the case, this change will need
to be cherry-picked into the git-annex in Debian stable, or its wormhole
pairing will break.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-02-03 15:06:40 -04:00
Joey Hess
c545701224
make sync --no-commit override annex.annex.autocommit 2017-02-03 14:36:14 -04:00
Joey Hess
b77903af48
New annex.synccontent config setting
.. which can be set to true to make git annex sync default to --content.

This may become the default at some point in the future.

As well as being configuable by git config, it can be configured by
git-annex config to control the default behavior in all clones of a
repository.

Had to add a separate --no-content switch to we can tell if it's been
explicitly set, and should override annex.synccontent. If --content was the
default, this complication would not be necessary.

This commit was sponsored by Jake Vosloo on Patreon.
2017-02-03 14:31:17 -04:00
Joey Hess
ed56dba868
annex.autocommit can be configured via git-annex config
... to control the default behavior in all clones of a repository.

This includes a new Configurable data type, so the GitConfig type indicates
which values can be configured this way.

The implementation should be quite efficient; the config log is only read
once, and only when a Configurable value has not already been set by
git-config.

Indeed, it would be nice in the future to extend this, so that git-config
is itself only read on demand. Some commands may not need to look at the
git configuration at all.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-02-03 13:58:53 -04:00
Joey Hess
062286135c
unused: When large files are checked right into git, avoid buffering their contents in memory.
This makes it a little bit slower since it has to check file size,
but worth it to fix a potential memory use problem.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-01-31 19:09:37 -04:00
Joey Hess
9eb10caa27
Some optimisations to string splitting code.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.

As well as being an optimisation, this helps move toward eliminating use of
missingh.

(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)

I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-01-31 19:06:22 -04:00
Joey Hess
183f3f7a9c
make git annex config settings editable in vicfg
This commit was sponsored by Shane-o on Patreon.
2017-01-30 17:08:05 -04:00