Commit graph

30844 commits

Author SHA1 Message Date
ilovezfs
5cda73c529 2017-09-09 17:57:34 +00:00
Joey Hess
425a3a10b0
close 2017-09-09 13:08:42 -04:00
ilovezfs
aa608ab831 2017-09-09 16:30:28 +00:00
yarikoptic
b4e40c5477 very minor typo 2017-09-08 21:23:49 +00:00
Joey Hess
2bb96e9c32
very delayed response now that feature is added 2017-09-08 16:47:42 -04:00
Joey Hess
afdff226fb
don't show key urls in whereis for S3 with public=yes and exporttree=yes 2017-09-08 16:44:00 -04:00
Joey Hess
0228714406
consistency 2017-09-08 16:41:50 -04:00
Joey Hess
e6f2af3b63
devblog 2017-09-08 16:29:18 -04:00
Joey Hess
9c78bbb6b0
Merge branch 'master' of ssh://git-annex.branchable.com 2017-09-08 16:28:46 -04:00
Joey Hess
650d0955a0
S3 export finalization
Fixed ACL issue, and updated some documentation.
2017-09-08 16:28:28 -04:00
Joey Hess
44cd5ae313
S3 export (untested)
It opens a http connection per file exported, but then so does git
annex copy --to s3.

Decided not to munge exported filenames for IA. Too large a chance of
the munging having confusing results. Instead, export of files not
supported by IA, eg with spaces in their name, will fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-08 15:46:24 -04:00
Joey Hess
a1b195d84c
External special remote protocol extended to support export.
Also updated example.sh to support export.

This commit was supported by the NSF-funded DataLad project.
2017-09-08 14:24:05 -04:00
karel-de-macil
72c4968014 Added a comment 2017-09-08 08:31:36 +00:00
Joey Hess
3b885d7914
devblog 2017-09-07 16:42:24 -04:00
Joey Hess
34ad1c15e8
mention git-annex export 2017-09-07 16:17:46 -04:00
Joey Hess
165725b9df
update 2017-09-07 16:07:28 -04:00
Joey Hess
a55b2045ad
correction 2017-09-07 16:00:03 -04:00
Joey Hess
a50d061570
comment 2017-09-07 15:55:07 -04:00
Joey Hess
9379f4174e
Merge branch 'master' of ssh://git-annex.branchable.com 2017-09-07 15:54:04 -04:00
Joey Hess
2823c6bd06
Merge branch 'export' 2017-09-07 15:53:34 -04:00
Joey Hess
cd5f405623
interrupted export recovery bugfixes
When an export was interrupted, the sqlite database won't have been
committed necessarily. Also, the interrupted export might have been
run in an entirely different repository. There's not a significant speed
benefit in checking getExportLocation in this case anyway, so avoid it.

Also, remove the old filename from the export database.

Recovery from interrupted exports is now tested working.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 15:51:31 -04:00
Joey Hess
a48b52c056
avoid renaming to temp files before deleting
Only rename when actually ncessary.

The diff gets buffered in memory. Probably git has to buffer a diff in
memory when generating it as well, so this memory usage should not be a
problem, even when the diff is very large. I hope.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 14:32:47 -04:00
Joey Hess
16eb2f976c
prevent exporttree=yes on remotes that don't support exports
Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 13:48:44 -04:00
Joey Hess
45d30820ac
document new stuff for external special remotes
Got rid of RENAMEEXPORT-UNSUPPORTED, no reason not to use
RENAMEEXPORT-FAILURE for that.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 12:59:35 -04:00
Horus
b7dbee0607 2017-09-07 09:38:29 +00:00
Horus
1260756563 Added a comment 2017-09-07 09:30:53 +00:00
anthony@ad39673d230d75cbfd19d2757d754030049c7673
bb72640042 Added a comment 2017-09-06 22:01:57 +00:00
Joey Hess
084fbee8c8
devblog 2017-09-06 17:22:22 -04:00
Joey Hess
6ab14710fc
fix consistency bug reading from export database
The export database has writes made to it and then expects to read back
the same data immediately. But, the way that Database.Handle does
writes, in order to support multiple writers, makes that not work, due
to caching issues. This resulted in export re-uploading files it had
already successfully renamed into place.

Fixed by allowing databases to be opened in MultiWriter or SingleWriter
mode. The export database only needs to support a single writer; it does
not make sense for multiple exports to run at the same time to the same
special remote.

All other databases still use MultiWriter mode. And by inspection,
nothing else in git-annex seems to be relying on being able to
immediately query for changes that were just written to the database.

This commit was supported by the NSF-funded DataLad project.
2017-09-06 17:19:07 -04:00
Joey Hess
4f657ba918
bugfix 2017-09-06 15:59:02 -04:00
Joey Hess
35cd329bd8
Merge branch 'master' into export 2017-09-06 15:49:30 -04:00
Joey Hess
5cd340ce27
rename bug fix 2017-09-06 15:48:14 -04:00
Joey Hess
3ccf661d7c
todo 2017-09-06 15:46:35 -04:00
Joey Hess
cae3704a44
export file renaming
This is seriously super hairy. It has to handle interrupted exports,
which may be resumed with the same or a different tree. It also has to
recover from export conflicts, which could cause the wrong content
to be renamed to a file.

I think this works, or is close to working. See the update to the design
for how it works.

This is definitely not optimal, in that it does more renames than are
necessary. It would probably be worth finding the keys that are really
renamed and only renaming those. But let's get the "simple" approach to
work first..

This commit was supported by the NSF-funded DataLad project.
2017-09-06 15:44:10 -04:00
Joey Hess
0fa948b402
record incomplete exports in export.log
Not yet used, but essential for resuming cleanly.

Note that, in normmal operation, only one commit is made to export.log
during an export; the incomplete version only gets to the journal and
is then overwritten.

This commit was supported by the NSF-funded DataLad project.
2017-09-06 13:45:03 -04:00
Joey Hess
1ec3a9eb05
thoughts on handling renames efficiently
This gets complicated, but I think this design will work!

This commit was supported by the NSF-funded DataLad project.
2017-09-06 13:04:09 -04:00
Joey Hess
8918b7ab09
Merge branch 'master' of ssh://git-annex.branchable.com 2017-09-06 12:26:18 -04:00
Edward Betts
c1b9f718bc
move line break to fix broken link 2017-09-06 11:25:06 -04:00
Joey Hess
fd8392b669
update 2017-09-06 11:23:04 -04:00
karel-de-macil
9a2e687b0d 2017-09-06 09:20:26 +00:00
yarikoptic
3e7d0e0de7 Added datalad "super-dataset". 2017-09-05 17:00:38 +00:00
EskildHustvedt
8755f320f5 removed 2017-09-05 09:17:44 +00:00
EskildHustvedt
70ecf52888 Added a comment: Partial exports 2017-09-05 09:16:59 +00:00
EskildHustvedt
5e15956225 Added a comment: Partial exports 2017-09-05 09:16:26 +00:00
eacousineau
b8b7a9a902 2017-09-05 01:22:19 +00:00
Joey Hess
c7af16eb3a
Merge branch 'master' of ssh://git-annex.branchable.com 2017-09-04 17:03:20 -04:00
Joey Hess
fa4defc9d7
devblog 2017-09-04 17:02:30 -04:00
Joey Hess
a1cc9ec0fd
add export infication to git-annex info 2017-09-04 17:01:38 -04:00
Joey Hess
662f2a5ee7
git annex get from exports
Straightforward enough, except for the needed belt-and-suspenders sanity
checks to avoid foot shooting due to exports not being key/value stores.

* Even when annex.verify=false, always verify from exports.
* Only get files from exports that use a backend that supports
  checksum verification.
* Never trust exports, even if the user says to, because then
  `git annex drop` would drop content if the export seemed to contain
  a copy.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 16:39:56 -04:00
Joey Hess
4da763439b
use export db to correctly handle duplicate files
Removed uncorrect UniqueKey key in db schema; a key can appear multiple
times with different files.

The database has to be flushed after each removal. But when adding files
to the export, lots of changes are able to be queued up w/o flushing.
So it's still fairly efficient.

If large removals of files from exports are too slow, an alternative
would be to make two passes over the diff, one pass queueing deletions
from the database, then a flush and the a second pass updating the
location log. But that would use more memory, and need to look up
exportKey twice per removed file, so I've avoided such optimisation yet.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 14:39:32 -04:00