Commit graph

39 commits

Author SHA1 Message Date
Joey Hess
abdff127f2
split out todo for webapp export config UI; close export todo! 2017-09-20 15:32:05 -04:00
Joey Hess
d71c65ca0a
add exporter thread to assistant
This is similar to the pusher thread, but a separate thread because git
pushes can be done in parallel with exports, and updating a big export
should not prevent other git pushes going out in the meantime.

The exportThread only runs at most every 30 seconds, since updating an
export is more expensive than pushing. This may need to be tuned.

Added a separate channel for export commits; the committer records a
commit in that channel.

Also, reconnectRemotes records a dummy commit, to make the exporter
thread wake up and make sure all exports are up-to-date. So,
connecting a drive with a directory special remote export will
immediately update it, and getting online will automatically
update S3 and WebDAV exports.

The transfer queue is not involved in exports. Instead, failed
exports are retried much like failed pushes.

This commit was sponsored by Ewen McNeill.
2017-09-20 15:29:13 -04:00
Joey Hess
2e69efea8d
git annex sync --content to exports
Assistant still todo.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2017-09-19 14:20:47 -04:00
Joey Hess
a6268b79b2
break out separate todo for later 2017-09-19 12:38:07 -04:00
Joey Hess
5f9eff3f32
fix bug that prevented db being written to disk in SingleWriter mode
The bug occurred when closeDb was not called, and garbage collection of
the DbHandle didn't give the workerThread time to shut down. Fixed by
exiting the runSqlite action when a commit is made.

(MultiWriter mode already forked off a runSqlite action, so avoided the
problem.)

This commit was sponsored by Brock Spratlen on Patreon.
2017-09-18 19:42:20 -04:00
Joey Hess
f4be3c3f89
merge changes made on other repos into ExportTree
Now when one repository has exported a tree, another repository can get
files from the export, after syncing.

There's a bug: While the database update works, somehow the database on
disk does not get updated, and so the database update is run the next
time, etc. Wasn't able to figure out why yet.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-09-18 19:21:41 -04:00
Joey Hess
b03d77c211
add ExportTree table to export db
New table needed to look up what filenames are used in the currently
exported tree, for reasons explained in export.mdwn.

Also, added smart constructors for ExportLocation and ExportDirectory to
make sure they contain filepaths with the right direction slashes.

And some code refactoring.

This commit was sponsored by Francois Marier on Patreon.
2017-09-18 13:59:59 -04:00
Joey Hess
486902389d
lock to avoid more than one export to a remote at a time
This commit was sponsored by Jack Hill on Patreon.
2017-09-18 12:38:07 -04:00
Joey Hess
af0958dd70
move tracking exports to design 2017-09-18 12:06:01 -04:00
Joey Hess
4a45f34fe1
don't support removing content from export with removeKey
There does not seem to be a use case for supporting that, and it would
need a lot of complication to support it in a way that allows eventual
consistency when two repositories are updating the same export.

This commit was sponsored by Henrik Riomar on Patreon.
2017-09-17 17:56:33 -04:00
Joey Hess
494b4066db
clarification 2017-09-16 16:44:27 -04:00
Joey Hess
18ba1be26f
design for next steps on exports 2017-09-16 16:41:04 -04:00
Joey Hess
1223960294
empty directory removal working 2017-09-15 15:24:45 -04:00
Joey Hess
5fe803e14e
update 2017-09-15 12:22:11 -04:00
Joey Hess
268a0cc664
update 2017-09-13 15:52:19 -04:00
Joey Hess
bf48ba4ef7
work around box.com webdav rename bug
Apparently box.com renaming is just buggy. I tried a couple of fixes:

* In case the http Manager was opening multiple connections and reaching
  different backend servers, I tried limiting the number of connections
  to 1. Didn't help.
* To make sure it was not a http connection reuse problem, I tried
  rewriting how exportAction works, so that the same http connection
  is clearly open. Didn't help.

So, disable renaming of exports for box.com. It would be good to test it
with some other webdav server.

This commit was sponsored by John Peloquin on Patreon.
2017-09-13 15:26:56 -04:00
Joey Hess
f8fd66d3f8
fix compaction of export.log
It was not getting old lines removed, because the tree graft confused
the updater, so it union merged from the previous git-annex branch,
which still contained the old lines. Fixed by carefully using setIndexSha.

This commit was supported by the NSF-funded DataLad project.
2017-09-12 18:30:36 -04:00
Joey Hess
c8ed941a26
change export.log format to support multiple export remotes
This breaks backwards compatibility, but only with unreleased versions of
git-annex, which I think is acceptable.

This commit was supported by the NSF-funded DataLad project.
2017-09-12 17:45:52 -04:00
Joey Hess
63ba764923
bug 2017-09-12 17:00:15 -04:00
Joey Hess
9c3622882b
export: cache connections for S3 and webdav 2017-09-12 16:59:04 -04:00
Joey Hess
7ad8e8b889
more box.com strangeness 2017-09-12 15:45:43 -04:00
Joey Hess
7f8892f2d2
document box.com rename problem 2017-09-12 15:16:17 -04:00
Joey Hess
267f47c473
S3: Allow removing files from IA, but warn about derived versions potentially still existing there.
Removal works, only derives are a potential issue, so allow removing
with a warning. This way, unexporting a file works, and behavior is
consistent with IA remotes whether or not exporttree=yes.

Also tested exporting filenames containing unicode, spaces, underscores.
All worked, despite the IA's faq saying it doesn't.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-09-12 12:35:58 -04:00
Joey Hess
650d0955a0
S3 export finalization
Fixed ACL issue, and updated some documentation.
2017-09-08 16:28:28 -04:00
Joey Hess
44cd5ae313
S3 export (untested)
It opens a http connection per file exported, but then so does git
annex copy --to s3.

Decided not to munge exported filenames for IA. Too large a chance of
the munging having confusing results. Instead, export of files not
supported by IA, eg with spaces in their name, will fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-08 15:46:24 -04:00
Joey Hess
165725b9df
update 2017-09-07 16:07:28 -04:00
Joey Hess
a48b52c056
avoid renaming to temp files before deleting
Only rename when actually ncessary.

The diff gets buffered in memory. Probably git has to buffer a diff in
memory when generating it as well, so this memory usage should not be a
problem, even when the diff is very large. I hope.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 14:32:47 -04:00
Joey Hess
16eb2f976c
prevent exporttree=yes on remotes that don't support exports
Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 13:48:44 -04:00
Joey Hess
6ab14710fc
fix consistency bug reading from export database
The export database has writes made to it and then expects to read back
the same data immediately. But, the way that Database.Handle does
writes, in order to support multiple writers, makes that not work, due
to caching issues. This resulted in export re-uploading files it had
already successfully renamed into place.

Fixed by allowing databases to be opened in MultiWriter or SingleWriter
mode. The export database only needs to support a single writer; it does
not make sense for multiple exports to run at the same time to the same
special remote.

All other databases still use MultiWriter mode. And by inspection,
nothing else in git-annex seems to be relying on being able to
immediately query for changes that were just written to the database.

This commit was supported by the NSF-funded DataLad project.
2017-09-06 17:19:07 -04:00
Joey Hess
3ccf661d7c
todo 2017-09-06 15:46:35 -04:00
Joey Hess
1ec3a9eb05
thoughts on handling renames efficiently
This gets complicated, but I think this design will work!

This commit was supported by the NSF-funded DataLad project.
2017-09-06 13:04:09 -04:00
Joey Hess
662f2a5ee7
git annex get from exports
Straightforward enough, except for the needed belt-and-suspenders sanity
checks to avoid foot shooting due to exports not being key/value stores.

* Even when annex.verify=false, always verify from exports.
* Only get files from exports that use a backend that supports
  checksum verification.
* Never trust exports, even if the user says to, because then
  `git annex drop` would drop content if the export seemed to contain
  a copy.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 16:39:56 -04:00
Joey Hess
4da763439b
use export db to correctly handle duplicate files
Removed uncorrect UniqueKey key in db schema; a key can appear multiple
times with different files.

The database has to be flushed after each removal. But when adding files
to the export, lots of changes are able to be queued up w/o flushing.
So it's still fairly efficient.

If large removals of files from exports are too slow, an alternative
would be to make two passes over the diff, one pass queueing deletions
from the database, then a flush and the a second pass updating the
location log. But that would use more memory, and need to look up
exportKey twice per removed file, so I've avoided such optimisation yet.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 14:39:32 -04:00
Joey Hess
28e2cad849
implement exporttree=yes configuration
* Only export to remotes that were initialized to support it.
* Prevent storing key/value on export remotes.
* Prevent enabling exporttree=yes and encryption in the same remote.

SetupStage Enable was changed to take the old RemoteConfig.
This allowed only setting exporttree when initially setting up a
remote, and not configuring it later after stuff might already be stored
in the remote.

Went with =yes rather than =true for consistency with other parts of
git-annex. Changed docs accordingly.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 13:09:38 -04:00
Joey Hess
978885247e
implement export.log and resolve export conflicts
Incremental export updates work now too.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-08-31 15:47:23 -04:00
Joey Hess
7c7af82578
resuming exports
Make a pass over the whole exported tree, and upload anything that has
not yet reached the export. Update location log when exporting.

Note that the synthesized keys for non-annexed files are stored in the
location log too.

Some cases involving files in the tree with the same content are not
handled correctly yet.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-08-31 13:33:50 -04:00
Joey Hess
9f3630f4e0
initial export command
Very basic operation works, but of course this is only the beginning.

This commit was sponsored by Nick Daly on Patreon.
2017-08-29 15:10:01 -04:00
Joey Hess
a2017e944f expand 2017-03-27 18:12:46 -04:00
Joey Hess
9fcd3987f2 idea 2017-03-27 18:10:36 -04:00