Commit graph

2181 commits

Author SHA1 Message Date
Joey Hess
cd3f231d21
retitle 2018-09-12 14:21:39 -04:00
Joey Hess
e01903efc5
response 2018-09-11 13:13:27 -04:00
https://me.yahoo.com/a/iOGTltEpmOTQ.xZ99NFP5c7Zdcc-#6a7ba
cf2cc6f7fc Added a comment 2018-09-07 18:04:51 +00:00
https://me.yahoo.com/a/iOGTltEpmOTQ.xZ99NFP5c7Zdcc-#6a7ba
1f0ab538fc 2018-09-07 17:01:04 +00:00
Joey Hess
19e91d5ee3
Merge branch 'master' of ssh://git-annex.branchable.com 2018-09-06 14:37:42 -04:00
Joey Hess
b7daf2685f
support public versioned S3 access
Makes git annex whereis display the versionId urls.

And, when a s3 remote is enabled without creds, git-annex will use the
versionId urls to access its contents.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-09-06 14:31:41 -04:00
Joey Hess
0630ef166b
thought 2018-09-06 13:21:46 -04:00
Joey Hess
256669a85d
close as I don't want to do this 2018-09-06 13:16:13 -04:00
Joey Hess
1c86ba8ee8
close, these seem done already 2018-09-06 13:15:21 -04:00
Joey Hess
50fb6a86f9
thoughts 2018-09-06 13:09:18 -04:00
anarcat
694f612fba Added a comment: added as a special remote 2018-09-06 00:58:52 +00:00
Joey Hess
0a7c5a9982
dropdead per-remote metadata
Had to refactor pure code into separate modules so it is accessible
inside Annex.Branch.Transitions.

This commit was sponsored by Peter on Patreon.
2018-09-05 13:52:46 -04:00
Joey Hess
f1e5dfb7c7
close 2018-09-05 12:21:52 -04:00
Joey Hess
8eb944ea11
close todo, open todo 2018-08-31 14:01:24 -04:00
Joey Hess
308f49e9ae
update 2018-08-31 13:56:32 -04:00
Joey Hess
b3d42283ad
use per-remote metadata storage for S3 version ID
Since the same key can be stored in a versioned S3 bucket multiple times
with different version IDs, this allows tracking them all. Not currently
needed, but if we ever want to drop from a versioned S3 bucket, we'll
need to know them all.

This commit was supported by the NSF-funded DataLad project.
2018-08-31 13:27:29 -04:00
Joey Hess
5c99f6247e
per-remote metadata storage
Actually very straightforward reuse of the metadata log file code.
Although I had to add a todo item as git-annex forget won't clean up
dead remote's metadata yet.

This would be worth adding to the external special remote interface
sometime. Have not opened a todo though, guess I'll wait until something
needs it.

This commit was supported by the NSF-funded DataLad project.
2018-08-31 12:23:22 -04:00
Joey Hess
9d78a4387f
update 2018-08-31 12:23:04 -04:00
Joey Hess
3a5d0402ba
update 2018-08-30 15:49:21 -04:00
Joey Hess
19dcff2b71
use S3 version ID for retrieval
Have to store the S3 object along with the version ID, so retrieval can
use the same object.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 15:37:08 -04:00
Joey Hess
794e9a7a44
store S3 version IDs
Only done when versioning=yes is configured. It could always do it when
S3 sends back a version id, but there may be buckets that have
versioning enabled by accident, so it seemed better to honor the
configuration.

S3's docs say version IDs are "randomly generated", so presumably
storing the same content twice gets two different ones not the same one.
So I considered storing a list of version IDs for a key. That would
allow removing the key completely. But.. The way Logs.RemoteState works,
when there are multiple writers, the last writer wins. So storing a list
would need a different log format that merges, which seemed overkill to support
removing a key from an append-only remote.

Note that Logs.RemoteState for S3 is now dedicated to version IDs.
If something else needs to be stored, a new log will be needed to do it.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 14:30:56 -04:00
Joey Hess
0ff5a41311
S3 versioning=yes config
Not yet used.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 13:45:28 -04:00
Joey Hess
358178fbfb
don't untrust appendonly exports
Make exporttree=yes remotes that are appendonly not be untrusted, and not force
verification of content, since the usual concerns about losing data when an
export is updated by someone else don't apply.

Note that all the remote operations on keys are left as usual for
appendonly export remotes, except for storing content.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:48:04 -04:00
Joey Hess
8b39db20b5
export appendonly support
Make `git annex export` check appendonly when removing a file from an
export, and not update the location log, since the remote still contains
the content.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:18:20 -04:00
Joey Hess
dad627fa9e
remove false starts, simplify 2018-08-29 14:12:18 -04:00
Joey Hess
5b78952f78
misunderstood some code; simplify 2018-08-29 14:09:18 -04:00
Joey Hess
e216c18318
new much improved plan 2018-08-29 13:59:52 -04:00
Joey Hess
3874c5c88d
further thoughts 2018-08-29 10:56:02 -04:00
Joey Hess
b1280eb252
new todo (requested by yoh) 2018-08-28 12:14:06 -04:00
Joey Hess
6adc0d2b3f
bug triage 2018-08-27 15:10:05 -04:00
Joey Hess
2c9f21e987
todo 2018-08-26 20:59:20 -04:00
anarcat
af727108b0 update status to mention tor 2018-08-24 21:35:11 +00:00
anarcat
bfab1da5a7 mention that dat thing 2018-08-24 21:30:20 +00:00
Joey Hess
98fd7ec6c9
recover from race between git mv+commit and git-annex get
Last of the known v6 races.

This also makes git add of a pointer file populate it when its content
is present in the annex. Which makes sense to do, I think.

This commit was supported by the NSF-funded DataLad project.
2018-08-22 16:01:50 -04:00
Joey Hess
50fa17aee6
v6: recover from race between git mv and git-annex get/drop
Update pointer file next time reconcileStaged is run to recover from the
race.

Note that restagePointerFile causes git to run the clean filter,
and that will run reconcileStaged. So, normally by the time the git
annex get/drop command finishes, the race has already been dealt with.
It may be that, in some case, that won't happen and the race will be
dealt with at a later point. git-annex could run reconcileStaged at
shutdown if that becomes a problem.

This does not handle the situation where the git mv is committed before
git-annex gets a chance to run again. git commit does run the clean
filter, and that happens to re-inject the content if it was supposed to
be dropped but is still populated. But, the case where the file was
supposed to be gotten but is not populated is not handled yet.

This commit was supported by the NSF-funded DataLad project.
2018-08-22 15:56:43 -04:00
Joey Hess
e9b2674281
plan 2018-08-22 13:58:32 -04:00
Joey Hess
38a934cf07
correction 2018-08-22 13:34:15 -04:00
Joey Hess
18ecf41917
avoid running reconcileStaged when the index has not changed
This commit was supported by the NSF-funded DataLad project.
2018-08-22 13:04:12 -04:00
Joey Hess
5e56d9b620
v6: Update associated files database when git has staged changes to pointer files
This commit was supported by the NSF-funded DataLad project.
2018-08-21 17:02:20 -04:00
Joey Hess
b8cd5fde17
idea 2018-08-20 16:13:46 -04:00
Joey Hess
54d49eeac8
avoid update-index race
This commit was supported by the NSF-funded DataLad project.
2018-08-17 16:03:40 -04:00
Joey Hess
ec91b6e4b2
plan to fix race 2018-08-17 11:18:53 -04:00
Joey Hess
5799d325f0
update todo categories 2018-08-16 16:36:47 -04:00
Joey Hess
82a239675f
narrow the race where a file gets modified before update-index
Check just before running update-index if the worktree file's content is
still the same, don't update it when it's been modified. This narrows
the race window a lot, from possibly minutes or hours, to seconds or
less.

(Use replaceFile so that the worktree update happens atomically,
allowing the InodeCache of the new worktree file to itself be gathered
w/o any other race.)

This doesn't eliminate the race; it can still occur in the window before
update-index runs. When annex.queue is large, a lot of files will be
statted by the checks, and so the window may still be large enough to be a
problem.

When only a few files are being processed, the window is as small as it
is in the race where a modification gets overwritten by git-annex when
it updates the worktree. Or maybe as small as whatever race git
checkout/pull/merge may have when the worktree gets modified during it.
Still, I've kept a todo about this race.

This commit was supported by the NSF-funded DataLad project.
2018-08-16 15:56:43 -04:00
Joey Hess
82cfcfc838
better index file refresh method
Use git update-index --refresh, since it's a little bit more
efficient and the user can be told to run it if a locked index prevents
git-annex from running it.

This also fixes the problem where an annexed file was deleted in the index
and a get of another file that uses the same key caused the index update to
add back the deleted file. update-index will not add back the deleted file.

Documented in tips/unlocked_files.mdwn the gotcha that the index update
may conflict with other operations. I can't see any way to possibly avoid
that conflict.

One new todo about a race that causes a modification to be accidentially
staged.

Note that the assistant only flushes the git command queue when it
commits a modification. I have not tested the assistant with v6 unlocked
files, but assume most users of the assistant won't care if the index
shows a file as modified for a while.

This commit was supported by the NSF-funded DataLad project.
2018-08-16 14:16:24 -04:00
Joey Hess
4c5a9965c1
remove invalid todo item
I tested it, and it's ok. I think I was adding it under a filename that
produced a different key.
2018-08-15 13:34:48 -04:00
Joey Hess
48e9e12961
finally fixed v6 get/drop git status
After updating the worktree for an add/drop, update git's index, so git
status will not show the files as modified.

What actually happens is that the index update removes the inode
information from the index. The next git status (or similar) run
then has to do some work. It runs the clean filter.

So, this depends on the clean filter being reasonably fast and on git
not leaking memory when running it. Both problems were fixed in
a96972015d, but only for git 2.5. Anyone
using an older git will see very expensive git status after an add/drop.

This uses the same git update-index queue as other parts of git-annex, so
the actual index update is fairly efficient. Of course, updating the index
does still have some overhead. The annex.queuesize config will control how
often the index gets updated when working on a lot of files.

This is an imperfect workaround... Added several todos about new
problems this workaround causes. Still, this seems a lot better than the
old behavior.

This commit was supported by the NSF-funded DataLad project.
2018-08-14 16:23:58 -04:00
Joey Hess
66a4483dfa
response 2018-08-14 11:02:55 -04:00
Joey Hess
d8a8f2df70
full plan 2018-08-13 17:51:02 -04:00
Joey Hess
86df0d6e1b
even better idea 2018-08-13 17:43:16 -04:00