From 15601f2b66f9ef7fdb504e6a4b6e4bccf8f162a1 Mon Sep 17 00:00:00 2001 From: supernaught Date: Mon, 28 Aug 2017 22:01:23 +0000 Subject: [PATCH 1/5] Added a comment --- ...mment_2_9ad4c9b2217f739e67198d16d14d32e7._comment | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 doc/todo/Invert_remote_selection/comment_2_9ad4c9b2217f739e67198d16d14d32e7._comment diff --git a/doc/todo/Invert_remote_selection/comment_2_9ad4c9b2217f739e67198d16d14d32e7._comment b/doc/todo/Invert_remote_selection/comment_2_9ad4c9b2217f739e67198d16d14d32e7._comment new file mode 100644 index 0000000000..5acf39f956 --- /dev/null +++ b/doc/todo/Invert_remote_selection/comment_2_9ad4c9b2217f739e67198d16d14d32e7._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="supernaught" + avatar="http://cdn.libravatar.org/avatar/55f92a50f2617099e2dc7509130ce158" + subject="comment 2" + date="2017-08-28T22:01:23Z" + content=""" +It's not very ergonomic to type out so much each for each sync, but I suppose it technically accomplishes the idea. + +Still -- wouldn't making '\!x' alias to '-c remote.x.annex-sync=false' have minimal impact and provide a bit more symmetry with the matching-options? + +I'm not familiar with Haskell, but could probably fumble my way through this one. Would you accept a patch? +"""]] From 74aa4c503b196464272233a2dd7d0e476dd7c3a3 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 29 Aug 2017 17:26:42 -0400 Subject: [PATCH 2/5] devblog --- .../exporting_trees_to_special_remotes.mdwn | 10 +++++----- doc/devblog/day_466__export_prototype.mdwn | 6 ++++++ doc/internals.mdwn | 15 ++++++++++++++- 3 files changed, 25 insertions(+), 6 deletions(-) create mode 100644 doc/devblog/day_466__export_prototype.mdwn diff --git a/doc/design/exporting_trees_to_special_remotes.mdwn b/doc/design/exporting_trees_to_special_remotes.mdwn index 9327b475f0..39c291c540 100644 --- a/doc/design/exporting_trees_to_special_remotes.mdwn +++ b/doc/design/exporting_trees_to_special_remotes.mdwn @@ -206,15 +206,15 @@ there would be a merge conflict. Union merging would *scramble* the exported tree, so even if a smart merge is added, old versions of git-annex would corrupt the exported tree. -To avoid that problem, add a log file `exported/uuid.log` that lists -the sha1 of the exported tree and the uuid of the repository that exported it. +To avoid that problem, add a log file `export.log` that contains the uuid +of the remote that was exported to, and the sha1 of the exported tree. To avoid the exported tree being GCed, do graft it in to the git-annex branch, but follow that with a commit that removes the tree again, and only update `refs/heads/git-annex` after making both commits. -If `exported/uuid.log` contains multiple active exports, there was an -export conflict. Short of downloading the whole export to checksum it, -or deleting the whole export, what can be done to resolve it? +If `export.log` contains multiple active exports of different trees, +there was an export conflict. Short of downloading the whole export to +checksum it, or deleting the whole export, what can be done to resolve it? In this case, git-annex knows both exported trees. Have the user provide a tree that resolves the conflict as they desire (it could be the same as diff --git a/doc/devblog/day_466__export_prototype.mdwn b/doc/devblog/day_466__export_prototype.mdwn new file mode 100644 index 0000000000..cdc1926f83 --- /dev/null +++ b/doc/devblog/day_466__export_prototype.mdwn @@ -0,0 +1,6 @@ +Put together a prototype of `git annex export` in the "export" branch. +Exporting to a directory special remote is basically working, but this is +only the beginning. + +Today's work was sponsored by Jake Vosloo on +[Patreon](https://patreon.com/joeyh/) diff --git a/doc/internals.mdwn b/doc/internals.mdwn index 4ed8001d48..7d39b10681 100644 --- a/doc/internals.mdwn +++ b/doc/internals.mdwn @@ -176,10 +176,23 @@ File format is identical to preferred-content.log. Contains standard preferred content settings for groups. (Overriding or supplementing the ones built into git-annex.) -The file format is one line per group, staring with a timestamp, then a +The file format is one line per group, starting with a timestamp, then a space, then the group name followed by a space and then the preferred content expression. +## `export.log` + +Tracks what trees have been exported to special remotes by +[[git-annex-export]](1). + +Each line starts with a timestamp, then the uuid of the special remote, +followed by the sha1 of the tree that was exported to that special remote. + +(The exported tree is also grafted into the git-annex branch, at +`export.tree`, to prevent git from garbage collecting it. However, the head +of the git-annex branch should never contain such a grafted in tree; +the grafted tree is removed in the same commit that updates `export.log`.) + ## `aaa/bbb/*.log` These log files record [[location_tracking]] information From 71682954f8470312331ff37d09e381cb4d273ef4 Mon Sep 17 00:00:00 2001 From: vgp Date: Wed, 30 Aug 2017 12:42:23 +0000 Subject: [PATCH 3/5] Added a comment --- .../comment_2_44eb5f14dbf3c5061afeb847958f27fe._comment | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 doc/forum/huge_text_files___40__not_binary__41___-_compress/comment_2_44eb5f14dbf3c5061afeb847958f27fe._comment diff --git a/doc/forum/huge_text_files___40__not_binary__41___-_compress/comment_2_44eb5f14dbf3c5061afeb847958f27fe._comment b/doc/forum/huge_text_files___40__not_binary__41___-_compress/comment_2_44eb5f14dbf3c5061afeb847958f27fe._comment new file mode 100644 index 0000000000..a1bee43df0 --- /dev/null +++ b/doc/forum/huge_text_files___40__not_binary__41___-_compress/comment_2_44eb5f14dbf3c5061afeb847958f27fe._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="vgp" + avatar="http://cdn.libravatar.org/avatar/b332bfc1d3f49c196e1bff84b53d0f8b" + subject="comment 2" + date="2017-08-30T12:42:22Z" + content=""" +Thanks for your comments joey! +In fact, compress files in the working tree is not mandatory. The main question is compress then in the git server (quota reasons). When we were using only git, it was slow (caused by huge files) but the files were compressed. Now, using git-annex, the operations are faster but the size of the repository increases a lot (due to lack of compression) and that is the problem once we've reached the disk quota in the git server. +"""]] From b14c4776d69c996dc78ece1c82f7c8a7ef05276e Mon Sep 17 00:00:00 2001 From: yarikoptic Date: Wed, 30 Aug 2017 14:15:45 +0000 Subject: [PATCH 4/5] initial bug report --- ...__34___to_get_files_with_the_same_key.mdwn | 48 +++++++++++++++++++ 1 file changed, 48 insertions(+) create mode 100644 doc/bugs/get_-J___34__fails__34___to_get_files_with_the_same_key.mdwn diff --git a/doc/bugs/get_-J___34__fails__34___to_get_files_with_the_same_key.mdwn b/doc/bugs/get_-J___34__fails__34___to_get_files_with_the_same_key.mdwn new file mode 100644 index 0000000000..fade3b3318 --- /dev/null +++ b/doc/bugs/get_-J___34__fails__34___to_get_files_with_the_same_key.mdwn @@ -0,0 +1,48 @@ +### What steps will reproduce the problem? + +ask annex get in parallel files which point to the same key + +### What version of git-annex are you using? On what operating system? + +6.20170815+gitg22da64d0f-1~ndall+1 + +### Please provide any additional information below. + +[[!format sh """ +# works in serial mode + +$> git annex get rh.white{,_avg} +get rh.white (from web...) +/mnt/btrfs/scrap/tmp/ds0001 100%[===========================================>] 360.31K --.-KB/s in 0.1s +2017-08-30 10:08:02 URL:https://dl.dropboxusercontent.com/s/0lww4tomnwfanwd/rh.white_avg?dl=0 [368962/368962] -> "/mnt/btrfs/scrap/tmp/ds000114/derivatives/freesurfer/.git/annex/tmp/MD5E-s368962--99a4db61cedffee686aef99b2d197794" [1] +(checksum...) ok +(recording state in git...) +(dev)2 10016.....................................:Wed 30 Aug 2017 10:08:02 AM EDT:. +(git)smaug:…/btrfs/scrap/tmp/ds000114/derivatives/freesurfer[master]fsaverage5/surf +$> git annex drop --fast rh.white{,_avg} +drop rh.white (checking https://dl.dropbox.com/s/0lww4tomnwfanwd/rh.white_avg?dl=0...) ok +(recording state in git...) + +# "fails" in parallel +$> git annex get -J2 rh.white{,_avg} +get rh.white get rh.white_avg (transfer already in progress, or unable to take transfer lock) + Unable to access these remotes: web +(from web...) + + Try making some of these repositories available: + 00000000-0000-0000-0000-000000000001 -- web + 5e47b3f3-f09c-4969-8885-920a49ff8a45 -- yoh@smaug:/mnt/btrfs/datasets/datalad/crawl/workshops/nih-workshop-2017/ds000114/derivatives/freesurfer +failed +/mnt/btrfs/scrap/tmp/ds0001 100%[===========================================>] 360.31K 1.63MB/s in 0.2s +2017-08-30 10:08:21 URL:https://dl.dropboxusercontent.com/s/0lww4tomnwfanwd/rh.white_avg?dl=0 [368962/368962] -> "/mnt/btrfs/scrap/tmp/ds000114/derivatives/freesurfer/.git/annex/tmp/MD5E-s368962--99a4db61cedffee686aef99b2d197794" [1] +(checksum...) ok +(recording state in git...) +git-annex: get: 1 failed +(dev)2 10018 ->1.....................................:Wed 30 Aug 2017 10:08:21 AM EDT:. + +"""]] + +so at the end we get a run of git-annex which exits with error 1... and in json mode also the error(s) reported etc. +I wondered if annex should first analyze passed paths to get actual keys to be fetched? + +[[!meta author=yoh]] From bdec46ac13dba6c02a61b3c7087cfc2eac08e792 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 30 Aug 2017 13:14:05 -0400 Subject: [PATCH 5/5] a few tweaks to the design --- .../exporting_trees_to_special_remotes.mdwn | 20 ++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/doc/design/exporting_trees_to_special_remotes.mdwn b/doc/design/exporting_trees_to_special_remotes.mdwn index 39c291c540..ce7431141f 100644 --- a/doc/design/exporting_trees_to_special_remotes.mdwn +++ b/doc/design/exporting_trees_to_special_remotes.mdwn @@ -35,6 +35,11 @@ To export a treeish, the user can run: That does all necessary uploads etc to make the special remote contain the tree of files. The treeish can be a tag, a branch, or a tree. +If a file's content is not present, it won't be exported. Re-running the +same export later should export files whose content has become present. +(This likely means a second pass, and needs location tracking to track +which files are in the export.) + Users may sometimes want to export multiple treeishes to a single special remote. For example, exporting several tags. This interface could be complicated to support that, putting the treeishes in subdirectories on the @@ -144,9 +149,13 @@ when using any of the above. ## location tracking +Since not all the files in an exported treeish may have content +present when the export is done, location tracking will be needed so that +getting the files and exporting again transfers their content. + Does a copy of a file exported to a special remote count as a copy of a file as far as [[numcopies]] goes? Should git-annex get download -a file from an export? Or should exporting not update location tracking? +a file from an export? The problem is that special remotes with exports are not key/value stores. The content of a file can change, and if multiple @@ -218,10 +227,11 @@ checksum it, or deleting the whole export, what can be done to resolve it? In this case, git-annex knows both exported trees. Have the user provide a tree that resolves the conflict as they desire (it could be the same as -one of the exported trees, or some merge of them). Then diff each exported -tree in turn against the resolving tree. If a file differs, re-export that -file. In some cases this will do unncessary re-uploads, but it's reasonably -efficient. +one of the exported trees, or some merge of them or an entirely new tree). +The UI to do this can just be another `git annex export $tree --to remote`. +To resolve, diff each exported tree in turn against the resolving tree. If a +file differs, re-export that file. In some cases this will do unncessary +re-uploads, but it's reasonably efficient. The documentation should suggest strongly only exporting to a given special remote from a single repository, or having some other rule that avoids