From 4bd139c5b6480d3615ba8da581b62baeacc92715 Mon Sep 17 00:00:00 2001 From: crest Date: Wed, 20 Feb 2019 10:49:21 +0000 Subject: [PATCH 01/11] poll vote (My phone (or MP3 player)) --- doc/design/assistant/polls/prioritizing_special_remotes.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/design/assistant/polls/prioritizing_special_remotes.mdwn b/doc/design/assistant/polls/prioritizing_special_remotes.mdwn index c6c2244b25..b5aa0841aa 100644 --- a/doc/design/assistant/polls/prioritizing_special_remotes.mdwn +++ b/doc/design/assistant/polls/prioritizing_special_remotes.mdwn @@ -6,7 +6,7 @@ locally paired systems, and remote servers with rsync. Help me prioritize my work: What special remote would you most like to use with the git-annex assistant? -[[!poll open=yes 18 "Amazon S3 (done)" 13 "Amazon Glacier (done)" 10 "Box.com (done)" 77 "My phone (or MP3 player)" 29 "Tahoe-LAFS" 17 "OpenStack SWIFT" 37 "Google Drive"]] +[[!poll open=yes 18 "Amazon S3 (done)" 13 "Amazon Glacier (done)" 10 "Box.com (done)" 78 "My phone (or MP3 player)" 29 "Tahoe-LAFS" 17 "OpenStack SWIFT" 37 "Google Drive"]] This poll is ordered with the options I consider easiest to build listed first. Mostly because git-annex already supports them and they From c605478e410b3916cdd361069220772cd3ac14a8 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 20 Feb 2019 17:32:05 -0400 Subject: [PATCH 02/11] devblog --- ...day_573__starting_import_tree_implementation.mdwn | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 doc/devblog/day_573__starting_import_tree_implementation.mdwn diff --git a/doc/devblog/day_573__starting_import_tree_implementation.mdwn b/doc/devblog/day_573__starting_import_tree_implementation.mdwn new file mode 100644 index 0000000000..88b95220be --- /dev/null +++ b/doc/devblog/day_573__starting_import_tree_implementation.mdwn @@ -0,0 +1,12 @@ +Started building [[todo/import_tree]] (in the `importtree` branch). So far +the content identifier storage in the git-annex branch is done. Since the +API tells me it will need to both map from a key to content identifiers, +and from content identifier to the key, I also added a sqlite database to +handle the latter. + +While implementing that, I happened to notice a bug in storage of metadata +that contains newlines; [[internals]] said that would be base64'd, but it +was not. That bug turns out to have been introduced by the ByteString +conversion in January, and it's the second bug caused by that conversion. +The other one broke git-annex on Windows, which was fixed by a release +yesterday. From 8428a74a0e3054472c60284d99270fb4abb567e1 Mon Sep 17 00:00:00 2001 From: "62.226.58.176" <62.226.58.176@web> Date: Thu, 21 Feb 2019 18:36:04 +0000 Subject: [PATCH 03/11] poll vote (My phone (or MP3 player)) --- doc/design/assistant/polls/prioritizing_special_remotes.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/design/assistant/polls/prioritizing_special_remotes.mdwn b/doc/design/assistant/polls/prioritizing_special_remotes.mdwn index b5aa0841aa..aeef0a490c 100644 --- a/doc/design/assistant/polls/prioritizing_special_remotes.mdwn +++ b/doc/design/assistant/polls/prioritizing_special_remotes.mdwn @@ -6,7 +6,7 @@ locally paired systems, and remote servers with rsync. Help me prioritize my work: What special remote would you most like to use with the git-annex assistant? -[[!poll open=yes 18 "Amazon S3 (done)" 13 "Amazon Glacier (done)" 10 "Box.com (done)" 78 "My phone (or MP3 player)" 29 "Tahoe-LAFS" 17 "OpenStack SWIFT" 37 "Google Drive"]] +[[!poll open=yes 18 "Amazon S3 (done)" 13 "Amazon Glacier (done)" 10 "Box.com (done)" 79 "My phone (or MP3 player)" 29 "Tahoe-LAFS" 17 "OpenStack SWIFT" 37 "Google Drive"]] This poll is ordered with the options I consider easiest to build listed first. Mostly because git-annex already supports them and they From 433fef865fb5bb8ea54e0885148a5b69b2d11bc8 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 21 Feb 2019 17:45:47 -0400 Subject: [PATCH 04/11] devblog --- doc/devblog/day_574__weeds.mdwn | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 doc/devblog/day_574__weeds.mdwn diff --git a/doc/devblog/day_574__weeds.mdwn b/doc/devblog/day_574__weeds.mdwn new file mode 100644 index 0000000000..fe7e0f589f --- /dev/null +++ b/doc/devblog/day_574__weeds.mdwn @@ -0,0 +1,19 @@ +Not a lot of progress on [[todo/import_tree]] today I feel.. + +Started off by adding a QuickCheck test of the content +identifier log, which did find one bug in that code. + +Then started roughing out the core of the importing operation, which involves +building up git trees for the files that are imported. But that needs a +way to graft an imported tree into a subdirectory of another tree, +and the only way I had available to do it needed to read in the entire +recursive tree of the current branch, which would be slower and use +more memory than I like. + +So, got sidetracked building a git tree grafter. It turns out that +the export tree code also needs to graft a tree (into the git-annex +branch), and did so using the same innefficient method that I want to +avoid, so it will also be able to be improved using the grafter. + +Unfortunately, I had to stop for the day with the grafter not quite working +properly. From baa1699570b9eab28168b93e187b0f750b59f012 Mon Sep 17 00:00:00 2001 From: gan Date: Fri, 22 Feb 2019 18:01:26 +0000 Subject: [PATCH 05/11] Added a comment: Provide flags to youtube-dl? --- .../comment_5_47dfa82fc6426fb9ad050dd00290dc03._comment | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 doc/git-annex-addurl/comment_5_47dfa82fc6426fb9ad050dd00290dc03._comment diff --git a/doc/git-annex-addurl/comment_5_47dfa82fc6426fb9ad050dd00290dc03._comment b/doc/git-annex-addurl/comment_5_47dfa82fc6426fb9ad050dd00290dc03._comment new file mode 100644 index 0000000000..db9c1150a4 --- /dev/null +++ b/doc/git-annex-addurl/comment_5_47dfa82fc6426fb9ad050dd00290dc03._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="gan" + avatar="http://cdn.libravatar.org/avatar/564f55f9fc3773e521bafdbb6f23efc9" + subject="Provide flags to youtube-dl?" + date="2019-02-22T18:01:25Z" + content=""" +Is there already a way to specify flags to youtube-dl on a per-file basis. I think it would be OK to do it during either during addurl (modifying the resulting reference that is stored in the annex somehow), or during git-annex get. This is so that the preferred format can be specified. Primarily this would enable to download audio-only formats for some files. ) Apologies if I missed some documentation on how to achieve this) + +"""]] From 9e38dfd700cc0794417589b7520ea3197e6ca689 Mon Sep 17 00:00:00 2001 From: gan Date: Fri, 22 Feb 2019 18:03:03 +0000 Subject: [PATCH 06/11] Added a comment: Clarification --- .../comment_6_d10797c491185004eab51b6c613e3c66._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/git-annex-addurl/comment_6_d10797c491185004eab51b6c613e3c66._comment diff --git a/doc/git-annex-addurl/comment_6_d10797c491185004eab51b6c613e3c66._comment b/doc/git-annex-addurl/comment_6_d10797c491185004eab51b6c613e3c66._comment new file mode 100644 index 0000000000..97ea1723b8 --- /dev/null +++ b/doc/git-annex-addurl/comment_6_d10797c491185004eab51b6c613e3c66._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="gan" + avatar="http://cdn.libravatar.org/avatar/564f55f9fc3773e521bafdbb6f23efc9" + subject="Clarification" + date="2019-02-22T18:03:03Z" + content=""" +So, to clarify - I read your first answer. But if this coulud be done during \"get\" perhaps then it's OK because it is an explicit request for the potentially unsafe operation? +"""]] From 776916d5fea1e55c8b39a55c1a5d9fc956f93f60 Mon Sep 17 00:00:00 2001 From: gan Date: Fri, 22 Feb 2019 18:03:16 +0000 Subject: [PATCH 07/11] Added a comment: Clarification --- .../comment_7_8d8ac07a0f0fe599ea5ed1e4089b13fa._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/git-annex-addurl/comment_7_8d8ac07a0f0fe599ea5ed1e4089b13fa._comment diff --git a/doc/git-annex-addurl/comment_7_8d8ac07a0f0fe599ea5ed1e4089b13fa._comment b/doc/git-annex-addurl/comment_7_8d8ac07a0f0fe599ea5ed1e4089b13fa._comment new file mode 100644 index 0000000000..cbc29bf551 --- /dev/null +++ b/doc/git-annex-addurl/comment_7_8d8ac07a0f0fe599ea5ed1e4089b13fa._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="gan" + avatar="http://cdn.libravatar.org/avatar/564f55f9fc3773e521bafdbb6f23efc9" + subject="Clarification" + date="2019-02-22T18:03:16Z" + content=""" +So, to clarify - I read your first answer. But if this coulud be done during get perhaps then it's OK because it is an explicit request for the potentially unsafe operation? +"""]] From 5b7daecbd3506bcf36f1e7ddbf1494d4f7fcc43b Mon Sep 17 00:00:00 2001 From: gan Date: Fri, 22 Feb 2019 18:03:32 +0000 Subject: [PATCH 08/11] removed --- .../comment_6_d10797c491185004eab51b6c613e3c66._comment | 8 -------- 1 file changed, 8 deletions(-) delete mode 100644 doc/git-annex-addurl/comment_6_d10797c491185004eab51b6c613e3c66._comment diff --git a/doc/git-annex-addurl/comment_6_d10797c491185004eab51b6c613e3c66._comment b/doc/git-annex-addurl/comment_6_d10797c491185004eab51b6c613e3c66._comment deleted file mode 100644 index 97ea1723b8..0000000000 --- a/doc/git-annex-addurl/comment_6_d10797c491185004eab51b6c613e3c66._comment +++ /dev/null @@ -1,8 +0,0 @@ -[[!comment format=mdwn - username="gan" - avatar="http://cdn.libravatar.org/avatar/564f55f9fc3773e521bafdbb6f23efc9" - subject="Clarification" - date="2019-02-22T18:03:03Z" - content=""" -So, to clarify - I read your first answer. But if this coulud be done during \"get\" perhaps then it's OK because it is an explicit request for the potentially unsafe operation? -"""]] From d7e5a884f7394aa80854c610b541adf4dbcbcb59 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 22 Feb 2019 16:03:19 -0400 Subject: [PATCH 09/11] response --- .../comment_8_b261e7bfeeffb2c5264aaadae1d78817._comment | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 doc/git-annex-addurl/comment_8_b261e7bfeeffb2c5264aaadae1d78817._comment diff --git a/doc/git-annex-addurl/comment_8_b261e7bfeeffb2c5264aaadae1d78817._comment b/doc/git-annex-addurl/comment_8_b261e7bfeeffb2c5264aaadae1d78817._comment new file mode 100644 index 0000000000..fce6e1d997 --- /dev/null +++ b/doc/git-annex-addurl/comment_8_b261e7bfeeffb2c5264aaadae1d78817._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: provide flags to youtube-dl""" + date="2019-02-22T20:01:37Z" + content=""" +@gan, there's not much point in providing flags that are only used in the +initial download; the main point in adding the url to git-annex is so you +can download the same content from it again later. +"""]] From 8c836623b7241f3305d5fc24d30c24b3aa4de77a Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 22 Feb 2019 16:18:09 -0400 Subject: [PATCH 10/11] design work --- .../importing_trees_from_special_remotes.mdwn | 25 +++++++++++++++++++ doc/todo/import_tree.mdwn | 2 ++ 2 files changed, 27 insertions(+) diff --git a/doc/design/importing_trees_from_special_remotes.mdwn b/doc/design/importing_trees_from_special_remotes.mdwn index 8a4e0d9371..b275bdcb25 100644 --- a/doc/design/importing_trees_from_special_remotes.mdwn +++ b/doc/design/importing_trees_from_special_remotes.mdwn @@ -16,6 +16,31 @@ that has the modifications in it. Updating the local working copy is then done by merging the import treeish. This way, conflicts will be detected and handled as normal by git. +## command line interface + +`git annex import --from remote` would import files from the remote to the +top of the working tree. Sometimes users will want to import into a +subdirectory, so there should be a way to do that. + +`git annex export` has its own way to specify a subdirectory to export, +eg "master:subdir" (which is one way of referring to a git tree in git). +So it seems it would make sense to make importing use a similar syntax. +When importing, "master:subdir" would mean to import into a tree at subdir, +and merge it into master. So any branch ref not containing a colon, eg +"master" naturally means import not in a subdir, and merge it into the +branch. + +Note that while export can have a particular commit or tree sha specified, +it does not makes sense to import *to* a particular sha. + +Also, there should be a way to configure it so `git annex sync --content` +first imports from a remote and then exports to it. Currently `git annex +export` has `--tracking` to configure the latter. It seems to only make +sense to import and export the same tracking branch. So, should `git annex +export --tracking` set the same thing, or perhaps it would be better to +move the tracking branch configuration out of `git annex export` and into +an interface that explicitly configures both import and export? + ## content identifiers The remote is responsible for collecting a list of diff --git a/doc/todo/import_tree.mdwn b/doc/todo/import_tree.mdwn index d53f0e214a..eaae6914a7 100644 --- a/doc/todo/import_tree.mdwn +++ b/doc/todo/import_tree.mdwn @@ -3,6 +3,8 @@ and the remote allows files to somehow be edited on it, then there ought to be a way to import the changes back from the remote into the git repository. The command could be `git annex import --from remote` +There also ought to be a way to make `git annex sync` automatically import. + See [[design/importing_trees_from_special_remotes]] for current design for this. From 200dc632f5858925d19c0ca53d53b1045eb06954 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 22 Feb 2019 21:18:01 -0400 Subject: [PATCH 11/11] more design --- .../importing_trees_from_special_remotes.mdwn | 30 +++++++++++++++---- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/doc/design/importing_trees_from_special_remotes.mdwn b/doc/design/importing_trees_from_special_remotes.mdwn index b275bdcb25..0eb549c959 100644 --- a/doc/design/importing_trees_from_special_remotes.mdwn +++ b/doc/design/importing_trees_from_special_remotes.mdwn @@ -9,12 +9,32 @@ their changes into the local repository's version control. The basic idea is to have a `git annex import --from remote` command. It would find changed/new/deleted files on the remote. -Download the changed/new files and inject into the annex. -Generate a new treeish, with parent the treeish that was exported earlier, -that has the modifications in it. +Download the changed/new files and inject into the annex. +And then generate a commit that can be merged (by the command or later by +the user) to make their branch reflect changes made on the remote. -Updating the local working copy is then done by merging the import treeish. -This way, conflicts will be detected and handled as normal by git. +## generating commits and merging + +For the merge to work correctly, the parent of the generated commit +needs to be, when possible, a commit whose tree corresponds to the last +tree that was exported to the remote. This way, git merge will treat the +remote the same as a normal git remote where changes were made. + +The export log does not record the last exported commit though, only the +tree. And the exported tree may not be the tree of any commit in the +history; it's often a subtree. + +So, the export log needs to get a commit sha added to it. And it's possible +that commit will get garbage collected or not pushed, and so not be +available. It could be linked into the git-annex branch as is done for the +exported tree, but doing that for a commit is pretty strange. It's also +possible for the user to export a tree by sha, so there's no commit. +And of course, if no export has been done yet, there would be no commit. + +If the last exported commit is not accessible, or not recorded, seems it +would be ok to make a commit with no parent. git merge would then need +--allow-unrelated-histories, and it would be more likely for the merge to +have conflicts. ## command line interface