diff --git a/doc/design/assistant/polls/prioritizing_special_remotes.mdwn b/doc/design/assistant/polls/prioritizing_special_remotes.mdwn index c6c2244b25..aeef0a490c 100644 --- a/doc/design/assistant/polls/prioritizing_special_remotes.mdwn +++ b/doc/design/assistant/polls/prioritizing_special_remotes.mdwn @@ -6,7 +6,7 @@ locally paired systems, and remote servers with rsync. Help me prioritize my work: What special remote would you most like to use with the git-annex assistant? -[[!poll open=yes 18 "Amazon S3 (done)" 13 "Amazon Glacier (done)" 10 "Box.com (done)" 77 "My phone (or MP3 player)" 29 "Tahoe-LAFS" 17 "OpenStack SWIFT" 37 "Google Drive"]] +[[!poll open=yes 18 "Amazon S3 (done)" 13 "Amazon Glacier (done)" 10 "Box.com (done)" 79 "My phone (or MP3 player)" 29 "Tahoe-LAFS" 17 "OpenStack SWIFT" 37 "Google Drive"]] This poll is ordered with the options I consider easiest to build listed first. Mostly because git-annex already supports them and they diff --git a/doc/design/importing_trees_from_special_remotes.mdwn b/doc/design/importing_trees_from_special_remotes.mdwn index 33b7ec49a8..4be865d069 100644 --- a/doc/design/importing_trees_from_special_remotes.mdwn +++ b/doc/design/importing_trees_from_special_remotes.mdwn @@ -9,12 +9,57 @@ their changes into the local repository's version control. The basic idea is to have a `git annex import --from remote` command. It would find changed/new/deleted files on the remote. -Download the changed/new files and inject into the annex. -Generate a new treeish, with parent the treeish that was exported earlier, -that has the modifications in it. +Download the changed/new files and inject into the annex. +And then generate a commit that can be merged (by the command or later by +the user) to make their branch reflect changes made on the remote. -Updating the local working copy is then done by merging the import treeish. -This way, conflicts will be detected and handled as normal by git. +## generating commits and merging + +For the merge to work correctly, the parent of the generated commit +needs to be, when possible, a commit whose tree corresponds to the last +tree that was exported to the remote. This way, git merge will treat the +remote the same as a normal git remote where changes were made. + +The export log does not record the last exported commit though, only the +tree. And the exported tree may not be the tree of any commit in the +history; it's often a subtree. + +So, the export log needs to get a commit sha added to it. And it's possible +that commit will get garbage collected or not pushed, and so not be +available. It could be linked into the git-annex branch as is done for the +exported tree, but doing that for a commit is pretty strange. It's also +possible for the user to export a tree by sha, so there's no commit. +And of course, if no export has been done yet, there would be no commit. + +If the last exported commit is not accessible, or not recorded, seems it +would be ok to make a commit with no parent. git merge would then need +--allow-unrelated-histories, and it would be more likely for the merge to +have conflicts. + +## command line interface + +`git annex import --from remote` would import files from the remote to the +top of the working tree. Sometimes users will want to import into a +subdirectory, so there should be a way to do that. + +`git annex export` has its own way to specify a subdirectory to export, +eg "master:subdir" (which is one way of referring to a git tree in git). +So it seems it would make sense to make importing use a similar syntax. +When importing, "master:subdir" would mean to import into a tree at subdir, +and merge it into master. So any branch ref not containing a colon, eg +"master" naturally means import not in a subdir, and merge it into the +branch. + +Note that while export can have a particular commit or tree sha specified, +it does not makes sense to import *to* a particular sha. + +Also, there should be a way to configure it so `git annex sync --content` +first imports from a remote and then exports to it. Currently `git annex +export` has `--tracking` to configure the latter. It seems to only make +sense to import and export the same tracking branch. So, should `git annex +export --tracking` set the same thing, or perhaps it would be better to +move the tracking branch configuration out of `git annex export` and into +an interface that explicitly configures both import and export? ## content identifiers diff --git a/doc/devblog/day_573__starting_import_tree_implementation.mdwn b/doc/devblog/day_573__starting_import_tree_implementation.mdwn new file mode 100644 index 0000000000..88b95220be --- /dev/null +++ b/doc/devblog/day_573__starting_import_tree_implementation.mdwn @@ -0,0 +1,12 @@ +Started building [[todo/import_tree]] (in the `importtree` branch). So far +the content identifier storage in the git-annex branch is done. Since the +API tells me it will need to both map from a key to content identifiers, +and from content identifier to the key, I also added a sqlite database to +handle the latter. + +While implementing that, I happened to notice a bug in storage of metadata +that contains newlines; [[internals]] said that would be base64'd, but it +was not. That bug turns out to have been introduced by the ByteString +conversion in January, and it's the second bug caused by that conversion. +The other one broke git-annex on Windows, which was fixed by a release +yesterday. diff --git a/doc/devblog/day_574__weeds.mdwn b/doc/devblog/day_574__weeds.mdwn new file mode 100644 index 0000000000..fe7e0f589f --- /dev/null +++ b/doc/devblog/day_574__weeds.mdwn @@ -0,0 +1,19 @@ +Not a lot of progress on [[todo/import_tree]] today I feel.. + +Started off by adding a QuickCheck test of the content +identifier log, which did find one bug in that code. + +Then started roughing out the core of the importing operation, which involves +building up git trees for the files that are imported. But that needs a +way to graft an imported tree into a subdirectory of another tree, +and the only way I had available to do it needed to read in the entire +recursive tree of the current branch, which would be slower and use +more memory than I like. + +So, got sidetracked building a git tree grafter. It turns out that +the export tree code also needs to graft a tree (into the git-annex +branch), and did so using the same innefficient method that I want to +avoid, so it will also be able to be improved using the grafter. + +Unfortunately, I had to stop for the day with the grafter not quite working +properly. diff --git a/doc/git-annex-addurl/comment_5_47dfa82fc6426fb9ad050dd00290dc03._comment b/doc/git-annex-addurl/comment_5_47dfa82fc6426fb9ad050dd00290dc03._comment new file mode 100644 index 0000000000..db9c1150a4 --- /dev/null +++ b/doc/git-annex-addurl/comment_5_47dfa82fc6426fb9ad050dd00290dc03._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="gan" + avatar="http://cdn.libravatar.org/avatar/564f55f9fc3773e521bafdbb6f23efc9" + subject="Provide flags to youtube-dl?" + date="2019-02-22T18:01:25Z" + content=""" +Is there already a way to specify flags to youtube-dl on a per-file basis. I think it would be OK to do it during either during addurl (modifying the resulting reference that is stored in the annex somehow), or during git-annex get. This is so that the preferred format can be specified. Primarily this would enable to download audio-only formats for some files. ) Apologies if I missed some documentation on how to achieve this) + +"""]] diff --git a/doc/git-annex-addurl/comment_7_8d8ac07a0f0fe599ea5ed1e4089b13fa._comment b/doc/git-annex-addurl/comment_7_8d8ac07a0f0fe599ea5ed1e4089b13fa._comment new file mode 100644 index 0000000000..cbc29bf551 --- /dev/null +++ b/doc/git-annex-addurl/comment_7_8d8ac07a0f0fe599ea5ed1e4089b13fa._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="gan" + avatar="http://cdn.libravatar.org/avatar/564f55f9fc3773e521bafdbb6f23efc9" + subject="Clarification" + date="2019-02-22T18:03:16Z" + content=""" +So, to clarify - I read your first answer. But if this coulud be done during get perhaps then it's OK because it is an explicit request for the potentially unsafe operation? +"""]] diff --git a/doc/git-annex-addurl/comment_8_b261e7bfeeffb2c5264aaadae1d78817._comment b/doc/git-annex-addurl/comment_8_b261e7bfeeffb2c5264aaadae1d78817._comment new file mode 100644 index 0000000000..fce6e1d997 --- /dev/null +++ b/doc/git-annex-addurl/comment_8_b261e7bfeeffb2c5264aaadae1d78817._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: provide flags to youtube-dl""" + date="2019-02-22T20:01:37Z" + content=""" +@gan, there's not much point in providing flags that are only used in the +initial download; the main point in adding the url to git-annex is so you +can download the same content from it again later. +"""]] diff --git a/doc/todo/import_tree.mdwn b/doc/todo/import_tree.mdwn index 16a4808d1a..1ddbe2c125 100644 --- a/doc/todo/import_tree.mdwn +++ b/doc/todo/import_tree.mdwn @@ -3,6 +3,8 @@ and the remote allows files to somehow be edited on it, then there ought to be a way to import the changes back from the remote into the git repository. The command could be `git annex import --from remote` +There also ought to be a way to make `git annex sync` automatically import. + See [[design/importing_trees_from_special_remotes]] for current design for this.