Merge branch 'master' into importtree

This commit is contained in:
Joey Hess 2019-02-22 21:18:13 -04:00
commit 4e0d08b66b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
8 changed files with 110 additions and 6 deletions

View file

@ -6,7 +6,7 @@ locally paired systems, and remote servers with rsync.
Help me prioritize my work: What special remote would you most like Help me prioritize my work: What special remote would you most like
to use with the git-annex assistant? to use with the git-annex assistant?
[[!poll open=yes 18 "Amazon S3 (done)" 13 "Amazon Glacier (done)" 10 "Box.com (done)" 77 "My phone (or MP3 player)" 29 "Tahoe-LAFS" 17 "OpenStack SWIFT" 37 "Google Drive"]] [[!poll open=yes 18 "Amazon S3 (done)" 13 "Amazon Glacier (done)" 10 "Box.com (done)" 79 "My phone (or MP3 player)" 29 "Tahoe-LAFS" 17 "OpenStack SWIFT" 37 "Google Drive"]]
This poll is ordered with the options I consider easiest to build This poll is ordered with the options I consider easiest to build
listed first. Mostly because git-annex already supports them and they listed first. Mostly because git-annex already supports them and they

View file

@ -9,12 +9,57 @@ their changes into the local repository's version control.
The basic idea is to have a `git annex import --from remote` command. The basic idea is to have a `git annex import --from remote` command.
It would find changed/new/deleted files on the remote. It would find changed/new/deleted files on the remote.
Download the changed/new files and inject into the annex. Download the changed/new files and inject into the annex.
Generate a new treeish, with parent the treeish that was exported earlier, And then generate a commit that can be merged (by the command or later by
that has the modifications in it. the user) to make their branch reflect changes made on the remote.
Updating the local working copy is then done by merging the import treeish. ## generating commits and merging
This way, conflicts will be detected and handled as normal by git.
For the merge to work correctly, the parent of the generated commit
needs to be, when possible, a commit whose tree corresponds to the last
tree that was exported to the remote. This way, git merge will treat the
remote the same as a normal git remote where changes were made.
The export log does not record the last exported commit though, only the
tree. And the exported tree may not be the tree of any commit in the
history; it's often a subtree.
So, the export log needs to get a commit sha added to it. And it's possible
that commit will get garbage collected or not pushed, and so not be
available. It could be linked into the git-annex branch as is done for the
exported tree, but doing that for a commit is pretty strange. It's also
possible for the user to export a tree by sha, so there's no commit.
And of course, if no export has been done yet, there would be no commit.
If the last exported commit is not accessible, or not recorded, seems it
would be ok to make a commit with no parent. git merge would then need
--allow-unrelated-histories, and it would be more likely for the merge to
have conflicts.
## command line interface
`git annex import --from remote` would import files from the remote to the
top of the working tree. Sometimes users will want to import into a
subdirectory, so there should be a way to do that.
`git annex export` has its own way to specify a subdirectory to export,
eg "master:subdir" (which is one way of referring to a git tree in git).
So it seems it would make sense to make importing use a similar syntax.
When importing, "master:subdir" would mean to import into a tree at subdir,
and merge it into master. So any branch ref not containing a colon, eg
"master" naturally means import not in a subdir, and merge it into the
branch.
Note that while export can have a particular commit or tree sha specified,
it does not makes sense to import *to* a particular sha.
Also, there should be a way to configure it so `git annex sync --content`
first imports from a remote and then exports to it. Currently `git annex
export` has `--tracking` to configure the latter. It seems to only make
sense to import and export the same tracking branch. So, should `git annex
export --tracking` set the same thing, or perhaps it would be better to
move the tracking branch configuration out of `git annex export` and into
an interface that explicitly configures both import and export?
## content identifiers ## content identifiers

View file

@ -0,0 +1,12 @@
Started building [[todo/import_tree]] (in the `importtree` branch). So far
the content identifier storage in the git-annex branch is done. Since the
API tells me it will need to both map from a key to content identifiers,
and from content identifier to the key, I also added a sqlite database to
handle the latter.
While implementing that, I happened to notice a bug in storage of metadata
that contains newlines; [[internals]] said that would be base64'd, but it
was not. That bug turns out to have been introduced by the ByteString
conversion in January, and it's the second bug caused by that conversion.
The other one broke git-annex on Windows, which was fixed by a release
yesterday.

View file

@ -0,0 +1,19 @@
Not a lot of progress on [[todo/import_tree]] today I feel..
Started off by adding a QuickCheck test of the content
identifier log, which did find one bug in that code.
Then started roughing out the core of the importing operation, which involves
building up git trees for the files that are imported. But that needs a
way to graft an imported tree into a subdirectory of another tree,
and the only way I had available to do it needed to read in the entire
recursive tree of the current branch, which would be slower and use
more memory than I like.
So, got sidetracked building a git tree grafter. It turns out that
the export tree code also needs to graft a tree (into the git-annex
branch), and did so using the same innefficient method that I want to
avoid, so it will also be able to be improved using the grafter.
Unfortunately, I had to stop for the day with the grafter not quite working
properly.

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="gan"
avatar="http://cdn.libravatar.org/avatar/564f55f9fc3773e521bafdbb6f23efc9"
subject="Provide flags to youtube-dl?"
date="2019-02-22T18:01:25Z"
content="""
Is there already a way to specify flags to youtube-dl on a per-file basis. I think it would be OK to do it during either during addurl (modifying the resulting reference that is stored in the annex somehow), or during git-annex get. This is so that the preferred format can be specified. Primarily this would enable to download audio-only formats for some files. ) Apologies if I missed some documentation on how to achieve this)
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="gan"
avatar="http://cdn.libravatar.org/avatar/564f55f9fc3773e521bafdbb6f23efc9"
subject="Clarification"
date="2019-02-22T18:03:16Z"
content="""
So, to clarify - I read your first answer. But if this coulud be done during get perhaps then it's OK because it is an explicit request for the potentially unsafe operation?
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="joey"
subject="""Re: provide flags to youtube-dl"""
date="2019-02-22T20:01:37Z"
content="""
@gan, there's not much point in providing flags that are only used in the
initial download; the main point in adding the url to git-annex is so you
can download the same content from it again later.
"""]]

View file

@ -3,6 +3,8 @@ and the remote allows files to somehow be edited on it, then there ought
to be a way to import the changes back from the remote into the git repository. to be a way to import the changes back from the remote into the git repository.
The command could be `git annex import --from remote` The command could be `git annex import --from remote`
There also ought to be a way to make `git annex sync` automatically import.
See [[design/importing_trees_from_special_remotes]] for current design for See [[design/importing_trees_from_special_remotes]] for current design for
this. this.