diff --git a/doc/devblog/day_587__import_preferred_content.mdwn b/doc/devblog/day_587__import_preferred_content.mdwn new file mode 100644 index 0000000000..a1f779b056 --- /dev/null +++ b/doc/devblog/day_587__import_preferred_content.mdwn @@ -0,0 +1,20 @@ +I've developed a plan for how to handle [[todo/export_preferred_content]]. +And today I'm working on making `git annex import --from remote` honor +the preferred content of the remote. It doesn't make sense to support it +for one and not the other, so this is on the `preferred` git branch for now. + +One use case for this is to configure an import to exclude certain file +extensions or directories. Such unwanted content will be left as-is +in the remote's data store, but won't be imported, so from git-annex's +POV, it won't be present on the remote. + +The tricky thing is, when importing, the key is not known until the file +is downloaded, but you don't want git-annex downloading content that is not +preferred. I'm finessing that problem by checking the subset of preferred +content expressions that are not dependent on the file's content, which will +avoid downloads of unwanted content in probably most cases. + +What should it do when the preferred content expression is dependent on +the file's content? I'm undecided if it's better to warn and not import, +or to download the content once in order to check the preferred content +expression, and then throw unwanted content away. diff --git a/doc/todo/export_preferred_content.mdwn b/doc/todo/export_preferred_content.mdwn index ae37555216..b576dcacd1 100644 --- a/doc/todo/export_preferred_content.mdwn +++ b/doc/todo/export_preferred_content.mdwn @@ -123,11 +123,21 @@ a subtree. > remote in that group. This seems surprising! > > Maybe better than guessing would be to limit preferred content -> expression matching for importing to terms that don't require guessing. -> If an expression is found to require guessing, display a warning and - make the whole expression match. OR download the content -> from the remote, generate a key from it, and match the preferred -> content expression at that point. That avoids any surprises at -> the expense of an unnessary download. As long as the ContentIdentifier to +> expression matching for importing to terms that don't require the key. +> If an expression is found to require the key, display a warning and +> don't import. +> +> OR download the content +> from the remote, generate a key from it, and re-match the preferred +> content expression. That avoids any surprises and supports all +> expressions at the expense of an unnessary download. As long as the ContentIdentifier to > Key mapping gets updated, it will only download a given file unncessarily - one time. +> one time. +> +> Which approach is better? Note that almost all of the standard groups +> do depend on the key. But it seems very likely that most actual +> uses of this feature would involve the name or size of a file that's +> being imported, and nothing else. +> +> > started work on this in the `preferred` branch. --[[Joey]] +