2018-03-21 07:19:47 +00:00
|
|
|
`git annex export` normally exports all files in the specified tree,
|
|
|
|
which is generally what the user wants.
|
|
|
|
But, in some situations, the user may want to export a subset of files,
|
|
|
|
in a way that can be well expressed by a preferred content expression.
|
|
|
|
|
|
|
|
For example, they may want to export .mp3 files but not the .wav
|
|
|
|
files used to produce those.
|
|
|
|
|
|
|
|
Or, export podcasts, but not ones in a "old" directory that have already
|
|
|
|
been listened to.
|
|
|
|
|
|
|
|
It seems doable to make `git annex export` honor whatever
|
|
|
|
preferred content settings have been configured for the remote.
|
|
|
|
(And `git annex sync --content` too.)
|
2019-04-10 16:01:52 +00:00
|
|
|
|
2019-05-14 14:52:00 +00:00
|
|
|
Problem: A preferred content expression include=subdir/foo or
|
|
|
|
exclude=subdir/bar matches relative to the top of the repository.
|
|
|
|
But `git annex export` may be exporting a sub-tree, and it has no way
|
|
|
|
of knowing where a provided sub-tree sha is rooted within the larger tree.
|
|
|
|
What it could do is when provided "master:subdir" know that it's operating
|
|
|
|
within subdir and prefix that to filenames when matching preferred content.
|
|
|
|
But that would be inconsistent behavior and could violate least surprise.
|
|
|
|
It may be better to add a note that preferred content expressions include=
|
|
|
|
exclude= etc match relative to the top of the exported tree when exporting
|
|
|
|
a subtree.
|
|
|
|
|
|
|
|
----
|
|
|
|
|
2019-04-10 16:01:52 +00:00
|
|
|
> `git annex import` of a tree from a special remote would also be
|
|
|
|
> influenced by this.
|
|
|
|
>
|
|
|
|
> It would make sense for the ImportableContents to have files
|
|
|
|
> that are not preferred content filtered out of it. Eg, if a .wav file
|
|
|
|
> is added to the remote, it shouldn't be downloaded. Or a better example,
|
|
|
|
> if directory Music is excluded from an android remote, importing from
|
|
|
|
> it should exclude that directory.
|
|
|
|
|
2019-05-14 15:49:23 +00:00
|
|
|
----
|
|
|
|
|
2019-04-10 16:01:52 +00:00
|
|
|
> Problem: If a tree is exported with eg, no .wav files, and then an import
|
|
|
|
> is made from the remote, and necessarily lacks .wav files, the remote
|
|
|
|
> tracking branch will have a tree with no .wav
|
|
|
|
> files. Merging that into master will delete all the .wav files.
|
|
|
|
>
|
|
|
|
> If the remote tracking branch has a disconnected history from master,
|
|
|
|
> then git wouldn't delete files on
|
|
|
|
> merge. But: This would prevent actual deletions made on the special
|
|
|
|
> remote from happening in master too. So not a good idea.
|
|
|
|
>
|
2019-05-14 14:52:00 +00:00
|
|
|
> So it seems that, when updating the remote tracking branch for an import,
|
|
|
|
> the files that were excluded from being exported to it need to be added
|
|
|
|
> back in. So that tree of excluded files needs to somehow be kept track of
|
|
|
|
> when exporting, or generated from records.
|
|
|
|
>
|
|
|
|
> To generated the excluded tree, would need the whole tree that was
|
|
|
|
> exported, and the remote's preferred content expression at export time.
|
|
|
|
> But expressions like inallgroup would also need to look at location
|
|
|
|
> tracking info at that time. So it would need to remember the
|
|
|
|
> head of the git-annex branch at export time and query against that
|
|
|
|
> version of the branch for preferred content and location tracking.
|
|
|
|
> (And use of `git-annex forget` could break it.)
|
|
|
|
>
|
|
|
|
> It seems easier to instead record the tree of excluded files somewhere,
|
|
|
|
> Logs.Export already records the whole exported tree in the git-annex
|
|
|
|
> branch, so extend it to also record the tree of excluded files.
|
|
|
|
> Complication: Export conflicts.
|
2019-05-14 15:49:23 +00:00
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
> Matching a preferred content expression at import time before the content
|
|
|
|
> is downloaded means that the imported key may not yet be known. (Only
|
|
|
|
> when the ContentIdentifier is known can it can be mapped back to an
|
|
|
|
> already known key.) This is a problem for every preferred content term
|
|
|
|
> that relates to a key.
|
|
|
|
>
|
|
|
|
> Maybe the problem expressions can be guessed:
|
|
|
|
>
|
|
|
|
> * For copies, lackingcopies, and approxlackingcopies, inallgroup,
|
|
|
|
> the number of copies could be assumed to be 1 (the remote being
|
|
|
|
> imported from). But if it turns out to hash to a known key,
|
|
|
|
> they would have matched wrong.
|
|
|
|
>
|
|
|
|
> * For inbackend and securehash, the backend that will be used for the
|
|
|
|
> import is probably known. But if annex.largefiles becomes
|
|
|
|
> supported for imports, it would not be any longer.
|
|
|
|
>
|
|
|
|
> * For smallerthan, largerthan, the file size of an import is known.
|
|
|
|
>
|
|
|
|
> * For metadata, if we assume the imported file is new content,
|
|
|
|
> is has no metadata attached. But if it turns out to hash
|
|
|
|
> to a known key, this would have matched wrong.
|
|
|
|
>
|
|
|
|
> * For present, the content is in the remote, so it's definitely present.
|
|
|
|
>
|
|
|
|
> * For unused, the file is going to be added to the tree, its key
|
|
|
|
> will definitely not be unused.
|
|
|
|
>
|
|
|
|
> So in some cases the guess is wrong and a problem expression
|
|
|
|
> matches when it should not. This either results in a file being imported
|
|
|
|
> that should not, or a file not being imported that should be.
|
|
|
|
> In the former case, when the file reaches the master branch and
|
|
|
|
> a later export is done, the file may or may not be preferred content
|
|
|
|
> for the special remote then, and when it's not it will get removed from
|
|
|
|
> the special remote.
|
|
|
|
>
|
|
|
|
> So for example: The user sets a preferred content expression of
|
|
|
|
> "metadata=notforexport=true" and has some files with that set.
|
|
|
|
> Then they import from a remote, and it downloads a new file that happens
|
|
|
|
> to have the same content as one of those files. The new file gets
|
|
|
|
> added to their master branch, and they export to the remote and the
|
|
|
|
> new file is then removed from the remote. Seems fairly ok?
|
|
|
|
>
|
|
|
|
> Another example: The user sets a preferred content expression of "not
|
|
|
|
> inallgroup=backup". The import/export remote is not in that group.
|
|
|
|
> They import from it, and find that no new files that are added to the
|
|
|
|
> remote ever get imported. That seems to be what they asked for.
|
|
|
|
>
|
|
|
|
> Another example: The user sets a preferred content expression of "not
|
|
|
|
> inallgroup=exports". The import/export remote *is* in that group,
|
|
|
|
> and so are several other import/export remotes.
|
|
|
|
> They import from it, and find that no new files that are added to the
|
|
|
|
> remote ever get imported. Even if the same file got added to all other
|
|
|
|
> remote in that group. This seems surprising!
|
|
|
|
>
|
|
|
|
> Maybe better than guessing would be to limit preferred content
|
|
|
|
> expression matching for importing to terms that don't require guessing.
|
|
|
|
> If an expression is found to require guessing, display a warning and
|
|
|
|
make the whole expression match. OR download the content
|
|
|
|
> from the remote, generate a key from it, and match the preferred
|
|
|
|
> content expression at that point. That avoids any surprises at
|
|
|
|
> the expense of an unnessary download. As long as the ContentIdentifier to
|
|
|
|
> Key mapping gets updated, it will only download a given file unncessarily
|
|
|
|
one time.
|