7d177b78e4
This includes a note about how include= and exclude= match when exporting a subtree. I don't know if the note is prominent enough, but the behavior seems unsurprising enough.
172 lines
8 KiB
Markdown
172 lines
8 KiB
Markdown
`git annex export` normally exports all files in the specified tree,
|
|
which is generally what the user wants.
|
|
But, in some situations, the user may want to export a subset of files,
|
|
in a way that can be well expressed by a preferred content expression.
|
|
|
|
For example, they may want to export .mp3 files but not the .wav
|
|
files used to produce those.
|
|
|
|
Or, export podcasts, but not ones in a "old" directory that have already
|
|
been listened to.
|
|
|
|
It seems doable to make `git annex export` honor whatever
|
|
preferred content settings have been configured for the remote.
|
|
(And `git annex sync --content` too.)
|
|
|
|
> done
|
|
|
|
Problem: A preferred content expression include=subdir/foo or
|
|
exclude=subdir/bar matches relative to the top of the repository.
|
|
But `git annex export` may be exporting a sub-tree, and it has no way
|
|
of knowing where a provided sub-tree sha is rooted within the larger tree.
|
|
What it could do is when provided "master:subdir" know that it's operating
|
|
within subdir and prefix that to filenames when matching preferred content.
|
|
But that would be inconsistent behavior and could violate least surprise.
|
|
It may be better to add a note that preferred content expressions include=
|
|
exclude= etc match relative to the top of the exported tree when exporting
|
|
a subtree.
|
|
|
|
> done
|
|
|
|
----
|
|
|
|
> `git annex import` of a tree from a special remote would also be
|
|
> influenced by this.
|
|
>
|
|
> It would make sense for the ImportableContents to have files
|
|
> that are not preferred content filtered out of it. Eg, if a .wav file
|
|
> is added to the remote, it shouldn't be downloaded. Or a better example,
|
|
> if directory Music is excluded from an android remote, importing from
|
|
> it should exclude that directory.
|
|
|
|
----
|
|
|
|
> Problem: If a tree is exported with eg, no .wav files, and then an import
|
|
> is made from the remote, and necessarily lacks .wav files, the remote
|
|
> tracking branch will have a tree with no .wav
|
|
> files. Merging that into master will delete all the .wav files.
|
|
>
|
|
> If the remote tracking branch has a disconnected history from master,
|
|
> then git wouldn't delete files on
|
|
> merge. But: This would prevent actual deletions made on the special
|
|
> remote from happening in master too. So not a good idea.
|
|
>
|
|
> So it seems that, when updating the remote tracking branch for an import,
|
|
> the files that were excluded from being exported to it need to be added
|
|
> back in. So that tree of excluded files needs to somehow be kept track of
|
|
> when exporting, or generated from records.
|
|
>
|
|
> To generated the excluded tree, would need the whole tree that was
|
|
> exported, and the remote's preferred content expression at export time.
|
|
> But expressions like inallgroup would also need to look at location
|
|
> tracking info at that time. So it would need to remember the
|
|
> head of the git-annex branch at export time and query against that
|
|
> version of the branch for preferred content and location tracking.
|
|
> (And use of `git-annex forget` could break it.)
|
|
>
|
|
> Logs.Export already records the tree that the user chose to export
|
|
> into the git-annex branch. Should excluded files be present in that
|
|
> tree or not? If not, the diff from that tree to the remote tracking
|
|
> branch will be the excluded files. Another good reason to do that
|
|
> is that if the preferred content settings change, the next export
|
|
> will pick up on the change, since the exported tree differs from the tree
|
|
> to be exported.
|
|
>
|
|
> So, plan: Make export of a tree filter that tree through the preferred
|
|
> content of the remote, and use the new tree as the tree that really
|
|
> gets exported, recording it in the git-annex branch. But the remote
|
|
> tracking branch will point to the commit that the user chose to export.
|
|
> > (done)
|
|
>
|
|
> The import then just needs to diff between the last exported tree and
|
|
> and the remote tracking branch to get the files that were excluded,
|
|
> and add them back into the imported tree.
|
|
|
|
---
|
|
|
|
> Matching a preferred content expression at import time before the content
|
|
> is downloaded means that the imported key may not yet be known. (Only
|
|
> when the ContentIdentifier is known can it can be mapped back to an
|
|
> already known key.) This is a problem for every preferred content term
|
|
> that relates to a key.
|
|
>
|
|
> Maybe the problem expressions can be guessed:
|
|
>
|
|
> * For copies, lackingcopies, and approxlackingcopies, inallgroup,
|
|
> the number of copies could be assumed to be 1 (the remote being
|
|
> imported from). But if it turns out to hash to a known key,
|
|
> they would have matched wrong.
|
|
>
|
|
> * For inbackend and securehash, the backend that will be used for the
|
|
> import is probably known. But if annex.largefiles becomes
|
|
> supported for imports, it would not be any longer.
|
|
>
|
|
> * For metadata, if we assume the imported file is new content,
|
|
> is has no metadata attached. But if it turns out to hash
|
|
> to a known key, this would have matched wrong.
|
|
>
|
|
> * For present, the content is in the remote, so it's definitely present.
|
|
>
|
|
> * For unused, the file is going to be added to the tree, its key
|
|
> will definitely not be unused.
|
|
>
|
|
> So in some cases the guess is wrong and a problem expression
|
|
> matches when it should not. This either results in a file being imported
|
|
> that should not, or a file not being imported that should be.
|
|
> In the former case, when the file reaches the master branch and
|
|
> a later export is done, the file may or may not be preferred content
|
|
> for the special remote then, and when it's not it will get removed from
|
|
> the special remote.
|
|
>
|
|
> So for example: The user sets a preferred content expression of
|
|
> "metadata=notforexport=true" and has some files with that set.
|
|
> Then they import from a remote, and it downloads a new file that happens
|
|
> to have the same content as one of those files. The new file gets
|
|
> added to their master branch, and they export to the remote and the
|
|
> new file is then removed from the remote. Seems fairly ok?
|
|
>
|
|
> Another example: The user sets a preferred content expression of "not
|
|
> inallgroup=backup". The import/export remote is not in that group.
|
|
> They import from it, and find that no new files that are added to the
|
|
> remote ever get imported. That seems to be what they asked for.
|
|
>
|
|
> Another example: The user sets a preferred content expression of "not
|
|
> inallgroup=exports". The import/export remote *is* in that group,
|
|
> and so are several other import/export remotes.
|
|
> They import from it, and find that no new files that are added to the
|
|
> remote ever get imported. Even if the same file got added to all other
|
|
> remote in that group. This seems surprising!
|
|
>
|
|
> Maybe better than guessing would be to limit preferred content
|
|
> expression matching for importing to terms that don't require the key.
|
|
> If an expression is found to require the key, display a warning and
|
|
> don't import.
|
|
>
|
|
> OR download the content
|
|
> from the remote, generate a key from it, and re-match the preferred
|
|
> content expression. That avoids any surprises and supports all
|
|
> expressions at the expense of an unnessary download. As long as the ContentIdentifier to
|
|
> Key mapping gets updated, it will only download a given file unncessarily
|
|
> one time.
|
|
>
|
|
> Which approach is better? Note that almost all of the standard groups
|
|
> do depend on the key. But it seems very likely that most actual
|
|
> uses of this feature would involve the name or size of a file that's
|
|
> being imported, and nothing else.
|
|
>
|
|
> > started work on this in the `preferred` branch. --[[Joey]]
|
|
|
|
## different preferred content for export and import?
|
|
|
|
May be cases where this makes sense. For example, I might make my phone
|
|
prefer all content that has some metadata set, but want to import all files
|
|
from my phone (or all files except those in the music directory).
|
|
|
|
OTOH, that config would cause files imported from the phone to be removed
|
|
from it on the next export, unless the necessary metadata got set; git
|
|
annex sync --content would not work well.
|
|
|
|
Better example: Make the phone want all content that is in the laptop
|
|
group, so all files on my laptop export to the phone but not others that I
|
|
have archived. But want to import all files from the phone, which is not in
|
|
the laptop group, so need a separate expression for import.
|