2018-03-21 07:19:47 +00:00
|
|
|
`git annex export` normally exports all files in the specified tree,
|
|
|
|
which is generally what the user wants.
|
|
|
|
But, in some situations, the user may want to export a subset of files,
|
|
|
|
in a way that can be well expressed by a preferred content expression.
|
|
|
|
|
|
|
|
For example, they may want to export .mp3 files but not the .wav
|
|
|
|
files used to produce those.
|
|
|
|
|
|
|
|
Or, export podcasts, but not ones in a "old" directory that have already
|
|
|
|
been listened to.
|
|
|
|
|
|
|
|
It seems doable to make `git annex export` honor whatever
|
|
|
|
preferred content settings have been configured for the remote.
|
2019-05-20 16:01:37 +00:00
|
|
|
(And `git annex sync --content` too.)
|
2019-05-20 20:37:04 +00:00
|
|
|
> done
|
2019-05-20 16:01:37 +00:00
|
|
|
|
2019-05-20 20:37:04 +00:00
|
|
|
Logs.Export already records the tree that the user chose to export
|
|
|
|
into the git-annex branch. Should excluded files be present in that
|
|
|
|
tree or not? A good reason to do that is that if the preferred content
|
|
|
|
settings change, the next export will pick up on the change, since
|
|
|
|
the exported tree differs from the tree to be exported.
|
|
|
|
So: Make export of a tree filter that tree through the preferred
|
|
|
|
content of the remote, and use the new tree as the tree that really
|
|
|
|
gets exported, recording it in the git-annex branch. But the remote
|
|
|
|
tracking branch will point to the tree that the user chose to export.
|
2019-05-20 16:01:37 +00:00
|
|
|
> done
|
2019-04-10 16:01:52 +00:00
|
|
|
|
2019-05-14 14:52:00 +00:00
|
|
|
Problem: A preferred content expression include=subdir/foo or
|
|
|
|
exclude=subdir/bar matches relative to the top of the repository.
|
|
|
|
But `git annex export` may be exporting a sub-tree, and it has no way
|
|
|
|
of knowing where a provided sub-tree sha is rooted within the larger tree.
|
|
|
|
What it could do is when provided "master:subdir" know that it's operating
|
|
|
|
within subdir and prefix that to filenames when matching preferred content.
|
|
|
|
But that would be inconsistent behavior and could violate least surprise.
|
|
|
|
It may be better to add a note that preferred content expressions include=
|
|
|
|
exclude= etc match relative to the top of the exported tree when exporting
|
|
|
|
a subtree.
|
2019-05-20 16:01:37 +00:00
|
|
|
> done
|
|
|
|
|
2019-05-20 20:37:04 +00:00
|
|
|
Problem: Each `git-annex sync --content` re-filters the exported tree.
|
|
|
|
Unnecessary work. If there were a way to look up the original tree that
|
|
|
|
corresponds with the filtered exported tree, that could be avoided.
|
|
|
|
TODO
|
|
|
|
|
2019-05-14 14:52:00 +00:00
|
|
|
----
|
|
|
|
|
2019-04-10 16:01:52 +00:00
|
|
|
> `git annex import` of a tree from a special remote would also be
|
|
|
|
> influenced by this.
|
|
|
|
>
|
|
|
|
> It would make sense for the ImportableContents to have files
|
|
|
|
> that are not preferred content filtered out of it. Eg, if a .wav file
|
|
|
|
> is added to the remote, it shouldn't be downloaded. Or a better example,
|
|
|
|
> if directory Music is excluded from an android remote, importing from
|
|
|
|
> it should exclude that directory.
|
|
|
|
|
2019-05-14 15:49:23 +00:00
|
|
|
----
|
|
|
|
|
2019-04-10 16:01:52 +00:00
|
|
|
> Problem: If a tree is exported with eg, no .wav files, and then an import
|
|
|
|
> is made from the remote, and necessarily lacks .wav files, the remote
|
2019-05-20 20:37:04 +00:00
|
|
|
> tracking branch will be updated with a tree with no .wav
|
2019-04-10 16:01:52 +00:00
|
|
|
> files. Merging that into master will delete all the .wav files.
|
|
|
|
>
|
2019-05-14 14:52:00 +00:00
|
|
|
> So it seems that, when updating the remote tracking branch for an import,
|
|
|
|
> the files that were excluded from being exported to it need to be added
|
|
|
|
> back in. So that tree of excluded files needs to somehow be kept track of
|
2019-05-20 20:37:04 +00:00
|
|
|
> when exporting.
|
|
|
|
>
|
|
|
|
> Complication: The export might happen from one clone and then another
|
|
|
|
> clone imports. The clones might not sync in between. Seems all that
|
|
|
|
> the importing clone can rely on is its local state.
|
2019-05-14 14:52:00 +00:00
|
|
|
>
|
2019-05-20 20:37:04 +00:00
|
|
|
> If importing with no remote tracking branch existing yet, the import will
|
|
|
|
> create one with a disconnected history, and so it's ok to import a tree
|
|
|
|
> missing excluded files; merging a disconnected history won't delete
|
|
|
|
> those files from master.
|
2019-05-14 14:52:00 +00:00
|
|
|
>
|
2019-05-20 20:37:04 +00:00
|
|
|
> In the multiple clone case, the importing clone can't rely on information
|
|
|
|
> from the exporting clone, but if the importing clone only ever imports
|
|
|
|
> it's fine; if it exports it needs to take that into account for
|
|
|
|
> subsequent imports.
|
2019-05-20 16:01:37 +00:00
|
|
|
>
|
2019-05-20 20:37:04 +00:00
|
|
|
> So, the only case where the excluded files
|
|
|
|
> need to be added back is when there was a previous export done from
|
|
|
|
> the current repo. The list of excluded files in the export can
|
|
|
|
> be recorded locally and added back to the import.
|
2019-05-20 16:01:37 +00:00
|
|
|
>
|
2019-05-20 20:37:04 +00:00
|
|
|
> > done
|
2019-05-14 15:49:23 +00:00
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
> Matching a preferred content expression at import time before the content
|
|
|
|
> is downloaded means that the imported key may not yet be known. (Only
|
|
|
|
> when the ContentIdentifier is known can it can be mapped back to an
|
|
|
|
> already known key.) This is a problem for every preferred content term
|
|
|
|
> that relates to a key.
|
|
|
|
>
|
|
|
|
> Maybe the problem expressions can be guessed:
|
|
|
|
>
|
|
|
|
> * For copies, lackingcopies, and approxlackingcopies, inallgroup,
|
|
|
|
> the number of copies could be assumed to be 1 (the remote being
|
|
|
|
> imported from). But if it turns out to hash to a known key,
|
|
|
|
> they would have matched wrong.
|
|
|
|
>
|
|
|
|
> * For inbackend and securehash, the backend that will be used for the
|
|
|
|
> import is probably known. But if annex.largefiles becomes
|
|
|
|
> supported for imports, it would not be any longer.
|
|
|
|
>
|
|
|
|
> * For metadata, if we assume the imported file is new content,
|
|
|
|
> is has no metadata attached. But if it turns out to hash
|
|
|
|
> to a known key, this would have matched wrong.
|
|
|
|
>
|
|
|
|
> * For present, the content is in the remote, so it's definitely present.
|
|
|
|
>
|
|
|
|
> * For unused, the file is going to be added to the tree, its key
|
|
|
|
> will definitely not be unused.
|
|
|
|
>
|
|
|
|
> So in some cases the guess is wrong and a problem expression
|
|
|
|
> matches when it should not. This either results in a file being imported
|
|
|
|
> that should not, or a file not being imported that should be.
|
|
|
|
> In the former case, when the file reaches the master branch and
|
|
|
|
> a later export is done, the file may or may not be preferred content
|
|
|
|
> for the special remote then, and when it's not it will get removed from
|
|
|
|
> the special remote.
|
|
|
|
>
|
|
|
|
> So for example: The user sets a preferred content expression of
|
|
|
|
> "metadata=notforexport=true" and has some files with that set.
|
|
|
|
> Then they import from a remote, and it downloads a new file that happens
|
|
|
|
> to have the same content as one of those files. The new file gets
|
|
|
|
> added to their master branch, and they export to the remote and the
|
|
|
|
> new file is then removed from the remote. Seems fairly ok?
|
|
|
|
>
|
|
|
|
> Another example: The user sets a preferred content expression of "not
|
|
|
|
> inallgroup=backup". The import/export remote is not in that group.
|
|
|
|
> They import from it, and find that no new files that are added to the
|
|
|
|
> remote ever get imported. That seems to be what they asked for.
|
|
|
|
>
|
|
|
|
> Another example: The user sets a preferred content expression of "not
|
|
|
|
> inallgroup=exports". The import/export remote *is* in that group,
|
|
|
|
> and so are several other import/export remotes.
|
|
|
|
> They import from it, and find that no new files that are added to the
|
|
|
|
> remote ever get imported. Even if the same file got added to all other
|
|
|
|
> remote in that group. This seems surprising!
|
|
|
|
>
|
|
|
|
> Maybe better than guessing would be to limit preferred content
|
2019-05-14 19:25:09 +00:00
|
|
|
> expression matching for importing to terms that don't require the key.
|
|
|
|
> If an expression is found to require the key, display a warning and
|
|
|
|
> don't import.
|
|
|
|
>
|
|
|
|
> OR download the content
|
|
|
|
> from the remote, generate a key from it, and re-match the preferred
|
|
|
|
> content expression. That avoids any surprises and supports all
|
|
|
|
> expressions at the expense of an unnessary download. As long as the ContentIdentifier to
|
2019-05-14 15:49:23 +00:00
|
|
|
> Key mapping gets updated, it will only download a given file unncessarily
|
2019-05-14 19:25:09 +00:00
|
|
|
> one time.
|
|
|
|
>
|
|
|
|
> Which approach is better? Note that almost all of the standard groups
|
|
|
|
> do depend on the key. But it seems very likely that most actual
|
|
|
|
> uses of this feature would involve the name or size of a file that's
|
|
|
|
> being imported, and nothing else.
|
|
|
|
>
|
|
|
|
> > started work on this in the `preferred` branch. --[[Joey]]
|
|
|
|
|
2019-05-17 00:41:17 +00:00
|
|
|
## different preferred content for export and import?
|
|
|
|
|
|
|
|
May be cases where this makes sense. For example, I might make my phone
|
|
|
|
prefer all content that has some metadata set, but want to import all files
|
|
|
|
from my phone (or all files except those in the music directory).
|
|
|
|
|
|
|
|
OTOH, that config would cause files imported from the phone to be removed
|
|
|
|
from it on the next export, unless the necessary metadata got set; git
|
|
|
|
annex sync --content would not work well.
|
|
|
|
|
|
|
|
Better example: Make the phone want all content that is in the laptop
|
|
|
|
group, so all files on my laptop export to the phone but not others that I
|
|
|
|
have archived. But want to import all files from the phone, which is not in
|
|
|
|
the laptop group, so need a separate expression for import.
|