2018-03-21 07:19:47 +00:00
|
|
|
`git annex export` normally exports all files in the specified tree,
|
|
|
|
which is generally what the user wants.
|
|
|
|
But, in some situations, the user may want to export a subset of files,
|
|
|
|
in a way that can be well expressed by a preferred content expression.
|
2019-05-21 16:54:57 +00:00
|
|
|
|
|
|
|
> started work on this in the `preferred` branch. --[[Joey]]
|
2019-05-21 18:38:00 +00:00
|
|
|
>
|
|
|
|
> > And [[done]]! --[[Joey]]
|
2018-03-21 07:19:47 +00:00
|
|
|
|
|
|
|
For example, they may want to export .mp3 files but not the .wav
|
|
|
|
files used to produce those.
|
|
|
|
|
|
|
|
Or, export podcasts, but not ones in a "old" directory that have already
|
|
|
|
been listened to.
|
|
|
|
|
|
|
|
It seems doable to make `git annex export` honor whatever
|
|
|
|
preferred content settings have been configured for the remote.
|
2019-05-20 16:01:37 +00:00
|
|
|
(And `git annex sync --content` too.)
|
2019-05-20 20:37:04 +00:00
|
|
|
> done
|
2019-05-20 16:01:37 +00:00
|
|
|
|
2019-05-20 20:37:04 +00:00
|
|
|
Logs.Export already records the tree that the user chose to export
|
|
|
|
into the git-annex branch. Should excluded files be present in that
|
|
|
|
tree or not? A good reason to do that is that if the preferred content
|
|
|
|
settings change, the next export will pick up on the change, since
|
|
|
|
the exported tree differs from the tree to be exported.
|
|
|
|
So: Make export of a tree filter that tree through the preferred
|
|
|
|
content of the remote, and use the new tree as the tree that really
|
|
|
|
gets exported, recording it in the git-annex branch. But the remote
|
|
|
|
tracking branch will point to the tree that the user chose to export.
|
2019-05-20 16:01:37 +00:00
|
|
|
> done
|
2019-04-10 16:01:52 +00:00
|
|
|
|
2019-05-14 14:52:00 +00:00
|
|
|
Problem: A preferred content expression include=subdir/foo or
|
|
|
|
exclude=subdir/bar matches relative to the top of the repository.
|
|
|
|
But `git annex export` may be exporting a sub-tree, and it has no way
|
|
|
|
of knowing where a provided sub-tree sha is rooted within the larger tree.
|
|
|
|
What it could do is when provided "master:subdir" know that it's operating
|
|
|
|
within subdir and prefix that to filenames when matching preferred content.
|
|
|
|
But that would be inconsistent behavior and could violate least surprise.
|
|
|
|
It may be better to add a note that preferred content expressions include=
|
|
|
|
exclude= etc match relative to the top of the exported tree when exporting
|
|
|
|
a subtree.
|
2019-05-20 16:01:37 +00:00
|
|
|
> done
|
|
|
|
|
2019-05-21 18:38:00 +00:00
|
|
|
Note: Each `git-annex sync --content` re-filters the exported tree.
|
2019-05-20 20:37:04 +00:00
|
|
|
Unnecessary work. If there were a way to look up the original tree that
|
|
|
|
corresponds with the filtered exported tree, that could be avoided.
|
|
|
|
|
2019-05-14 14:52:00 +00:00
|
|
|
----
|
|
|
|
|
2019-04-10 16:01:52 +00:00
|
|
|
> `git annex import` of a tree from a special remote would also be
|
|
|
|
> influenced by this.
|
|
|
|
>
|
|
|
|
> It would make sense for the ImportableContents to have files
|
|
|
|
> that are not preferred content filtered out of it. Eg, if a .wav file
|
|
|
|
> is added to the remote, it shouldn't be downloaded. Or a better example,
|
|
|
|
> if directory Music is excluded from an android remote, importing from
|
|
|
|
> it should exclude that directory.
|
2019-05-21 18:38:00 +00:00
|
|
|
> > done
|
2019-04-10 16:01:52 +00:00
|
|
|
|
2019-05-21 16:54:57 +00:00
|
|
|
## import after limited export
|
2019-05-14 15:49:23 +00:00
|
|
|
|
2019-04-10 16:01:52 +00:00
|
|
|
> Problem: If a tree is exported with eg, no .wav files, and then an import
|
|
|
|
> is made from the remote, and necessarily lacks .wav files, the remote
|
2019-05-20 20:37:04 +00:00
|
|
|
> tracking branch will be updated with a tree with no .wav
|
2019-04-10 16:01:52 +00:00
|
|
|
> files. Merging that into master will delete all the .wav files.
|
|
|
|
>
|
2019-05-14 14:52:00 +00:00
|
|
|
> So it seems that, when updating the remote tracking branch for an import,
|
|
|
|
> the files that were excluded from being exported to it need to be added
|
|
|
|
> back in. So that tree of excluded files needs to somehow be kept track of
|
2019-05-20 20:37:04 +00:00
|
|
|
> when exporting.
|
|
|
|
>
|
|
|
|
> Complication: The export might happen from one clone and then another
|
|
|
|
> clone imports. The clones might not sync in between. Seems all that
|
|
|
|
> the importing clone can rely on is its local state.
|
2019-05-14 14:52:00 +00:00
|
|
|
>
|
2019-05-20 20:37:04 +00:00
|
|
|
> If importing with no remote tracking branch existing yet, the import will
|
|
|
|
> create one with a disconnected history, and so it's ok to import a tree
|
|
|
|
> missing excluded files; merging a disconnected history won't delete
|
|
|
|
> those files from master.
|
2019-05-14 14:52:00 +00:00
|
|
|
>
|
2019-05-20 20:37:04 +00:00
|
|
|
> In the multiple clone case, the importing clone can't rely on information
|
|
|
|
> from the exporting clone, but if the importing clone only ever imports
|
|
|
|
> it's fine; if it exports it needs to take that into account for
|
|
|
|
> subsequent imports.
|
2019-05-20 16:01:37 +00:00
|
|
|
>
|
2019-05-20 20:37:04 +00:00
|
|
|
> So, the only case where the excluded files
|
|
|
|
> need to be added back is when there was a previous export done from
|
|
|
|
> the current repo. The list of excluded files in the export can
|
|
|
|
> be recorded locally and added back to the import.
|
2019-05-20 16:01:37 +00:00
|
|
|
>
|
2019-05-20 20:37:04 +00:00
|
|
|
> > done
|
2019-05-14 15:49:23 +00:00
|
|
|
|
2019-05-21 16:54:57 +00:00
|
|
|
## matching preferred content expressions on import
|
2019-05-14 15:49:23 +00:00
|
|
|
|
|
|
|
> Matching a preferred content expression at import time before the content
|
|
|
|
> is downloaded means that the imported key may not yet be known. (Only
|
|
|
|
> when the ContentIdentifier is known can it can be mapped back to an
|
|
|
|
> already known key.) This is a problem for every preferred content term
|
|
|
|
> that relates to a key.
|
|
|
|
>
|
|
|
|
> Maybe the problem expressions can be guessed:
|
|
|
|
>
|
|
|
|
> * For copies, lackingcopies, and approxlackingcopies, inallgroup,
|
|
|
|
> the number of copies could be assumed to be 1 (the remote being
|
|
|
|
> imported from). But if it turns out to hash to a known key,
|
|
|
|
> they would have matched wrong.
|
|
|
|
>
|
|
|
|
> * For inbackend and securehash, the backend that will be used for the
|
|
|
|
> import is probably known. But if annex.largefiles becomes
|
|
|
|
> supported for imports, it would not be any longer.
|
|
|
|
>
|
|
|
|
> * For metadata, if we assume the imported file is new content,
|
|
|
|
> is has no metadata attached. But if it turns out to hash
|
|
|
|
> to a known key, this would have matched wrong.
|
|
|
|
>
|
|
|
|
> * For present, the content is in the remote, so it's definitely present.
|
|
|
|
>
|
|
|
|
> * For unused, the file is going to be added to the tree, its key
|
|
|
|
> will definitely not be unused.
|
|
|
|
>
|
|
|
|
> So in some cases the guess is wrong and a problem expression
|
|
|
|
> matches when it should not. This either results in a file being imported
|
|
|
|
> that should not, or a file not being imported that should be.
|
|
|
|
> In the former case, when the file reaches the master branch and
|
|
|
|
> a later export is done, the file may or may not be preferred content
|
|
|
|
> for the special remote then, and when it's not it will get removed from
|
|
|
|
> the special remote.
|
|
|
|
>
|
|
|
|
> So for example: The user sets a preferred content expression of
|
|
|
|
> "metadata=notforexport=true" and has some files with that set.
|
|
|
|
> Then they import from a remote, and it downloads a new file that happens
|
|
|
|
> to have the same content as one of those files. The new file gets
|
|
|
|
> added to their master branch, and they export to the remote and the
|
|
|
|
> new file is then removed from the remote. Seems fairly ok?
|
|
|
|
>
|
|
|
|
> Another example: The user sets a preferred content expression of "not
|
|
|
|
> inallgroup=backup". The import/export remote is not in that group.
|
|
|
|
> They import from it, and find that no new files that are added to the
|
|
|
|
> remote ever get imported. That seems to be what they asked for.
|
|
|
|
>
|
|
|
|
> Another example: The user sets a preferred content expression of "not
|
|
|
|
> inallgroup=exports". The import/export remote *is* in that group,
|
|
|
|
> and so are several other import/export remotes.
|
|
|
|
> They import from it, and find that no new files that are added to the
|
|
|
|
> remote ever get imported. Even if the same file got added to all other
|
|
|
|
> remote in that group. This seems surprising!
|
|
|
|
>
|
|
|
|
> Maybe better than guessing would be to limit preferred content
|
2019-05-14 19:25:09 +00:00
|
|
|
> expression matching for importing to terms that don't require the key.
|
|
|
|
> If an expression is found to require the key, display a warning and
|
|
|
|
> don't import.
|
|
|
|
>
|
|
|
|
> OR download the content
|
|
|
|
> from the remote, generate a key from it, and re-match the preferred
|
|
|
|
> content expression. That avoids any surprises and supports all
|
2019-05-21 16:54:57 +00:00
|
|
|
> expressions at the expense of an unnessary download. As long as the
|
|
|
|
> ContentIdentifier to Key mapping gets updated, it will only download
|
|
|
|
> a given file unncessarily one time.
|
2019-05-14 19:25:09 +00:00
|
|
|
>
|
|
|
|
> Which approach is better? Note that almost all of the standard groups
|
|
|
|
> do depend on the key. But it seems very likely that most actual
|
|
|
|
> uses of this feature would involve the name or size of a file that's
|
|
|
|
> being imported, and nothing else.
|
|
|
|
>
|
2019-05-21 16:54:57 +00:00
|
|
|
> Imagine an import of all non-mp3s from the phone, and the phone has
|
|
|
|
> a 20 gb mp3 collection. Downloading them all just to check the preferred
|
|
|
|
> content expression would be an enormous amount of unnecessary work.
|
|
|
|
> If the user started with `exclude=*.mp3`, they'd expect it to be fast,
|
|
|
|
> and if they changed to `exclude=*.mp3 or metadata=tag=podcast`
|
|
|
|
> and it did all that extra work, that would be surprising.
|
2019-05-14 19:25:09 +00:00
|
|
|
|
2019-05-21 18:38:00 +00:00
|
|
|
> > done; it seemed to make sense at least at first to make import
|
|
|
|
> > fail when the preferred content dependened on a key.
|
|
|
|
|
2019-05-17 00:41:17 +00:00
|
|
|
## different preferred content for export and import?
|
|
|
|
|
|
|
|
May be cases where this makes sense. For example, I might make my phone
|
|
|
|
prefer all content that has some metadata set, but want to import all files
|
|
|
|
from my phone (or all files except those in the music directory).
|
|
|
|
|
|
|
|
OTOH, that config would cause files imported from the phone to be removed
|
|
|
|
from it on the next export, unless the necessary metadata got set; git
|
|
|
|
annex sync --content would not work well.
|
|
|
|
|
|
|
|
Better example: Make the phone want all content that is in the laptop
|
|
|
|
group, so all files on my laptop export to the phone but not others that I
|
2019-05-21 16:54:57 +00:00
|
|
|
have archived. But want to import all files from the phone except for mp3s.
|
|
|
|
|
|
|
|
One way would be new preferred content terminals that match when importing
|
|
|
|
and exporting:
|
|
|
|
|
|
|
|
(exporting and copies=laptop:1) or (importing and exclude=*.mp3)
|
|
|
|
|
|
|
|
This needs preferred content expressions for import to be able to
|
|
|
|
support things like copies= that need to know the key, as discussed above.
|
|
|
|
|
|
|
|
If the import preferred content expression is limited to not include those
|
|
|
|
terms, the above example can't be used at import time. Unless it can be
|
|
|
|
simplified before being checked for those terms:
|
|
|
|
|
|
|
|
(false and copies=laptop:1) or (true and exclude=*.mp3)
|
|
|
|
|
|
|
|
false or (true and exclude=*.mp3)
|
|
|
|
|
|
|
|
Or there could be separate expressions for import and export.
|
|
|
|
And this kind of makes sense; the normal preferred content expression
|
|
|
|
controls what is stored on a remote, so affects export, and the import
|
|
|
|
preferred content expression is only used to fine-tune which files get
|
|
|
|
imported from it.
|
|
|
|
|
|
|
|
Problem: Suppose a tree is exported to a special remote, and the tree includes some
|
|
|
|
mp3 files. And then an import is done, excluding mp3 files. That will
|
|
|
|
create a remote tracking branch that lacks the mp3 files; merging it will
|
|
|
|
delete them from master. That could be surprising! This is the inverse of
|
|
|
|
the "import after limited export" problem, but it seems it
|
|
|
|
can't be solved in a similar way.
|
|
|
|
|
|
|
|
That problem might be a fatal blow to this idea of separate expressions
|
|
|
|
for import and export. But it's worse than that! --
|
|
|
|
|
|
|
|
Problem: Suppose a tree is exported to a special remote and the tree
|
|
|
|
includes mp3 files. Then the remote's preferred content is set to exclude
|
|
|
|
mp3 files. Then on import from the remote, a tree will be constructed that
|
|
|
|
lacks the mp3 files that were exported before; merging it will delete them
|
|
|
|
from master.
|
|
|
|
|
|
|
|
That could be avoided by making the import notice that the preferred
|
|
|
|
content expression has changed, and so throw away the old remote tracking
|
|
|
|
branch history and import a commit with no parent, avoiding the deletion.
|
|
|
|
But, if some other file got deleted from the special remote after the
|
|
|
|
export, the import would then not delete it.
|
|
|
|
|
|
|
|
Alternatively, when a preferred content expression doesn't match a file at
|
2019-05-21 18:38:00 +00:00
|
|
|
import, could check if the same file is known to be present on the remote
|
|
|
|
as of the last import or export. (With same or different content.) If so,
|
|
|
|
assume the preferred content has changed and that the user does not want to
|
|
|
|
delete this file, so keep it in the import anyway. This way the import does
|
|
|
|
not delete files from master, and when the next export removes it from
|
|
|
|
the remote it will still not get deleted from master.
|
|
|
|
> done
|