117 lines
5.7 KiB
Markdown
117 lines
5.7 KiB
Markdown
Hi, I am assuming to use git-annex-assistant for two usecases, but I would like to ask about the options or planed roadmap for dropped/removed files from the repository.
|
|
|
|
Usecases:
|
|
|
|
1. sync working directory between laptop, home computer, work komputer
|
|
2. archive functionality for my photograps
|
|
|
|
Both usecases have one common factor. Some files might become obsolate and
|
|
in long time frame nobody is interested to keep their revisions. Let's
|
|
assume photographs. Usuall workflow I take is to import all photograps to
|
|
filesystem, then assess (select) the good ones I want to keep and then
|
|
process them what ever way.
|
|
|
|
Problem with git-annex(-assistant) I have is that it start to revision all
|
|
of the files at the time they are added to directory. This is welcome at
|
|
first but might be an issue if you are used to put 80% of the size of your
|
|
imported files to trash.
|
|
|
|
I am aware of what git-annex is not. I have been reading documentation for
|
|
"git-annex drop" and "unused" options including forums. I do understand
|
|
that I am actually able to delete all revisions of the file if I will drop
|
|
it, remove it and if I will run git annex unused 1..###. (on all synced
|
|
repositories).
|
|
|
|
I actually miss the option to have above process automated/replicated to the other synced repositories.
|
|
|
|
I would formulate the 'use case' requirements for git-annex as:
|
|
|
|
* command to drop an file including revisions from all annex repositories?
|
|
(for example like moving a file to /trash folder) that will schedulle
|
|
it's deletition)
|
|
* option to keep like max. 10 last revisions of the file?
|
|
* option to keep only previous revisions if younger than 6 months from now?
|
|
|
|
Finally, how to specify a feature request for git-annex?
|
|
|
|
> By moving it here ;-) --[[Joey]]
|
|
|
|
> So, let's spec out a design.
|
|
>
|
|
> * Add preferred content terminal to configure whether a repository wants
|
|
> to hang on to unused content. Simply `unused`.
|
|
> (It cannot include a timestamp, because there's
|
|
> no way repos can agree on about when a key became unused.) **done**
|
|
> * In order to quickly match that terminal, the Annex monad will need
|
|
> to keep a Set of unused Keys. This should only be loaded on demand.
|
|
> **done**
|
|
> NB: There is some potential for a great many unused Keys to cause
|
|
> memory usage to balloon.
|
|
> * Client repositories will end their preferred content with
|
|
> `and (not unused)`. Transfer repositories too, because typically
|
|
> only client repos connect to them, and so otherwise unused files
|
|
> would build up there. Backup repos would want unused files. I
|
|
> think that archive repos would too. **done**
|
|
> * Make the assistant check for unused files periodically. Exactly
|
|
> how often may need to be tuned, but once per day seems reasonable
|
|
> for most repos. Note that the assistant could also notice on the
|
|
> fly when files are removed and mark their keys as unused if that was
|
|
> the last associated file. (Only currently possible in direct mode.)
|
|
> **done**
|
|
> * After scanning for unused files, it makes sense for the
|
|
> assistant to queue transfers of unused files to any remotes that
|
|
> do want them (eg, backup remotes). If the files can successfully be
|
|
> sent to a remote, that will lead to them being dropped locally as
|
|
> they're not wanted.
|
|
> * Add a git config setting like annex.expireunused=7d. This causes
|
|
> *deletion* of unused files after the specified time period if they are
|
|
> not able to be moved to a repo that wants them.
|
|
> (The default should be annex.expireunused=false.)
|
|
> * How to detect how long a file has been unused? We can't look at the
|
|
> time stamp of the object; we could use the mtime of the .map file,
|
|
> that that's direct mode only and may be replaced with a database
|
|
> later. Seems best to just keep a unused log file with timestamps.
|
|
> **done**
|
|
> * After the assistant scans for unused files, if annex.expireunused
|
|
> is not set, and there is some significant quantity of unused files
|
|
> (eg, more than 1000, or more than 1 gb, or more than the amount of
|
|
> remaining free disk space),
|
|
> it can pop up a webapp alert asking to configure it. **done**
|
|
> * Webapp interface to configure annex.expireunused. Reasonable values
|
|
> are no expiring, or any number of days. **done**
|
|
>
|
|
> [[done]] This does not cover every use case that was requested.
|
|
> But I don't see a cheap way to ensure it keeps eg the past 10 versions of
|
|
> a file. I guess that if you care about that, you leave
|
|
> annex.expireunused=false, and set up a backup repository where the unused
|
|
> files will be moved to.
|
|
>
|
|
> Note that since the assistant uses direct mode by default, old versions
|
|
> of modififed files are not guaranteed to be retained. But they very well
|
|
> might be. For example, if a file is replicated to 2 clients, and one
|
|
> client directly edits it, or deletes it, it loses the old version,
|
|
> but the other client will still be storing that old version.
|
|
>
|
|
> ## Stability analysis for unused in preferred content expressions
|
|
>
|
|
> This is tricky, because two repos that are otherwise entirely
|
|
> in sync may have differing opinons about whether a key is unused,
|
|
> depending on when each last scanned for unused keys.
|
|
>
|
|
> So, this preferred content terminal is *not stable*.
|
|
> It may be possible to write preferred content expressions
|
|
> that constantly moved such keys around without reaching a steady state.
|
|
>
|
|
> Example:
|
|
>
|
|
> A and B are clients directly connected, and both also connected
|
|
> to BACKUP.
|
|
>
|
|
> A deletes F. B syncs with A, and runs unused check; decides F
|
|
> is unused. B sends F to BACKUP. B will then think A doesn't want F,
|
|
> and will drop F from A. Next time A runs a full transfer scan, it will
|
|
> *not* find F (because the file was deleted!). So it won't get F back from
|
|
> BACKUP.
|
|
>
|
|
> So, it looks like the fact that unused files are not going to be
|
|
> looked for on the full transfer scan seems to make this work out ok.
|