Merge remote branch 'branchable/master'
This commit is contained in:
commit
7b586f0833
9 changed files with 70 additions and 20 deletions
5
doc/forum/chrysn.mdwn
Normal file
5
doc/forum/chrysn.mdwn
Normal file
|
@ -0,0 +1,5 @@
|
|||
* **name**: chrysn
|
||||
* **website**: <http://christian.amsuess.com/>
|
||||
* **uses git-annex for**: managing the family's photos (and possibly videos and music in the future)
|
||||
* **likes git-annex because**: it adds a layer of commit semantics over a regular file system without keeping everything in duplicate locally
|
||||
* **would like git-annex to**: not be required any more as git itself learns to use cow filesystems to avoid abundant disk usage and gets better with sparser checkouts (git-annex might then still be a simpler tool that watches over what can be safely dropped for a sparser checkout)
|
45
doc/forum/relying_on_git_for_numcopies.mdwn
Normal file
45
doc/forum/relying_on_git_for_numcopies.mdwn
Normal file
|
@ -0,0 +1,45 @@
|
|||
This is a rough sketch of a modification of git-annex to rely more on git commit semantics. It might be flawed due to my lack of understanding of git-annex internals. --[[chrysn]]
|
||||
|
||||
Summary
|
||||
=========
|
||||
|
||||
Currently, the location tracking is only used for informational purposes unless a repository is [[trust]]ed, in which case there is no checking at all. It is proposed to use the location tracking information as a commitment to keep track of a file without any promise that it might not be dropped if another repository takes over responsibility.
|
||||
|
||||
git's semantics for atomic commits are proposed to be used, which makes sure that before files are actually deleted, another repository has accepted the deletion.
|
||||
|
||||
Modified git-annex-drop behavior
|
||||
==========================
|
||||
|
||||
The most important (if not only) git-annex command that is affected by this is `git annex drop`. Currently, for dropping a large number of files, every file is checked with another (or multiple, if so configured) host if it's safe to delete.
|
||||
|
||||
The new behavior would be to
|
||||
|
||||
* decrement the location tracking counter for all files to be dropped,
|
||||
* commit that change,
|
||||
* try to push it to at least as many repositories that the numcopies constraints are met,
|
||||
* revert if that fails,
|
||||
* otherwise really drop the files from the backend.
|
||||
|
||||
Unlike explicit checking, this never looks at the remote backend if the file is really present -- otoh, git-annex already relies on the files in the backend to not be touched by anyone but git-annex, and git-annex would only drop them if they were derefed and committed, in which case git would not accept the push. (git by itself would accept a merged push, but even if the reverting step failed due to a power outage or similar, git-annex would, before really deleting files from the backend, check again if the numcopies restraint is still met, and revert its own delete commit as the files are still present anyway.)
|
||||
|
||||
Implications for trust
|
||||
==============
|
||||
|
||||
The proposed change also changes the semantics of trust. Trust can now be controlled in a finer-grained way between untrusted and semi-trusted, as best illustrated by a use case:
|
||||
|
||||
> Alice takes her netbook with her on a trip through Spain, and will fill most of its disk up with pictures she takes. As she expects to meet some old friends during the first days, she wants to take older pictures with her, which are safely backed up at home.
|
||||
>
|
||||
> She tells her netbook's repository to dereference the old images (but not other parts of the repository she has not copied anywhere yet) and pushes to the server before leaving. When she adds pictures from her camera to the repository, git-annex can now free up space as needed.
|
||||
|
||||
Dereferencing could be implemented as `git annex drop --not-yet`, freeing space is similar to `dropunused`.
|
||||
|
||||
A trusted repository with the new semantics would mean that the repository would not accept dropping anything, just as before.
|
||||
|
||||
Advantages / Disadvantages
|
||||
=====================
|
||||
|
||||
The advantage of this proposal is that the round trips required for dropping something could be greatly reduced.
|
||||
|
||||
There should also be simplifications in the `git annex drop` command as it doesn't need to take care of locking any more (git should already do that between checking if HEAD is a parent of the pushed commit and replacing HEAD).
|
||||
|
||||
Besides being a major change in git-annex (with the requirement to track hosts' git-annex versions for migration, as the new trust system is incompatible with the old one), no disadvantages of that stragegy are known to the author (hoping for discussion below).
|
|
@ -219,7 +219,7 @@ Many git-annex commands will stage changes for later `git commit` by you.
|
|||
|
||||
* fromkey file
|
||||
|
||||
This can be used to maually set up a file to link to a specified key
|
||||
This can be used to manually set up a file to link to a specified key
|
||||
in the key-value backend. How you determine an existing key in the backend
|
||||
varies. For the URL backend, the key is just a URL to the content.
|
||||
|
||||
|
@ -244,7 +244,7 @@ Many git-annex commands will stage changes for later `git commit` by you.
|
|||
|
||||
* setkey file
|
||||
|
||||
This plumbing-level command sets the annxed data for a key to the content of
|
||||
This plumbing-level command sets the annexed data for a key to the content of
|
||||
the specified file, and then removes the file.
|
||||
|
||||
A backend will typically need to be specified with --backend. If none
|
||||
|
@ -380,7 +380,7 @@ These files are used by git-annex, in your git repository:
|
|||
available. Annexed files in your git repository symlink to that content.
|
||||
|
||||
`.git-annex/uuid.log` is used to map between repository UUID and
|
||||
decscriptions.
|
||||
descriptions.
|
||||
|
||||
`.git-annex/trust.log` is used to indicate which repositories are trusted
|
||||
and untrusted.
|
||||
|
|
|
@ -51,7 +51,7 @@ files with git.
|
|||
* [[git-annex man page|git-annex]]
|
||||
* [[key-value backends|backends]] for data storage
|
||||
* [[location_tracking]] reminds you where git-annex has seen files
|
||||
* git-annex prevents accidential data loss by [[tracking copies|copies]]
|
||||
* git-annex prevents accidental data loss by [[tracking copies|copies]]
|
||||
of your files
|
||||
* [[what git annex is not|not]]
|
||||
* git-annex is Free Software, licensed under the [[GPL]].
|
||||
|
|
|
@ -27,5 +27,5 @@ descriptions to help you with finding them:
|
|||
c0a28e06-d7ef-11df-885c-775af44f8882 -- USB archive drive 1
|
||||
e1938fee-d95b-11df-96cc-002170d25c55
|
||||
|
||||
In certian cases you may want to configure git-annex to [[trust]]
|
||||
In certain cases you may want to configure git-annex to [[trust]]
|
||||
that location tracking information is always correct for a repository.
|
||||
|
|
|
@ -26,7 +26,7 @@
|
|||
I only learned of git-media after writing git-annex, but I probably
|
||||
would have still written git-annex instead of using it. Currently,
|
||||
git-media has the advantage of using git smudge filters rather than
|
||||
git-annex's pile of symlinks, and it may be a tighter fit for certian
|
||||
git-annex's pile of symlinks, and it may be a tighter fit for certain
|
||||
situations. It lacks git-annex's support for widely distributed storage,
|
||||
using only a single backend data store. It also does not support
|
||||
partial checkouts of file contents, like git-annex does.
|
||||
|
|
|
@ -11,12 +11,12 @@ information. When removing content, it will directly check
|
|||
that other repositories have enough [[copies]].
|
||||
|
||||
Generally that explicit checking is a good idea. Consider that the current
|
||||
[[location_tracking]] information for a remote may not yet have propigated
|
||||
[[location_tracking]] information for a remote may not yet have propagated
|
||||
out. Or, a remote may have suffered a catastrophic loss of data, or itself
|
||||
been lost.
|
||||
|
||||
There is still some trust involved here. A semitrusted repository is
|
||||
dependended on to retain a copy of the file content; possibly the only
|
||||
depended on to retain a copy of the file content; possibly the only
|
||||
[[copy|copies]].
|
||||
|
||||
(Being semitrusted is the default. The `git annex semitrust` command
|
||||
|
|
|
@ -6,13 +6,13 @@ safe place.
|
|||
With git-annex, Bob has a single directory tree that includes all
|
||||
his files, even if their content is being stored offline. He can
|
||||
reorganize his files using that tree, committing new versions to git,
|
||||
without worry about accidentially deleting anything.
|
||||
without worry about accidentally deleting anything.
|
||||
|
||||
When Bob needs access to some files, git-annex can tell him which drive(s)
|
||||
they're on, and easily make them available. Indeed, every drive knows what
|
||||
is on every other drive.
|
||||
|
||||
Run in a cron job, git-annex adds new files to achival drives at night. It
|
||||
Run in a cron job, git-annex adds new files to archival drives at night. It
|
||||
also helps Bob keep track of intentional, and unintentional copies of
|
||||
files, and logs information he can use to decide when it's time to duplicate
|
||||
the content of old drives.
|
||||
|
|
|
@ -84,7 +84,7 @@ can get them.
|
|||
|
||||
## transferring files: When things go wrong
|
||||
|
||||
After a while, you'll have serveral annexes, with different file contents.
|
||||
After a while, you'll have several annexes, with different file contents.
|
||||
You don't have to try to keep all that straight; git-annex does
|
||||
[[location_tracking]] for you. If you ask it to get a file and the drive
|
||||
or file server is not accessible, it will let you know what it needs to get
|
||||
|
@ -146,7 +146,7 @@ That's a good thing, because it might be the only copy, you wouldn't
|
|||
want to lose it in a fumblefingered mistake.
|
||||
|
||||
# echo oops > my_cool_big_file
|
||||
bash: my_cool_big_file: Permission deined
|
||||
bash: my_cool_big_file: Permission denied
|
||||
|
||||
In order to modify a file, it should first be unlocked.
|
||||
|
||||
|
@ -176,7 +176,7 @@ There is one problem with using `git commit` like this: Git wants to first
|
|||
stage the entire contents of the file in its index. That can be slow for
|
||||
big files (sorta why git-annex exists in the first place). So, the
|
||||
automatic handling on commit is a nice safety feature, since it prevents
|
||||
the file content being accidentially commited into git. But when working with
|
||||
the file content being accidentally committed into git. But when working with
|
||||
big files, it's faster to explicitly add them to the annex yourself
|
||||
before committing.
|
||||
|
||||
|
@ -267,12 +267,12 @@ that the URL is stable; no local backup is kept.
|
|||
|
||||
Another handy alternative to the default [[backend|backends]] is the
|
||||
SHA1 backend. This backend provides more git-style assurance that your data
|
||||
has not been damanged. And the checksum means that when you add the same
|
||||
has not been damaged. And the checksum means that when you add the same
|
||||
content to the annex twice, only one copy need be stored in the backend.
|
||||
|
||||
The only reason it's not the default is that it needs to checksum
|
||||
files when they're added to the annex, and this can slow things down
|
||||
significantly for really big files. To make SHA1 the detault, just
|
||||
significantly for really big files. To make SHA1 the default, just
|
||||
add something like this to `.gitattributes`:
|
||||
|
||||
* annex.backend=SHA1
|
||||
|
@ -292,7 +292,7 @@ files will be skipped.
|
|||
|
||||
After migrating a file to a new backend, the old content in the old backend
|
||||
will still be present. That is necessary because multiple files
|
||||
can point to the same content. The `git annex unused` sucommand can be
|
||||
can point to the same content. The `git annex unused` subcommand can be
|
||||
used to clear up that detritus later. Note that hard links are used,
|
||||
to avoid wasting disk space.
|
||||
|
||||
|
@ -342,7 +342,7 @@ setting is satisfied for all files.
|
|||
fsck my_cool_big_file (checksum...) ok
|
||||
...
|
||||
|
||||
You can also specifiy the files to check. This is particularly useful if
|
||||
You can also specify the files to check. This is particularly useful if
|
||||
you're using sha1 and don't want to spend a long time checksumming everything.
|
||||
|
||||
# git annex fsck my_cool_big_file
|
||||
|
@ -367,7 +367,7 @@ might say about a badly messed up annex:
|
|||
## backups
|
||||
|
||||
git-annex can be configured to require more than one copy of a file exists,
|
||||
as a simple backup for your data. This is controled by the "annex.numcopies"
|
||||
as a simple backup for your data. This is controlled by the "annex.numcopies"
|
||||
setting, which defaults to 1 copy. Let's change that to require 2 copies,
|
||||
and send a copy of every file to a USB drive.
|
||||
|
||||
|
@ -394,9 +394,9 @@ For more details about the numcopies setting, see [[copies]].
|
|||
|
||||
## untrusted repositories
|
||||
|
||||
Suppose you have a USB thunb drive and are using it as a git annex
|
||||
Suppose you have a USB thumb drive and are using it as a git annex
|
||||
repository. You don't trust the drive, because you could lose it, or
|
||||
accidentially run it through the laundry. Or, maybe you have a drive that
|
||||
accidentally run it through the laundry. Or, maybe you have a drive that
|
||||
you know is dying, and you'd like to be warned if there are any files
|
||||
on it not backed up somewhere else. Maybe the drive has already died
|
||||
or been lost.
|
||||
|
|
Loading…
Add table
Reference in a new issue