Merge branch 'master' into smudge

This commit is contained in:
Joey Hess 2015-12-11 13:50:31 -04:00
commit c608a752a5
Failed to extract signature
13 changed files with 163 additions and 26 deletions

View file

@ -397,7 +397,7 @@ withTmp key action = do
- when doing concurrent downloads.
-}
checkDiskSpace :: Maybe FilePath -> Key -> Integer -> Bool -> Annex Bool
checkDiskSpace destination key alreadythere samefilesystem = ifM (Annex.getState Annex.force)
checkDiskSpace destdir key alreadythere samefilesystem = ifM (Annex.getState Annex.force)
( return True
, do
-- We can't get inprogress and free at the same
@ -421,7 +421,7 @@ checkDiskSpace destination key alreadythere samefilesystem = ifM (Annex.getState
_ -> return True
)
where
dir = maybe (fromRepo gitAnnexDir) return destination
dir = maybe (fromRepo gitAnnexDir) return destdir
needmorespace n =
warning $ "not enough free space, need " ++
roughSize storageUnits True n ++

View file

@ -162,7 +162,7 @@ performRemote key file backend numcopies remote =
let cleanup = liftIO $ catchIO (removeFile tmp) (const noop)
cleanup
cleanup `after` a tmp
getfile tmp = ifM (checkDiskSpace (Just tmp) key 0 True)
getfile tmp = ifM (checkDiskSpace (Just (takeDirectory tmp)) key 0 True)
( ifM (Remote.retrieveKeyFileCheap remote key (Just file) tmp)
( return (Just True)
, ifM (Annex.getState Annex.fast)

View file

@ -191,7 +191,7 @@ testDav url (Just (u, p)) = do
makeParentDirs
void $ mkColRecursive tmpDir
inLocation (tmpLocation "git-annex-test") $ do
putContentM (Nothing, L.empty)
putContentM (Nothing, L8.fromString "test")
delContentM
where
test a = liftIO $

3
debian/changelog vendored
View file

@ -17,6 +17,9 @@ git-annex (6.20151225) unstable; urgency=medium
git-annex (5.20151209) UNRELEASED; urgency=medium
* Add S3 features to git-annex version output.
* webdav: When testing the WebDAV server, send a file with content.
The empty file it was sending tickled bugs in some php WebDAV server.
* fsck: Failed to honor annex.diskreserve when checking a remote.
-- Joey Hess <id@joeyh.name> Thu, 10 Dec 2015 11:39:34 -0400

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2015-12-11T15:24:44Z"
content="""
It seems likely that you didn't stop the assistant. There's no
caching of urls to git remotes; git-annex just uses whatever's there in
.git/config.
"""]]

View file

@ -0,0 +1,39 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2015-12-11T15:08:30Z"
content="""
The first failure is git-annex sending MKCOL (make directory basically).
The server fails with "Unauthorized". You say it also made the directory.
That's got to be a bug in the server, no? It can't sanely have an
authorization problem and also go on and do the unathorized action.
(Sounds rather like a security hole..)
As to the PUT failure, the chunked transfer encoding mentioned in that
comment is a regular part of the HTTP protocol (this is not connected
to git-annex's own chunking).
<https://en.wikipedia.org/wiki/Chunked_transfer_encoding>
Looks like this PHP webdav server might be delegating the actual HTTP
to whatever web server it's running on somehow. Since chunked transfer
encoding might not be supported by some web server, they are left trying to
detect that. I don't know if their check for that is accurate.
As to the implementation in git-annex,
Network.Http.Client.RequestBodyStreamChunked is documented to be the only
thing that causes a chunked request body to be sent, and git-annex is using
RequestBodyLBS instead. Unless the documentation is wrong (and I also
looked at the http-client source code and the documentation seems accurate),
I am doubtful that the chunked transfer encoding is actually being used by
git-annex. If eg a protocol dump shows that it is in fact using chunked
transfer encoding (ie, contains "Transfer-Encoding: chunked"),
that would be grounds to file a bug on the http-client library.
Aah, but.. git-annex is sending an empty file. And the webdav server's
check consists of reading 1 byte.
Of course there's not a byte to read if an empty file is being sent!
So that code you showed is certianly buggy.
I've changed git-annex to send a non-empty file when testing the webdav
server to work around this.
"""]]

View file

@ -0,0 +1,50 @@
[[!comment format=mdwn
username="yminus"
subject="comment 7"
date="2015-12-10T22:25:26Z"
content="""
I have the same problem as the initial reporter.
USB drive is FAT32 in direct mode
laptop is ext4 in indirect mode
nas is ext4 in indirect mode
Syncing nas with laptop and vice versa works with no problems.
But as soon as I sync with USB drive it behaves like all commits on laptop and nas that happened since the last sync are reverted.
I can recover the files on laptop and nas by ```git reset --hard origin/master``` and ```git reset --hard origin/synced/master``` on laptop or nas.
However, I cannot reset master and synced/master on the USB drive (error is \"fatal: This operation must be run in a work tree\").
This is the tree as seen from the on laptop after syncing and resetting as described above:
* 9bdc037 (n900/synced/master, n900/master) merge refs/heads/synced/master ### <--- THIS IS THE STATE WHEN SYNCING WITH USB DRIVE all added files are deleted
|\
| * 1236008 (HEAD -> master, origin/synced/master, origin/master, nas/synced/master, nas/master, synced/master) ADDED FILES ### <--- THIS IS THE LAST GOOD STATE
| * 17c4f54 ADDED FILES
| * 364d525 Merge remote-tracking branch 'refs/remotes/origin/master'
| |\
| | * c18f170 ADDED FILES
| | * 9dd5668 ADDED FILES
| * | c3280fc ADDED FILES
| * | 2babe80 ADDED FILES
| * | b964e29 ADDED FILES
| * | 03f3bd1 ADDED FILES
| * | 010a469 ADDED FILES
| * | 8acf199 ADDED FILES
| * | f2477bc Merge remote-tracking branch 'refs/remotes/origin/master'
| |\ \
| | |/
| | * 121ffd1 ADDED FILES
* | | dc88b8a (n900/annex/direct/master) git-annex in lars@lars-laptop:/run/media/lars/Nokia N900/.sounds/Musik ### <--- THIS IS THE CURRENT STATE ON THE USB DRIVE
|/ /
*
n900 is the USB drive
nas and origin are both the same
How can I sync my USB drive without loosing my last commits?
"""]]

View file

@ -4,18 +4,6 @@ see [[tips/using_Amazon_S3]].
[[!toc]]
## encryption backends
It makes sense to support multiple encryption backends. So, there
should be a way to tell what backend is responsible for a given filename
in an encrypted remote. (And since special remotes can also store files
unencrypted, differentiate from those as well.)
The rest of this page will describe a single encryption backend using GPG.
Probably only one will be needed, but who knows? Maybe that backend will
turn out badly designed, or some other encryptor needed. Designing
with more than one encryption backend in mind helps future-proofing.
## encryption key management
[[!template id=note text="""
@ -35,18 +23,22 @@ already been stored in the remote.
Different encrypted remotes need to be able to each use different ciphers.
Allowing multiple ciphers to be used within a single remote would add a lot
of complexity, so is not planned to be supported.
of complexity, so is not supported.
Instead, if you want a new cipher, create a new S3 bucket, or whatever.
There does not seem to be much benefit to using the same cipher for
two different encrypted remotes.
So, the encrypted cipher could just be stored with the rest of a remote's
So, the encrypted cipher is just stored with the rest of a remote's
configuration in `remotes.log` (see [[internals]]). When `git
annex intiremote` makes a remote, it can generate a random symmetric
annex intiremote` makes a remote, it generates a random symmetric
cipher, and encrypt it with the specified gpg key. To allow another gpg
public key access, update the encrypted cipher to be encrypted to both gpg
keys.
Note that there's a shared encryption mode where the cipher is not
encrypted. When this mode is used, any clone of the git repository
can decrypt files stored in its special remote.
## filename enumeration
If the names of files are encrypted or securely hashed, or whatever is
@ -73,7 +65,8 @@ can be chosen for new remotes.
It was suggested that it might not be wise to use the same cipher for both
gpg and HMAC. Being paranoid, it's best not to tie the security of one
to the security of the other. So, the encrypted cipher described above is
actually split in two; half is used for HMAC, and half for gpg.
actually split in two; the first half is used for HMAC, and the second
half for gpg.
----
@ -101,6 +94,9 @@ in remotes.log. This way anyone whose gpg key has been given access to
the cipher can get access to whatever other credentials are needed to
use the special remote.
For example, the S3 special remote does this if configured with
embedcreds=yet.
## risks
A risk of this scheme is that, once the symmetric cipher has been
@ -118,9 +114,5 @@ that an attacker with local machine access can tell at least all the
filenames and metadata of files stored in the encrypted remote anyway,
and can access whatever content is stored locally.
This design does not support obfuscating the size of files by chunking
them, as that would have added a lot of complexity, for dubious benefits.
If the untrusted party running the encrypted remote wants to know file sizes,
they could correlate chunks that are accessed together. Encrypting data
changes the original file size enough to avoid it being used as a direct
fingerprint at least.
This design does not address obfuscating the size of files by chunking
them. However, chunking was later added; see [[design/assistant/chunks]].

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2015-12-11T15:20:55Z"
content="""
It's not plumbing level, but `git-annex import --deduplicate`
or `git-annex import --skip-duplicates` are meant to handle this sort
of thing.
"""]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2015-12-11T14:42:32Z"
content="""
That's what's "special" about special remotes vs regular git remotes: They
only store the content of annexed files and not the git repository. Back up
the git repository separately (and your gpg key if it's being used, and the
credentials if you didn't use embedcreds=yes)
To use your backup, you can make a clone of the backed up git repository and
use `git annex enableremote` to enable it to use the special remote.
See [[design/encryption]] for details of how the encryption is implemented.
I've seen people follow that and manually use the data from the git repo to
decrypt files, but I don't have a pointer to an example at the moment.
"""]]

View file

@ -44,6 +44,7 @@ for using git-annex with various services:
* [pCloud](https://github.com/tochev/git-annex-remote-pcloud)
* [[ipfs]]
* [Ceph](https://github.com/mhameed/git-annex-remote-ceph)
* [Blackblaze's B2](https://github.com/encryptio/git-annex-remote-b2)
Want to add support for something else? [[Write your own!|external]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="fbicknel@01ede624a1a56b3998b823e9b60da0ff81cccb16"
nickname="fbicknel"
subject="Complete removal"
date="2015-12-10T16:16:43Z"
content="""
So, and I hope this isn't too Captain Obvious, if we drop the file at each repo, we essentially remove it from existence as far as this git-annex cluster is concerned?
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="comment 4"
date="2015-12-10T18:58:46Z"
content="""
Correct, dropping a file from everywhere will lose its content entirely.
But, git-annex has a [[copies]] tracking feature that prevents such foot-shooting. If you ask it to drop the last copy, it will refuse, although there is a way to override this if you really want to.
"""]]