Merge branch 'master' into assistant

This commit is contained in:
Joey Hess 2012-07-05 16:27:09 -06:00
commit 880b55f277
5 changed files with 63 additions and 38 deletions

View file

@ -97,16 +97,17 @@ keyValueE :: SHASize -> KeySource -> Annex (Maybe Key)
keyValueE size source = keyValue size source >>= maybe (return Nothing) addE
where
addE k = return $ Just $ k
{ keyName = keyName k ++ extension
{ keyName = keyName k ++ selectExtension (keyFilename source)
, keyBackendName = shaNameE size
}
naiveextension = takeExtension $ keyFilename source
extension
-- long or newline containing extensions are
-- probably not really an extension
| length naiveextension > 6 ||
'\n' `elem` naiveextension = ""
| otherwise = naiveextension
selectExtension :: FilePath -> String
selectExtension = join "." . reverse . take 2 . takeWhile shortenough .
reverse . split "." . takeExtensions
where
shortenough e
| '\n' `elem` e = False -- newline in extension?!
| otherwise = length e <= 4 -- long enough for "jpeg"
{- A key's checksum is checked during fsck. -}
checkKeyChecksum :: SHASize -> Key -> FilePath -> Annex Bool

2
debian/changelog vendored
View file

@ -9,6 +9,8 @@ git-annex (3.20120630) UNRELEASED; urgency=low
but avoids portability problems.
* Use SHA library for files less than 50 kb in size, at which point it's
faster than forking the more optimised external program.
* SHAnE backends are now smarter about composite extensions, such as
.tar.gz Closes: #680450
-- Joey Hess <joeyh@debian.org> Sun, 01 Jul 2012 15:04:37 -0400

View file

@ -0,0 +1,41 @@
So as not to bury the lead, I've been hard at work on my first day in
Nicaragua, and ** the git-annex assistant fully syncs files (including
their contents) between remotes now !! **
Details follow..
Made the committer thread queue Upload Transfers when new files
are added to the annex. Currently it tries to transfer the new content
to *every* remote; this innefficiency needs to be addressed later.
Made the watcher thread queue Download Transfers when new symlinks
appear that point to content we don't have. Typically, that will happen
after an automatic merge from a remote. This needs to be improved as it
currently adds Transfers from every remote, not just those that have the
content.
This was the second place that needed an ordered list of remotes
to talk to. So I cached such a list in the DaemonStatus state info.
This will also be handy later on, when the webapp is used to add new
remotes, so the assistant can know about them immediately.
Added YAT (Yet Another Thread), number 15 or so, the transferrer thread
that waits for transfers to be queued and runs them. Currently a naive
implementation, it runs one transfer at a time, and does not do anything
to recover when a transfer fails.
Actually transferring content requires YAT, so that the transfer
action can run in a copy of the Annex monad, without blocking
all the assistant's other threads from entering that monad while a transfer
is running. This is also necessary to allow multiple concurrent transfers
to run in the future.
This is a very tricky peice of code, because that thread will modify the
git-annex branch, and its parent thread has to invalidate its cache in
order to see any changes the child thread made. Hopefully that's the extent
of the complication of doing this. The only reason this was possible at all
is that git-annex already support multiple concurrent processes running
and all making independant changes to the git-annex branch, etc.
After all my groundwork this week, file content transferring is now
fully working!

View file

@ -21,8 +21,11 @@ all the other git clones, at both the git level and the key/value level.
Watcher. **done**
* Write basic Transfer handling thread. Multiple such threads need to be
able to be run at once. Each will need its own independant copy of the
Annex state monad.
Annex state monad. **done**
* Write transfer control thread, which decides when to launch transfers.
**done**
* Check that download transfer triggering code works (when a symlink appears
and the remote does *not* upload to us.
* At startup, and possibly periodically, look for files we have that
location tracking indicates remotes do not, and enqueue Uploads for
them. Also, enqueue Downloads for any files we're missing.
@ -86,35 +89,6 @@ reachable remote. This is worth doing first, since it's the simplest way to
get the basic functionality of the assistant to work. And we'll need this
anyway.
### transfer tracking
Transfer threads started/stopped as necessary to move data.
(May sometimes want multiple threads downloading, or uploading, or even both.)
startTransfer :: TransferQueue -> Transfer -> Annex ()
startTransfer q transfer = error "TODO"
stopTransfer :: TransferQueue -> TransferID -> Annex ()
stopTransfer q transfer = error "TODO"
The assistant needs to find out when `git-annex-shell` is receiving or
sending (triggered by another remote), so it can add data for those too.
This is important to avoid uploading content to a remote that is already
downloading it from us, or vice versa, as well as to in future let the web
app manage transfers as user desires.
For files being received, it can see the temp file, but other than lsof
there's no good way to find the pid (and I'd rather not kill blindly).
For files being sent, there's no filesystem indication. So git-annex-shell
(and other git-annex transfer processes) should write a status file to disk.
Can use file locking on these status files to claim upload/download rights,
which will avoid races.
This status file can also be updated periodically to show amount of transfer
complete (necessary for tracking uploads).
## other considerations
This assumes the network is connected. It's often not, so the

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="http://joeyh.name/"
subject="comment 1"
date="2012-07-05T17:04:34Z"
content="""
I haven't tried it either, but I think it should work ok, as long as you bear in mind that to git-annex, each submodule will be treated as a separate git repository.
"""]]