Merge branch 'master' into assistant
Conflicts: doc/design/assistant/syncing.mdwn
This commit is contained in:
commit
d6f65aed16
8 changed files with 140 additions and 46 deletions
|
@ -102,9 +102,13 @@ keyValueE size source = keyValue size source >>= maybe (return Nothing) addE
|
||||||
}
|
}
|
||||||
|
|
||||||
selectExtension :: FilePath -> String
|
selectExtension :: FilePath -> String
|
||||||
selectExtension = join "." . reverse . take 2 . takeWhile shortenough .
|
selectExtension f
|
||||||
reverse . split "." . takeExtensions
|
| null es = ""
|
||||||
|
| otherwise = join "." ("":es)
|
||||||
where
|
where
|
||||||
|
es = filter (not . null) $ reverse $
|
||||||
|
take 2 $ takeWhile shortenough $
|
||||||
|
reverse $ split "." $ takeExtensions f
|
||||||
shortenough e
|
shortenough e
|
||||||
| '\n' `elem` e = False -- newline in extension?!
|
| '\n' `elem` e = False -- newline in extension?!
|
||||||
| otherwise = length e <= 4 -- long enough for "jpeg"
|
| otherwise = length e <= 4 -- long enough for "jpeg"
|
||||||
|
|
|
@ -0,0 +1,10 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://www.google.com/accounts/o8/id?id=AItOawldKnauegZulM7X6JoHJs7Gd5PnDjcgx-E"
|
||||||
|
nickname="Matt"
|
||||||
|
subject="Source code"
|
||||||
|
date="2012-07-06T00:12:15Z"
|
||||||
|
content="""
|
||||||
|
Hi Joey,
|
||||||
|
|
||||||
|
Is the source code for git-annex assistant available somewhere?
|
||||||
|
"""]]
|
|
@ -0,0 +1,7 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="http://joeyh.name/"
|
||||||
|
subject="comment 2"
|
||||||
|
date="2012-07-06T00:21:43Z"
|
||||||
|
content="""
|
||||||
|
It's in the `assistant` branch of git://git-annex.branchable.com/
|
||||||
|
"""]]
|
28
doc/design/assistant/blog/day_26__dying_drives.mdwn
Normal file
28
doc/design/assistant/blog/day_26__dying_drives.mdwn
Normal file
|
@ -0,0 +1,28 @@
|
||||||
|
My laptop's SSD died this morning. I had some work from yesterday
|
||||||
|
committed to the git repo on it, but not pushed as it didn't build.
|
||||||
|
Luckily I was able to get that off the SSD, which is now a read-only
|
||||||
|
drive -- even mounting it fails with fsck write errors.
|
||||||
|
|
||||||
|
Wish I'd realized the SSD was dying before the day before my trip to
|
||||||
|
Nicaragua..
|
||||||
|
Getting back to a useful laptop used most of my time and energy today.
|
||||||
|
|
||||||
|
I did manage to fix transfers to not block the rest of the assistant's
|
||||||
|
threads. Problem was that, without Haskell's threaded runtime, waiting
|
||||||
|
on something like a rsync command blocks all threads. To fix this,
|
||||||
|
transfers now are run in separate processes.
|
||||||
|
|
||||||
|
Also added code to allow multiple transfers to run at once. Each transfer
|
||||||
|
takes up a slot, with the number of free slots tracked by a `QSemN`.
|
||||||
|
This allows the transfer starting thread to block until a slot frees up,
|
||||||
|
and then run the transfer.
|
||||||
|
|
||||||
|
This needs to be extended to be aware of transfers initiated by remotes.
|
||||||
|
The transfer watcher thread should detect those starting and stopping
|
||||||
|
and update the `QSemN` accordingly. It would also be nice if transfers
|
||||||
|
initiated by remotes would be delayed when there are no free slots for them
|
||||||
|
... but I have not thought of a good way to do that.
|
||||||
|
|
||||||
|
There's a bug somewhere in the new transfer code, when two transfers are
|
||||||
|
queued close together, the second one is lost and doesn't happen.
|
||||||
|
Would debug this, but I'm spent for the day.
|
|
@ -3,27 +3,6 @@ all the other git clones, at both the git level and the key/value level.
|
||||||
|
|
||||||
## action items
|
## action items
|
||||||
|
|
||||||
* on-disk transfers in progress information files (read/write/enumerate)
|
|
||||||
**done**
|
|
||||||
* locking for the files, so redundant transfer races can be detected,
|
|
||||||
and failed transfers noticed **done**
|
|
||||||
* transfer info for git-annex-shell **done**
|
|
||||||
* update files as transfers proceed. See [[progressbars]]
|
|
||||||
(updating for downloads is easy; for uploads is hard)
|
|
||||||
* add Transfer queue TChan **done**
|
|
||||||
* add TransferInfo Map to DaemonStatus for tracking transfers in progress.
|
|
||||||
**done**
|
|
||||||
* Poll transfer in progress info files for changes (use inotify again!
|
|
||||||
wow! hammer, meet nail..), and update the TransferInfo Map **done**
|
|
||||||
* enqueue Transfers (Uploads) as new files are added to the annex by
|
|
||||||
Watcher. **done**
|
|
||||||
* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by
|
|
||||||
Watcher. **done**
|
|
||||||
* Write basic Transfer handling thread. Multiple such threads need to be
|
|
||||||
able to be run at once. Each will need its own independant copy of the
|
|
||||||
Annex state monad. **done**
|
|
||||||
* Write transfer control thread, which decides when to launch transfers.
|
|
||||||
**done**
|
|
||||||
* Check that download transfer triggering code works (when a symlink appears
|
* Check that download transfer triggering code works (when a symlink appears
|
||||||
and the remote does *not* upload to us.
|
and the remote does *not* upload to us.
|
||||||
* Investigate why transfers seem to block other git-annex assistant work.
|
* Investigate why transfers seem to block other git-annex assistant work.
|
||||||
|
@ -35,31 +14,23 @@ all the other git clones, at both the git level and the key/value level.
|
||||||
* git-annex needs a simple speed control knob, which can be plumbed
|
* git-annex needs a simple speed control knob, which can be plumbed
|
||||||
through to, at least, rsync. A good job for an hour in an
|
through to, at least, rsync. A good job for an hour in an
|
||||||
airport somewhere.
|
airport somewhere.
|
||||||
|
* file transfer processes are not waited for, contain the zombies.
|
||||||
|
|
||||||
## git syncing
|
## longer-term TODO
|
||||||
|
|
||||||
1. Can use `git annex sync`, which already handles bidirectional syncing.
|
* Investigate the XMPP approach like dvcs-autosync does, or other ways of
|
||||||
When a change is committed, launch the part of `git annex sync` that pushes
|
|
||||||
out changes. **done**; changes are pushed out to all remotes in parallel
|
|
||||||
1. Watch `.git/refs/remotes/` for changes (which would be pushed in from
|
|
||||||
another node via `git annex sync`), and run the part of `git annex sync`
|
|
||||||
that merges in received changes, and follow it by the part that pushes out
|
|
||||||
changes (sending them to any other remotes).
|
|
||||||
[The watching can be done with the existing inotify code! This avoids needing
|
|
||||||
any special mechanism to notify a remote that it's been synced to.]
|
|
||||||
**done**
|
|
||||||
1. Periodically retry pushes that failed. **done** (every half an hour)
|
|
||||||
1. Also, detect if a push failed due to not being up-to-date, pull,
|
|
||||||
and repush. **done**
|
|
||||||
2. Use a git merge driver that adds both conflicting files,
|
|
||||||
so conflicts never break a sync. **done**
|
|
||||||
3. Investigate the XMPP approach like dvcs-autosync does, or other ways of
|
|
||||||
signaling a change out of band.
|
signaling a change out of band.
|
||||||
4. Add a hook, so when there's a change to sync, a program can be run
|
* Add a hook, so when there's a change to sync, a program can be run
|
||||||
and do its own signaling.
|
and do its own signaling.
|
||||||
5. --debug will show often unnecessary work being done. Optimise.
|
* --debug will show often unnecessary work being done. Optimise.
|
||||||
6. It would be nice if, when a USB drive is connected,
|
* It would be nice if, when a USB drive is connected,
|
||||||
syncing starts automatically. Use dbus on Linux?
|
syncing starts automatically. Use dbus on Linux?
|
||||||
|
* This assumes the network is connected. It's often not, so the
|
||||||
|
[[cloud]] needs to be used to bridge between LANs.
|
||||||
|
* Configurablity, including only enabling git syncing but not data transfer;
|
||||||
|
only uploading new files but not downloading, and only downloading
|
||||||
|
files in some directories and not others. See for use cases:
|
||||||
|
[[forum/Wishlist:_options_for_syncing_meta-data_and_data]]
|
||||||
|
|
||||||
## misc todo
|
## misc todo
|
||||||
|
|
||||||
|
@ -90,7 +61,42 @@ reachable remote. This is worth doing first, since it's the simplest way to
|
||||||
get the basic functionality of the assistant to work. And we'll need this
|
get the basic functionality of the assistant to work. And we'll need this
|
||||||
anyway.
|
anyway.
|
||||||
|
|
||||||
## other considerations
|
## done
|
||||||
|
|
||||||
This assumes the network is connected. It's often not, so the
|
1. Can use `git annex sync`, which already handles bidirectional syncing.
|
||||||
[[cloud]] needs to be used to bridge between LANs.
|
When a change is committed, launch the part of `git annex sync` that pushes
|
||||||
|
out changes. **done**; changes are pushed out to all remotes in parallel
|
||||||
|
1. Watch `.git/refs/remotes/` for changes (which would be pushed in from
|
||||||
|
another node via `git annex sync`), and run the part of `git annex sync`
|
||||||
|
that merges in received changes, and follow it by the part that pushes out
|
||||||
|
changes (sending them to any other remotes).
|
||||||
|
[The watching can be done with the existing inotify code! This avoids needing
|
||||||
|
any special mechanism to notify a remote that it's been synced to.]
|
||||||
|
**done**
|
||||||
|
1. Periodically retry pushes that failed. **done** (every half an hour)
|
||||||
|
1. Also, detect if a push failed due to not being up-to-date, pull,
|
||||||
|
and repush. **done**
|
||||||
|
2. Use a git merge driver that adds both conflicting files,
|
||||||
|
so conflicts never break a sync. **done**
|
||||||
|
|
||||||
|
* on-disk transfers in progress information files (read/write/enumerate)
|
||||||
|
**done**
|
||||||
|
* locking for the files, so redundant transfer races can be detected,
|
||||||
|
and failed transfers noticed **done**
|
||||||
|
* transfer info for git-annex-shell **done**
|
||||||
|
* update files as transfers proceed. See [[progressbars]]
|
||||||
|
(updating for downloads is easy; for uploads is hard)
|
||||||
|
* add Transfer queue TChan **done**
|
||||||
|
* add TransferInfo Map to DaemonStatus for tracking transfers in progress.
|
||||||
|
**done**
|
||||||
|
* Poll transfer in progress info files for changes (use inotify again!
|
||||||
|
wow! hammer, meet nail..), and update the TransferInfo Map **done**
|
||||||
|
* enqueue Transfers (Uploads) as new files are added to the annex by
|
||||||
|
Watcher. **done**
|
||||||
|
* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by
|
||||||
|
Watcher. **done**
|
||||||
|
* Write basic Transfer handling thread. Multiple such threads need to be
|
||||||
|
able to be run at once. Each will need its own independant copy of the
|
||||||
|
Annex state monad. **done**
|
||||||
|
* Write transfer control thread, which decides when to launch transfers.
|
||||||
|
**done**
|
||||||
|
|
12
doc/forum/Wishlist:_mark_remotes_offline.mdwn
Normal file
12
doc/forum/Wishlist:_mark_remotes_offline.mdwn
Normal file
|
@ -0,0 +1,12 @@
|
||||||
|
I have several remotes which are not always accessible. For example they can
|
||||||
|
be on hosts only accessible by LAN or on a portable hard drive which is not
|
||||||
|
plugged in. When running sync these remotes are checked as well, leading to
|
||||||
|
unnecessary error messages and possibly git-annex waiting for a few minutes
|
||||||
|
on each remote for a timeout.
|
||||||
|
|
||||||
|
In this situation it would be useful to mark some remotes as offline
|
||||||
|
(`git annex offline <remotename>`), so that git-annex would not even attempt
|
||||||
|
to contact them. Then, I could configure my system to automatically, for example,
|
||||||
|
mark a portable hard disk remote online when plugging it in, and offline when
|
||||||
|
unplugging it, and similarly marking remotes offline and online depending on
|
||||||
|
whether I have an internet connection or a connection to a specific network.
|
|
@ -0,0 +1,14 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="http://joeyh.name/"
|
||||||
|
subject="comment 1"
|
||||||
|
date="2012-07-06T13:04:07Z"
|
||||||
|
content="""
|
||||||
|
You can already do this:
|
||||||
|
|
||||||
|
git config remote.foo.annex-ignore true
|
||||||
|
|
||||||
|
There's no need to do anything for portable drives that are sometimes mounted and sometimes not -- git-annex will automatically avoid using repositories in directories that do not currently exist.
|
||||||
|
|
||||||
|
I thought git-annex also had a way to run a command and use its exit status to control whether a repo was
|
||||||
|
ignored or not, but it seems I never actually implemented that. It might be worth adding, although the command would necessarily run whenever git-annex is transferring data around.
|
||||||
|
"""]]
|
|
@ -0,0 +1,13 @@
|
||||||
|
Since _transfer queueing_ and syncing of data works now in the assistant branch (been playing with it), there are times when I really don't want to sync the data, I would like to just sync meta-data and manually do a _get_ on files that I would want or selectively sync data in a subtree.
|
||||||
|
|
||||||
|
It would be nice to have the syncing/watch feature to have the option of syncing only *meta-data* or *meta-data and data*, I think this sort of option was already planned? It would also be nice to be able to automatically sync data for only a subtree.
|
||||||
|
|
||||||
|
My use case is, I have a big stash of files somewhere at home or work, and I want to keep what I am actually using on my laptop and be able to selectively just take a subtree or a set of subtree's of files. I would not always want to suck down all the data but still have the functionally to add files and push them upstream and sync meta-data.
|
||||||
|
|
||||||
|
that is...
|
||||||
|
|
||||||
|
> * Site A: big master annex in a server room with lots of disk (or machines), watches a directory and syncs both data and meta-data, it should always try and pull data from all it's child repos. That way I will always have a master copy of my data somewhere, it would be even nicer if I could have clones of the annex, where each annex is on a different machine which is configured to only sync a subtree of files so I can distribute my annex across different systems and disks.
|
||||||
|
> * Site A: machine A: syncs Folder A
|
||||||
|
> * Site A: machine B: syncs Folder B
|
||||||
|
> * and so on with selectively syncing sites and directories
|
||||||
|
> * Laptop: has a clone of the annex, and watches a directory, syncs meta-data as usual and only uploads files to a remote (all or a designated one) but it never downloads files automatically or it should only occur inside a selected subtree.
|
Loading…
Add table
Add a link
Reference in a new issue