From 625330ae783719420b8cc6f27026abb6e9362bec Mon Sep 17 00:00:00 2001 From: "https://me.yahoo.com/speredenn#aaf38" Date: Tue, 21 Aug 2012 18:15:48 +0000 Subject: [PATCH 01/22] Added a comment --- .../comment_3_48c3a80c14a85f27d742482b2ccbe628._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/forum/How_to_define_an_alternative_remote_url_for_a_git_remote_repository__63__/comment_3_48c3a80c14a85f27d742482b2ccbe628._comment diff --git a/doc/forum/How_to_define_an_alternative_remote_url_for_a_git_remote_repository__63__/comment_3_48c3a80c14a85f27d742482b2ccbe628._comment b/doc/forum/How_to_define_an_alternative_remote_url_for_a_git_remote_repository__63__/comment_3_48c3a80c14a85f27d742482b2ccbe628._comment new file mode 100644 index 0000000000..7a0054c493 --- /dev/null +++ b/doc/forum/How_to_define_an_alternative_remote_url_for_a_git_remote_repository__63__/comment_3_48c3a80c14a85f27d742482b2ccbe628._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://me.yahoo.com/speredenn#aaf38" + nickname="Jean-Baptiste Carré" + subject="comment 3" + date="2012-08-21T18:15:48Z" + content=""" +You're totally right: The UUIDs are the same. So it shouldn't matter if there are many repositories pointing to the same folder, as you state it. Thanks a lot! +"""]] From 6873ca0c1b1c50d2208c5074a5e7fb7fc267f990 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 21 Aug 2012 20:25:20 -0400 Subject: [PATCH 02/22] blog for the day --- .../day_61__network_connection_detection.mdwn | 36 +++++++++++++++++++ doc/design/assistant/syncing.mdwn | 17 +++++++-- 2 files changed, 50 insertions(+), 3 deletions(-) create mode 100644 doc/design/assistant/blog/day_61__network_connection_detection.mdwn diff --git a/doc/design/assistant/blog/day_61__network_connection_detection.mdwn b/doc/design/assistant/blog/day_61__network_connection_detection.mdwn new file mode 100644 index 0000000000..8ab40f5162 --- /dev/null +++ b/doc/design/assistant/blog/day_61__network_connection_detection.mdwn @@ -0,0 +1,36 @@ +Today, added a thread that deals with recovering when there's been a loss +of network connectivity. When the network's down, the normal immediate +syncing of changes of course doesn't work. So this thread detects when the +network comes back up, and does a pull+push to network remotes, and +triggers scanning for file content that needs to be transferred. + +I used dbus again, to detect events generated by both network-manager and +wicd when they've sucessfully brought an interface up. Or, if they're not +available, it polls every 30 minutes. + +When the network comes up, in addition to the git pull+push, it also +currently does a full scan of the repo to find files whose contents +need to be transferred to get fully back into sync. + +I think it'll be ok for some git pulls and pushes to happen when +moving to a new network, or resuming a laptop (or every 30 minutes when +resorting to polling). But the transfer scan is currently really too heavy +to be appropriate to do every time in those situations. I have an idea for +avoiding that scan when the remote's git-annex branch has not changed. But +I need to refine it, to handle cases like this: + +1. a new remote is added +2. file contents start being transferred to (or from it) +3. the network is taken down +4. all the queued transfers fail +5. the network comes back up +6. the transfer scan needs to know the remote was not all in sync + before #3, and so should do a full scan despite the git-annex branch + not having changed + +--- + +Doubled the ram in my netbook, which I use for all development. Yesod needs +rather a lot of ram to compile and link, and this should make me quite a +lot more productive. I was struggling with OOM killing bits of chromium +during my last week of development. diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 3aeb76afc1..898081574e 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -3,9 +3,15 @@ all the other git clones, at both the git level and the key/value level. ## immediate action items -* At startup, and possibly periodically, or when the network connection - changes, or some heuristic suggests that a remote was disconnected from - us for a while, queue remotes for processing by the TransferScanner. +* Sync with all available remotes on startup. +* TransferScanner should avoid unnecessary scanning of remotes. + This is paricilarly important for scans queued by the NetWatcher, + which can be polling, or could be after a momentary blip in network + connectivity. The TransferScanner could check the remote's git-annex + branch; if it is not ahead of the local git-annex branch, then + there's nothing to transfer. **Except** if the tree was not already + up-to-date before the loss of connectivity. So doing this needs + tracking of when the tree is not yet fully up-to-date. * Ensure that when a remote receives content, and updates its location log, it syncs that update back out. Prerequisite for: * After git sync, identify new content that we don't have that is now available @@ -157,3 +163,8 @@ redone to check it. finishes. **done** * Test MountWatcher on KDE, and add whatever dbus events KDE emits when drives are mounted. **done** +* Possibly periodically, or when the network connection + changes, or some heuristic suggests that a remote was disconnected from + us for a while, queue remotes for processing by the TransferScanner. + **done**; both network-manager and wicd connection events are supported, + and it falls back to polling every 30 minutes when neither is available. From e43feeb5b439d3cc8078b52a18a5a8568ceee2ae Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 22 Aug 2012 15:45:20 -0400 Subject: [PATCH 03/22] update --- doc/design/assistant/syncing.mdwn | 56 +++++++++++++++++++++++-------- 1 file changed, 42 insertions(+), 14 deletions(-) diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 898081574e..83c5e9d223 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -3,15 +3,42 @@ all the other git clones, at both the git level and the key/value level. ## immediate action items -* Sync with all available remotes on startup. -* TransferScanner should avoid unnecessary scanning of remotes. - This is paricilarly important for scans queued by the NetWatcher, - which can be polling, or could be after a momentary blip in network - connectivity. The TransferScanner could check the remote's git-annex - branch; if it is not ahead of the local git-annex branch, then - there's nothing to transfer. **Except** if the tree was not already - up-to-date before the loss of connectivity. So doing this needs - tracking of when the tree is not yet fully up-to-date. +* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily + broke content syncing in some situations, which need to be added back. + + Now syncing a disconnected remote only starts a transfer scan if the + remote's git-annex branch has diverged, which indicates it probably has + new files. But that leaves open the cases where the local repo has + new files; and where the two repos git branches are in sync, but the + content transfers are lagging behind; and where the transfer scan has + never been run. + + Need to track locally whether we're believed to be in sync with a remote. + This includes: + * All local content has been transferred to it successfully. + * The remote has been scanned once for data to transfer from it, and all + transfers initiated by that scan succeeded. + + Note the complication that, if it's initiated a transfer, our queued + transfer will be thrown out as unnecessary. But if its transfer then + fails, that needs to be noticed. + + If we're going to track failed transfers, we could just set a flag, + and use that flag later to initiate a new transfer scan. We need a flag + in any case, to ensure that a transfer scan is run for each new remote. + The flag could be `.git/annex/transfer/scanned/uuid`. + + But, if failed transfers are tracked, we could also record them, in + order to retry them later, without the scan. I'm thinking about a + directory like `.git/annex/transfer/failed/{upload,download}/uuid/`, + which failed transfer log files could be moved to. + + Note that a remote may lose content it had before, so when requeuing + a failed download, should check the location log to see if it still has + the content, and if not, queue a download from elsewhere. (And, a remote + may get content we were uploading from elsewhere, so check the location + log when queuing a failed Upload too.) + * Ensure that when a remote receives content, and updates its location log, it syncs that update back out. Prerequisite for: * After git sync, identify new content that we don't have that is now available @@ -49,6 +76,10 @@ all the other git clones, at both the git level and the key/value level. that need to be done to sync with a remote. Currently it walks the git working copy and checks each file. +## misc todo + +* --debug will show often unnecessary work being done. Optimise. + ## data syncing There are two parts to data syncing. First, map the network and second, @@ -163,8 +194,5 @@ redone to check it. finishes. **done** * Test MountWatcher on KDE, and add whatever dbus events KDE emits when drives are mounted. **done** -* Possibly periodically, or when the network connection - changes, or some heuristic suggests that a remote was disconnected from - us for a while, queue remotes for processing by the TransferScanner. - **done**; both network-manager and wicd connection events are supported, - and it falls back to polling every 30 minutes when neither is available. +* It would be nice if, when a USB drive is connected, + syncing starts automatically. Use dbus on Linux? **done** From dfb67090644af0024abcefe885744df9a46d094d Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 22 Aug 2012 15:47:08 -0400 Subject: [PATCH 04/22] blog for the day --- .../blog/day_62__smarter_syncing.mdwn | 21 +++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 doc/design/assistant/blog/day_62__smarter_syncing.mdwn diff --git a/doc/design/assistant/blog/day_62__smarter_syncing.mdwn b/doc/design/assistant/blog/day_62__smarter_syncing.mdwn new file mode 100644 index 0000000000..28fa892d38 --- /dev/null +++ b/doc/design/assistant/blog/day_62__smarter_syncing.mdwn @@ -0,0 +1,21 @@ +Woke up this morning with most of the design for a smarter approach to +[[syncing]] in my head. (This is why I sometimes slip up and tell people I +work on this project 12 hours a day..) + +To keep the current `assistant` branch working while I make changes +that break use cases that are working, I've started +developing in a new branch, `assistant-wip`. + +In it, I've started getting rid of unnecessary expensive transfer scans. + +First optimisation I've done is to detect when a remote that was +disconnected has diverged its `git-annex` branch from the local branch. +Only when that's the case does a new transfer scan need to be done, to find +out what new stuff might be available on that remote, to have caused the +change to its branch, while it was disconnected. + +That broke a lot of stuff. I have a plan to fix it written down in +[[syncing]]. It'll involve keeping track of whether a transfer scan has +ever been done (if not, one should be run), and recording logs when +transfers failed, so those failed transfers can be retried when the +remote gets reconnected. From 199cedf978c999201d3541003d3cffd4b0a30b77 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawmubB1Sj2rwFoVdZYvGV0ACaQUJQyiJXJI" Date: Thu, 23 Aug 2012 06:32:24 +0000 Subject: [PATCH 05/22] Added a comment: Amazon Glacier --- .../comment_1_09b58f41a8d48f218619711ee19511ac._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment diff --git a/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment b/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment new file mode 100644 index 0000000000..029aec7832 --- /dev/null +++ b/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawmubB1Sj2rwFoVdZYvGV0ACaQUJQyiJXJI" + nickname="Paul" + subject="Amazon Glacier" + date="2012-08-23T06:32:24Z" + content=""" +Do you think git-annex could support [Amazon Glacier](http://aws.amazon.com/glacier/) as a backend? +"""]] From d5d4b8db345b3e4a81b2ad2d7a4f2e5f1e30d519 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 23 Aug 2012 16:24:22 -0400 Subject: [PATCH 06/22] update --- doc/design/assistant/syncing.mdwn | 79 +++++++++++++++---------------- 1 file changed, 39 insertions(+), 40 deletions(-) diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 83c5e9d223..071ea2730a 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -3,42 +3,12 @@ all the other git clones, at both the git level and the key/value level. ## immediate action items -* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily - broke content syncing in some situations, which need to be added back. - - Now syncing a disconnected remote only starts a transfer scan if the - remote's git-annex branch has diverged, which indicates it probably has - new files. But that leaves open the cases where the local repo has - new files; and where the two repos git branches are in sync, but the - content transfers are lagging behind; and where the transfer scan has - never been run. - - Need to track locally whether we're believed to be in sync with a remote. - This includes: - * All local content has been transferred to it successfully. - * The remote has been scanned once for data to transfer from it, and all - transfers initiated by that scan succeeded. - - Note the complication that, if it's initiated a transfer, our queued - transfer will be thrown out as unnecessary. But if its transfer then - fails, that needs to be noticed. - - If we're going to track failed transfers, we could just set a flag, - and use that flag later to initiate a new transfer scan. We need a flag - in any case, to ensure that a transfer scan is run for each new remote. - The flag could be `.git/annex/transfer/scanned/uuid`. - - But, if failed transfers are tracked, we could also record them, in - order to retry them later, without the scan. I'm thinking about a - directory like `.git/annex/transfer/failed/{upload,download}/uuid/`, - which failed transfer log files could be moved to. - - Note that a remote may lose content it had before, so when requeuing - a failed download, should check the location log to see if it still has +* Fix MountWatcher to notice umounts and remounts of drives. +* A remote may lose content it had before, so when requeuing + a failed download, check the location log to see if the remote still has the content, and if not, queue a download from elsewhere. (And, a remote may get content we were uploading from elsewhere, so check the location log when queuing a failed Upload too.) - * Ensure that when a remote receives content, and updates its location log, it syncs that update back out. Prerequisite for: * After git sync, identify new content that we don't have that is now available @@ -67,18 +37,17 @@ all the other git clones, at both the git level and the key/value level. files in some directories and not others. See for use cases: [[forum/Wishlist:_options_for_syncing_meta-data_and_data]] * speed up git syncing by using the cached ssh connection for it too - (will need to use `GIT_SSH`, which needs to point to a command to run, - not a shell command line) + Will need to use `GIT_SSH`, which needs to point to a command to run, + not a shell command line. Beware that the network connection may have + bounced and the cached ssh connection not be usable. * Map the network of git repos, and use that map to calculate optimal transfers to keep the data in sync. Currently a naive flood fill is done instead. * Find a more efficient way for the TransferScanner to find the transfers that need to be done to sync with a remote. Currently it walks the git - working copy and checks each file. - -## misc todo - -* --debug will show often unnecessary work being done. Optimise. + working copy and checks each file. That probably needs to be done once, + but further calls to the TransferScanner could eg, look at the delta + between the last scan and the current one in the git-annex branch. ## data syncing @@ -196,3 +165,33 @@ redone to check it. drives are mounted. **done** * It would be nice if, when a USB drive is connected, syncing starts automatically. Use dbus on Linux? **done** +* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily + broke content syncing in some situations, which need to be added back. + **done** + + Now syncing a disconnected remote only starts a transfer scan if the + remote's git-annex branch has diverged, which indicates it probably has + new files. But that leaves open the cases where the local repo has + new files; and where the two repos git branches are in sync, but the + content transfers are lagging behind; and where the transfer scan has + never been run. + + Need to track locally whether we're believed to be in sync with a remote. + This includes: + * All local content has been transferred to it successfully. + * The remote has been scanned once for data to transfer from it, and all + transfers initiated by that scan succeeded. + + Note the complication that, if it's initiated a transfer, our queued + transfer will be thrown out as unnecessary. But if its transfer then + fails, that needs to be noticed. + + If we're going to track failed transfers, we could just set a flag, + and use that flag later to initiate a new transfer scan. We need a flag + in any case, to ensure that a transfer scan is run for each new remote. + The flag could be `.git/annex/transfer/scanned/uuid`. + + But, if failed transfers are tracked, we could also record them, in + order to retry them later, without the scan. I'm thinking about a + directory like `.git/annex/transfer/failed/{upload,download}/uuid/`, + which failed transfer log files could be moved to. From 73c24e05d86e0d8ef5c312fde92e7898154a01e2 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 23 Aug 2012 16:27:21 -0400 Subject: [PATCH 07/22] blog for the day --- .../blog/day_63__transfer_retries.mdwn | 26 +++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 doc/design/assistant/blog/day_63__transfer_retries.mdwn diff --git a/doc/design/assistant/blog/day_63__transfer_retries.mdwn b/doc/design/assistant/blog/day_63__transfer_retries.mdwn new file mode 100644 index 0000000000..d668f507ba --- /dev/null +++ b/doc/design/assistant/blog/day_63__transfer_retries.mdwn @@ -0,0 +1,26 @@ +Implemented everything I planned out yesterday: Expensive scans are only +done once per remote (unless the remote changed while it was disconnected), +and failed transfers are logged so they can be retried later. + +Changed the TransferScanner to prefer to scan low cost remotes first, +as a crude form of scheduling lower-cost transfers first. + +A whole bunch of interesting syncing scenarios should work now. I have not +tested them all in detail, but to the best of my knowledge, all these +should work: + +* Connect to the network. It starts syncing with a networked remote. + Disconnect the network. Reconnect, and it resumes where it left off. +* Migrate between networks (ie, home to cafe to work). Any transfers + that can only happen on one LAN are retried on each new network you + visit, until they succeed. + +One that is not working, but is soooo close: + +* Plug in a removable drive. Some transfers start. Yank the plug. + Plug it back in. All necessary transfers resume, and it ends up + fully in sync, no matter how many times you yank that cable. + +That's not working because of an infelicity in the MountWatcher. +It doesn't notice when the drive gets unmounted, so it ignores +the new mount event. From 476c60ce1fd1b5298d639c77103958b929d87a42 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawnY9ObrNrQuRp8Xs0XvdtJJssm5cp4NMZA" Date: Thu, 23 Aug 2012 21:00:12 +0000 Subject: [PATCH 08/22] Added a comment: Rackspace Cloud Files support? --- ...comment_6_78da9e233882ec0908962882ea8c4056._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment diff --git a/doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment b/doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment new file mode 100644 index 0000000000..742dbedc2f --- /dev/null +++ b/doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawnY9ObrNrQuRp8Xs0XvdtJJssm5cp4NMZA" + nickname="alan" + subject="Rackspace Cloud Files support?" + date="2012-08-23T21:00:11Z" + content=""" +Any chance I could bribe you to setup Rackspace Cloud Files support? We are using them and would hate to have a S3 bucket only for this. + +https://github.com/rackspace/python-cloudfiles +"""]] From bb949ed2d278baaa4cc8542ff7c477dc535aeb3e Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo" Date: Thu, 23 Aug 2012 21:25:48 +0000 Subject: [PATCH 09/22] Added a comment --- ...comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment diff --git a/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment b/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment new file mode 100644 index 0000000000..119aee2c91 --- /dev/null +++ b/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo" + nickname="Justin" + subject="comment 1" + date="2012-08-23T21:25:48Z" + content=""" +Do encrypted rsync remotes resume quickly as well? + +One thing I noticed was that if a copy --to an encrypted rsync remote gets interrupted it will remove the tmp file and re-encrypt the whole file before resuming rsync. +"""]] From d25f407e6767c8ce9214fcc7c503178cfa3fa9f5 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawla3gLc6_rHuggFfy7o7eGMPvPztFZTrUQ" Date: Fri, 24 Aug 2012 08:44:56 +0000 Subject: [PATCH 10/22] --- .../fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn b/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn index e15529c645..883c53d36f 100644 --- a/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn +++ b/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn @@ -22,3 +22,9 @@ The original file also has sha512 ead9db1f34739014a216239d9624bce74d92fe723de065 >> And what sha512 does the file in .git/annex/bad have **now**? (fsck >> preserves the original filename; this says nothing about what the >> current checksum is, if the file has been corrupted). --[[Joey]] + +The same, as it's the file I was trying to inject: + +ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d .git/annex/bad/SHA512E-s94402560--ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d.Moon.avi + +That's what puzzles me, it is the same file, but for some weird reason git annex thinks it's not. From b985e0b7ec0e5ba23e0d36c4bcab0d1760e25676 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 24 Aug 2012 12:16:17 -0400 Subject: [PATCH 11/22] Bugfix: Fix fsck in SHA*E backends, when the key contains composite extensions, as added in 3.20120721. --- Backend/SHA.hs | 2 +- debian/changelog | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/Backend/SHA.hs b/Backend/SHA.hs index cf61139e00..3ac1463ad0 100644 --- a/Backend/SHA.hs +++ b/Backend/SHA.hs @@ -125,5 +125,5 @@ checkKeyChecksum size key file = do _ -> return True where check s - | s == dropExtension (keyName key) = True + | s == dropExtensions (keyName key) = True | otherwise = False diff --git a/debian/changelog b/debian/changelog index d81d1661d1..2a3ecc6044 100644 --- a/debian/changelog +++ b/debian/changelog @@ -2,6 +2,8 @@ git-annex (3.20120808) UNRELEASED; urgency=low * S3: Add fileprefix setting. * Pass --use-agent to gpg when in no tty mode. Thanks, Eskild Hustvedt. + * Bugfix: Fix fsck in SHA*E backends, when the key contains composite + extensions, as added in 3.20120721. -- Joey Hess Thu, 09 Aug 2012 13:51:47 -0400 From 748bd1e1e1479c1d3efa566370d35be349f7652e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 24 Aug 2012 12:18:51 -0400 Subject: [PATCH 12/22] bug fixed --- .../fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn b/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn index 883c53d36f..e9051f9f34 100644 --- a/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn +++ b/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn @@ -28,3 +28,8 @@ The same, as it's the file I was trying to inject: ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d .git/annex/bad/SHA512E-s94402560--ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d.Moon.avi That's what puzzles me, it is the same file, but for some weird reason git annex thinks it's not. + +> Ok, reproduced and fixed the bug. The "E" backends recently got support +> for 2 levels of filename extensions, but were not made to drop them both +> when fscking. [[done]] (I'll release a fixed version probably tomorrow; +> fix is in git now.) --[[Joey]] From 47875e9b94ed84d4f6ac358e929d7e541c8fcb38 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 24 Aug 2012 13:13:17 -0400 Subject: [PATCH 13/22] update --- doc/design/assistant/syncing.mdwn | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 071ea2730a..0225a5d395 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -3,12 +3,11 @@ all the other git clones, at both the git level and the key/value level. ## immediate action items -* Fix MountWatcher to notice umounts and remounts of drives. -* A remote may lose content it had before, so when requeuing - a failed download, check the location log to see if the remote still has - the content, and if not, queue a download from elsewhere. (And, a remote - may get content we were uploading from elsewhere, so check the location - log when queuing a failed Upload too.) +* Run transfer scan on startup. +* The syncing code currently doesn't run for special remotes. While + transfering the git info about special remotes could be a complication, + if we assume that's synced between existing git remotes, it should be + possible for them to do file transfers to/from special remotes. * Ensure that when a remote receives content, and updates its location log, it syncs that update back out. Prerequisite for: * After git sync, identify new content that we don't have that is now available @@ -195,3 +194,9 @@ redone to check it. order to retry them later, without the scan. I'm thinking about a directory like `.git/annex/transfer/failed/{upload,download}/uuid/`, which failed transfer log files could be moved to. +* A remote may lose content it had before, so when requeuing + a failed download, check the location log to see if the remote still has + the content, and if not, queue a download from elsewhere. (And, a remote + may get content we were uploading from elsewhere, so check the location + log when queuing a failed Upload too.) **done** +* Fix MountWatcher to notice umounts and remounts of drives. **done** From c387f98bc99c6276b29fba07a1ce6746e294259f Mon Sep 17 00:00:00 2001 From: "https://me.yahoo.com/speredenn#aaf38" Date: Fri, 24 Aug 2012 18:22:07 +0000 Subject: [PATCH 14/22] Added PATH variable part --- doc/install/OSX.mdwn | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/install/OSX.mdwn b/doc/install/OSX.mdwn index 3c24609684..261959c7b1 100644 --- a/doc/install/OSX.mdwn +++ b/doc/install/OSX.mdwn @@ -11,6 +11,11 @@ sudo cabal update cabal install git-annex --bindir=$HOME/bin +Do not forget to add to your PATH variable your ~/bin folder. In your .bashrc, for example: +
+PATH=~/bin:/usr/bin/local:$PATH
+
+ See also: * [[forum/OSX__39__s_haskell-platform_statically_links_things]] From dda61b1b80c42ab91a846537c0cab93a497562a8 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 24 Aug 2012 17:40:38 -0400 Subject: [PATCH 15/22] blog for the day --- .../blog/day_64__syncing_robustly.mdwn | 33 +++++++++++++++++++ doc/design/assistant/syncing.mdwn | 8 ++++- 2 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 doc/design/assistant/blog/day_64__syncing_robustly.mdwn diff --git a/doc/design/assistant/blog/day_64__syncing_robustly.mdwn b/doc/design/assistant/blog/day_64__syncing_robustly.mdwn new file mode 100644 index 0000000000..ab0090b92d --- /dev/null +++ b/doc/design/assistant/blog/day_64__syncing_robustly.mdwn @@ -0,0 +1,33 @@ +Working toward getting the data syncing to happen robustly, +so a bunch of improvements. + +* Got unmount events to be noticed, so unplugging and replugging + a removable drive will resume the syncing to it. There's really no + good unmount event available on dbus in kde, so it uses a heuristic + there. +* Avoid requeuing a download from a remote that no longer has a key. +* Run a full scan on startup, for multiple reasons, including dealing with + crashes. + +Ran into a strange issue: Occasionally the assistant will run `git-annex +copy` and it will not transfer the requested file. It seems that +when the copy command runs `git ls-files`, it does not see the file +it's supposed to act on in its output. + +Eventually I figured out what's going on: When updating the git-annex +branch, it sets `GIT_INDEX_FILE`, and of course environment settings are +not thread-safe! So there's a race between threads that access +the git-annex branch, and the Transferrer thread, or any other thread +that might expect to look at the normal git index. + +Unfortunatly, I don't have a fix for this yet.. Git's only interface for +using a different index file is `GIT_INDEX_FILE`. It seems I have a lot of +code to tear apart, to push back the setenv until after forking every git +command. :( + +Before I figured out the root problem, I developed a workaround for the +symptom I was seeing. I added a `git-annex transferkey`, which is +optimised to be run by the assistant, and avoids running `git ls-files`, so +avoids the problem. While I plan to fix this environment variable problem +properly, `transferkey` turns out to be so much faster than how it was +using `copy` that I'm going to keep it. diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 0225a5d395..c3bf3823b5 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -3,11 +3,16 @@ all the other git clones, at both the git level and the key/value level. ## immediate action items -* Run transfer scan on startup. * The syncing code currently doesn't run for special remotes. While transfering the git info about special remotes could be a complication, if we assume that's synced between existing git remotes, it should be possible for them to do file transfers to/from special remotes. +* Often several remotes will be queued for full TransferScanner scans, + and the scan does the same thing for each .. so it would be better to + combine them into one scan in such a case. +* Sometimes a Download gets queued from a slow remote, and then a fast + remote becomes available, and a Download is queued from it. Would be + good to sort the transfer queue to run fast Downloads (and Uploads) first. * Ensure that when a remote receives content, and updates its location log, it syncs that update back out. Prerequisite for: * After git sync, identify new content that we don't have that is now available @@ -200,3 +205,4 @@ redone to check it. may get content we were uploading from elsewhere, so check the location log when queuing a failed Upload too.) **done** * Fix MountWatcher to notice umounts and remounts of drives. **done** +* Run transfer scan on startup. **done** From 83dd81f4af9120b2096514ea1d756b960d536496 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawmU_2tE75oyG0h2ZPN4lcroIKEMC8G-otE" Date: Sat, 25 Aug 2012 08:22:21 +0000 Subject: [PATCH 16/22] --- doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn | 1 + 1 file changed, 1 insertion(+) create mode 100644 doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn diff --git a/doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn b/doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn new file mode 100644 index 0000000000..71085c8d26 --- /dev/null +++ b/doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn @@ -0,0 +1 @@ +I tried to compile the assitant branch on Ubuntu 12.04. But i depends on the DBus libraryw hich does not compile with some glibberish errors. Is there a way to solve this? From ab06112b39f1ac7ff7510aaca87e25bef4b113c7 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawmU_2tE75oyG0h2ZPN4lcroIKEMC8G-otE" Date: Sat, 25 Aug 2012 08:22:36 +0000 Subject: [PATCH 17/22] --- doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn b/doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn index 71085c8d26..0bf4e45c17 100644 --- a/doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn +++ b/doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn @@ -1 +1,5 @@ I tried to compile the assitant branch on Ubuntu 12.04. But i depends on the DBus libraryw hich does not compile with some glibberish errors. Is there a way to solve this? + + + +I tried to compile the assitant branch on Ubuntu 12.04. But i depends on the DBus libraryw hich does not compile with some glibberish errors. Is there a way to solve this? From 50f84b438323e5d1ba34c7d97edcce9369318461 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawmU_2tE75oyG0h2ZPN4lcroIKEMC8G-otE" Date: Sat, 25 Aug 2012 08:23:01 +0000 Subject: [PATCH 18/22] --- doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn | 2 -- 1 file changed, 2 deletions(-) diff --git a/doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn b/doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn index 0bf4e45c17..b4271d172c 100644 --- a/doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn +++ b/doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn @@ -1,5 +1,3 @@ I tried to compile the assitant branch on Ubuntu 12.04. But i depends on the DBus libraryw hich does not compile with some glibberish errors. Is there a way to solve this? - -I tried to compile the assitant branch on Ubuntu 12.04. But i depends on the DBus libraryw hich does not compile with some glibberish errors. Is there a way to solve this? From 11f95de7e43f404ac6582ea9959268ccd8080d32 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sat, 25 Aug 2012 13:06:31 +0000 Subject: [PATCH 19/22] Added a comment --- ..._dc14a40b64b7eda94d1a3fd766cd39cc._comment | 28 +++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 doc/forum/DBus_on_Ubuntu_12.04__63__/comment_1_dc14a40b64b7eda94d1a3fd766cd39cc._comment diff --git a/doc/forum/DBus_on_Ubuntu_12.04__63__/comment_1_dc14a40b64b7eda94d1a3fd766cd39cc._comment b/doc/forum/DBus_on_Ubuntu_12.04__63__/comment_1_dc14a40b64b7eda94d1a3fd766cd39cc._comment new file mode 100644 index 0000000000..0ef2469d3c --- /dev/null +++ b/doc/forum/DBus_on_Ubuntu_12.04__63__/comment_1_dc14a40b64b7eda94d1a3fd766cd39cc._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="4.152.246.119" + subject="comment 1" + date="2012-08-25T13:06:31Z" + content=""" +Hmm, let's see... + +If the gibberish error is ouyay orgotfay otay otay elltay emay utwhay ethay +roreay asway, then we can figure it out, surely.. + +If the gibberish error looks something like Ḩ̶̞̗̓ͯ̅͒ͪͫe̢ͦ̊ͭͭͤͣ̂͏̢̳̦͔̬ͅ ̣̘̹̄̕͢Ç̛͈͔̹̮̗͈͓̞ͨ͂͑ͅo̿ͥͮ̿͢͏̧̹̗̪͇̫m̷̢̞̙͑̊̔ͧ̍ͩ̇̚ę̜͑̀͝s̖̱̝̩̞̻͐͂̐́̂̇̆͂ + +.. your use of cabal +has accidentually summoned Cthulu! Back slowly away from the monitor! + +Otherwise, you might try installing the `libdbus-1-dev` package with apt, +which might make cabal install the haskell dbus bindings successfully. Or +you could just install the `libghc-dbus-dev` package, which contains the +necessary haskell library pre-built. But I don't know if it's in Ubuntu +12.04; it only seems to be available in quantal + + +Or you could even build it with the Makefile, rather than using cabal. +The Makefile has a `-DWITH_DBUS` setting in it that can be removed to build +the fallback mode that doesn't use dbus. + +"""]] From 8473cdb07d4ec374b95625fa49e677b1868e570f Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sat, 25 Aug 2012 13:11:37 +0000 Subject: [PATCH 20/22] Added a comment --- ...comment_2_608a30e274e6a691a39f69503720e320._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/forum/DBus_on_Ubuntu_12.04__63__/comment_2_608a30e274e6a691a39f69503720e320._comment diff --git a/doc/forum/DBus_on_Ubuntu_12.04__63__/comment_2_608a30e274e6a691a39f69503720e320._comment b/doc/forum/DBus_on_Ubuntu_12.04__63__/comment_2_608a30e274e6a691a39f69503720e320._comment new file mode 100644 index 0000000000..02bda9eaa6 --- /dev/null +++ b/doc/forum/DBus_on_Ubuntu_12.04__63__/comment_2_608a30e274e6a691a39f69503720e320._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="4.152.246.119" + subject="comment 2" + date="2012-08-25T13:11:37Z" + content=""" +I fnordgot to mention, cabal can be configured to not build with dbus too. The relevant incantation is: + +cabal install git-annex --flags=\"-Dbus\" +"""]] From 0ef70280777f29a363d4d3ec64729d3b00121163 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sat, 25 Aug 2012 10:27:59 -0400 Subject: [PATCH 21/22] releasing version 3.20120825 --- debian/changelog | 4 ++-- git-annex.cabal | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/debian/changelog b/debian/changelog index 2a3ecc6044..f91887b0d3 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,11 +1,11 @@ -git-annex (3.20120808) UNRELEASED; urgency=low +git-annex (3.20120825) unstable; urgency=low * S3: Add fileprefix setting. * Pass --use-agent to gpg when in no tty mode. Thanks, Eskild Hustvedt. * Bugfix: Fix fsck in SHA*E backends, when the key contains composite extensions, as added in 3.20120721. - -- Joey Hess Thu, 09 Aug 2012 13:51:47 -0400 + -- Joey Hess Sat, 25 Aug 2012 10:00:10 -0400 git-annex (3.20120807) unstable; urgency=low diff --git a/git-annex.cabal b/git-annex.cabal index 8ab775939b..ab3dd4fae5 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -1,5 +1,5 @@ Name: git-annex -Version: 3.20120807 +Version: 3.20120825 Cabal-Version: >= 1.8 License: GPL Maintainer: Joey Hess From d228e4ca8c5b9ed88fe6b30ada12e822f847f58d Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sat, 25 Aug 2012 10:28:24 -0400 Subject: [PATCH 22/22] add news item for git-annex 3.20120825 --- doc/news/version_3.20120825.mdwn | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 doc/news/version_3.20120825.mdwn diff --git a/doc/news/version_3.20120825.mdwn b/doc/news/version_3.20120825.mdwn new file mode 100644 index 0000000000..8cc45e1b14 --- /dev/null +++ b/doc/news/version_3.20120825.mdwn @@ -0,0 +1,6 @@ +git-annex 3.20120825 released with [[!toggle text="these changes"]] +[[!toggleable text=""" + * S3: Add fileprefix setting. + * Pass --use-agent to gpg when in no tty mode. Thanks, Eskild Hustvedt. + * Bugfix: Fix fsck in SHA*E backends, when the key contains composite + extensions, as added in 3.20120721."""]] \ No newline at end of file