From 30ca7805d6f9b4856e2e09a49cf7dc6b3eeb2f47 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 18 Oct 2013 15:11:17 -0400 Subject: [PATCH] update --- doc/design/assistant/disaster_recovery.mdwn | 42 ++++++++++++++------- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/doc/design/assistant/disaster_recovery.mdwn b/doc/design/assistant/disaster_recovery.mdwn index f3f1a2a05e..f96cf520b8 100644 --- a/doc/design/assistant/disaster_recovery.mdwn +++ b/doc/design/assistant/disaster_recovery.mdwn @@ -86,15 +86,29 @@ Run `git fsck` and parse output to find bad objects, and determine from its output if they are a commit, a tree, or a blob. Check if there's a remote. If so, and if the bad objects are all -present on it, can simply get all bad objects from it, and inject them -back into .git/objects to recover. -How to best re-get bad objects from a remote? May need to re-clone from -the remote, and rsync .git/objects from the clone. +present on it, can simply get all bad objects from the remote, +and inject them back into .git/objects to recover: -Otherwise, find commits in each local branch that are broken by -all found bad objects. Some of this can be parsed from git fsck -output, but for eg bad blobs, the commits need to be walked to -walk the trees, to find trees that refer to the blobs. +1. If the local repository contains packs, the packs may be corrupt. + So, start by using `git unpack-objects` to unpack all + packs it can handle (which may include parts of corrupt packs) + back to loose objects. And delete all packs. +2. Delete all loose corrupt objects. +3. Make a new (bare) clone from the remote. Use `--reference` pointing + at the broken repository, to avoid re-downloading objects that + are present in it. (git does not seem to provide an easy way to just + fetch specific missing objects from a remote; `git fetch-pack` only + operates on refs... but this clone method should be pretty efficient) +4. Unpack any packs in the clone, so we can operate on loose objects. +5. Copy each missing object from the new clone's .git/objects to the + repository. +6. If each bad object was able to be repaired this way, we're done! + (If not, can reuse the clone for getting objects from the next remote.) + +If some missing objects cannot be recovered from remotes, find commits in each +local branch that are broken by all remaining missing objects. Some of this can +be parsed from git fsck output, but for eg blobs, the commits need to +be walked to walk the trees, to find trees that refer to the blobs. For each branch that is affected, look in the reflog and/or `git log $branch` to find the last good change that predates all broken commits. (If @@ -103,17 +117,17 @@ anything useful, but the reflog can be used to find past refs for the branch -- have to first delete the .git/HEAD file if it points to the broken ref.) -Reset the branch to the last good change. This will leave git showing any -changes made since then as staged in the index and uncommitted. Or if -the index is missing/corrupt, any files in the tree will show as modified -and uncommitted. User (or git-annex assistant) can then commit as -appropriate. +Reset the branch to the last good change. For the head branch, this will +leave git showing any changes made since then as staged in the index and +uncommitted. Or if the index is missing/corrupt, any files in the tree will +show as modified and uncommitted. User (or git-annex assistant) can then +commit as appropriate. (Special handling for git-annex branch: Commit .git/annex/index over top of the reset git-annex branch, and then run a `git annex fsck --fast` to fix up any object location info.) Also should check all remote tracking branches. If such a branch refers -to a bad object, it is sufficient to remove the tracking +to a missing object, it is sufficient to remove the tracking branch and then `git fetch` from the remote, which will re-download missing objects from it and reinstate the tracking branch.