Before, if a direct mode repo had one or more associated files that
were modifed, moving the object into it would overwrite the associated
files with the pristine object.
Now, modified associated files are left unchanged. To ensure that,
when an object is moved into a direct mode repo, it's not thrown away,
it gets stored in indirect mode.
This fixes a bug with git annex add in direct mode. If some files already
existed in the tree pointing at the same key as a file that was just added,
and their content was not present, add neglected to copy the content to
those files.
I also changed the behavior of moveAnnex slightly: When content is moved
into the annex in direct mode, it does not overwrite any content already
present in direct mode files. That content may be modified after all.
On Windows with Cygwin, checking out a git-annex repo will create symlinks
on disk, so we need to always try to read the symlink, even when
core.symlinks says they're not supported.
Seems that Windows doesn't allow deleting a file that the same process has open.
Here the inode cache file was read and a the value from it gets used later.
But due to laziness, the old file is still open when it gets deleted. Adding
strictness avoids this problem. Of course, the file is small, so it's no
problem to read it all strictly, so this is probably an improvement even
outside of Windows.
That's needed in files used to build the configure program.
For the other files, I'm keeping my __WINDOWS__ define, as I find that much easier to type.
I may search and replace it to use the mingw32_HOST_OS thing later.
It's possible for files in indirect mode to have a direct mode mapping
file. Probably from when they were in direct mode. In this case,
toDirectGen tried to copy the content from the direct mode file that the
mapping said had it. But, being in indirect mode, it didn't really have the
content. So it did nothing. This fix makes it always move the content from
.git/annex/objects/ when it's there.
I don't know why, but I can't seem to set the environment variables inside
git-annex to work around the git error caused by android's crappy username
and hostname settings. This workaround works, and that's all that's good
about it.
Git fails on Android, because it gets some weird domain for local host like
"localhost.(none)". This works around that. I made it always set EMAIL when
GECOS workaround was needed (unless EMAIL is already set). It might be
nicer to try to get the hostname.domain as git does, and only set it if
that fails. But I don't want to be stuck trying to exactly duplicate
whatever git is doing.
A content directory can be frozen in direct mode. One way this can happen
is if the content is transferred before direct mode has a mapping for it,
so it's stored in the content directory.
So, we need to thaw the content directory before doing things with it.
The root of the problem is that toInodeCache sees a non-symlink, and so
goes on and generates a new inode cache for the dummy symlink.
Any place that toInodeCache, or sameFileStatus, or genInodeCache are called
may need to deal with this case. Although many of them are ok. For example,
prepSendAnnex calls sameInodeCache, which calls genInodeCache.. but if
the file content is not present, the InodeCache generated for its standin
file is appropriately not the same, and so it returns Nothing.
I've audited some, but have to say I'm not happy with this; it should be
handled at the type level somehow, or a toInodeCache wrapper be used that
is aware of dummy symlinks.
(The Watcher already dealt with it, via the guardSymlinkStandin function.)
Introduced a new per-remote option 'annex-rsync-transport' to specify
the remote shell that it to be used with rsync. In case the value is
'ssh', connections are cached unless 'sshcaching' is unset.
This looks at the string one char at a time, which is hardly efficient..
but more than good enough for expanding variables in
relatively short command lines.
Fixed by storing a list of cached inodes for a key, instead of just one.
Backwards compatability note: An old git-annex version will fail to parse
an inode cache file that has been written by a new version, and has
multiple items. It will succees if just one. So old git-annexes will have
even worse behavior when there are duplicated files, if that is possible.
I don't think it will be a problem. (Famous last words.)
Also, note that it doesn't expire old and unused inode caches for a key.
It would be possible to add this if needed; just look through the
associated files for a key and if there are more cached inodes, throw out
any not corresponding to associated files. Unless a file is being copied
repeatedly and the old copy deleted, this lack of expiry should not be a
problem.
* since this is a crippled filesystem anyway, git-annex doesn't use
symlinks on it
* so there's no reason to use the mixed case hash directories that we're
stuck using to avoid breaking everyone's symlinks to the content
* so we can do what is already done for all bare repos, and make non-bare
repos on crippled filesystems use the all-lower case hash directories
* which are, happily, all 3 letters long, so they cannot conflict with
mixed case hash directories
* so I was able to 100% fix this and even resuming `git annex add` in the
test case will recover and it will all just work.
This avoids commit churn by the assistant when eg,
replacing a file with a symlink.
But, just as importantly, it prevents the working tree being left with a
deleted file if git-annex, or perhaps the whole system, crashes at the
wrong time.
(It also probably avoids confusing displays in file managers.)
I would have sort of liked to put this in .gitattributes, but it seems
it does not support multi-word attribute values. Also, making this a single
config setting makes it easy to only parse the expression once.
A natural next step would be to make the assistant `git add` files that
are not annex.largefiles. OTOH, I don't think `git annex add` should
`git add` such files, because git-annex command line tools are
not in the business of wrapping git command line tools.
There are two types of equality here, and which one is right varies,
so this forces me to consider and choose between them.
Based on this, I learned that the commit in git anex sync was
always doing a strong comparison, even when in a repository where
the inodes had changed. Fixed that.
This fixes the issue mentioned in the last commit.
Turns out just collecting UUID of clients behind a XMPP remote is
insufficient (although I should probably still do it for other reasons),
because a single remote repo might be connected via both XMPP and local
pairing. So a way is needed to know when a push was received from any
client using a given XMPP remote over XMPP, as opposed to via ssh.
Make manualPull send push requests over XMPP.
When reconnecting with remotes, those that are XMPP remotes cannot
immediately be pulled from and scanned, so instead maintain a set of
(probably) desynced remotes, and put XMPP remotes on it. (This set could be
used in other ways later, if we can detect we're out of sync with other
types of remotes.)
The merger handles detecting when a XMPP push is received from a desynced
remote, and triggers a scan then, if they have in fact diverged.
This has one known bug: A single XMPP remote can have multiple clients
behind it. When this happens, only the UUID of one client is recorded
as the UUID of the XMPP remote. Pushes from the other XMPP clients will not
trigger a scan. If the client whose UUID is expected responds to the push
request, it'll work, but when that client is offline, we're SOL.
Pass subcommand as a regular param, which allows passing git parameters
like -c before it. This was already done in the pipeing set of functions,
but not the command running set.
The only thing lost is ./ghci
Speed: make fast used to take 20 seconds here, when rebuilding from
touching Command/Unused.hs. With cabal, it's 29 seconds.
This resulted in a lot of user complains that git annex init had git
telling them they needed to run git config --global user.email .. which
didn't work because even HOME was not passed into git.
Refactored annex link code into nice clean new library.
Audited and dealt with calls to createSymbolicLink.
Remaining calls are all safe, because:
Annex/Link.hs: ( liftIO $ createSymbolicLink linktarget file
only when core.symlinks=true
Assistant/WebApp/Configurators/Local.hs: createSymbolicLink link link
test if symlinks can be made
Command/Fix.hs: liftIO $ createSymbolicLink link file
command only works in indirect mode
Command/FromKey.hs: liftIO $ createSymbolicLink link file
command only works in indirect mode
Command/Indirect.hs: liftIO $ createSymbolicLink l f
refuses to run if core.symlinks=false
Init.hs: createSymbolicLink f f2
test if symlinks can be made
Remote/Directory.hs: go [file] = catchBoolIO $ createSymbolicLink file f >> return True
fast key linking; catches failure to make symlink and falls back to copy
Remote/Git.hs: liftIO $ catchBoolIO $ createSymbolicLink loc file >> return True
ditto
Upgrade/V1.hs: liftIO $ createSymbolicLink link f
v1 repos could not be on a filesystem w/o symlinks
Audited and dealt with calls to readSymbolicLink.
Remaining calls are all safe, because:
Annex/Link.hs: ( liftIO $ catchMaybeIO $ readSymbolicLink file
only when core.symlinks=true
Assistant/Threads/Watcher.hs: ifM ((==) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file))
code that fixes real symlinks when inotify sees them
It's ok to not fix psdueo-symlinks.
Assistant/Threads/Watcher.hs: mlink <- liftIO (catchMaybeIO $ readSymbolicLink file)
ditto
Command/Fix.hs: stopUnless ((/=) (Just link) <$> liftIO (catchMaybeIO $ readSymbolicLink file)) $ do
command only works in indirect mode
Upgrade/V1.hs: getsymlink = takeFileName <$> readSymbolicLink file
v1 repos could not be on a filesystem w/o symlinks
Audited and dealt with calls to isSymbolicLink.
(Typically used with getSymbolicLinkStatus, but that is just used because
getFileStatus is not as robust; it also works on pseudolinks.)
Remaining calls are all safe, because:
Assistant/Threads/SanityChecker.hs: | isSymbolicLink s -> addsymlink file ms
only handles staging of symlinks that were somehow not staged
(might need to be updated to support pseudolinks, but this is
only a belt-and-suspenders check anyway, and I've never seen the code run)
Command/Add.hs: if isSymbolicLink s || not (isRegularFile s)
avoids adding symlinks to the annex, so not relevant
Command/Indirect.hs: | isSymbolicLink s -> void $ flip whenAnnexed f $
only allowed on systems that support symlinks
Command/Indirect.hs: whenM (liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f) $ do
ditto
Seek.hs:notSymlink f = liftIO $ not . isSymbolicLink <$> getSymbolicLinkStatus f
used to find unlocked files, only relevant in indirect mode
Utility/FSEvents.hs: | Files.isSymbolicLink s = runhook addSymlinkHook $ Just s
Utility/FSEvents.hs: | Files.isSymbolicLink s ->
Utility/INotify.hs: | Files.isSymbolicLink s ->
Utility/INotify.hs: checkfiletype Files.isSymbolicLink addSymlinkHook f
Utility/Kqueue.hs: | Files.isSymbolicLink s = callhook addSymlinkHook (Just s) change
all above are lower-level, not relevant
Audited and dealt with calls to isSymLink.
Remaining calls are all safe, because:
Annex/Direct.hs: | isSymLink (getmode item) =
This is looking at git diff-tree objects, not files on disk
Command/Unused.hs: | isSymLink (LsTree.mode l) = do
This is looking at git ls-tree, not file on disk
Utility/FileMode.hs:isSymLink :: FileMode -> Bool
Utility/FileMode.hs:isSymLink = checkMode symbolicLinkMode
low-level
Done!!
Now getKeysPresent checks that the key's content, not only its directory,
exists. In direct mode, the inode cache file is used as a standin for the
content.
removeAnnex always removes the inode cache file, and drop and move --from
always call removeAnnex, even if the object does not seem to be inAnnex,
to ensure it's always deleted.
This reverts commit 57780cb3a4.
This was buggy, it caused the direct mode cache to be lost when dropping
keys, so when the file is gotten back, it's stored in indirect mode.
Note to self: Do not attempt bug fixes at 6 am!
In indirect mode, now checks the inode cache to detect changes to a file.
Note that a file can still be changed if a process has it open for write,
after landing in the annex.
In direct mode, some checking of the inode cache was done before, but
from a much later point, so fewer modifications could be detected. Now it's
as good as indirect mode.
On crippled filesystems, no lock down is done before starting to add a
file, so checking the inode cache is the only protection we have.
git annex init probes for crippled filesystems, and sets direct mode, as
well as `annex.crippledfilesystem`.
Avoid manipulating permissions of files on crippled filesystems.
That would likely cause an exception to be thrown.
Very basic support in Command.Add for cripped filesystems; avoids the lock
down entirely since doing it needs both permissions and hard links.
Will make this better soon.
Been meaning to do this for some time; Android port was last straw.
Note that newer versions of the uuid library have a Data.UUID.V4 that
generates random UUIDs slightly more cleanly, but Debian has an old version
of the library, so I do it slightly round-about.
These files were left behind, and made getKeysPresent find keys that were
not present. It would be expensive to make getKeysPresent check that the
actual key files are present (it just lists the directories). But that's not
needed if we just clean up the stale cache and mapping files.
To handle systems that were in direct mode and got switched back with stale
direct mode files, made cleanObjectLoc remove all files in the key's directory.
git annex unused will still list keys that are gone but for which the stale
direct mode files exists. To deal with that, made dropunused remove the key's
directory even if the key does not seem to be present.
The most common way for a mapping to be stale is when a file was deleted,
or renamed. Nothing updates the mappings for deletions yet.
But they can also become stale in other ways. For example a file can
be modified.
So, the mapping is not trusted to be consistent. When we get a key,
only replace symlinks that still point to that key with its content.
When we drop a key, only put back symlinks for files that still have
the direct mode content.
Avoid a crash if a mapping contains files that no longer exist.
This could happen because eg, one was deleted and a commit has not yet been
done to update the mapping.
Fix path calculation.
Sometimes it seems that git-cat-file --batch stops getting info for
files in the current repo, when ":file" is fed to it. I have not reproduced
this at the command line, but only when using git annex whereis and git
annex move inside a direct mode repo. Those failed, because cat-file
returned "file missing". OTOH, git annex find works fine, despite passing
the same file to cat-file. It seems that the failing commands first asked
cat-file to show a file on the git-annex branch. Perhaps it got "stuck" on
that branch? But I cannot repoduce it running cat-file by hand. Most
strange. HEAD is a workaround for this extreme weirdness, since I spent a
good 2 hours struggling with it already.
The expensive scan uses lookupFile, but in direct mode, that doesn't work
for files that are present. So the scan was not finding things that are
present that need to be uploaded. (It did find things not present that
needed to be downloaded.)
Now lookupFile also works in direct mode. Note that it still prefers
symlinks on disk to info committed to git, in direct mode. This is
necessary to make things like Assistant.Threads.Watcher.onAddSymlink
work correctly, when given a new symlink not yet checked into git (or
replacing a file checked into git).
Now there's a Config type, that's extracted from the git config at startup.
Note that laziness means that individual config values are only looked up
and parsed on demand, and so we get implicit memoization for all of them.
So this is not only prettier and more type safe, it optimises several
places that didn't have explicit memoization before. As well as getting rid
of the ugly explicit memoization code.
Not yet done for annex.<remote>.* configuration settings.