Commit graph

259 commits

Author SHA1 Message Date
Joey Hess
f5f059e288
relocate gpg test framework temp dir to outside repo
The gitAnnexTmpOtherDir cleanup made it be deleted too early sometimes,
and so the test suite failed. Also there was a report of a similar
failure which likely had a similar cause and hopwfully this fixes that
too.
2019-01-21 14:16:00 -04:00
Joey Hess
d5f2463702
misctmp cleanup
* Switch to using .git/annex/othertmp for tmp files other than partial
  downloads, and make stale files left in that directory when git-annex
  is interrupted be cleaned up promptly by subsequent git-annex processes.
* The .git/annex/misctmp directory is no longer used and git-annex will
  delete anything lingering in there after it's 1 week old.

Also, in Annex.Ingest, made the filename it uses in the tmp dir be
prefixed with "ingest-" to avoid potentially using a filename used by
some other code.
2019-01-17 16:02:22 -04:00
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
727767e1e2
make everything build again after ByteString Key changes 2019-01-11 16:39:46 -04:00
Joey Hess
591e4b145f
convert old uuid-based log parsers to attoparsec
This preserves the workaround for the old bug that caused NoUUID items
to be stored in the log, prefixing log lines with " ". It's now handled
implicitly, by using takeWhile1 (/= ' ') to get the uuid.

There is a behavior change from the old parser, which split the value
into words and then recombined it. That meant that "foo  bar" and "foo\tbar"
came out as "foo bar". That behavior was not documented, and seems
surprising; it meant that after a git-annex describe here "foo  bar",
you wouldn't get that same string back out when git-annex displayed repo
descriptions.

Otoh, some other parsers relied on the old behavior, and the attoparsec
rewrites had to deal with the issue themselves...

For group.log, there are some edge cases around the user providing a
group name with a leading or trailing space. The old parser would ignore
such excess whitespace. The new parser does too, because the alternative
is to refuse to parse something like " group1  group2 " due to excess
whitespace, which would be even more confusing behavior.

The only git-annex branch log file that is not converted to attoparsec
and bytestring-builder now is transitions.log.
2019-01-10 16:34:20 -04:00
Joey Hess
bfc9039ead
convert git-annex branch access to ByteStrings and Builders
Most of the individual logs are not converted yet, only presense logs
have an efficient ByteString Builder implemented so far. The rest
convert to and from String.
2019-01-03 13:21:48 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
f9ec330cbf
avoid a FAT fail 2018-12-19 13:45:28 -04:00
Joey Hess
5759e93444
honor init --version=5 on crippled filesystem
init: When --version=5 is passed on a crippled filesystem, use a v5 direct
mode repo as requested, rather than upgrading to v7 adjusted unlocked.

Fixed test suite on crippled filesystems, making it request --version=5
to test direct mode.
2018-12-19 13:17:04 -04:00
Joey Hess
14971414dc
Make test suite work better when the temp directory is on NFS.
Deleting directories is one of the great unsolved problems of CS, thanks to
abominations like NFS lock files and Windows and races with other processes
cleaning up after themselves in the background. The gpg test harness
sometimes failed to delete its temp directory on NFS. Avoid the problem
class by not deleting it at all, and putting it inside the tmp repo being
tested. The test suite's more robust (and/or nonsensical) workarounds for
deleting its test dir will thus be used, hopefully avoiding the problem
until an OS finds a new way to violate POSIX and the laws of nature.

Note that this means that the .gnupg directory will be on whatever
filesystem the test suite is being run on, which may be a lesser quality
filesystem than gpg is really expecting. Gpg does not seem to need to
write sockets etc to there so this seems ok. The only known problem is
that if the filesystem forces a directory mode like 777, gpg will warn
about unsafe home directory perms, but it still works.
2018-12-19 12:44:56 -04:00
Joey Hess
83109affd1
remove leftovers from removed TestSuite build flag
Test suite is always built, so this can be simplified.
2018-11-19 12:39:16 -04:00
Joey Hess
76a25fdcf0
Fix test suite failure when git-annex test is not run inside a git repository
Not the first time this kind of test suite breakage has happened..
It would be good to avoid somehow it looking up from .t and finding a git
repo. But just running the test suite from time to time outside of
git-annex would also let me notice these before the distribution packagers
do.

This commit was sponsored by mo on Patreon.
2018-11-05 13:31:49 -04:00
Joey Hess
89ac32616f
don't move keys db out from under sqlite
Instead remove enough data from it that this regression test tests what
it needs to.

Moving the database was the last thing that made the test suite
unstable, including sometimes hanging completely. It seems that, despite
closeDb being called before the move, sqlite was not quite done with it,
and somehow this causes other sqlite handles to become unstable. Not
good.

With this change, the test suite has successfully run 100+ times without
any issues.
2018-10-31 08:04:16 -04:00
Joey Hess
cc1087de42
test: display error messages from git-annex on unexpected failures
.. but not on expected failures
2018-10-30 10:49:39 -04:00
Joey Hess
595fb98473
add small delay to avoid problems on systems with low-resolution mtime
I've seen intermittent failures of the test suite with v6 for a long time,
it seems to have possibly gotten worse with the changes around v7. Or just
being unlucky; all tests failed today.

Seen on amd64 and i386 builders, repeatedly but intermittently:

	unused: FAIL (4.86s)
	Test.hs:928:
	git diff did not show changes to unlocked file

And I think other such failures, all involving v7/v6 mode tests.

I managed to reproduce the unused failure with --keep-failures,
and inside the repo, git diff was indeed not showing any changes for
the modified unlocked file.

The two stats will be the same other than mtime; the old and new files have
the same size and inode, since the test case writes to the file and then
overwrites it.

Indeed, notice the identical timestamps:

	builder@orca:~/gitbuilder/build/.t/tmprepo335$ echo 1 > foo; stat foo; echo 2 > foo; stat foo
	  File: foo
	  Size: 2         	Blocks: 8          IO Block: 4096   regular file
	Device: 801h/2049d	Inode: 3546179     Links: 1
	Access: (0644/-rw-r--r--)  Uid: ( 1000/ builder)   Gid: ( 1000/ builder)
	Access: 2018-10-29 22:14:10.894942036 +0000
	Modify: 2018-10-29 22:14:10.894942036 +0000
	Change: 2018-10-29 22:14:10.894942036 +0000
	 Birth: -
	  File: foo
	  Size: 2         	Blocks: 8          IO Block: 4096   regular file
	Device: 801h/2049d	Inode: 3546179     Links: 1
	Access: (0644/-rw-r--r--)  Uid: ( 1000/ builder)   Gid: ( 1000/ builder)
	Access: 2018-10-29 22:14:10.894942036 +0000
	Modify: 2018-10-29 22:14:10.898942036 +0000
	Change: 2018-10-29 22:14:10.898942036 +0000
	 Birth: -

I'm seeing this in Linux VMs; it doesn't happen on my laptop. I've also
not experienced the intermittent test suite failures on my laptop.

So, I hope that this small delay will avoid the problem.

Update: I didn't, indeed I then reproduced the same failure on my
laptop, so it must be due to something else. But keeping this change anyway
since not needing to worry about lowish-resolution mtime in the test suite seems
worthwhile.
2018-10-29 19:31:26 -04:00
Joey Hess
234842a347
v7
Install new git hooks in this version.

This does beg the question of what to do if git later gets eg a
post-smudge hook, that could run git-annex smudge --update. I think the
thing to do in that case would be to make git-annex smudge --update
install the new hooks. That way, as the user uses git-annex, the hook
would be created pretty quickly and without needing any extra syscalls
except for when git-annex smudge --update is called.

I considered doing something like that for installation of the
post-checkout and post-merge hooks, which would have avoided the need
for v7. But the only place it was cheap to do it would be in git-annex smudge
which could cheaply notice that smudge.log didn't exist yet and so know
the hooks needed to be installed. But since smudge used to populate pointer
files, it would be quite surprising if a single git checkout/merge failed
to update the work tree, and so that idea didn't work out.

The other reason for v7 is psychological -- users don't need to worry
about whether they might be running an old version of git-annex that
doesn't support their v7 repository very well. And bug reports about
"v6" have gotten a bit of a bad association in my head since they often
hit one of the known limitations and didn't realize it was experimental.

newtyped RepoVersion Int to avoid needing 2 comparisons in
versionSupportsUnlockedPointers etc. Also it's just nicer.

This commit was sponsored by John Pellman on Patreon.
2018-10-25 18:24:23 -04:00
Joey Hess
94aa0e2f64
fix strange test failure
It was trying to git annex adjust when in a direct mode repo, and that
of course fails. What I don't understand though, is how the test suite
managed to work before, when it was clearly checking the wrong thing.
Since the right way to fix it was obvious, I have not bisected.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-10-22 16:51:09 -04:00
Joey Hess
28720c795f
limit url downloads to whitelisted schemes
Security fix! Allowing any schemes, particularly file: and
possibly others like scp: allowed file exfiltration by anyone who had
write access to the git repository, since they could add an annexed file
using such an url, or using an url that redirected to such an url,
and wait for the victim to get it into their repository and send them a copy.

* Added annex.security.allowed-url-schemes setting, which defaults
  to only allowing http and https URLs. Note especially that file:/
  is no longer enabled by default.

* Removed annex.web-download-command, since its interface does not allow
  supporting annex.security.allowed-url-schemes across redirects.
  If you used this setting, you may want to instead use annex.web-options
  to pass options to curl.

With annex.web-download-command removed, nearly all url accesses in
git-annex are made via Utility.Url via http-client or curl. http-client
only supports http and https, so no problem there.
(Disabling one and not the other is not implemented.)

Used curl --proto to limit the allowed url schemes.

Note that this will cause git annex fsck --from web to mark files using
a disallowed url scheme as not being present in the web. That seems
acceptable; fsck --from web also does that when a web server is not available.

youtube-dl already disabled file: itself (probably for similar
reasons). The scheme check was also added to youtube-dl urls for
completeness, although that check won't catch any redirects it might
follow. But youtube-dl goes off and does its own thing with other
protocols anyway, so that's fine.

Special remotes that support other domain-specific url schemes are not
affected by this change. In the bittorrent remote, aria2c can still
download magnet: links. The download of the .torrent file is
otherwise now limited by annex.security.allowed-url-schemes.

This does not address any external special remotes that might download
an url themselves. Current thinking is all external special remotes will
need to be audited for this problem, although many of them will use
http libraries that only support http and not curl's menagarie.

The related problem of accessing private localhost and LAN urls is not
addressed by this commit.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-16 11:57:50 -04:00
Joey Hess
0b7f6d24d3
rename BlobType and add submodule to it
This was badly named, it's a not a blob necessarily, but anything that a
tree can refer to.

Also removed the Show instance which was used for serialization to git
format, instead use fmtTreeItemType.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 14:45:41 -04:00
Joey Hess
89e1a05a8f
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.

It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.

This commit was supported by the NSF-funded DataLad project.
2018-04-16 16:21:21 -04:00
Joey Hess
6063b3df3f
Dial back optimisation when building on arm
Prevent ghc and llc from running out of memory when optimising some
files.

Sean Whitton reported that doing this only in Test.hs was insufficient,
the build still OOMed by the time it got to Test.hs. He had earlier found
the build worked when these options are applied globally.

See https://ghc.haskell.org/trac/ghc/ticket/14821 for why it needs -O1;
once that's fixed it may suffice to use "GHC-Options: -O2 -optlo-O2",
although it may also be that the -O1 prevents ghc from using/leaking
as much memory.

os(arm) should match armel, armhf, armeb, and arm.
It probably also matches arm64, somewhat unfortunately since arm64
systems probably tend to have more memory. See list of arches in
https://hackage.haskell.org/package/Cabal-1.22.2.0/docs/src/Distribution-System.html

This commit was sponsored by Henrik Riomar on Patreon.
2018-03-04 19:48:07 -04:00
Joey Hess
8ccfbd14d0
Split Test.hs and avoid optimising it much, to need less memory to compile.
The ghc options were found by Sean Whitton; the debian arm autobuilders
need those to build w/o OOM, and it seems to involve llvm using too much
memory to optimize Test.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-02-18 11:48:48 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
1f5bf73af0
Revert "git-annex.cabal: Add back custom-setup stanza, so cabal new-build works."
This reverts commit 51228c2306.

No, still doesn't work when built with cabal. It did with stack; stack
must somehow make the unix package implicitly available.

With cabal, System.Posix.Process and System.Posix.Env are both missing.
2017-12-31 14:09:41 -04:00
Joey Hess
51228c2306
git-annex.cabal: Add back custom-setup stanza, so cabal new-build works.
Seems I had all the work in past commits to make this build, at least on
linux. I'm actually surprised it does, without a unix dep, Utility.Env
still builds ok somehow despite using System.Posix.Env.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-12-31 13:54:41 -04:00
Joey Hess
79857d7e9f
Removed the testsuite build flag
Test suite is always included.

Building with this flag disabled has actually been broken for some time,
since Command.TestRemote uses tasty. Fewer build flags are better, so good
time to drop it.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-12-20 12:25:03 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
3cc94c1667
.noannex file
A top-level .noannex file will prevent git-annex init from being used in a
repository. This is useful for repositories that have a policy reason not
to use git-annex. The content of the file will be displayed to the user who
tries to run git-annex init.

This also affects git annex reinit and initialization via the webapp.
It does not affect automatic inits, when there's a sibling git-annex branch
already.

This commit was supported by the NSF-funded DataLad project.
2017-12-13 14:34:32 -04:00
Joey Hess
429132f496
try again to avoid directory removal issues on NFS
af6068525a seems to not have worked;
though the keys database should not have any files open after closeDb,
NFS seems to be creating some files where while the directory is being
removed, which causes the removal to fail.

So instead, try renaming the directory out of the way.

This commit was supported by the NSF-funded DataLad project.
2017-12-05 14:25:51 -04:00
Joey Hess
1b6cbb63e9
still can't express custom-setup deps
They need unix on non-windows, for Utility.Env, which Build.Configure uses,
but cabal can't express that in a custom-setup stanza.

To avoid this problem, Utility.Env would need to be moved into
unix-compat..
2017-11-14 14:59:51 -04:00
Joey Hess
8d68112be5
split out setEnv to avoid adding dep
Windows needs the setenv package in custom-setup, but I don't want to
pull it in on unix, which would probably break some builds and need more
work. Instead, split out setEnv to a separate module.

Quite likely, unix-compat will get a portable environment layer, and
then both modules can be removed from here.

This commit was sponsored by Øyvind Andersen Holm.
2017-11-14 14:28:49 -04:00
Joey Hess
e696b086dc
avoid build warning on windows 2017-10-24 12:24:06 -04:00
Joey Hess
5c32196a37
fix process and FD leak
Fix process and file descriptor leak that was exposed when git-annex was
built with ghc 8.2.1. Apparently ghc has changed its behavior of GC
of open file handles that are pipes to running processes. That
broke git-annex test on OSX due to running out of FDs.

Audited for all uses of Annex.new and made stopCoProcesses be called
once it's done with the state. Fixed several places that might have
leaked in other situations than running the test suite.

This commit was sponsored by Ewen McNeill.
2017-09-29 22:36:08 -04:00
Joey Hess
f84e34883c
test: Fix reversion that made it only run inside a git repository.
Using annexeval to run probeCrippledFileSystem' caused Git.CurrentRepo.get
to be run. Fixed easily since probeCrippledFileSystem' had no need to use
the Annex monad.

This commit was sponsored by Ethan Aubin.
2017-09-29 15:08:18 -04:00
Joey Hess
db2a06b66f
init: Display an additional message when it detects a filesystem that allows writing to files whose write bit is not set. 2017-08-28 13:21:18 -04:00
Joey Hess
d39c120afa
add annex-ignore-command and annex-sync-command configs
Added remote configuration settings annex-ignore-command and
annex-sync-command, which are dynamic equivilants of the annex-ignore
and annex-sync configurations.

For this I needed a new DynamicConfig infrastructure. Its implementation
should be as fast as before when there is no dynamic config, and it caches
so shell commands are only run once.

Note that annex-ignore-command exits nonzero when the remote should be ignored.
While that may seem backwards, it allows using the same command for it as
for annex-sync-command when you want to disable both.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-08-17 13:54:14 -04:00
Joey Hess
8526cd7c92
test: Avoid most situations involving failure to delete test directories
By forking a worker process and only deleting the test directory once it exits.

This way, if a test leaves files open, they'll get closed when the worker
exits, so avoiding failure to delete open files on Windows, and failure to
delete directories due to NFS lock files.

If a test leaves a git worker process running, the closed pipes should
cause the worker to exit too, also avoiding the problem there. The 10
second sleep ought to give plenty of time for such worker processes to
exit, although this is of course a race.

Finally, even if test directory fails to be deleted still,
it won't appear as if the last test in the test suite failed; the error
will be displayed at the very end.

This commit was supported by the NSF-funded DataLad project.
2017-08-14 16:29:47 -04:00
Joey Hess
af6068525a
Fix a git-annex test failure when run on NFS due to NFS lock files preventing directory removal.
Should fix this:

    lock (v6 --force):                                    FAIL
      Exception: .git/annex/keys: removeDirectoryRecursive: unsatisfied constraints (Directory not empty)

Verified that the test case still catches the regression it's meant to.

This commit was supported by the NSF-funded DataLad project.
2017-08-14 15:11:42 -04:00
Joey Hess
2cecc8d2a3
Added GIT_ANNEX_VECTOR_CLOCK environment variable
Can be used to override the default timestamps used in log files in the
git-annex branch. This is a dangerous environment variable; use with
caution.

Note that this only affects writing to the logs on the git-annex branch.
It is not used for metadata in git commits (other env vars can be set for
that).

There are many other places where timestamps are still used, that don't
get committed to git, but do touch disk. Including regular timestamps
of files, and timestamps embedded in some files in .git/annex/, including
the last fsck timestamp and timestamps in transfer log files.

A good way to find such things in git-annex is to get for getPOSIXTime and
getCurrentTime, although some of the results are of course false positives
that never hit disk (unless git-annex gets swapped out..)

So this commit does NOT necessarily make git-annex comply with some HIPPA
privacy regulations; it's up to the user to determine if they can use it in
a way compliant with such regulations.

Benchmarking: It takes 0.00114 milliseconds to call getEnv
"GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log
files can be written with an added overhead of only 0.114 seconds. That
should be by far swamped by the actual overhead of writing the log files
and making the commit containing them.

This commit was supported by the NSF-funded DataLad project.
2017-08-14 14:19:58 -04:00
Joey Hess
da8e84efe9
fix failing quickcheck properties
QuickCheck 2.10 found a counterexample eg "\929184" broke the property.

As far as I can tell, Git.Filename is matching how git handles encoding
of strange high unicode characters in filenames for display. Git does
not display high unicode characters, and instead displays the C-style
escaped form of each byte. This is ambiguous, but since git is not
unicode aware, it doesn't need to roundtrip parse it.

So, making Git.FileName's roundtrip test only chars < 256 seems fine.

Utility.Format.format uses encode_c, in order to mimic git, so that's
ok.

Utility.Format.gen uses decode_c, but only so that stuff like "\n"
in the format string is handled. If the format string contains C-style
octal escapes, they will be converted to ascii characters, and not
combined into unicode characters, but that should not be a problem.
If the user wants unicode characters, they can include them in the
format string, without escaping them.

Finally, decode_c is used by Utility.Gpg.secretKeys, because gpg
--with-colons hex-escapes some characters in particular ':' and '\\'.
gpg passes unicode through, so this use of decode_c is not a problem.

This commit was sponsored by Henrik Riomar on Patreon.
2017-06-17 16:48:00 -04:00
Joey Hess
61b7af97ec
fix test suite breakage caused by GIT_ANNEX_USE_GIT_SSH 2017-04-07 16:56:04 -04:00
Joey Hess
df2639218a
improve git-annex-shell exit status propigation 2017-03-20 12:45:10 -04:00
Joey Hess
1484422e19
test move with ssh remote 2017-03-17 19:18:45 -04:00
Joey Hess
002513e194
test suite infra for testing mocked ssh remotes
This commit was supported by the NSF-funded DataLad project.
2017-03-17 19:14:41 -04:00
Joey Hess
9c4650358c
add KeyVariety type
Where before the "name" of a key and a backend was a string, this makes
it a concrete data type.

This is groundwork for allowing some varieties of keys to be disabled
in file2key, so git-annex won't use them at all.

Benchmarks ran in my big repo:

old git-annex info:

real	0m3.338s
user	0m3.124s
sys	0m0.244s

new git-annex info:

real	0m3.216s
user	0m3.024s
sys	0m0.220s

new git-annex find:

real	0m7.138s
user	0m6.924s
sys	0m0.252s

old git-annex find:

real	0m7.433s
user	0m7.240s
sys	0m0.232s

Surprising result; I'd have expected it to be slower since it now parses
all the key varieties. But, the parser is very simple and perhaps
sharing KeyVarieties uses less memory or something like that.

This commit was supported by the NSF-funded DataLad project.
2017-02-24 15:16:56 -04:00
Joey Hess
ca0daa8bb8
factor non-type stuff out of Key 2017-02-24 13:42:30 -04:00
Edward Betts
0750913136
correct spelling mistakes 2017-02-12 17:30:23 -04:00
Joey Hess
e7e36b6e72
import: Changed how --deduplicate, --skip-duplicates, and --clean-duplicates determine if a file is a duplicate
Before, only content known to be present somewhere was considered a
duplicate. Now, any content that has been annexed before will be considered
a duplicate, even if all annexed copies of the data have been lost.

Note that --clean-duplicates and --deduplicate still check numcopies,
so won't delete duplicate files unless there's an annexed copy.

This makes import use the same method as reinject --known.

The man page already said that duplicate meant "its content is either
present in the local repository already, or git-annex knows of another
repository that contains it, or it was present in the annex before but has
been removed now". So, this is really only bringing the implementation into
line with the man page.

This commit was sponsored by Jochen Bartl on Patreon.
2017-02-07 17:41:58 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
ee309d6941
lock: Fix edge cases where data loss could occur in v6 mode.
In the case where the pointer file is in place, and not the content
of the object, lock's  performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020

The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.

Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.

Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.

Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.

In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
2016-10-17 13:58:43 -04:00