merge in windows loststamp branch

This commit is contained in:
Joey Hess 2014-06-12 15:20:48 -04:00
commit b291951180
12 changed files with 181 additions and 12 deletions

5
debian/changelog vendored
View file

@ -11,7 +11,10 @@ git-annex (5.20140607) UNRELEASED; urgency=medium
Linux's caching of higher res timestamps while a FAT is mounted, caused
direct mode repositories on FAT to seem to have modified files after
they were unmounted and remounted.
* Deal with Windows's horrible handling of time zone changes.
* Detect when Windows has lost its mind in a timezone change, and
automatically apply a delta to the timestamps it returns, to get back to
sane values. Note that this may cause a one-time re-checksumming of all
files on Windows, on upgrade to this version.
-- Joey Hess <joeyh@debian.org> Mon, 09 Jun 2014 14:44:09 -0400

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="108.236.230.124"
subject="comment 14"
date="2014-06-11T18:59:49Z"
content="""
Opened a separate bug for [[Windows_file_timestamp_timezone_madness]].
Fixed FAT issue on Linux as discussed in my previous comment.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="108.236.230.124"
subject="comment 4"
date="2014-06-11T20:09:14Z"
content="""
What happens if you run `git-annex assistant --autostart` ?
"""]]

View file

@ -5,15 +5,27 @@ appear to change.
This means that after such a change, git-annex will see new mtimes, and
want to re-checksum every file in the repo.
The best way to fix this seems to be to normalize the timestamp returned by
getFileStatus, which is relative to the current time zone, to be relative
to UTC. (As is always the case on Unix, of course).
However, to do that, I need to know the current timezone.
[[!tag confirmed]]
Unfortunately, Data.Time.LocalTime.getCurrentTimeZone doesn't seem to really
work on windows. It always returns a time zone 60 minutes from UTS in my tests,
no matter what the zone really is. I need to test this more widely and file
a GHC bug if appropriate.
> [[fixed|done]], avoiding using getCurrentTime for now, although I have a
> patch to fix it too. --[[Joey]]
> Update: Actually, I seem to have been getting confused by behavior of
> cygwin terminal setting TZ. That indeed led to timestamp changes when the
> time zone changed. I have made git-annex unset TZ to avoid this.
>
> Without TZ set, time stamps are actually stable across time zone changes.
> Ie, a simple program to read the time stamp of a file and print it
> always shows the same thing, before and after a timezone change.
>
> However, and here's where it gets truely ghastly: A program that stats a
> file in a loop will see its timestamp change when the timezone changes.
> I suspect this might be a bug in the Haskell RTS caching something it
> should not. Stopping and re-running the program gets back to the
> original timestamp.
>
> I have not tested DST changes, but it's hard to imagine it being any
> worse than the above behavior.
>
> So, that's insane then. We can't trust timestamps to be stable on windows
> when git-annex is running for a long period of time. --[[Joey]]
>
> > [[fixed|done]], using the inode sentinal file to detect when windows
> > has lost its mind, and calculating its delta from insanity. --[[Joey]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="108.236.230.124"
subject="comment 1"
date="2014-06-11T19:09:17Z"
content="""
Rather than getting the timezone, another approach might be to look at the inode sentinal file. Its timestamp will also appear to have changed. If the delta is exactly some number of hours, and the inode sential's other data is unchanged, a Windows-specific hack could apply that same delta to all inode cache timestamps.
Except, time zones are not all actually on hour boundaries. Some are half hours, some may be 15 minutes, and next week some crazy country might legislate a 3 minute delta for all I know.
Well, could just say if the inode sentinal's mtime has changed at all (delta < 3600 seconds), and it's otherwise unchanged, and we're on windows, assume this is a time zone change. When would that fail? Only if the repository is copied to someplace, and the mtime is not preserved.
"""]]

View file

@ -0,0 +1,26 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="108.236.230.124"
subject="comment 2"
date="2014-06-11T19:51:15Z"
content="""
Note that multiple time zone changes complicate this. I think that means that the delta can't be simply applied when comparing inode caches. Instead, probably it needs to be applied when generating inode caches.
A scenario:
1. Time zone is at +1h when the inode sential is written.
2. Time zone changes to +2h
3. File F is added (with a current timestamp of T)
4. Time zone changes to +5h
I am a little confused by which way windows moves the timestamps when the time zone changes. Let's assume I might get the sign wrong.
Let F's timestamp after step 4, F4 = T+-3h.
Let the delta after step 4, D4 = +-4h
And, let the delta after step 2, D2 = +-1h
If step 3 writes the current timestamp to the inode cache, then the cache still has T in it after step 4. F4+D4 /= T (T +-3h +-4h /= T). So comparison doesn't work.
If instead the current delta is applied when generating inode caches (both for storing on disk, and for immediate comparison), then the inode cache will have T+D2 in it. Then after step 4, generating a new inode cache for F will yield F4+D4. So, does F4+D4 == T+D2? T +-3h +-4h == T +-1h YES!
"""]]

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="108.236.230.124"
subject="comment 3"
date="2014-06-11T22:11:38Z"
content="""
I have implemented this on the loststamp git branch. It seems to work!
However, it has a big problem:
If the timezone changes while the assistant (or a long-running command)
runs, it won't notice, since it only checks the inode cache once, and
so will use the old delta for all new inode caches it generates for new
files it's added. Which will result in them seeming changed the next time
it runs.
So, it would be really nice to be able to check the actual timezone instead.
But I suppose I can make the assistant poll the inode cache file, or check
it when adding a new file, or something like that. Bleagh.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="108.236.230.124"
subject="comment 4"
date="2014-06-11T22:54:48Z"
content="""
Getting the actual time zone on windows works better if you unset TZ first.
But, a haskell program that polls the time zone fails to notice when it's changed. It only notices after being restarted. I have contacted the time library maintainer about this.
"""]]

View file

@ -0,0 +1,23 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="108.236.230.124"
subject="comment 5"
date="2014-06-11T23:02:28Z"
content="""
I've developed a fix for the time library. This patch has been sent to the author, hopefully it will get applied and then I can use getCurrentTImeZone. Note that git-annex would need to unset TZ first, which might be hard on windows.
<pre>
diff --git a/cbits/HsTime.c b/cbits/HsTime.c
index cfafb27..86ca92a 100644
--- a/cbits/HsTime.c
+++ b/cbits/HsTime.c
@@ -8,6 +8,7 @@ long int get_current_timezone_seconds (time_t t,int* pdst,char const* * pname)
tzset();
struct tm* ptm = localtime_r(&t,&tmd);
#else
+ tzset();
struct tm* ptm = localtime(&t);
#endif
if (ptm)
</pre>
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawk7iPiqWr3BVPLWEDvJhSSvcOqheLEbLNo"
nickname="Dirk"
subject="comment 4"
date="2014-06-11T19:47:35Z"
content="""
For me the autobuilt tarball works. Go to this page: http://git-annex.branchable.com/install/Linux_standalone/. Right before the comments start is a list of the different autobuilt tar balls.
I assume that this fix will appear in the regular tarballs with the next release.
Dirk
"""]]

View file

@ -0,0 +1,23 @@
Spent all day on some horrible timestamp issues on legacy systems.
On FAT, timestamps have a 2s granularity, which is ok, but then Linux adds
a temporary higher resolution cache, which is lost on unmount. This
confused git-annex since the mtimes seemed to change and it had to
re-checksum half the files to get unconfused, which was not good.
I found a way to use the inode sentinal file to detect when on FAT
and put in a workaround, without degrading git-annex everywhere else.
On Windows, time zones are a utter disaster; it changes the mtime it reports
for files after the time zone has changed. Also there's a bug in the
haskell time library which makes it return old time zone data after a time
zone change. (I just finished developing a fix for that bug..)
Left with nothing but a few sticks, I rubbed them together, and
actually found a way to deal with this problem too. Scary details in
[[bugs/Windows_file_timestamp_timezone_madness]]. While I've implemented
it, it's stuck on a branch until I find a way to make git-annex notice when
the timezone changes while it's running.
----
Today's work was sponsored by Svenne Krap.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="108.236.230.124"
subject="comment 4"
date="2014-06-11T19:56:49Z"
content="""
Running git-annex will also register it on OSX. The registration just consists of making a ~/.ssh/git-annex-shell that runs the real git-annex-shell. The assistant detects when it needs to use that wrapper when setting up a repository.
"""]]