From f78da404e594f04c9b148180e3c77749452f7fbc Mon Sep 17 00:00:00 2001 From: zardoz Date: Mon, 18 Aug 2014 20:54:11 +0000 Subject: [PATCH] Added a comment --- ...t_5_8378cd8c03fabdaa300194b66c1ea53c._comment | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 doc/bugs/Possible_data-loss_if_WORM_keys_do_not_encode_relative_paths/comment_5_8378cd8c03fabdaa300194b66c1ea53c._comment diff --git a/doc/bugs/Possible_data-loss_if_WORM_keys_do_not_encode_relative_paths/comment_5_8378cd8c03fabdaa300194b66c1ea53c._comment b/doc/bugs/Possible_data-loss_if_WORM_keys_do_not_encode_relative_paths/comment_5_8378cd8c03fabdaa300194b66c1ea53c._comment new file mode 100644 index 0000000000..72bc4f8f4c --- /dev/null +++ b/doc/bugs/Possible_data-loss_if_WORM_keys_do_not_encode_relative_paths/comment_5_8378cd8c03fabdaa300194b66c1ea53c._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="zardoz" + ip="78.48.163.229" + subject="comment 5" + date="2014-08-18T20:54:10Z" + content=""" +> True of a single filesystem, but not of a set of connected git repositories. + +That’s a good point. Might depend on the use case, though. + +> The probabilities of these seem low, but perhaps not as low as the probability that there will be two differing files with the same name+size+mtime in the first place. + +This one I’m not completely sure about. E.g., I have an annex with web pages mirrored from the web. Due to the crawler implementation, there are lots of «index.html» or «favicon.ico» with the same mtime (in particular when mtime is read with a 1 sec. precision). Files like favicon are often bitmaps of the same resolution and often have the same size due to this. Because there are file-formats where both size and basename are semantically pre-determined, there is zero entropy from these sources alone (also cf. «readme.txt»). The entropy of mtime alone is not really large, I suppose, and in some use-cases will also approach zero (think «initializing a repo by cp -r on a fast disk without preserving mtime). The relative path could make a huge difference there. I believe this argument is actually what worried me the most. Does it seem valid? + +Apart from entropy, there’s the non-probabilistic advantage we discussed (granted, with some limiting constraints which one has to assure for oneself). Granted, one might argue a hash would be the better way, but this is not always practical in every setup. +"""]]