From 195c403ae3c112b6c1792b2620e784ce2bcfcef4 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Fri, 1 Aug 2014 22:16:29 +0000 Subject: [PATCH 01/21] removed --- ...t_21_5c11e69c28b9ed4cbe238a36c0839a47._comment | 15 --------------- 1 file changed, 15 deletions(-) delete mode 100644 doc/special_remotes/comment_21_5c11e69c28b9ed4cbe238a36c0839a47._comment diff --git a/doc/special_remotes/comment_21_5c11e69c28b9ed4cbe238a36c0839a47._comment b/doc/special_remotes/comment_21_5c11e69c28b9ed4cbe238a36c0839a47._comment deleted file mode 100644 index 1645e03e68..0000000000 --- a/doc/special_remotes/comment_21_5c11e69c28b9ed4cbe238a36c0839a47._comment +++ /dev/null @@ -1,15 +0,0 @@ -[[!comment format=mdwn - username="http://joeyh.name/" - ip="209.250.56.64" - subject="comment 21" - date="2013-11-24T15:58:30Z" - content=""" -@Bence the closest I have is some tests of particular special remotes inside Test.hs. The shell equivilant of that code is: - -[[!format sh \"\"\" -set -e -git annex copy file --to remote # tests store -git annex drop file # tests checkpresent when remote has file -git annex move file --from remote # tests retrieve and remove -\"\"\"]] -"""]] From 25ac3a4c4ad72ca5acca02dab84ad98de0578b98 Mon Sep 17 00:00:00 2001 From: "https://id.koumbit.net/anarcat" Date: Sat, 2 Aug 2014 02:46:07 +0000 Subject: [PATCH 02/21] some progress. maybe. --- ...usability:_what_are_those_arrow_things__63__.mdwn | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/doc/forum/usability:_what_are_those_arrow_things__63__.mdwn b/doc/forum/usability:_what_are_those_arrow_things__63__.mdwn index ac83430408..cde7d8d505 100644 --- a/doc/forum/usability:_what_are_those_arrow_things__63__.mdwn +++ b/doc/forum/usability:_what_are_those_arrow_things__63__.mdwn @@ -1,6 +1,8 @@ +# Introduction + i want to relate a usability story that happens fairly regularly when I show git-annex to people. the story goes like this. ----- +# The story Antoine sat down at his computer saying, "i have this great movie collection I want to share with you, my friend, because the fair use provisions allow for that, and I use this great git-annex tool that allows me to sync my movie collection between different places". His friend Charlie, a Linux user only vaguely familiar with the internals of how his operating system or legal system actually works, reads this as "yay free movies" and wholeheartedly agrees to lend himself to the experiment. @@ -10,7 +12,7 @@ Charlie logs into Antoine's computer, named `marcos`. Antoine shows Charlie wher Antoine then has no solution but to convert the git-annex repository into direct mode, something which takes a significant amount of time and is actually [[designated as "untrusted"|direct_mode]] in the documentation. In fact, so much so that he actually did [[screw up his repository magnificently|bugs/direct_command_leaves_repository_inconsistent_if_interrupted]] because he freaked out when `git-annex direct` started and interrupted it because he tought it would take too long. ----- +# Technical analysis Now I understand it is not necessarily `git-annex`'s responsability if Thunar (or Nautilus, for that matter), doesn't know how to properly deal with symlinks (hint: just dereference the damn thing already). Maybe I should file a bug about this against thunar? I also understand that symlinks are useful to ensure the security of the data hosted in `git-annex`, and that I could have used direct mode in the first place. But I like to track changes in git to those files, and direct mode makes that really difficult. @@ -19,3 +21,9 @@ I didn't file this as a bug because I want to start the conversation, but maybe (The other being "how do i actually use git annex to sync those files instead of just copying them by hand", but that's for another story!) -- [[anarcat]] + +# Followup + +Here is a bug report filed against Thunar, with a patch to fix this behavior: https://bugzilla.xfce.org/show_bug.cgi?id=11065 + +Similar bugs would need to be filed against Nautilus, at the very least, but probably other file managers, which makes this task a little daunting, to say the least. -- [[anarcat]] From 670f4d43e3e76bd42fea4d5e725997c97a945f4e Mon Sep 17 00:00:00 2001 From: zardoz Date: Sat, 2 Aug 2014 12:21:38 +0000 Subject: [PATCH 03/21] --- ...r_depending_on_working_dir_during_add.mdwn | 57 +++++++++++++++++++ 1 file changed, 57 insertions(+) create mode 100644 doc/bugs/WORM_keys_differ_depending_on_working_dir_during_add.mdwn diff --git a/doc/bugs/WORM_keys_differ_depending_on_working_dir_during_add.mdwn b/doc/bugs/WORM_keys_differ_depending_on_working_dir_during_add.mdwn new file mode 100644 index 0000000000..e412201141 --- /dev/null +++ b/doc/bugs/WORM_keys_differ_depending_on_working_dir_during_add.mdwn @@ -0,0 +1,57 @@ +### Please describe the problem. + +While the docs say that WORM keys are a function of a files basename, +when doing «git annex add .», the generated keys will actually contain +the relative path (with slashes escaped). Not sure whether this is by +design or a bug in its own right. I suppose that to minimize the chance +of collisions on WORM, having the path within the key is preferable. + +A problem about this, however, is that the path in the key is not +stable, but varies with the working dir when doing the «git annex +add». So, when a file is added from one working dir (say, the repo +base), later unlocked, and readded from another working dir (say, +somewhere below the repo base), this will generate a different key +even when the file has not been touched. + +Is there a rationale for this variability, or should «add» canonicalize +the encoded paths to the repo root? + + +### What steps will reproduce the problem? + + +[[!format sh """ + +# Init +$ git init /tmp/foo +$ cd /tmp/foo && git annex init + +$ mkdir baz +$ touch baz/quux + +# Add file with working dir at repo root. +$ git annex add --backend=WORM baz +$ git commit -m "first" + +# Key includes relative path. +$ readlink baz/quux +../.git/annex/objects/8x/8V/WORM-s0-m1406981486--baz%quux/WORM-s0-m1406981486--baz%quux + +# Unlock and readd with working dir at path below repo root. +$ cd baz +$ git annex unlock quux + +$ git annex add quux +$ git com -m "second" + +# Relative path is anchored to working dir instead of repo root. +$ readlink quux +../.git/annex/objects/9G/72/WORM-s0-m1406981486--quux/WORM-s0-m1406981486--quux + +# End of transcript or log. +"""]] + +### What version of git-annex are you using? On what operating system? +Linux 3.15.8 + +git-annex 5.20140716 From 5e2dd2ea865562fae4eb99e1820a8c632e5bb723 Mon Sep 17 00:00:00 2001 From: zardoz Date: Sat, 2 Aug 2014 14:29:26 +0000 Subject: [PATCH 04/21] Added a comment --- .../comment_2_975edca7ec87158216d9e106903dfb48._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/todo/wishlist:_annex.largefiles_support_for_mimetypes/comment_2_975edca7ec87158216d9e106903dfb48._comment diff --git a/doc/todo/wishlist:_annex.largefiles_support_for_mimetypes/comment_2_975edca7ec87158216d9e106903dfb48._comment b/doc/todo/wishlist:_annex.largefiles_support_for_mimetypes/comment_2_975edca7ec87158216d9e106903dfb48._comment new file mode 100644 index 0000000000..ba88bfb7c6 --- /dev/null +++ b/doc/todo/wishlist:_annex.largefiles_support_for_mimetypes/comment_2_975edca7ec87158216d9e106903dfb48._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="zardoz" + ip="78.48.163.229" + subject="comment 2" + date="2014-08-02T14:29:26Z" + content=""" +This could be achieved in a generic way by allowing filter binaries in expressions, which are run on the filename and return 0 or 1. +"""]] From 49db204da4f72bc2355b330d9d2da262f0569532 Mon Sep 17 00:00:00 2001 From: divB Date: Sat, 2 Aug 2014 20:10:12 +0000 Subject: [PATCH 05/21] --- ...to_repair_hangs__44___broken_symlinks.mdwn | 41 +++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100644 doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn diff --git a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn new file mode 100644 index 0000000000..8688820cf9 --- /dev/null +++ b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn @@ -0,0 +1,41 @@ +Sorry that I put all this in the same thread but I don't know what happened and how it is related. + +I have just a simple setup: git-annex client with assistant (Windows 7) and on a server (Debian). + +Suddenly weird things started to happen + +1.) On Windows, when I start the assistant, it writes "Attempting to repair THINKTANK:c:\data\annex [here]" but it runs forever and never stops + +2.) On Windows, when I get "Pusher crashed: failed to read sha from git write-tree [Restart Thread]". When I click "Restart Thread" nothing happens but the message from (1) persists. + +3.) When I run "git annex fsck" on the client I get thousands of messages like + fsck Fotos/2014/DSC_0303.JPG + ** No known copies exist of Fotos/2014/DSC_0303.JPG + failed + +Here the same: + + $ git annex whereis "Fotos/2014/DSC_0303.JPG" + whereis Fotos/2014/DSC_0303.JPG (0 copies) failed + git-annex: whereis: 1 failed + +4.) On the server, files that should ALWAYS be on the server (configured as "full backup") suddenly wiped data that was also made available on the client. The symlinks are dangling symlinks and contain just binary data: + + ls -l + lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0011.JPG -> ???? + lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0012.JPG -> ???? + lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0013.JPG -> ???? + lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0014.JPG -> ???? + lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0015.JPG -> ???? + lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0018.JPG -> ???? + lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0019.JPG -> ???? + lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0020.JPG -> ???? + +5.) "git annex fsck" on the server is still successful, returning no errors! + +6.) Manually executing "git annex sync --content" on both sides does not change anything and does not output any error messages. + +Oh no, so desparate :-( Any ideas? + +(I am happy to share all log files privately but I do not want to publish them here because they contain sensitive data) + From 7d015c8eef404ccae30269ad8911dc09da9aba56 Mon Sep 17 00:00:00 2001 From: divB Date: Sat, 2 Aug 2014 20:12:42 +0000 Subject: [PATCH 06/21] --- ...4___attempt_to_repair_hangs__44___broken_symlinks.mdwn | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn index 8688820cf9..56666a49a9 100644 --- a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn +++ b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn @@ -19,7 +19,9 @@ Here the same: whereis Fotos/2014/DSC_0303.JPG (0 copies) failed git-annex: whereis: 1 failed -4.) On the server, files that should ALWAYS be on the server (configured as "full backup") suddenly wiped data that was also made available on the client. The symlinks are dangling symlinks and contain just binary data: +4.) When I do "git annex status" a whole bunch of files are displayed with "M" (modified) although they are not, they are not even checked out and should be only at the server ... + +5.) On the server, files that should ALWAYS be on the server (configured as "full backup") suddenly wiped data that was also made available on the client. The symlinks are dangling symlinks and contain just binary data: ls -l lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0011.JPG -> ???? @@ -31,9 +33,9 @@ Here the same: lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0019.JPG -> ???? lrwxrwxrwx 1 4 Aug 2 08:55 DSC_0020.JPG -> ???? -5.) "git annex fsck" on the server is still successful, returning no errors! +6.) "git annex fsck" on the server is still successful, returning no errors! -6.) Manually executing "git annex sync --content" on both sides does not change anything and does not output any error messages. +7.) Manually executing "git annex sync --content" on both sides does not change anything and does not output any error messages. Oh no, so desparate :-( Any ideas? From 84b5d1241c77fdf6d96d299e7daf41668719f6e3 Mon Sep 17 00:00:00 2001 From: divB Date: Sat, 2 Aug 2014 20:17:05 +0000 Subject: [PATCH 07/21] --- ...t_to_repair_hangs__44___broken_symlinks.mdwn | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn index 56666a49a9..f25041165d 100644 --- a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn +++ b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn @@ -37,6 +37,23 @@ Here the same: 7.) Manually executing "git annex sync --content" on both sides does not change anything and does not output any error messages. +8.) On the client: + + $ git annex group here + error: invalid object 100644 3b3767ae65e5c6d2e3835af3d55fbf2f9e145c8b for '000/0e6/SHA256Es193806--b6d4689fba8e15acd6497f9a7e584c93ea0c8c2199ad32eadac79d59b9f49814.JPG.log' + fatal: git-write-tree: error building trees + manual + (Recording state in git...) + git-annex: failed to read sha from git write-tree + + $ git annex wanted here + error: invalid object 100644 3b3767ae65e5c6d2e3835af3d55fbf2f9e145c8b for '000/0e6/SHA256Es193806--b6d4689fba8e15acd6497f9a7e584c93ea0c8c2199ad32eadac79d59b9f49814.JPG.log' + fatal: git-write-tree: error building trees + exclude="*" and present + git-annex: failed to read sha from git write-tree + + + Oh no, so desparate :-( Any ideas? (I am happy to share all log files privately but I do not want to publish them here because they contain sensitive data) From a8b0311d8af4301fa24c212214125d820efb92c2 Mon Sep 17 00:00:00 2001 From: divB Date: Sat, 2 Aug 2014 20:23:10 +0000 Subject: [PATCH 08/21] --- ...attempt_to_repair_hangs__44___broken_symlinks.mdwn | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn index f25041165d..91fbfbc7bb 100644 --- a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn +++ b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn @@ -52,6 +52,17 @@ Here the same: exclude="*" and present git-annex: failed to read sha from git write-tree +9.) Ok I don't know what happened I did nothing special but it seems that the repository is broken :( :( + + git annex repair + Running git fsck ... + git-annex: DeleteFile "C:\\Data\\annex\\.git\\objects\\2a\\54bb281c80c91ea7a732c0d48db0c5acc0ca2c": permission denied (Access is denied.) + failed + git-annex: repair: 1 failed + +But that's not true, I can read and delete that file manually + + Oh no, so desparate :-( Any ideas? From 9973dc85e620788a9372de89bd41d5385ac6b5f6 Mon Sep 17 00:00:00 2001 From: divB Date: Sat, 2 Aug 2014 20:24:15 +0000 Subject: [PATCH 09/21] --- ...shed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn index 91fbfbc7bb..f83a85ff43 100644 --- a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn +++ b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn @@ -9,6 +9,7 @@ Suddenly weird things started to happen 2.) On Windows, when I get "Pusher crashed: failed to read sha from git write-tree [Restart Thread]". When I click "Restart Thread" nothing happens but the message from (1) persists. 3.) When I run "git annex fsck" on the client I get thousands of messages like + fsck Fotos/2014/DSC_0303.JPG ** No known copies exist of Fotos/2014/DSC_0303.JPG failed From 668e67719a1f4ada1016cd378cf123c438611828 Mon Sep 17 00:00:00 2001 From: divB Date: Sat, 2 Aug 2014 20:29:06 +0000 Subject: [PATCH 10/21] --- ...4___attempt_to_repair_hangs__44___broken_symlinks.mdwn | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn index f83a85ff43..10fd5d6aaf 100644 --- a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn +++ b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn @@ -55,13 +55,17 @@ Here the same: 9.) Ok I don't know what happened I did nothing special but it seems that the repository is broken :( :( - git annex repair + $ git annex --verbose --debug repair + [...] + [2014-08-02 13:27:38 Pacific Daylight Time] read: git ["--git-dir=C:\\Data\\annex\\.git","--work-tree=C:\\Data\\annex","-c","core.bare=false","show","ef3fe549f457783dbbd877b467b4e54b0ebc813c"] Running git fsck ... + git-annex: DeleteFile "C:\\Data\\annex\\.git\\objects\\2a\\54bb281c80c91ea7a732c0d48db0c5acc0ca2c": permission denied (Access is denied.) failed git-annex: repair: 1 failed -But that's not true, I can read and delete that file manually +But this file exists, I can read, write and delete to this file manually, there is definitely no permission denied ... + From 8fbdcd7d9f030e821b1981056947ee72e6b9c95d Mon Sep 17 00:00:00 2001 From: divB Date: Sat, 2 Aug 2014 22:57:48 +0000 Subject: [PATCH 11/21] --- ...44___attempt_to_repair_hangs__44___broken_symlinks.mdwn | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn index 10fd5d6aaf..4c1bcc129b 100644 --- a/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn +++ b/doc/forum/Pusher_crashed__44___attempt_to_repair_hangs__44___broken_symlinks.mdwn @@ -1,6 +1,6 @@ Sorry that I put all this in the same thread but I don't know what happened and how it is related. -I have just a simple setup: git-annex client with assistant (Windows 7) and on a server (Debian). +I have just a simple setup: git-annex client with assistant (Windows 7) and on a server (Debian, no assistant). Suddenly weird things started to happen @@ -68,9 +68,10 @@ But this file exists, I can read, write and delete to this file manually, there - - Oh no, so desparate :-( Any ideas? +As it seems the client repository is broken but how can it be then that also files on the server repository get deleted which shouldn't be deleted? +And how can it be that there are not only broken symlinks but symlinks that have just binary garbage as target and "fsck" returns success? + (I am happy to share all log files privately but I do not want to publish them here because they contain sensitive data) From 3f8605870e8d46c81595d6fb3fb40208fbe45caa Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sat, 2 Aug 2014 19:05:16 -0400 Subject: [PATCH 12/21] devblog --- doc/devblog/day_209__mass_conversion.mdwn | 28 +++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 doc/devblog/day_209__mass_conversion.mdwn diff --git a/doc/devblog/day_209__mass_conversion.mdwn b/doc/devblog/day_209__mass_conversion.mdwn new file mode 100644 index 0000000000..6278b191a4 --- /dev/null +++ b/doc/devblog/day_209__mass_conversion.mdwn @@ -0,0 +1,28 @@ +Have started converting lots of special remotes to the new API. Today, S3 +and hook got chunking support. I also converted several remotes to the new +API without supporting chunking: bup, ddar, and glacier (which should +support chunking, but there were complications). + +This removed 110 lines of code while adding features! And, +I seem to be able to convert them faster than `testremote` can test them. :) + +Now that S3 supports chunks, they can be used to work around several +problems with S3 remotes, including file size limits, and a memory leak in +the underlying S3 library. + +The S3 conversion included caching of the S3 connection when +storing/retrieving chunks. But the API doesn't yet support caching +when removing or checking if chunks are present. I should probably expand +the API, but got into some type checker messes when using generic enough +data types to support everything. Should probably switch to `ResourceT`. + +Also, I tried, but failed to make `testremote` check that storing a key +is done atomically. The best I could come up with was a test that stored a +key and had another thread repeatedly check if the object was present on +the remote, logging the results and timestamps. It then becomes a +statistical problem -- somewhere toward the end of the log it's ok if the key +has become present -- but too early might indicate that it wasn't stored +atomically. Perhaps it's my poor knowledge of statistics, but I could not +find a way to analize the log that reliably detected non-atomic storage. +If someone would like to try to work on this, see the `atomic-store-test` +branch. From 2ca8ddca0f893ebe53e82c7a66d1fe328da584f2 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sat, 2 Aug 2014 23:08:44 +0000 Subject: [PATCH 13/21] Added a comment --- ...mment_3_5e9cecb0e2ec7602963406779b6e3c1f._comment | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 doc/bugs/S3_memory_leaks/comment_3_5e9cecb0e2ec7602963406779b6e3c1f._comment diff --git a/doc/bugs/S3_memory_leaks/comment_3_5e9cecb0e2ec7602963406779b6e3c1f._comment b/doc/bugs/S3_memory_leaks/comment_3_5e9cecb0e2ec7602963406779b6e3c1f._comment new file mode 100644 index 0000000000..a7bb0265fe --- /dev/null +++ b/doc/bugs/S3_memory_leaks/comment_3_5e9cecb0e2ec7602963406779b6e3c1f._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.112" + subject="comment 3" + date="2014-08-02T23:08:44Z" + content=""" +hS3's author seems to have abandoned it and it has other problems. I should try to switch to a different S3 library. + +There is now a workaround; S3 special remotes can be configured to use [[chunking]]. A max of one chunk will then be buffered in memory at a time. + +For example, to reconfigure an existing mys3 remote: `enableremote mys3 chunk=1MiB` +"""]] From 51f774f8c8082221ed1548eba76382ce6444a841 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sat, 2 Aug 2014 23:13:41 +0000 Subject: [PATCH 14/21] Added a comment --- ...ent_3_d878b87a05f4fcd380e6ff309b615aab._comment | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 doc/bugs/S3_upload_not_using_multipart/comment_3_d878b87a05f4fcd380e6ff309b615aab._comment diff --git a/doc/bugs/S3_upload_not_using_multipart/comment_3_d878b87a05f4fcd380e6ff309b615aab._comment b/doc/bugs/S3_upload_not_using_multipart/comment_3_d878b87a05f4fcd380e6ff309b615aab._comment new file mode 100644 index 0000000000..46245e657c --- /dev/null +++ b/doc/bugs/S3_upload_not_using_multipart/comment_3_d878b87a05f4fcd380e6ff309b615aab._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.112" + subject="comment 3" + date="2014-08-02T23:13:41Z" + content=""" +There is now a workaround; S3 special remotes can be configured to use [[chunking]]. + +For example, to reconfigure an existing mys3 remote: `enableremote mys3 chunk=1MiB` + +I'm leaving this bug open because chunking is not the default (although the assistant does enable it by default), and because this chunking operates at a higher, and less efficient level than S3's own multipart upload API. In particular, AWS will charge a fee for each http request made for a chunk. + +Adding proper multipart support will probably require switching to a different S3 library. +"""]] From 42655caacc1ebec030e0db36332ce0f12f95050c Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sat, 2 Aug 2014 19:16:35 -0400 Subject: [PATCH 15/21] correction --- doc/devblog/day_209__mass_conversion.mdwn | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/doc/devblog/day_209__mass_conversion.mdwn b/doc/devblog/day_209__mass_conversion.mdwn index 6278b191a4..23de65b2ac 100644 --- a/doc/devblog/day_209__mass_conversion.mdwn +++ b/doc/devblog/day_209__mass_conversion.mdwn @@ -11,7 +11,11 @@ problems with S3 remotes, including file size limits, and a memory leak in the underlying S3 library. The S3 conversion included caching of the S3 connection when -storing/retrieving chunks. But the API doesn't yet support caching +storing/retrieving chunks. [Update: Actually, it turns out it didn't; +the hS3 library doesn't support persistent connections. Another reason I +need to switch to a better S3 library!] + +But the API doesn't yet support caching when removing or checking if chunks are present. I should probably expand the API, but got into some type checker messes when using generic enough data types to support everything. Should probably switch to `ResourceT`. From 116f70555e35df440bea6c91460ef8bfcfa9c63f Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawmkuFJVGp6WVvJtIV5JYb8IqN8mRvSGQdI" Date: Sun, 3 Aug 2014 01:18:54 +0000 Subject: [PATCH 16/21] Added a comment: Would you accept a patch? --- ...ment_5_a4ab4173620b72ac0a24d575fa9c810c._comment | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 doc/forum/Using_git-annex_as_a_library/comment_5_a4ab4173620b72ac0a24d575fa9c810c._comment diff --git a/doc/forum/Using_git-annex_as_a_library/comment_5_a4ab4173620b72ac0a24d575fa9c810c._comment b/doc/forum/Using_git-annex_as_a_library/comment_5_a4ab4173620b72ac0a24d575fa9c810c._comment new file mode 100644 index 0000000000..70f2ad9353 --- /dev/null +++ b/doc/forum/Using_git-annex_as_a_library/comment_5_a4ab4173620b72ac0a24d575fa9c810c._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawmkuFJVGp6WVvJtIV5JYb8IqN8mRvSGQdI" + nickname="Emilio Jesús" + subject="Would you accept a patch?" + date="2014-08-03T01:18:54Z" + content=""" +Dear Joey, + +I am also interested in using git-annex as a Haskell library, would you accept a patch to the .cabal file then? + +Thanks, +Emilio +"""]] From ef0404f5d4a5d1ca9e678d9b8cdc16fbfea4f74b Mon Sep 17 00:00:00 2001 From: zardoz Date: Sun, 3 Aug 2014 15:25:47 +0000 Subject: [PATCH 17/21] --- ...ate_entries_in_location_tracking_logs.mdwn | 21 +++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 doc/forum/Duplicate_entries_in_location_tracking_logs.mdwn diff --git a/doc/forum/Duplicate_entries_in_location_tracking_logs.mdwn b/doc/forum/Duplicate_entries_in_location_tracking_logs.mdwn new file mode 100644 index 0000000000..ba7526ed2e --- /dev/null +++ b/doc/forum/Duplicate_entries_in_location_tracking_logs.mdwn @@ -0,0 +1,21 @@ +I’ve noticed something odd when inspecting the history of the +git-annex branch today. Apparently, the branch had some merge +conflicts during sync that involved two alternative location tracking +entries that both were for one and the same remote. Both entries only +differed in their timestamps, and the union merge kept both, so that I +now have .log files in the annex branch that contain duplicate parts +like this. + +1404838274.151066s 1 a2401cfd-1f58-4441-a2b3-d9bef06220ad +1406978406.24838s 1 a2401cfd-1f58-4441-a2b3-d9bef06220ad + +The UUID here is my local repository. + +The duplication also occurred in the uuid.log: + +4316c3dc-5b6d-46eb-b780-948c717b7be5 server timestamp=1404839228.113473s +4316c3dc-5b6d-46eb-b780-948c717b7be5 server timestamp=1404847241.863051s + +Is this something to be concerned about? The situation somehow arose +in relation to unannexing a bunch of files and rebasing the master +branch. From 699a0a3bf8051549ebb4b0962a3ea402ea5e9020 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sun, 3 Aug 2014 18:22:58 +0000 Subject: [PATCH 18/21] Added a comment --- .../comment_4_09a3372fd13734cbb05e79d0ba76d052._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/bugs/S3_upload_not_using_multipart/comment_4_09a3372fd13734cbb05e79d0ba76d052._comment diff --git a/doc/bugs/S3_upload_not_using_multipart/comment_4_09a3372fd13734cbb05e79d0ba76d052._comment b/doc/bugs/S3_upload_not_using_multipart/comment_4_09a3372fd13734cbb05e79d0ba76d052._comment new file mode 100644 index 0000000000..d366281637 --- /dev/null +++ b/doc/bugs/S3_upload_not_using_multipart/comment_4_09a3372fd13734cbb05e79d0ba76d052._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.112" + subject="comment 4" + date="2014-08-03T18:22:58Z" + content=""" +The aws library does not support multipart yet either; here's the bug report requesting it: +"""]] From ac166a898ece3ab2aabcbcf826a9dcff07d2d6e6 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sun, 3 Aug 2014 18:27:32 +0000 Subject: [PATCH 19/21] Added a comment --- .../comment_5_5add65b5b284f79ec09ee4d0326e7132._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/bugs/S3_upload_not_using_multipart/comment_5_5add65b5b284f79ec09ee4d0326e7132._comment diff --git a/doc/bugs/S3_upload_not_using_multipart/comment_5_5add65b5b284f79ec09ee4d0326e7132._comment b/doc/bugs/S3_upload_not_using_multipart/comment_5_5add65b5b284f79ec09ee4d0326e7132._comment new file mode 100644 index 0000000000..0c77423649 --- /dev/null +++ b/doc/bugs/S3_upload_not_using_multipart/comment_5_5add65b5b284f79ec09ee4d0326e7132._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.112" + subject="comment 5" + date="2014-08-03T18:27:32Z" + content=""" +However, I don't think that multipart upload actually allows exceeding the S3 limit of 5 GB per object. Configuring the remote with `chunk=100MiB` *does* allow bypassing whatever S3's maximum object size happens to be. +"""]] From 63c00daa080230aab25c34e91fed89e1d25d677a Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sun, 3 Aug 2014 18:40:26 +0000 Subject: [PATCH 20/21] Added a comment --- .../comment_4_37e41b518813bd7c349017abf4a0ca0f._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/bugs/S3_memory_leaks/comment_4_37e41b518813bd7c349017abf4a0ca0f._comment diff --git a/doc/bugs/S3_memory_leaks/comment_4_37e41b518813bd7c349017abf4a0ca0f._comment b/doc/bugs/S3_memory_leaks/comment_4_37e41b518813bd7c349017abf4a0ca0f._comment new file mode 100644 index 0000000000..464588a793 --- /dev/null +++ b/doc/bugs/S3_memory_leaks/comment_4_37e41b518813bd7c349017abf4a0ca0f._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="209.250.56.112" + subject="comment 4" + date="2014-08-03T18:40:26Z" + content=""" +Beginning work on a `s3-aws` branch using the aws library instead of hS3. +"""]] From c648548e1fa9dd2f972accee5db5d2353c088384 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 3 Aug 2014 14:56:40 -0400 Subject: [PATCH 21/21] formatting --- doc/forum/Duplicate_entries_in_location_tracking_logs.mdwn | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/forum/Duplicate_entries_in_location_tracking_logs.mdwn b/doc/forum/Duplicate_entries_in_location_tracking_logs.mdwn index ba7526ed2e..f5f1354440 100644 --- a/doc/forum/Duplicate_entries_in_location_tracking_logs.mdwn +++ b/doc/forum/Duplicate_entries_in_location_tracking_logs.mdwn @@ -6,15 +6,19 @@ differed in their timestamps, and the union merge kept both, so that I now have .log files in the annex branch that contain duplicate parts like this. +
 1404838274.151066s 1 a2401cfd-1f58-4441-a2b3-d9bef06220ad
 1406978406.24838s 1 a2401cfd-1f58-4441-a2b3-d9bef06220ad
+
The UUID here is my local repository. The duplication also occurred in the uuid.log: +
 4316c3dc-5b6d-46eb-b780-948c717b7be5 server timestamp=1404839228.113473s
 4316c3dc-5b6d-46eb-b780-948c717b7be5 server timestamp=1404847241.863051s
+
Is this something to be concerned about? The situation somehow arose in relation to unannexing a bunch of files and rebasing the master