diff --git a/doc/bugs/Assistant_doesn__39__t_sync_directory_remotes.mdwn b/doc/bugs/Assistant_doesn__39__t_sync_directory_remotes.mdwn new file mode 100644 index 0000000000..3be92bdb03 --- /dev/null +++ b/doc/bugs/Assistant_doesn__39__t_sync_directory_remotes.mdwn @@ -0,0 +1,34 @@ +### Please describe the problem. + +I have set up a repository with assistant. Then, within it, I ran: + +``` +git annex initremote source type=directory directory=... importtree=yes encryption=none +git annex enableremote source type=directory directory=... +git config remote.source.annex-readonly true +git config remote.source.annex-tracking-branch main:data +git annex import main:data --from source +``` + +At this point, git annex sync will (usually) sync this. + +### What steps will reproduce the problem? + +There are two problems. + +1. The assistant will never sync this, no matter what I do. I can request a manual sync of either the repo or the remote, and neither does anything. +2. It appears that the assistant is creating a locking race with the CLI. For instance, I got `fatal: Unable to create '.git/index.lock': .git/index.lock: openFd: already exists (File exists)` with one run of `git annex sync`, but the run of it before and after worked fine. + +When there isn't a race with `git annex sync`, it behaves as desired. + +### What version of git-annex are you using? On what operating system? + +8.20210223-2 on Debian bullseye. + +### Please provide any additional information below. + +I will probably disable assistant + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +I'm pretty excited about using this approach to help archive some photos and such! diff --git a/doc/bugs/Broken_symlinks_in_directory_remote_causes_crash.mdwn b/doc/bugs/Broken_symlinks_in_directory_remote_causes_crash.mdwn new file mode 100644 index 0000000000..57489b4ad5 --- /dev/null +++ b/doc/bugs/Broken_symlinks_in_directory_remote_causes_crash.mdwn @@ -0,0 +1,33 @@ +### Please describe the problem. + +I have a directory remote with importtree=yes. In that remote, I have some symlinks that are broken. (Long story; this is a file server and they work on the system that has mounted them, but are broken here.) + +### What steps will reproduce the problem? + +I've added it with `git config remote.source.annex-tracking-branch main:$REPO`. When I run `git annex sync`, I get: + +``` +commit +On branch adjusted/main(unlocked) +nothing to commit, working tree clean +ok +list source +git-annex: Unable to list contents of source: [redacted]: getFileStatus: does not exist (No such file or directory) +failed +git-annex: sync: 1 failed +``` + +### What version of git-annex are you using? On what operating system? + +8.20210223-2 on Debian + +### Please provide any additional information below. + +I would like git-annex to either: + +1. Store the symlink as a symlink, or +2. Ignore bad symlinks + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Loading in other parts of my photo collection as we speak! diff --git a/doc/bugs/Files_recorded_with_other_file__39__s_checksums.mdwn b/doc/bugs/Files_recorded_with_other_file__39__s_checksums.mdwn new file mode 100644 index 0000000000..e74f155d8e --- /dev/null +++ b/doc/bugs/Files_recorded_with_other_file__39__s_checksums.mdwn @@ -0,0 +1,100 @@ +### Please describe the problem. + +I have a special remote "source" set up as type=directory importtree=yes. + +I pulled it into one repo in which none of the files were wanted. So far so good. I cloned that repo to a second, archive, repo. git annex sync worked (but took 2 hours). Then I did `git annex get --auto`, most of the files came through OK. But some said things like this: + +``` +get Pictures/.dtrash/info/IMG_2979_v1-e9dced7b.dtrashinfo (from source...) (checksum...) + verification of content failed + + Unable to access these remotes: source + + No other repository is known to contain the file. +failed +``` + +(Both repos are unlocked via adjust --unlocked) + +Upon looking at the file in the repo where it wasn't wanted, I saw this: + +``` +$ cat IMG_2979_v1-e9dced7b.dtrashinfo +/annex/objects/SHA256E-s144--cec5c7b6a9d97344e374e8395e02b74350678147ff65d6df091f5115cf18bf72 +``` + +Interesting. So, in the source directory: + +``` +$ sha256sum IMG_2979_v1-e9dced7b.dtrashinfo +aca3ed7243def7a0bd5fcad542c66841b8e7d2a670b4cafe749eb27e032d8975 IMG_2979_v1-e9dced7b.dtrashinfo +``` + +That's not a match at all. Well, OK then: + +``` +$ sha256sum * | grep cec5c7 +cec5c7b6a9d97344e374e8395e02b74350678147ff65d6df091f5115cf18bf72 IMG_2981_v1-5fc99c7a.dtrashinfo +``` + +Yikes. So for IMG_2979_v1-e9dced7b.dtrashinfo, git-annex recorded a checksum that belonged to IMG_2981_v1-5fc99c7a.dtrashinfo. Well then, what is this other file recorded as, back in the git-annex repo? + +``` +$ cat IMG_2981_v1-5fc99c7a.dtrashinfo +/annex/objects/SHA256E-s144--cec5c7b6a9d97344e374e8395e02b74350678147ff65d6df091f5115cf18bf72 +``` + +OK, so two files that were not identical in the source directory got recorded with an identical checksum in git-annex somehow. And, when they were attempted to be imported via `git annex get --auto`, this at least was detected there. + +In this .dtrash/info directory, 436 files out of 719 were not loaded by `git annex get`, presumably due to this issue. + +In this directory, the source files were ranging in size from 140 to 227 bytes. + +In a companion directory, .dtrash/files, 24 out of 719 files exhibited this issue. These files tended to be larger, but one that was 495MB triggered it also. + +I have not yet seen it outside .dtrash, but it will be many hours until this get process completes fully, as it needs to copy about 1TB of data. + +In case you are wondering if there is a race condition with .dtrash: no. The only application that writes to it isn't running, and the last time a file was modified in there was over a year ago. Also the content of the .info files is just JSON and a corresponding filename embedded in them, so it is very clear that the files on the filesystem are correct and the calculated checksums at issue here were never correct. + +### What steps will reproduce the problem? + +I have laid that out as best I can above. + +### What version of git-annex are you using? On what operating system? + +8;.20210223 on Debian + +### Please provide any additional information below. + +Assistant is not being used. + +Setup: + +``` +REPO=Pictures +cd /acrypt/git-annex/repos +mkdir $REPO +cd $REPO +git init +git config annex.thin true +git annex init 'local hub' +git annex wanted . "include=* and exclude=$REPO/*" + +# Now initialize things. +touch mtree +git annex add mtree +git annex sync +git annex adjust --unlock + +git annex initremote source type=directory directory=/acrypt/git-annex/bind-ro/$REPO importtree=yes encryption=none + +git annex enableremote source directory=/acrypt/git-annex/bind-ro/$REPO +git config remote.source.annex-readonly true +git config remote.source.annex-tracking-branch main:$REPO +git config annex.securehashesonly true +git config annex.genmetadata true +git config annex.diskreserve 100M +``` + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + diff --git a/doc/bugs/Files_recorded_with_other_file__39__s_checksums/comment_1_05e3e33e1ca2361546dbe08c6bd476d6._comment b/doc/bugs/Files_recorded_with_other_file__39__s_checksums/comment_1_05e3e33e1ca2361546dbe08c6bd476d6._comment new file mode 100644 index 0000000000..68ab41bb32 --- /dev/null +++ b/doc/bugs/Files_recorded_with_other_file__39__s_checksums/comment_1_05e3e33e1ca2361546dbe08c6bd476d6._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="jgoerzen" + avatar="http://cdn.libravatar.org/avatar/090740822c9dcdb39ffe506b890981b4" + subject="comment 1" + date="2022-09-05T01:14:53Z" + content=""" +Update: This also occurred in other directories, with some video files from 2018. One directory contains 1945 files, and another 21 files. I'm not finding an obvious pattern to the issue. +"""]] diff --git a/doc/bugs/Files_recorded_with_other_file__39__s_checksums/comment_2_0735ed4187e23ddea6bd0aa408451942._comment b/doc/bugs/Files_recorded_with_other_file__39__s_checksums/comment_2_0735ed4187e23ddea6bd0aa408451942._comment new file mode 100644 index 0000000000..b5693f342d --- /dev/null +++ b/doc/bugs/Files_recorded_with_other_file__39__s_checksums/comment_2_0735ed4187e23ddea6bd0aa408451942._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="Lukey" + avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b" + subject="comment 2" + date="2022-09-05T09:35:18Z" + content=""" +You really should upgrade to the latest version. +"""]] diff --git a/doc/forum/Use_on_large_media_collection_without_modifying_it.mdwn b/doc/forum/Use_on_large_media_collection_without_modifying_it.mdwn new file mode 100644 index 0000000000..4c2d10f6f9 --- /dev/null +++ b/doc/forum/Use_on_large_media_collection_without_modifying_it.mdwn @@ -0,0 +1,34 @@ +Hi everyone, + +I want to lay out a couple of use cases here. + +I have several large (1 TB +) media collections. Some are often mounted read-only. Others are very sensitive to changes -- I definitely don't want to risk anything that might munge timestamps, etc. So my requirements are: + +1. Must not modify the files in the existing collection in any way. No changing timestamps, no converting them to hard or sym links, etc. +2. Must not store an additional copy of the data locally (I don't have space for that) +3. Must be able to handle the data store being read-only mounted (.git can be read-write) + +I want to use this for, in order of importance: + +1. Archival to external USB drives. Currently I do this with rsync and it's a real mess figuring out what's where and what to do when a drive fills up. +2. Being able to easily selectively copy some of the files to a laptop or Linux-using tablet for offline viewing +3. Being able to queue up files to add from a laptop/tablet + +I'm not worried about the .git directory itself; I can bind-mount the existing store to be a subdirectory under a git-annex repo, so that would be fine. + +So here's what I've looked into so far. All of these are run with `git annex adjust --unlock` (or the assistant, which does the same thing): + +- A directory remote with importtree=yes would work well for use case #1. However, since the rsync backend doesn't support importtree, it would be challenging for #2 (I guess I could make it work via sshfs, but that gets a bit nasty) +- I tried bind-mounting the existing data under a git-annex repo to use that as the source. This does work; however, presumably because it can't hard link the files into .git/annex, it results in doubling the storage space requirements for the data. That's not usable for me. +- I thought maybe a transport repo would help. So I could have, basically, `source->transport<->laptop` and `source->transport->archive`. The problem here is that git-annex can't copy directly from source to laptop or archive in this scenario without duplicating the data in transport. So I still can't just use get from the laptop to get things unless I use 2x the space, which again, I don't want to do. +- I thought about maybe adding git-annex directly to an existing directory. That risks changing things about it (since it is necessarily read-write to git-annex). I'm not really comfortable with that yet. + +Incidentally, I mentioned timestamps and didn't say how I'll preserve them for the archive drives. I can use mtree from Debian's mtree-netbsd package and do something like this on the source directory: + +`mtree -c -R nlink,uid,gid,mode -p /PATH/TO/REPO -X <(echo './.git') > /tmp/spec` + +And on the destination, restore the timestamps with: + +`mtree -t -U -e < /tmp/spec` + +I imagine some clever hooks would let me do this automatically, but I don't really feel the need for that. I think this is easier, for me, than the discussion at [[todo/does_not_preserve_timestamps]]. diff --git a/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_1_76307d95cf46992fbc5f084f9c056edc._comment b/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_1_76307d95cf46992fbc5f084f9c056edc._comment new file mode 100644 index 0000000000..c7b1535780 --- /dev/null +++ b/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_1_76307d95cf46992fbc5f084f9c056edc._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="Lukey" + avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b" + subject="comment 1" + date="2022-09-03T21:32:29Z" + content=""" +First of all you really want to look into/migrate to reflink-capable filesystems like XFS or btrfs. + +I don't know why you'd need to use the rsync special-remote for case #2. You create git-annex repos on your usb drive, +add the existing collection as a directory special-remote with `--import-tree` and import everything. Then you clone the +repo to your laptop and can `git annex sync/get/copy` from the usb drive however you like. I think you can even `git annex enableremote` the +import special-remote on your laptop, and then git-annex will get files directly from it. Heck, you could even `git annex import --no-content` +and only have the file metadata imported, but none of the content actually stored in git-annex and then you can selectively `git annex get` files directly from the special-remote. + +Also, you may want to set `git annex config --set annex.dotfiles true` on each of you repos. All of these options are documented in the [[git-annex]] manpage (also look at the [[git-annex-config]] manpage). +"""]] diff --git a/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_2_b89b598844b0709a5b6709d0fb2ef60c._comment b/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_2_b89b598844b0709a5b6709d0fb2ef60c._comment new file mode 100644 index 0000000000..57dff1e673 --- /dev/null +++ b/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_2_b89b598844b0709a5b6709d0fb2ef60c._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="jgoerzen" + avatar="http://cdn.libravatar.org/avatar/090740822c9dcdb39ffe506b890981b4" + subject="comment 2" + date="2022-09-03T22:45:42Z" + content=""" +Thank you for these thoughts! + +I should have mentioned that I intend the USB drives to often live offsite, so they would be disconnected. You are quite correct, though, that if they are onsite I could think of them as the sort of \"hub\" repository and do everything from them like that. + +Doing the enableremote for the special directory remote on the laptop does require it to be mounted as a filesystem there, hence my mention of sshfs. That can work but is a bit clunky. +"""]] diff --git a/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_3_2d707dc516dad666fb2a647f65fcedcf._comment b/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_3_2d707dc516dad666fb2a647f65fcedcf._comment new file mode 100644 index 0000000000..3f9d251c58 --- /dev/null +++ b/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_3_2d707dc516dad666fb2a647f65fcedcf._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="jgoerzen" + avatar="http://cdn.libravatar.org/avatar/090740822c9dcdb39ffe506b890981b4" + subject="comment 3" + date="2022-09-03T23:26:27Z" + content=""" +Forgot to mention - I'm on ZFS, which while it is a CoW filesystem, doesn't support cp --reflink. For various reasons, a migration to ZFS or btrfs isn't very practical for me. +"""]] diff --git a/doc/forum/Web_interface_to_git-annex__63__.mdwn b/doc/forum/Web_interface_to_git-annex__63__.mdwn new file mode 100644 index 0000000000..cd4f838713 --- /dev/null +++ b/doc/forum/Web_interface_to_git-annex__63__.mdwn @@ -0,0 +1,5 @@ +Is there currently such a thing as a Web interface for viewing files in a git-annex repository, and being able to carry out git-annex operations like get/drop? + +There are several different Web-based file browsers available, which work fine on a git-annex repo but don't let you do annex operations. I could also run a generic WebDAV server and client, but that has the same problem. + +The eventual use case is to expose the interface behind an HTTPS reverse proxy that will handle authentication, so no authentication functionality is needed. diff --git a/doc/tips/Repositories_with_large_number_of_files/comment_11_49585db4fbf9f22bf806ee08d73b7db1._comment b/doc/tips/Repositories_with_large_number_of_files/comment_11_49585db4fbf9f22bf806ee08d73b7db1._comment new file mode 100644 index 0000000000..a6f7af3695 --- /dev/null +++ b/doc/tips/Repositories_with_large_number_of_files/comment_11_49585db4fbf9f22bf806ee08d73b7db1._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="jgoerzen" + avatar="http://cdn.libravatar.org/avatar/090740822c9dcdb39ffe506b890981b4" + subject="comment 11" + date="2022-09-04T22:31:14Z" + content=""" +I'm trying to use a repo consisting of about 150,000 photos/videos. I tried all the tipes here as well as the ones at [[/forum /__34__git_annex_sync__34___synced_after_8_hours]] and the time is still quite poor. I don't know if using the special remote directory with importtree=yes hurts; I don't think it should. The problem seems to be largely CPU-bound and RAM-bound; syncs can use many GB of RAM and a large amount of CPU time (even when there is no evident hashing of source files). --jobs=10 hasn't caused much evident parallelization. Changing the git index type, repacking, etc. rocketed through almost instantly and made no evident change. I'd be very interested in ideas here, because at this rate, a sync that is a no-op has been running for 15 minutes just sitting there after \"list source ok\". I'll let it run and see what it does. + +If it makes a difference, this is an unlocked repo (via git annex adjust --unlocked), not running assistant. There are no directories with excessive numbers of photos. The underlying filesystem is ZFS. +"""]] diff --git a/doc/tips/Repositories_with_large_number_of_files/comment_12_e1f3b19d1dd62b1c9314e1e826fec54f._comment b/doc/tips/Repositories_with_large_number_of_files/comment_12_e1f3b19d1dd62b1c9314e1e826fec54f._comment new file mode 100644 index 0000000000..8a85021b7f --- /dev/null +++ b/doc/tips/Repositories_with_large_number_of_files/comment_12_e1f3b19d1dd62b1c9314e1e826fec54f._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="jgoerzen" + avatar="http://cdn.libravatar.org/avatar/090740822c9dcdb39ffe506b890981b4" + subject="comment 12" + date="2022-09-05T01:01:59Z" + content=""" +To expand: this took about 2 hours to run. A git annex sync from another git annex repo is a lot faster (a minute or so). It is the directory remote that's so slow. Examining with strace and lsof, I don't believe this is chechsumming. In fact, it seems to be mostly continuous reading from cidsdb/db, which is only 49MB in size and therefore certainly cached. The process is entirely CPU-bound. +"""]] diff --git a/doc/tips/Repositories_with_large_number_of_files/comment_13_6b4cd0667d700c0c5ec6e0054b43f2e7._comment b/doc/tips/Repositories_with_large_number_of_files/comment_13_6b4cd0667d700c0c5ec6e0054b43f2e7._comment new file mode 100644 index 0000000000..ddbc49020f --- /dev/null +++ b/doc/tips/Repositories_with_large_number_of_files/comment_13_6b4cd0667d700c0c5ec6e0054b43f2e7._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="Lukey" + avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b" + subject="comment 13" + date="2022-09-05T09:07:05Z" + content=""" +You may try the following: Set the preferred-content expression for the repo to just `present` or `anything` and then run `git annex sync --content --all`. This allows git-annex to use a optimization and should run faster. Don't use `--jobs` and unset `annex.jobs` git config, since these slow the optimization down a bit in my experience (note that just specifying `--jobs=1` is not the same AFAIK). + +See also [[todo/Incremental\_git\_annex\_sync\_--content\_--all]] + +Using unlocked files will slow down things in general, but from your description it doesn't sound like that's the issue here. +"""]] diff --git a/doc/todo/does_not_preserve_timestamps/comment_13_e37ea5188bd48817ccac82141906ee83._comment b/doc/todo/does_not_preserve_timestamps/comment_13_e37ea5188bd48817ccac82141906ee83._comment new file mode 100644 index 0000000000..834513db0e --- /dev/null +++ b/doc/todo/does_not_preserve_timestamps/comment_13_e37ea5188bd48817ccac82141906ee83._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="jgoerzen" + avatar="http://cdn.libravatar.org/avatar/090740822c9dcdb39ffe506b890981b4" + subject="mtree can help" + date="2022-09-03T17:30:57Z" + content=""" +You can see my workaround for this using mtree at [[/forum/Use_on_large_media_collection_without_modifying_it]]. +"""]] diff --git a/doc/users/jgoerzen.mdwn b/doc/users/jgoerzen.mdwn new file mode 100644 index 0000000000..1310545d18 --- /dev/null +++ b/doc/users/jgoerzen.mdwn @@ -0,0 +1 @@ +Hi. I'm John Goerzen. I have a [homepage](https://www.complete.org/jgoerzen). I [write a lot](https://www.complete.org/interesting-topics/) about [asynchronous networks](https://www.complete.org/asynchronous-communication/) and [offline communication](https://www.complete.org/tools-for-communicating-offline-and-in-difficult-circumstances/), especially with [NNCP](https://www.complete.org/nncp/). I'm particularly interested in git-annex's NNCP special remote.