Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2020-03-09 12:07:03 -04:00
commit 6eb52cf5f1
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
13 changed files with 311 additions and 0 deletions

View file

@ -0,0 +1,66 @@
I've been thinking a bit more about `annex.dotfiles` in the context of
[this forum post][0]. It seems to me that annexed dotfiles can jump
to git in a way that's surprising and worth raising as a possible bug.
Say that I have repo with `annex.dotfiles=true` in the .git/config,
and I add some dotfiles to the annex. Then, someone clones that repo
and goes into an adjusted state (either by running `git annex adjust
--unlock` or by being on a crippled file system). In that clone,
calling `annex get` on any of the annexed dotfiles will lead to those
files being added to the index as regular files. (Demo included
below.)
The above issue could be resolved by the user storing
`annex.dotfiles=true` in `git-annex:config.log`, but perhaps it'd be
feasible for git-annex to guard against already annexed dotfiles
migrating to git?
Thanks in advance.
[[!format sh """
git annex version | head -1
cd "$(mktemp -d --tmpdir gx-XXXXXXX)"
git init a
(
cd a
git annex init a
git config annex.dotfiles true
mkdir .reallybig
echo "a" >.reallybig/foo
git annex add .reallybig/foo
git commit -m"add foo"
)
git clone a b
(
cd b
git annex init b
git annex adjust --unlock
git annex get .reallybig
git status
git diff --cached
)
"""]]
```
git-annex version: 8.20200226
[...]
On branch adjusted/master(unlocked)
Changes to be committed:
modified: .reallybig/foo
diff --git a/.reallybig/foo b/.reallybig/foo
index 3de500c..7898192 100644
--- a/.reallybig/foo
+++ b/.reallybig/foo
@@ -1 +1 @@
-/annex/objects/SHA256E-s2--87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7
+a
```
[0]: https://git-annex.branchable.com/forum/Get_annex.dotfiles__61__true_behavior_without_persistent_configuration__63__/
[[!meta author=kyle]]
[[!tag projects/datalad]]

View file

@ -0,0 +1,20 @@
### Please describe the problem.
A special remote implementation that needs to look up further config based on the remote name no longer works, because a recent change prevents `GETCONFIG name` to return the remote name while `git annex initremote` is driving the special remote implementation.
### What version of git-annex are you using? On what operating system?
It used to work with 7.20191230 and no longer does with 8.20200226, test on Debian and Ubuntu.
### Please provide any additional information below.
Originally reported at https://github.com/datalad/datalad/issues/4259
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
There is no day that ends without me being grateful for git-annex ;-)
[[!meta author=mih]]
[[!tag projects/datalad]]

View file

@ -0,0 +1,67 @@
### Please describe the problem.
If `git annex init` has not been run on a repo, running git-annex commands on the linked worktrees should not change them, but seems to.
### What steps will reproduce the problem?
See log below
### What version of git-annex are you using? On what operating system?
8.20200226 on Amazon Linux 2
### Please provide any additional information below.
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
(master_env_v178_py36) 17:07 [a] $ ls -alt
total 4
drwxrwxr-x 8 ilya ilya 166 Mar 6 17:07 .git
drwxrwxr-x 3 ilya ilya 32 Mar 6 17:07 .
-rw-rw-r-- 1 ilya ilya 0 Mar 6 17:07 myfile
drwxrwxrwt 15 root root 4096 Mar 6 17:07 ..
(master_env_v178_py36) 17:07 [a] $ git worktree add ../b
Preparing worktree (new branch 'b')
HEAD is now at 9a5d353 first commit
(master_env_v178_py36) 17:08 [a] $ pushd ../b
/tmp/b /tmp/a /data/branches/is-add-asm-improvability-metrics
(master_env_v178_py36) 17:08 [b] $ ls -alt
total 8
drwxrwxr-x 2 ilya ilya 32 Mar 6 17:08 .
-rw-rw-r-- 1 ilya ilya 0 Mar 6 17:08 myfile
drwxrwxrwt 16 root root 4096 Mar 6 17:08 ..
-rw-rw-r-- 1 ilya ilya 32 Mar 6 17:08 .git
(master_env_v178_py36) 17:08 [b] $ git annex get
git-annex: First run: git-annex init
(master_env_v178_py36) 17:08 [b] $ ls -alt
total 4
drwxrwxr-x 2 ilya ilya 32 Mar 6 17:08 .
lrwxrwxrwx 1 ilya ilya 21 Mar 6 17:08 .git -> ../a/.git/worktrees/b
-rw-rw-r-- 1 ilya ilya 0 Mar 6 17:08 myfile
drwxrwxrwt 16 root root 4096 Mar 6 17:08 ..
(master_env_v178_py36) 17:08 [b] $ ls -alt /tmp/b/../a/.git/worktrees/b/annex
lrwxrwxrwx 1 ilya ilya 11 Mar 6 17:08 /tmp/b/../a/.git/worktrees/b/annex -> ../../annex
(master_env_v178_py36) 17:12 [b] $ git annex version
git-annex version: 8.20200226-g2d3ef2c07
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.20 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.1.0 ghc-8.6.5 http-client-0.5.14 persistent-sqlit\
e-2.9.3 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_2\
24 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE\
2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224\
BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
(master_env_v178_py36) 17:14 [b] $ uname -a
Linux ip-172-31-80-211.ec2.internal 4.14.171-136.231.amzn2.x86_64 #1 SMP Thu Feb 27 20:22:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
# End of transcript or log.
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)

View file

@ -0,0 +1,69 @@
### Please describe the problem.
I have a special directory remote with exporttree=yes (encryption=none) on an USB hard drive. Both `git annex sync --content` and `git annex export` only write around 400 KiB/s. Thus an export of a 9GB DVD iso takes a whole night.
The drive is not blazing fast, but:
- `sync; dd if=/dev/zero of=tempfile bs=1M count=10; sync` gives something around 10MB/s (don't recall the exact number)
- rsync (with --progress turned on) copies files with 2.35MB/s
`mount` for this drive shows:
> /dev/sdc1 on /media/thk/thk-sg1 type ext4 (rw,nosuid,nodev,relatime,sync,stripe=8191,uhelper=udisks2)
I tried to mount the drive without sync but failed. Even with the usdisks2 service stopped I could not manually mount the drive without sync (or with async). It always ended up being mounted with sync.
### What steps will reproduce the problem?
TODO(thk): try other drive and other laptop once the current transfer finishes...
Update 2020-03-07:
- export to a different USB drive (both seagate, same size, similar age) from the same machine with the exact same setup (but NTFS filesystem) runs with ~80 MiB/s. So this is perfect. This time there is also no problem with a lost exporttree=yes config.
### What version of git-annex are you using? On what operating system?
- git-annex version: 8.20200227-gf56dfe791
- Debian testing with Kernel 5.2.17
### Please provide any additional information below.
I now learned that there is no Linux kernel primitive to copy a file but that this is actually a high art:
<http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/copy.c>
I was surprised to see the implementation of `meteredWrite` in *Utility/Metered.hs*. I hoped that there would be some haskell standard library for efficient file copying? I wonder how rsync implements its progress meter? And whether the progress meter is the reason why rsync had slower write speed than dd.
Maybe it would make sense to call out to the *cp* command and just issue a *stat()* every few seconds for the progress meter? This is what I do to monitor cp progress manually.
I have no clue, but maybe these could help for fast file copying in Haskell?
- <https://github.com/snoyberg/conduit>
- <https://wiki.haskell.org/Pipes>
- reddit: [What is your take on conduits, pipes, and streams?](https://www.reddit.com/r/haskell/comments/7w79q1/what_is_your_take_on_conduits_pipes_and_streams/)
### Have you had any luck using git-annex before?
Well, I'm coming back to git-annex after several years. So far it is better than I remembered:
- tor support is great and solves the need for a central server
- I hope that the sqlite integration will now make large collections of files managable
- Finally we have exporttree, yeah!
## 2020-03-07 update
Turns out, the problem is more complex. I wanted to be clever. When I set up the two synced annex repos I made the mistake of not specifying exporttree=yes at the beginning. But I wanted to re-use the initial name. So I tried hard to remove all evidence of the previous existence of a special remote with that name from git-annex.
I checked out the git-annex branch in a separate worktree (see **man git worktree**) in both repositories, deleted the lines for that remote from remote.log and pushed to the other repo (not git annex sync). I even made the changes in parallel in both repos before pushing in both directions so that the special merge does not bring the lines back. I actually was sure there was nothing left of the old remotes. Of course I also deleted them from .git/config.
Somehow, there is again a line in remote.log for that remote without exporttree=yes. So now, after the last git annex sync --content, I have a mixture of an exported tree and an exported annex object store in the same special remote dir.
I also noticed that the repo that was so slow did not have the `remote.$REMOTE.annex-tracking-branch` config. But I could still run `sync --content` somehow. After adding this config, the last sync actually ran with 2 MB/s but it still wrote in object store format, not as an exported tree.
Some questions:
- Is there any other place where git-annex stores information about remotes then remote.log?
- The object store files in the remote were stored in format AAA/BBB/$HASH with three character directory names. While in .git/annex/objects the folders have two characters. What are those characters? I believe the 3 characters format is for remotes that potentially do not distinguish letter case?
- Is there a command to get the full path of a file in the object store (two or three letters) from the hash?
- Maybe there is still a bug. Is there a possibility that git-annex could forget that a remote is configured with exporttree=yes? Especially if I export to the same directory on the same usb drive from two synced repos?

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="sam.nastase@2b4a9b3e5094dab41e0a4de0b808a2697a3e9860"
nickname="sam.nastase"
avatar="http://cdn.libravatar.org/avatar/55c74b521bcb7322069f35bf655f81e0"
subject="Invalid option `--include-dotfiles'"
date="2020-03-06T22:17:32Z"
content="""
I just reinstalled DataLad (v0.12.2) via conda today and tried to do a `datalad save` on a preexisting datalad dataset and got previously unseen error with \"Invalid option `--include-dotfiles'\". Is this related to ongoing development? Or is there some easy fix? Thanks! (apologies if this is a poor place to post)
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="kyle"
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
subject="re: Invalid option `--include-dotfiles'"
date="2020-03-06T22:56:35Z"
content="""
DataLad has been updated for the removal of `--include-dotfiles` in
the latest git-annex release (8.20200226), but there hasn't been a
DataLad release yet that includes that fix. So I'd say the easiest
fix for now would be installing a developmental version of DataLad
(both `master` and `maint` have the fix). I think downgrading your
git-annex version would be problematic because your repo has probably
already been auto-upgraded to v8.
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="sam.nastase@2b4a9b3e5094dab41e0a4de0b808a2697a3e9860"
nickname="sam.nastase"
avatar="http://cdn.libravatar.org/avatar/55c74b521bcb7322069f35bf655f81e0"
subject="comment 5"
date="2020-03-06T23:34:12Z"
content="""
Thanks! This brings me to a new error due to our old version of git (v1.8.3), which apparently doesn't have the `--no-patch` flag for `git show`, but that's a separate issue.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 6"
date="2020-03-07T01:05:30Z"
content="""
Strange that a newer git didnt get installed by conda as a runtime dependency of git-annex... Can you post the output of `conda list`?
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="erewhon"
avatar="http://cdn.libravatar.org/avatar/b9bd5ad7176ebe149d0f051dcfe0a63e"
subject="comment 2"
date="2020-03-07T23:14:56Z"
content="""
Thanks, I missed that in the man page.
Is there a rationale for not having the option for preferred contents?
"""]]

View file

@ -0,0 +1 @@
Currently [[`git-annex-fsck`|git-annex-fsck]] gives a warning for all my files stored with MD5 keys that they can be upgraded to the more secure SHA256: `Can be upgraded to an improved key format. You can do so by running: git annex migrate`. In my case the key choice is deliberate, so it would be good if this warning could be disabled, to prevent it from drowning out more serious ones.

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="thk"
avatar="http://cdn.libravatar.org/avatar/bfef10a428769701aeee1db978951461"
subject="Also no clear error for permission problems"
date="2020-03-06T17:51:20Z"
content="""
I was exporting (with exporttree) to a directory remote on an external ext4 formatted USB drive.
As is usually the case, there was permission problem. My current user did not have write permission for one directory I was exporting to.
git-annex just printed \"failed\" after it actually completed the file transfer with 100%.
Even with --verbose and --debug I could not figure out the problem until I discovered the permission problem.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="thk"
avatar="http://cdn.libravatar.org/avatar/bfef10a428769701aeee1db978951461"
subject="No clear error message for failing names on NTFS"
date="2020-03-07T12:50:19Z"
content="""
I tried to export a tree to NTFS with a filename that contained spaces, single quotes and dots. Only after removing all of them the export succeeded. There error messages was just \"failed\".
"""]]

19
doc/users/thk.mdwn Normal file
View file

@ -0,0 +1,19 @@
I'm thk at debian org
My TODO items
- write a tip on using git worktree to inspect the git-annex branch
- Is there a way to filter out the directories?
- write a tip on how to deal with permission issues on ext formatted USB drives
- works of course only on Debian and derivatives
- use a common group defined in /usr/share/base-passwd/group.master, e.g. "floppy"
- use setgid bit: https://en.wikipedia.org/wiki/Setuid#SGID
- make sure all users on all machines are part of the common group
- Collect problems with NTFS tree exports, e.g.
- Spaces at the end in filenames
- single quotes in filenames
- dots?
- more experiences with ext4 encryption feature:
<https://www.techort.com/encryption-in-ext4-how-it-works-habrahabr/>