Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
5332cf8a80
23 changed files with 548 additions and 0 deletions
69
doc/bugs/__34__rename__58___permission_denied__34__.mdwn
Normal file
69
doc/bugs/__34__rename__58___permission_denied__34__.mdwn
Normal file
|
@ -0,0 +1,69 @@
|
|||
## I'm considering this a "false alarm", but leaving it around for others who may run into it
|
||||
|
||||
It took a long time to add the files (50 minutes). When I did, and did a `git status`, the ones that failed due to "permission denied" just appeared as having not been added. I added them, and it worked fine. I have no reason to believe that my folder has gotten corrupted.
|
||||
|
||||
So I don't personally think this needs fixing. But if anyone else out there runs into this issue, at least this page is here.
|
||||
|
||||
|
||||
### Please describe the problem.
|
||||
|
||||
When adding 400k files to a new annex, I get an error "rename: permission denied". It doesn't seem to be about file permissions (I have `chown`ed them), and it's inconsistent from run to run. So each time I try the import, different files may show the permission denied error.
|
||||
|
||||
One thing I'm concerned about is how to confirm whether these files have made it into annex, or if I now have a corrupted folder structure.
|
||||
|
||||
I do intend to do smaller imports, or try using `-J1`.
|
||||
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
1. `git config annex.jobs cpus`
|
||||
2. `git annex add .`
|
||||
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
macOS 10.15.7
|
||||
|
||||
```
|
||||
git-annex version: 8.20210310
|
||||
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Feeds Testsuite S3 WebDAV
|
||||
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.28 DAV-1.3.4 feed-1.3.0.1 ghc-8.10.4 http-client-0.7.6 persistent-sqlite-2.11.1.0 torrent-10000.1.1 uuid-1.3.14 yesod-1.6.1.0
|
||||
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
|
||||
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
|
||||
operating system: darwin x86_64
|
||||
supported repository versions: 8
|
||||
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
|
||||
```
|
||||
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
iMac 10-core i9 (maybe 20 threads?)
|
||||
|
||||
```
|
||||
git-annex: .git/annex/othertmp/ingest-A23998-2216: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ingest-Ad23998-21291: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ingest-P23998-30359: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ingest-Audio23998-182890: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ingest-wasd_clap_sys100_cra23998-206554: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ingest-wasd_clap_sys100_f23998-206560: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ingest-wasd_clap_sys100_f23998-206561: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ingest-Fairligh23998-248968: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ingest-ly23998-268165: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ingest-123998-269213: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ingest-46223998-278087.wav: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ing23998-290478: rename: permission denied (Permission denied)
|
||||
git-annex: .git/annex/othertmp/ingest-H23998-292758: rename: permission denied (Permission denied)
|
||||
```
|
||||
|
||||
[[!format sh """
|
||||
# If you can, paste a complete transcript of the problem occurring here.
|
||||
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
|
||||
|
||||
|
||||
# End of transcript or log.
|
||||
"""]]
|
||||
|
||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
Absolutely! :)
|
117
doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn
Normal file
117
doc/bugs/case_where_using_pathspec_with_git-commit_leaves_s.mdwn
Normal file
|
@ -0,0 +1,117 @@
|
|||
424bef6b6 (smudge: check for known annexed inodes before checking
|
||||
annex.largefiles, 2021-05-03) fixed a case where an unlocked annexed
|
||||
file that annex.largefiles does not match could get its unchanged
|
||||
content checked into git. This was in response to
|
||||
<https://git-annex.branchable.com/forum/one-off_unlocked_annex_files_that_go_against_large/>.
|
||||
|
||||
In a comment there, Joey said:
|
||||
|
||||
> I've made a change that seems to work, and will probably not break
|
||||
> other cases, although this is a complex and subtle area.
|
||||
|
||||
I'm following up with a change in behavior flagged by a DataLad test.
|
||||
As with most things in this area, I have a hard time reasoning about
|
||||
what the expected behavior should be and whether it should be
|
||||
considered a bug. Here's the reproducer:
|
||||
|
||||
[[!format sh """
|
||||
set -eu
|
||||
|
||||
cd "$(mktemp -d "${TMPDIR:-/tmp}"/ga-XXXXXXX)"
|
||||
|
||||
git version
|
||||
git annex version | head -1
|
||||
|
||||
git init -q
|
||||
git annex init
|
||||
|
||||
echo a >foo
|
||||
git annex add foo
|
||||
git commit --quiet -m 'add foo'
|
||||
|
||||
git annex unlock foo
|
||||
printf '* annex.largefiles=nothing\n' >.gitattributes
|
||||
|
||||
sleep 1
|
||||
|
||||
git annex add foo
|
||||
git commit -q -m 'commit unlocked' -- foo
|
||||
|
||||
set -x
|
||||
export PS4='> '
|
||||
git diff HEAD^- -- foo
|
||||
git diff --cached
|
||||
"""]]
|
||||
|
||||
Here's the output with 8.20210428:
|
||||
|
||||
```
|
||||
git version 2.31.1.659.g12c5fe8677
|
||||
git-annex version: 8.20210428
|
||||
[...]
|
||||
> git diff HEAD^- -- foo
|
||||
diff --git a/foo b/foo
|
||||
deleted file mode 120000
|
||||
index 8a2a0c9..0000000
|
||||
--- a/foo
|
||||
+++ /dev/null
|
||||
@@ -1 +0,0 @@
|
||||
-.git/annex/objects/3z/F8/SHA256E-s2--87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7/SHA256E-s2--87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7
|
||||
\ No newline at end of file
|
||||
diff --git a/foo b/foo
|
||||
new file mode 100644
|
||||
index 0000000..7898192
|
||||
--- /dev/null
|
||||
+++ b/foo
|
||||
@@ -0,0 +1 @@
|
||||
+a
|
||||
> git diff --cached
|
||||
```
|
||||
|
||||
And here's the output with a recent commit on master following
|
||||
424bef6b6:
|
||||
|
||||
```
|
||||
git version 2.31.1.659.g12c5fe8677
|
||||
git-annex version: 8.20210429-ge811a50e2
|
||||
[...]
|
||||
> git diff HEAD^- -- foo
|
||||
diff --git a/foo b/foo
|
||||
deleted file mode 120000
|
||||
index 8a2a0c9..0000000
|
||||
--- a/foo
|
||||
+++ /dev/null
|
||||
@@ -1 +0,0 @@
|
||||
-.git/annex/objects/3z/F8/SHA256E-s2--87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7/SHA256E-s2--87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7
|
||||
\ No newline at end of file
|
||||
diff --git a/foo b/foo
|
||||
new file mode 100644
|
||||
index 0000000..3de500c
|
||||
--- /dev/null
|
||||
+++ b/foo
|
||||
@@ -0,0 +1 @@
|
||||
+/annex/objects/SHA256E-s2--87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7
|
||||
> git diff --cached
|
||||
diff --git a/foo b/foo
|
||||
index 3de500c..7898192 100644
|
||||
--- a/foo
|
||||
+++ b/foo
|
||||
@@ -1 +1 @@
|
||||
-/annex/objects/SHA256E-s2--87428fc522803d31065e7bce3cf03fe475096631e5e07bbd7a0fde60c4cf25c7
|
||||
+a
|
||||
```
|
||||
|
||||
Before 424bef6b6, `git annex add foo + git commit ... foo` results in
|
||||
a commit that has foo's content tracked in git. After 424bef6b6, the
|
||||
unlocked file is still recorded, and the switch to being tracked by
|
||||
git ends up staged in the index.
|
||||
|
||||
The new behavior isn't seen if the pathspec is dropped from `git
|
||||
commit`. Also, without the sleep, it isn't triggered reliably
|
||||
(presumably because the index and foo have the same mtime, bypassing
|
||||
the clean filter).
|
||||
|
||||
Thanks for taking a look.
|
||||
|
||||
[[!meta author=kyle]]
|
||||
[[!tag projects/datalad]]
|
|
@ -0,0 +1,93 @@
|
|||
Thanks for extending `fromkey` to support for unlocked files. When
|
||||
updating some DataLad code to make use of this, a test flagged a
|
||||
difference between how links and pointer files are handled: the
|
||||
necessary leading directories will be created for links but not
|
||||
pointer files.
|
||||
|
||||
[[!format sh """
|
||||
cd "$(mktemp -d "${TMPDIR:-/tmp}"/ga-XXXXXXX)" || exit 1
|
||||
|
||||
git version
|
||||
git annex version | head -1
|
||||
|
||||
git init -q
|
||||
git annex init
|
||||
|
||||
set -x
|
||||
|
||||
git annex fromkey --force \
|
||||
SHA256E-s4--b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c \
|
||||
foo/a
|
||||
git cat-file -p :foo/a
|
||||
|
||||
git config annex.addunlocked true
|
||||
git annex fromkey --force \
|
||||
SHA256E-s4--7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730 \
|
||||
bar/a
|
||||
"""]]
|
||||
|
||||
```
|
||||
git version 2.31.1.705.g1ce651569c
|
||||
git-annex version: 8.20210429-gdab203070
|
||||
init (scanning for unlocked files...)
|
||||
ok
|
||||
(recording state in git...)
|
||||
+ git annex fromkey --force SHA256E-s4--b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c foo/a
|
||||
fromkey foo/a ok
|
||||
(recording state in git...)
|
||||
+ git cat-file -p :foo/a
|
||||
../.git/annex/objects/91/9x/SHA256E-s4--b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c/SHA256E-s4--b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c+ git config annex.addunlocked true
|
||||
+ git annex fromkey --force SHA256E-s4--7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730 bar/a
|
||||
fromkey bar/a
|
||||
git-annex: bar/a: openBinaryFile: does not exist (No such file or directory)
|
||||
failed
|
||||
(recording state in git...)
|
||||
git-annex: fromkey: 1 failed
|
||||
```
|
||||
|
||||
The caller can of course make sure that leading directories exist, but
|
||||
I think it makes sense for the locked and unlocked variants to behave
|
||||
the same here. What do you think about the patch below?
|
||||
|
||||
[[!format patch """
|
||||
From f6c97b8d01c7e9b8069638e9827062aa2462d429 Mon Sep 17 00:00:00 2001
|
||||
From: Kyle Meyer <kyle@kyleam.com>
|
||||
Date: Thu, 6 May 2021 11:11:14 -0400
|
||||
Subject: [PATCH] fromkey: create directory for pointer files too
|
||||
|
||||
fromkey creates leading directories for symbolic links. Do the same
|
||||
for pointer files.
|
||||
---
|
||||
Command/FromKey.hs | 2 +-
|
||||
1 file changed, 1 insertion(+), 1 deletion(-)
|
||||
|
||||
diff --git a/Command/FromKey.hs b/Command/FromKey.hs
|
||||
index eadb89fd1..16ff1693f 100644
|
||||
--- a/Command/FromKey.hs
|
||||
+++ b/Command/FromKey.hs
|
||||
@@ -106,6 +106,7 @@ perform matcher key file = lookupKeyNotHidden file >>= \case
|
||||
, matchKey = Just key
|
||||
}
|
||||
else keyMatchInfoWithoutContent key file
|
||||
+ createWorkTreeDirectory (parentDir file)
|
||||
ifM (addUnlocked matcher mi contentpresent)
|
||||
( do
|
||||
stagePointerFile file Nothing =<< hashPointerFile key
|
||||
@@ -115,7 +116,6 @@ perform matcher key file = lookupKeyNotHidden file >>= \case
|
||||
else writepointer
|
||||
, do
|
||||
link <- calcRepo $ gitAnnexLink file key
|
||||
- createWorkTreeDirectory (parentDir file)
|
||||
addAnnexLink link file
|
||||
)
|
||||
next $ return True
|
||||
|
||||
base-commit: dab2030702200bc9abea4bff9ce83ba63aeca41c
|
||||
--
|
||||
2.31.1.705.g1ce651569c
|
||||
|
||||
"""]]
|
||||
|
||||
|
||||
[[!meta author=kyle]]
|
||||
[[!tag projects/datalad]]
|
|
@ -0,0 +1,41 @@
|
|||
When using `reinject <src> <dest>` and `dest` is an absolute path to a
|
||||
pointer file, the operation silently fails to reinject the content.
|
||||
|
||||
[[!format sh """
|
||||
cd "$(mktemp -d "${TMPDIR:-/tmp}"/ga-XXXXXXX)" || exit 1
|
||||
|
||||
git version
|
||||
git annex version | head -1
|
||||
|
||||
git init -q
|
||||
git annex init
|
||||
git config annex.addunlocked true
|
||||
|
||||
git annex fromkey --force \
|
||||
SHA256E-s3--2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae \
|
||||
foo
|
||||
|
||||
printf foo >.git/tmp-to-copy
|
||||
git annex reinject .git/tmp-to-copy "$PWD"/foo
|
||||
echo $?
|
||||
cat foo
|
||||
"""]]
|
||||
|
||||
```
|
||||
git version 2.31.1.705.g1ce651569c
|
||||
git-annex version: 8.20210429-g06e996efa
|
||||
init (scanning for unlocked files...)
|
||||
ok
|
||||
(recording state in git...)
|
||||
fromkey foo ok
|
||||
(recording state in git...)
|
||||
0
|
||||
/annex/objects/SHA256E-s3--2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae
|
||||
```
|
||||
|
||||
If a link destination is used (i.e. drop the `addunlocked`
|
||||
configuration in the script above) or a relative path is used
|
||||
(i.e. drop the `"$PWD"/`), the content is injected.
|
||||
|
||||
[[!meta author=kyle]]
|
||||
[[!tag projects/datalad]]
|
|
@ -0,0 +1,15 @@
|
|||
[[!comment format=mdwn
|
||||
username="cecile.madjar@d95f9e618c3dff4829e7fedba1a71e1499542f3f"
|
||||
nickname="cecile.madjar"
|
||||
avatar="http://cdn.libravatar.org/avatar/a32ab97180285c0e5095bad4616a4d87"
|
||||
subject="comment 3"
|
||||
date="2021-05-04T20:24:50Z"
|
||||
content="""
|
||||
Hello,
|
||||
|
||||
Is there any updates on this? I am using an Apple M1 Silicon and I am blocked in all my projects because I cannot install git-annex on my computer. Do you have an approximate idea of when this would be available for Apple M1 Silicon users?
|
||||
|
||||
Thank you,
|
||||
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 4"
|
||||
date="2021-05-04T20:38:01Z"
|
||||
content="""
|
||||
Hmm, shouldn't it work just fine with rosetta?
|
||||
"""]]
|
|
@ -0,0 +1,9 @@
|
|||
[[!comment format=mdwn
|
||||
username="cecile.madjar@d95f9e618c3dff4829e7fedba1a71e1499542f3f"
|
||||
nickname="cecile.madjar"
|
||||
avatar="http://cdn.libravatar.org/avatar/a32ab97180285c0e5095bad4616a4d87"
|
||||
subject="comment 5"
|
||||
date="2021-05-04T21:12:52Z"
|
||||
content="""
|
||||
Thank you Lukey. Indeed, after installing Rosetta2 it worked. Thank you!
|
||||
"""]]
|
|
@ -0,0 +1,23 @@
|
|||
[[!comment format=mdwn
|
||||
username="fortran"
|
||||
avatar="http://cdn.libravatar.org/avatar/ee27e12e945c0af698d58f0d8dde2457"
|
||||
subject="comment 2"
|
||||
date="2021-05-04T19:10:35Z"
|
||||
content="""
|
||||
Oh. Wow. That's a big man page...
|
||||
|
||||
Okay. So if I run `git config annex.sshcaching false`, then things are happier. Well:
|
||||
|
||||
```
|
||||
❯ git annex get file1.nc4
|
||||
get file1.nc4 (from origin...)
|
||||
|
||||
You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.
|
||||
|
||||
annex.sshcaching is not set to true
|
||||
ok
|
||||
(recording state in git...)
|
||||
```
|
||||
|
||||
Now, reading the man pages, I see that the default concurrency is 1, so I think I'm safe? Or is there perhaps something I should use to tell it \"nope\" for that?
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="fortran"
|
||||
avatar="http://cdn.libravatar.org/avatar/ee27e12e945c0af698d58f0d8dde2457"
|
||||
subject="comment 3"
|
||||
date="2021-05-04T19:25:47Z"
|
||||
content="""
|
||||
Essentially, we are hoping to deploy git-annex so the less messages from git-annex, the better for our end users. (Or, I guess, my *supporting* the end users :) )
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 4"
|
||||
date="2021-05-04T20:31:45Z"
|
||||
content="""
|
||||
Even with concurrency enabled, this should not be a problem in your case, as you're manually doing the controlmaster setup.
|
||||
|
||||
@joey I guess there needs to be a way to hide such messages, like git with the `advice.*` configuration options.
|
||||
"""]]
|
|
@ -0,0 +1,14 @@
|
|||
[[!comment format=mdwn
|
||||
username="Atemu"
|
||||
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
|
||||
subject="comment 3"
|
||||
date="2021-05-04T17:19:57Z"
|
||||
content="""
|
||||
I used the exact same settings for the second special remote as the first one: `type=directory chunk=50MiB encryption=hybrid mac=HMACSHA256`.
|
||||
|
||||
GA was 8.20200810 though because my server machine is built from the stable Nixpkgs channel; I will test that again with the most recent version tomorrow.
|
||||
|
||||
`--sameas` won't help here; the special remotes are accessible via the same FS (the second is just a btrfs snapshot of the first) and they'd still only count as one copy. That's the same situation I have right now.
|
||||
|
||||
Counting it as two copies would work but there is a large delay between having moved the files to the special remote and them actually being mirrored (residential internet upload) which means the numcopies of somewhat newly added files wouldn't be correct. It'd be a step up though.
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="Atemu"
|
||||
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
|
||||
subject="comment 4"
|
||||
date="2021-05-05T07:33:07Z"
|
||||
content="""
|
||||
It worked! Thank you so much you two!
|
||||
|
||||
The cipher was indeed different for some reason, what could cause that?
|
||||
"""]]
|
13
doc/forum/avoiding_copies_across_mutiple_file_systems.mdwn
Normal file
13
doc/forum/avoiding_copies_across_mutiple_file_systems.mdwn
Normal file
|
@ -0,0 +1,13 @@
|
|||
I first mentioned this issue in a thread about 4 years ago (https://git-annex.branchable.com/forum/git-annex_across_two_filesystems/), and at time was encouraged to instead open a new thread. Priorities changed, and I'm only now returning to the issue.
|
||||
|
||||
The situation we have is as follows: We have a large collection of boundary condition data used in our weather/climate model. Individual "experiments" are run against specific versions of this data and we would like to minimize the total storage footprint as well as time spent copying data at the beginning of an experiment. The clones for the experiments would always be used in a read-only manner. New files would never be added through these repos.
|
||||
|
||||
At first glance, using git-annex with `git clone --shared` would seem to be a good solution. Unfortunately these experiments span a large number (~10) of separate cross-mounted filesystems and would result ~90% of the experiments still duplicating data rather than sharing across a hardlink.
|
||||
|
||||
A couple of partial solutions suggest themselves. (1) Put all of the clones on the same filesystem as the primary repo, and then create a symlink within each experiment back to the corresponding clone. (2) Maintain a secondary (fully populated) clone on each filesystem and ensure that the experiment setup script clones from the proper secondary.
|
||||
|
||||
Option (1) is viable, but would require some negotiations with the computing center to ensure that there is a single filesystem that gives appropriate privileges to all of our users. Tedious, but probably not a showstopper.
|
||||
|
||||
Option (2) sounds like an improvement over having 90% of the experiments duplicating data locally, except ... because the secondary clones would need to support any recent model configuration, the 10x duplication of "all" data could be much larger than the hundreds of copies of the smaller subsets needed by individual experiments.
|
||||
|
||||
Perhaps the ideal solution would be some sort of special "clone" that uses symlinks back to the primary repository. These special clones would be read only, and could even disable "dangerous" git actions that would allow adding/modifying files. `git-new-workdir` hints that something like this might be possible, but it does not appear to play nicely with git-annex in any event.
|
|
@ -0,0 +1,14 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 1"
|
||||
date="2021-05-05T18:08:49Z"
|
||||
content="""
|
||||
Have I understood you correctly, you have a \"primary\" repository (with all data/keys present), accessible by the clients via NFS/cifs/whatever? And the clients(/\"experiments\") want to check out a specific version/branch from that repo?
|
||||
|
||||
I think you have two alternatives to cloning it everywhere including all keys:
|
||||
|
||||
a) Every client clones the git repo (and remove the \"origin\" remote to ensure that nothing flows back), creates a symlink from `.git/annex/objects` to `/path/to/primary/.git/annex/objects` and checks out whatever version/branch it wants. Easy.
|
||||
|
||||
b) Every client uses the primary repo, but via its own worktree (See `git-worktree`). git-annex supports external worktrees, but I'm not sure what problems could arise in this particular setup.
|
||||
"""]]
|
|
@ -0,0 +1,14 @@
|
|||
[[!comment format=mdwn
|
||||
username="pat"
|
||||
avatar="http://cdn.libravatar.org/avatar/6b552550673a6a6df3b33364076f8ea8"
|
||||
subject="comment 8"
|
||||
date="2021-05-05T19:39:11Z"
|
||||
content="""
|
||||
Do you have any information on actual times for working with big repos?
|
||||
|
||||
As an example, I created one with 400k files. After following the steps here, `git status` takes 8 seconds to complete. I have plenty of resources. So, it's just slow. I am curious what sort of times you're getting with your big repos.
|
||||
|
||||
I will have to see if submodules help with this at all. This material is all reference information, and isn't going to be changed very much. So it's possible I'd be better off with an \"active\" repo, and a \"reference\" repo (maybe connected by submodule, maybe not).
|
||||
|
||||
Joey did make the suggestion of storing those sorts of files in a separate branch. I just did a test, and it appears that the limiting factor is in fact the number of files in the working tree. Deleting a lot of the files brought git back up to speed. So from a simplicity standpoint, I may want to have a `reference` branch with those files in it. And perhaps have two local clones of the repo - one `main` and one `reference` so I can explore and copy files from `reference` to `main` as needed.
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="pat"
|
||||
avatar="http://cdn.libravatar.org/avatar/6b552550673a6a6df3b33364076f8ea8"
|
||||
subject="comment 9"
|
||||
date="2021-05-05T22:51:28Z"
|
||||
content="""
|
||||
Separate branch is a no-go. `git annex info` takes 3 minutes 30 seconds to report 320k annex keys.
|
||||
|
||||
So for my purposes I think I will keep one slow reference repo, and one fast working repo.
|
||||
"""]]
|
|
@ -0,0 +1,13 @@
|
|||
Currently, SHA256E creates duplicate files for different extensions, i.e.:
|
||||
|
||||
```
|
||||
$ l * && l -Li * && sha256sum *
|
||||
lrwxrwxrwx 1 atemu users 198 2021-05-04 03:47 random.1 -> .git/annex/objects/F9/Kk/SHA256E-s104857600--2fdbdc9c3b23d1986a743aede593765e57ade9f173f9fd9766057f0efd63197a.1/SHA256E-s104857600--2fdbdc9c3b23d1986a743aede593765e57ade9f173f9fd9766057f0efd63197a.1
|
||||
lrwxrwxrwx 1 atemu users 198 2021-05-05 10:01 random.2 -> .git/annex/objects/Pm/J1/SHA256E-s104857600--2fdbdc9c3b23d1986a743aede593765e57ade9f173f9fd9766057f0efd63197a.2/SHA256E-s104857600--2fdbdc9c3b23d1986a743aede593765e57ade9f173f9fd9766057f0efd63197a.2
|
||||
3720 -r--r--r-- 1 atemu users 100M 2021-05-04 03:47 random.1
|
||||
49696 -r--r--r-- 1 atemu users 100M 2021-05-05 10:01 random.2
|
||||
2fdbdc9c3b23d1986a743aede593765e57ade9f173f9fd9766057f0efd63197a random.1
|
||||
2fdbdc9c3b23d1986a743aede593765e57ade9f173f9fd9766057f0efd63197a random.2
|
||||
```
|
||||
|
||||
These have the exact same content though, they could be hardlinks of one another instead and nothing would change.
|
3
doc/todo/Use_editorconfig_for_formatting_rules.mdwn
Normal file
3
doc/todo/Use_editorconfig_for_formatting_rules.mdwn
Normal file
|
@ -0,0 +1,3 @@
|
|||
https://editorconfig.org/ is a cross-editor standard for setting formatting rules like indent etc.
|
||||
|
||||
A blurb of elisp probably isn't too useful to vim users and I had some really strange memory leak with it in my Emacs.
|
|
@ -0,0 +1,20 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2021-05-04T16:53:18Z"
|
||||
content="""
|
||||
FWIW, verified that `git annex --debug initremote --sameas web datalad externaltype=datalad type=external encryption=none autoenable=true` makes `git-annex` to make `datalad` special remote to handle those urls. And since we do not have any prioritization handling in datalad we also grab the first one (the api. one) returned by git-annex and proceed with it.
|
||||
|
||||
So, indeed, if you do not like (or even just feel lukewarm) about an idea of adding costs within built-in `web` remote, feel welcome to close, and we will still have a way forward by providing such handling within datalad external special remote. It would be a bit sub-optimal since would require people to install datalad, but at least it would enable desired prioritization in some use cases (e.g. for QA `annex fsck --fast` run).
|
||||
|
||||
And indeed with the singular cost (not even a range of costs) assigned/returned by a remote and no e.g. cost provisioned to be returned by CLAIMURL, I guess there is no (easy) way to mix-in the URL based costs into overall decision making to order the remotes.
|
||||
|
||||
NB with `--sameas` trick above, `git-annex` doesn't even ask `datalad` with CLAIMURL and immediately passes `TRANSFER` of the key to `datalad` external remote. Without `--sameas` - `git-annex` (8.20210330-g0b03b3d) doesn't even bother asking datalad (within `whereis` at least) on either it could CLAIMURL those... even if I assign `annex-cost = 1.0` for datalad remote. Not sure yet if that is \"by design\".
|
||||
|
||||
> When it gets down to the web remote, it tries the urls in whatever order it happens to have them.
|
||||
|
||||
FWIW - I think I have tried to add them in different orders but it always went for the `api` one so I concluded that the order it has them is sorted and there is no way to \"tune it up\".
|
||||
|
||||
P.S. I still wonder why I have some memory of git-annex supporting some (external) way to prioritize URLs... may be it was indeed \"craft a special remote to do that\"...
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2021-05-04T17:50:45Z"
|
||||
content="""
|
||||
> It also seems to me that, if you're splitting a repo, you would also want to include things like trust.log and remote.log, or at least parts of them for some remotes?
|
||||
|
||||
yes. Even if not splitting but just copying a key (or multiple keys) since might need special remote configuration etc.
|
||||
"""]]
|
|
@ -0,0 +1,16 @@
|
|||
[[!comment format=mdwn
|
||||
username="Atemu"
|
||||
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
|
||||
subject="comment 3"
|
||||
date="2021-05-04T17:37:04Z"
|
||||
content="""
|
||||
Oh, I've already got all of that implemented; it's just the flag for disabling that behaviour at build time that's missing.
|
||||
|
||||
What I did is to conditionally set the executable to `/bin/cp` and the reflink param to `-c`.
|
||||
|
||||
The problem with using it without a fallback is that when you use it on a FS that doesn't support CoW, `/bin/cp` will hard-fail and make unlocking impossible. GNU coreutils actually fall back automatically by themselves, GA couldn't handle reflink cp failing before AFAICT. I refactored the copy functions a bit to make it fall back properly.
|
||||
|
||||
The reason I want it to be a configure flag is that some users might use GA exclusively on non-APFS FSs (trying to reflink copy here would be a waste of time) and some might prefer to use their $PATH's uutils-coreutils whose `cp` can handle `--reflink` just like the GNU ones.
|
||||
|
||||
~~I originally wanted to add it as a cabal configure flag but apparently you can't reference those anywhere?~~ Found this: https://stackoverflow.com/questions/48157516/conditional-compilation-in-haskell-submodule, that's probably what I'll end up doing. Will default to true on macOS.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Atemu"
|
||||
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
|
||||
subject="comment 4"
|
||||
date="2021-05-05T05:43:56Z"
|
||||
content="""
|
||||
https://github.com/Atemu/git-annex/tree/feature/macOS-reflinks
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="Atemu"
|
||||
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
|
||||
subject="comment 5"
|
||||
date="2021-05-05T06:04:49Z"
|
||||
content="""
|
||||
I've also got some small fixes for things that came up during development:
|
||||
|
||||
https://github.com/Atemu/git-annex/tree/misc-fixes
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue