Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2020-01-27 15:38:03 -04:00
commit 98b4291ca8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
17 changed files with 380 additions and 0 deletions

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 5"
date="2020-01-23T17:51:58Z"
content="""
Thank you Joey! I can only confirm that the file system was likely a crippled/NFS one... So we would likely need to do some sensing on DataLad side and instruct git-annex. Will continue on our end at https://github.com/datalad/datalad/issues/4075
"""]]

View file

@ -0,0 +1,147 @@
### Please describe the problem.
If there are multiple files with the same keys in the repository and they are copied to bup special remote,
then `git annex fsck --from=bup` with `--jobs=N` option (N >= 2) can show an error and remove these keys from bup.
Based on the error message (about locked .git/annex/tmp/ file), this problem is probably not specific to bup,
but I tested it with bup only.
### What steps will reproduce the problem?
1. Configure a bup special remote.
2. Add files with the same content to annex (and with the same backend).
3. Copy these files to bup.
4. Run `git annex fsck --from=bup -JN` several times, until it removes these keys from bup.
### What version of git-annex are you using? On what operating system?
git-annex 7.20191230-g985373f8e, build from source, on Debian GNU/Linux buster.
bup 0.29.3-2 from Debian sid. Also tried with bup 0.30, build from source.
### Please provide any additional information below.
[[!format txt """
~ $ mkdir testdir
~ $ cd testdir
~/testdir $
~/testdir $ git init
Initialized empty Git repository in /home/test/testdir/.git/
~/testdir $
~/testdir $ git annex init testrepo
init testrepo (scanning for unlocked files...)
ok
(recording state in git...)
~/testdir $
~/testdir $ ls ~/.bup/index-cache/
~/testdir $
~/testdir $ git annex initremote bup type=bup buprepo=~/testdir/.bup encryption=none
initremote bup (bup init...)
Reinitialized existing Git repository in /home/test/.bup/
Initialized empty Git repository in /home/test/testdir/.bup/
ok
(recording state in git...)
~/testdir $
~/testdir $ ls ~/.bup/index-cache/
None__home_test_testdir__bup
~/testdir $
~/testdir $ echo aaa >file1
~/testdir $ echo aaa >file2
~/testdir $
~/testdir $ git annex add .
add file1
ok
add file2
ok
(recording state in git...)
~/testdir $
~/testdir $ git commit -m files
[master (root-commit) 7a03b66] files
2 files changed, 2 insertions(+)
create mode 120000 file1
create mode 120000 file2
~/testdir $
~/testdir $ git -C .bup show-ref
~/testdir $
~/testdir $ git annex whereis
whereis file1 (1 copy)
5d9b0df2-000b-4273-bc4a-fb3b9d8319bd -- testrepo [here]
ok
whereis file2 (1 copy)
5d9b0df2-000b-4273-bc4a-fb3b9d8319bd -- testrepo [here]
ok
~/testdir $
~/testdir $ git annex copy --to=bup .
copy file1 (to bup...)
bloom: creating from 1 file (3 objects).ing: 0 kbytes
Receiving index from server: 1156/1156, done.
bloom: creating from 1 file (3 objects).
ok
copy file2 ok
(recording state in git...)
~/testdir $
~/testdir $ git annex lookupkey file1 file2
SHA256E-s4--17e682f060b5f8e47ea04c5c4855908b0a5ad612022260fe50e11ecb0cc0ab76
SHA256E-s4--17e682f060b5f8e47ea04c5c4855908b0a5ad612022260fe50e11ecb0cc0ab76
~/testdir $
~/testdir $ git -C .bup show-ref
2076647ee23ad632c8cf96caf51febbd0604452c refs/heads/SHA256E-s4--17e682f060b5f8e47ea04c5c4855908b0a5ad612022260fe50e11ecb0cc0ab76
~/testdir $
~/testdir $ git annex fsck --from=bup
fsck file1
(checksum...) ok
fsck file2
(checksum...) ok
(recording state in git...)
~/testdir $
~/testdir $ git -C .bup show-ref
2076647ee23ad632c8cf96caf51febbd0604452c refs/heads/SHA256E-s4--17e682f060b5f8e47ea04c5c4855908b0a5ad612022260fe50e11ecb0cc0ab76
"""]]
Now run `git annex fsck --from=bup -J2` multiple times, until it drops the key from bup...
[[!format txt """
~/testdir $ git annex fsck --from=bup -J2
fsck file1 fsck file2
100% 4 B 5 B/s 0s
content cannot be completely removed from bup remote
file2: Bad file size (4 B smaller); dropped from bup
(checksum...)
git-annex: .git/annex/tmp/fsck14654.SHA256E-s4--17e682f060b5f8e47ea04c5c4855908b0a5ad612022260fe50e11ecb0cc0ab76: openBinaryFile: resource busy (file is locked)
failed
(fixing location log) (checksum...) ok
(recording state in git...)
git-annex: fsck: 1 failed
~/testdir $
~/testdir $ git -C .bup show-ref
~/testdir $
~/testdir $ git annex whereis
whereis file1 (2 copies)
5d9b0df2-000b-4273-bc4a-fb3b9d8319bd -- testrepo [here]
88cc362a-f87a-43c7-b194-e79b2ee91828 -- [bup]
ok
whereis file2 (2 copies)
5d9b0df2-000b-4273-bc4a-fb3b9d8319bd -- testrepo [here]
88cc362a-f87a-43c7-b194-e79b2ee91828 -- [bup]
ok
~/testdir $
~/testdir $ git annex fsck --from=bup
fsck file1 (fixing location log)
** Based on the location log, file1
** was expected to be present, but its content is missing.
failed
fsck file2 ok
(recording state in git...)
git-annex: fsck: 1 failed
~/testdir $
~/testdir $ git annex whereis
whereis file1 (1 copy)
5d9b0df2-000b-4273-bc4a-fb3b9d8319bd -- testrepo [here]
ok
whereis file2 (1 copy)
5d9b0df2-000b-4273-bc4a-fb3b9d8319bd -- testrepo [here]
ok
"""]]

View file

@ -0,0 +1,20 @@
### Please describe the problem.
Full build logs are at http://neuro.debian.net/_files/_buildlogs/git-annex/7.20191230+git152-gefb981388
[[!format sh """
...
prop_read_write_transferinfo: FAIL
*** Failed! Exception: 'recoverEncode: invalid argument (invalid character)' (after 1 test):
Exception thrown while showing test case: 'recoverEncode: invalid argument (invalid character)'
Use --quickcheck-replay=507010 to reproduce.
"""]]
[[!meta author=yoh]]
[[!tag projects/datalad]]

View file

@ -0,0 +1,89 @@
### Please describe the problem.
git annex commands with `--all` option in tuned repository (with `annex.tune.branchhash1=true`) do not do anything.
### What steps will reproduce the problem?
1. Initialize a tuned annex repository with `git annex init -c annex.tune.branchhash1=true`.
2. Add some files to annex.
3. Now `git annex whereis --all` and `git annex fsck --all` (and maybe other commands) don't show/do anything.
### What version of git-annex are you using? On what operating system?
Version 7.20191230-g985373f8e, compiled from sources, on Debian buster 10.2.
### Please provide any additional information below.
[[!format txt """
~ $ mkdir testdir
~ $ cd testdir
~/testdir $
~/testdir $ git init
Initialized empty Git repository in /home/test/testdir/.git/
~/testdir $
~/testdir $ git annex init -c annex.tune.branchhash1=true testrepo
init testrepo (scanning for unlocked files...)
ok
(recording state in git...)
~/testdir $
~/testdir $ echo abcabc >file
~/testdir $
~/testdir $ git annex add file
add file
ok
(recording state in git...)
~/testdir $
~/testdir $ git commit -m file
[master (root-commit) b910684] file
1 file changed, 1 insertion(+)
create mode 120000 file
~/testdir $
~/testdir $ git annex whereis
whereis file (1 copy)
67d9c35f-e206-404f-a9da-6c94894a4f9f -- testrepo [here]
ok
~/testdir $
~/testdir $ git annex whereis --all
~/testdir $
~/testdir $ git annex fsck
fsck file (checksum...) ok
(recording state in git...)
~/testdir $
~/testdir $ git annex fsck --all
(recording state in git...)
"""]]
But `--key` option works:
[[!format txt """
~/testdir $ git annex lookupkey file
SHA256E-s7--2ed91d820157c0530ffbae54122d998e0de6d958f266b682f7c528942f770470
~/testdir $
~/testdir $ git annex whereis --key SHA256E-s7--2ed91d820157c0530ffbae54122d998e0de6d958f266b682f7c528942f770470
whereis SHA256E-s7--2ed91d820157c0530ffbae54122d998e0de6d958f266b682f7c528942f770470 (1 copy)
67d9c35f-e206-404f-a9da-6c94894a4f9f -- testrepo [here]
ok
~/testdir $
~/testdir $ git annex fsck --key SHA256E-s7--2ed91d820157c0530ffbae54122d998e0de6d958f266b682f7c528942f770470
fsck SHA256E-s7--2ed91d820157c0530ffbae54122d998e0de6d958f266b682f7c528942f770470 (checksum...) ok
(recording state in git...)
"""]]
Repository status:
[[!format txt """
~/testdir $ find .git/annex/objects/ -type f
.git/annex/objects/J3/3f/SHA256E-s7--2ed91d820157c0530ffbae54122d998e0de6d958f266b682f7c528942f770470/SHA256E-s7--2ed91d820157c0530ffbae54122d998e0de6d958f266b682f7c528942f770470
~/testdir $
~/testdir $ git ls-tree -r git-annex
100644 blob 20f9faf7ca569d23da5f106a445609d018fa221d activity.log
100644 blob 71f3551b7119daa3c4679d2b790d72b6bc06cbb8 c34/SHA256E-s7--2ed91d820157c0530ffbae54122d998e0de6d958f266b682f7c528942f770470.log
100644 blob d475e423f6fb4863559e8cca981ae8a433f68516 difference.log
100644 blob bf91bd54df30e28f40b49670cf9c9c26ff600a22 uuid.log
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Of course, I love it! Great project, thanks, Joey!
However, /me always wants more features from it. It's great that git-annex continues to develop.

View file

@ -0,0 +1,4 @@
I installed git-annex in windows using the file git-annex-installer.exe, and now each time I'm starting my computer I get a message telling me that "C:\Program Files\Git\cmd\git-annex-autostart.vbs" cannot be found.
This is very annoying and I don't need git-annex to be started at startup. I looked in msconfig.exe and I didn't find any entry for git-annex. Is there a way to disable this?

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="thanks"
date="2020-01-23T16:51:44Z"
content="""
\"the only user-visible improvement is these error messages\" -- FWIW, I've been bitten by the lack of config param checking in the past (thought I had set a chunk size but didn't due to misspelled param name, had to re-create the remote.)
"""]]

View file

@ -0,0 +1,7 @@
I use git-annex to manage my Sansa Clip Zip running Rockbox as a directory special remote since it has a FAT filesystem and I don't want to waste half the storage. I'd like to avoid having a copy of all my music and podcasts on my laptop as well, but git-annex only seems to be able to export files that are locally present. Would it be possible to have git-annex try to copy non-present files directly from remotes where it believes the files are present, starting with the lowest cost remote?
It would also be cool to be able to convert subdirectory information on the remote into metadata in the repository. For example, I delete podcasts after listening to them, so when git-annex detects that it could either move it into the archive dir or add a "listened" tag in the repo.
Soeaking of metadata, even though special remotes don't support it, I think it would be reasonable to treat files that have never been imported as having no metadata or some configurable default metadata per directory (like tag=listened or status=new) and use the metadata in the repo for files that have been imported previously when evaluating the wanted expression.
Thoughts?

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 1"
date="2020-01-26T18:43:55Z"
content="""
It's best to post enhancement suggestions under [[todo]]. In this case there's already a similar item at [[todo/git-annex-export_--from_option]], so let's move this thread there.
"""]]

View file

@ -0,0 +1,17 @@
Hey folks.
Repository B is a external 4Tb HDD kept in cold storage in an offsite location. It was a fully copy of everything in Git Annex about a year ago and serves as an offsite, offline backup.
I'd like to update it.
Repository A is my laptop, with about a 500Gb HDD. It probably has enough free space to `git annex get` a copy of all files that have been created since 1 year ago in Repository B. I'd like to;
1) Get those files to my laptop that need updating in repository B
2) Head to the offsite location.
3) Mount the Repository B HDD on my laptop
4) From the Repository B, add A (the laptop) as a remote, run a "get" and a "sync", effectively updating Repository B with a "delta" of new files
5) From Repository A, sync with B, getting an updated index of what exists on Repository B, for updating all the other online repositories I am back home.
The question is, how do I structure the command in Step 1 to get the "delta" of files to update Repository B with?
Thanks!

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 1"
date="2020-01-24T00:26:35Z"
content="""
[[git-annex-get]] takes [[git-annex-matching-options]]; see `--in=repository` .
"""]]

View file

@ -0,0 +1,12 @@
I want to create a repo with the contents of an old repo, but with a fresh commit history, etc. I do not care about preserving the old repo.
However, I just want to make sure that the following steps will not result in the new repo being broken in some way:
1. Create a new git annex repo.
2. Copy all of the symlinks from the old repo to the new one.
3. Move .git/annex/objects from the old repo to the new one.
4. Then "git annex add" everything in the new repo and commit.
Please let me know also if there is a better way to achieve the same results.
Thanks for your help.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="starting over with a new commit history"
date="2020-01-25T01:07:19Z"
content="""
You could [squash all commits](https://stackoverflow.com/questions/25356810/git-how-to-squash-all-commits-on-branch) on all your branches to one commit. See also [[git-annex-forget]] .
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="CandyAngel"
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
subject="comment 2"
date="2020-01-25T01:11:42Z"
content="""
Create the new repository and then add the \"source\" repository as a [local cache](/tips/local_caching_of_annexed_files).
This will allow you to copy the symlinks to the new repository and `git annex get` the content, in any order you like, with all the safety precautions of git-annex. The fact that it came from the cache isn't stored either, so it is added cleanly!
I use this method very heavily and it works really well.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="mario"
avatar="http://cdn.libravatar.org/avatar/4c63b0935789d29210d0bd8cad8d7ac7"
subject="Thank you"
date="2020-01-23T19:52:47Z"
content="""
This is is a great feature, especially `--hide-missing`! I really missed this in the past. (Strangely it took me until now to notice that you implemented it.) Thank you.
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="Chel"
avatar="http://cdn.libravatar.org/avatar/a42feb5169f70b3edf7f7611f7e3640c"
subject="comment 4"
date="2020-01-26T22:48:07Z"
content="""
Another theoretical use case (not available for now, but maybe for the future):
verify with checksums parts of the file and re-download only those parts/chunks, that are bad.
For this you need a checksum for each chunk and a \"global\" checksum in key, that somehow incorporates all these chunk checksums.
An example of this is Tiger Tree Hash in file sharing.
When I used the SHA256 backend in my downloads, I often wondered that the long process of checksumming a movie
or an OS installation .iso is not ideal. Because if the file download is not finished, I get the wrong checksum,
and the whole process needs to be repeated.
And in the future git-annex can integrate a FUSE filesystem and literally store just chunks of files,
but represent files as a whole in this virtual filesystem view.
"""]]

View file

@ -0,0 +1,3 @@
In the [[design/external_special_remote_protocol]], the `File` parameter of various requests is specified to be a regular file. If it could be a named pipe, this would open up useful possibilities: [[todo/git-annex-cat]], [[todo/transitive_transfers]], [[todo/git-annex-export_--from_option]], [[todo/OPT__58_____34__bundle__34___get_+_check___40__of_checksum__41___in_a_single_operation/]], [[todo/to_and_from_multiple_remotes]], faster [[`git-annex-fsck --from`|git-annex-fsck]], passing named pipes on `git-annex` command line (for streaming the outputs of a running command directly to a remote, or using `git-annex` as a building block of larger workflows), and maybe others.
An optional protocol request `NAMEDPIPESSUPPORTED`, similar to [[`EXPORTSUPPORTED`|design/external_special_remote_protocol/export_and_import_appendix#index1h2]], could tell `git-annex` that the remote supports named pipes. For remotes that don't declare such support, it could be emulated: before sending e.g. `TRANSFER STORE Key File`, if `File` is a pipe and the remote hasn't said it supports pipes, `git-annex` would drain the pipe to a `TempFile` and then send `TRANSFER STORE Key TempFile` instead. Then the rest of `git-annex` can presume pipes support.

View file

@ -0,0 +1,3 @@
Add an option to give git-annex a path to a RAM disk, and an option to set the maximum space to be used there. git-annex often knows the size of the files it is downloading, since it's part of the key, so can determine in advance if a tempfile of that size would fit on the RAM disk. One could instead symlink `.git/annex/tmp/` to a RAM disk, but this could cause memory overflow if a large file is transferred.
Related: [[todo/keep_git-annex_branch_checked_out__63__]], [[todo/transitive_transfers]]