Merge branch 'master' into message-serialization

This commit is contained in:
Joey Hess 2020-12-07 13:33:14 -04:00
commit a0e1650a15
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 215 additions and 9 deletions

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="falsifian"
avatar="http://cdn.libravatar.org/avatar/59c3c23c500d20d83ecb9d1f149be9ae"
subject="comment 2"
date="2020-12-02T16:52:10Z"
content="""
Thanks for the info. So I guess some combination of `git credential fill` and `git credential approve` is causing this. If I have more time I'll fiddle with those commands.
"""]]

View file

@ -0,0 +1,66 @@
In DataLad, there's a spot where we set `alwayscommit=false` when
doing bulk setting of metadata. When debugging a test failure related
to this, Adina, who was testing on Windows, reported that the script
below shows two commits rather than the one commit I see on my end and
would expect. Here's the good output on my Debian system:
[[!format sh """
set -eu
cd "$(mktemp -d "${TMPDIR:-/tmp}"/ga-XXXXXXX)"
alwayscommit=false
git init && git annex init
echo foo >foo && git annex add foo
git -c annex.alwayscommit=$alwayscommit annex metadata --set a=b foo
git -c annex.alwayscommit=$alwayscommit annex metadata --set c=d foo
git commit -mc
git annex version
echo "-----"
git log --oneline --stat git-annex -- '*.met'
"""]]
```
[... 23 lines ...]
git-annex version: 8.20201128-g2878ab456
build flags: Assistant Webapp Pairing Inotify TorrentParser Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso hook external
operating system: linux x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
local repository version: 8
-----
9436e93 update
...9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c.log.met | 2 ++
1 file changed, 2 insertions(+)
```
And here's the output Adina reported on Windows:
```
c44294d (git-annex) update
...b9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c.log.met | 1 +
1 file changed, 1 insertion(+)
2395118 update
...b9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c.log.met | 1 +
1 file changed, 1 insertion(+)
```
Adina saw this behavior on two Windows machines, one with
8.20201008-g65c1687 and the other with 8.20201008-g7e24b2587.
I realize this issue probably isn't something you can debug on your
end, but perhaps you can think of an obvious culprit or another
Windows user has ideas.
Thanks.
Related discussion on DataLad issue tracker:
https://github.com/datalad/datalad/pull/5202#discussion_r536178860
[[!meta author=kyle]]
[[!tag projects/datalad]]

View file

@ -0,0 +1,59 @@
Update on 3 new features. Appropriate to the season, there's a past,
a present, and a future one.
---
Past: The last release added `git annex adjust --unlock-present`
which might be just what you were looking for, if you used to use direct
mode. It unlocks files whose content is present, but files whose content is
missing are dangling symlinks. Currently, the branch is only refresh after
git-annex finishes all requested transfers. There is a
annex.adjustedbranchrefresh config that can make it refresh more
frequently, but doing it after every file may be too slow in a large repo.
I hope to speed it up enough eventually to perhaps make this the default
later in places where `--unlock` is currently used.
(That work was sponsored by Gioele Barabucci ENK)
----
Present: This week, I've been working on an internal protocol to
comminicate about all console IO
that git-annex does, so it can start some child processes to perform
long-running tasks, like downloads. The goal is to
[[detect stalled transfers and cancel or retry them|todo/more_extensive_retries_to_mask_transient_failures]].
This is after previous attempts, at doing it using threads failed.
I finished the IO serialization part today, but may put off the rest until
a bit later.
(This work was sponsored by Jake Vosloo, Mark Reidenbach, and Graham Spencer
[on Patreon](https://patreon.com/joeyh/))
----
Future: We've been thinking about a [[todo/borg_special_remote]] for a
while, and last night I realized that something I implemented this summer
for [[todo/importing_from_special_remote_without_downloading]] might be
just what's needed for this new kind of remote. That was surprising!
At the time, I had been doubtful about the new feature, since it seemed
only the directory special remote would benefit from it at all.
The idea is the user runs a backup program, like borg, to store a copy of
your git-annex repo, and then points git-annex at it, to learn what annexed
content is stored in it. This is particularly exciting to me, because it's
a whole new kind of special remote, and could be used for lots of backup
programs beyond borg, and probably other stuff.
Imagine something like this:
borg init user@host:/annex.borg
borg create user@host:/annex.borg::{now} .git
git annex initremote borg type=borg repolocation=user@host:/annex.borg
git annex import --known --from borg
git annex drop --unused
And now all your old unused annex objects have been moved into the borg
repo, where they're efficiently stored with its data deduplication.
And of course, you can use `git-annex get` to get them from there.
I have a feeling I'll be haunted by this idea until I implement it..

View file

@ -0,0 +1,18 @@
I currently use mergerfs, it gives me a simple union mount of a bunch of drives, for example:
/media/*/datahoard/
( * meaning several stand alone drives )
I wanted to use git-annex instead, so I went to one of those drives:
cd /media/strange/
and created a git annex repo, then if I go to another drive
cd /media/charm/
and attempt to clone the repo, it complains because /datahoard/ already exists (and isn't empty)
I want to import all of my single drives into git-annex and have the resulting directory structure be the same as the original union mount, is this possible and how would I go about doing it?
Thanks

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="Lukey"
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
subject="comment 1"
date="2020-12-05T13:07:31Z"
content="""
I recommend you to first init each repo on each drive separately and add it's files with `git annex add .; git annex sync`, then add each other as a remote and run `git annex sync`, which will merge the file-trees together. <br>
I guess you don't want each drive to contain the content of each file, so you should add all drives to a group and set a [[preferred-content|git-annex-preferred-content]] expression such as `git annex groupwanted drives \"(not copies=drives:1) or present\"` and then set that on all drives by running `git annex group here drives; git annex wanted here groupwanted` on all drives. This way, `git annex sync --content` won't copy the content of each file to each drive. <br>
You might also want to run `git annex config --set annex.dotfiles true` before adding any files or else dotfiles will be added to git directly. <br>
"""]]

View file

@ -35,15 +35,20 @@ contributed good bug reports and great ideas.
<img alt="McGill logo" src="https://mcgill.ca/hbhl/sites/all/themes/moriarty/images/logo-red.svg" width=100> <img alt="McGill logo" src="https://mcgill.ca/hbhl/sites/all/themes/moriarty/images/logo-red.svg" width=100>
<img alt="Neurohub" src="https://joeyh.name/blog/pics/neurohub.png" width=100> <img alt="Neurohub" src="https://joeyh.name/blog/pics/neurohub.png" width=100>
<img alt="NSF logo" src="https://www.nsf.gov/images/logos/nsf1.gif">
git-annex development is supported in large part by
[NeuroHub](https://www.mcgill.ca/hbhl/research/platforms), funded
by the Canada First Research Excellence Fund, awarded to McGill
University for the Healthy Brains for Healthy Lives initiative.
<img alt="Powderhouse logo" src="https://d33wubrfki0l68.cloudfront.net/e9285c9a8db37874efdadd61f2231774ce1d86cb/5de87/assets/phs-leftaligned-logo.svg" width=100> <img alt="Powderhouse logo" src="https://d33wubrfki0l68.cloudfront.net/e9285c9a8db37874efdadd61f2231774ce1d86cb/5de87/assets/phs-leftaligned-logo.svg" width=100>
With additional support by [Powderhouse Studios](https://powderhouse.org/). git-annex development was supported in large part by:
* [NeuroHub](https://www.mcgill.ca/hbhl/research/platforms), funded
by the Canada First Research Excellence Fund, awarded to McGill
University for the Healthy Brains for Healthy Lives initiative.
* [Datalad](http://datalad.org/), funded by the NSF, awarded to Dartmouth
College.
* [DANDI](https://www.dandiarchive.org/), funded by the NIH, awarded to
Dartmouth College.
* [Powderhouse Studios](https://powderhouse.org/)
* Gioele Barabucci ENK
Thanks also to these folks for their support: Thanks also to these folks for their support:
[[!inline raw=yes pages="thanks/list"]] and anonymous supporters. [[!inline raw=yes pages="thanks/list"]] and anonymous supporters.
@ -53,8 +58,8 @@ Thanks also to these folks for their support:
<img alt="NSF logo" src="https://www.nsf.gov/images/logos/nsf1.gif"> <img alt="NSF logo" src="https://www.nsf.gov/images/logos/nsf1.gif">
git-annex development was partially supported by the git-annex development was partially supported by the
[NSF](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1429999) as a part of the NSF funded [Datalad project](http://datalad.org/) and niceman
[DataLad project](http://datalad.org/) and NICEMAN (ReproNim TR&D3). (repronim tr&d3).
Thanks also to these folks for their support: Thanks also to these folks for their support:
[[!inline raw=yes pages="thanks/list.2018"]] and anonymous supporters. [[!inline raw=yes pages="thanks/list.2018"]] and anonymous supporters.

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="rshalaev@3e2130a1e3cb0aaff7dd80aba7548ad9be0ea2d4"
nickname="rshalaev"
avatar="http://cdn.libravatar.org/avatar/d7f181d338cbcef7418faa01f0441e86"
subject="comment 15"
date="2020-12-03T01:43:00Z"
content="""
Is where any way to make Git Annex use Windows 10 NTFS hardlinks in the working tree?
Looking to conserve disk-space while still being able to browse and view files content. Currently Git Annex is doubling the ammount of disk space.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Lukey"
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
subject="comment 16"
date="2020-12-03T07:38:30Z"
content="""
Yes, as the page above explains, `git config annex.thin true` and then `git annex fix`
"""]]

View file

@ -0,0 +1,21 @@
[[!comment format=mdwn
username="rshalaev@3e2130a1e3cb0aaff7dd80aba7548ad9be0ea2d4"
nickname="rshalaev"
avatar="http://cdn.libravatar.org/avatar/d7f181d338cbcef7418faa01f0441e86"
subject="Windows 10 NTFS hardlinks not working"
date="2020-12-03T11:43:07Z"
content="""
Lukey - I tried git config annex.thin true and then git annex fix
Doing it on Windows NTFS drive did not create hard-links. I've followed the instructions. Could not get it to work. Always got copies of files instead of hardlinks.
From: -- Joey Hess <id@joeyh.name> Mon, 18 Apr 2016 18:33:52 -0400
git-annex (6.20160412) Changelog
* annex.thin and annex.hardlink are now supported on Windows.
Based on Joey change log - hard links should work on NTFS. According to my obesrvation and a report from colin.brosseau above (titles \"NTFS Make it clear that it'll not work with annex.thin\") it does not work.
Can anyone confirm if git annex can creates NTFS hardlinks? I can file a bug report if needed.
Thanks!
"""]]