Merge branch 'master' into v8
This commit is contained in:
commit
2cea674d1e
44 changed files with 665 additions and 140 deletions
|
@ -75,3 +75,5 @@ Thanks for having a look.
|
|||
|
||||
[[!meta author=kyle]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
||||
|
|
|
@ -0,0 +1,42 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2019-12-27T06:22:23Z"
|
||||
content="""
|
||||
On second thought, making the clean filter check for non-annexed files
|
||||
would prevent use cases like annex.largefiles=largerthan(100kb)
|
||||
from working as the user intended and letting a small file start out
|
||||
non-annexed and get annexed once it gets too large. Users certianly rely on
|
||||
that and this bug that only affects an edge case does not justify breaking
|
||||
that.
|
||||
|
||||
What would work to make the clean filter detect when a file's content
|
||||
has not changed, though its mtime (or inode) has changed. In that case,
|
||||
it's reasonable for the clean filter to ignore annex.largefiles and keep
|
||||
the content represented in git however it already was (non-annexed or
|
||||
annexed).
|
||||
|
||||
To detect that, in the case where the file in the index is not annexed:
|
||||
First check if the file size is the same as the
|
||||
size in the index. If it is, run git hash-object on the file, and see if
|
||||
the sha1 is the same as in the index. This avoids hashing any unusually
|
||||
large files, so the clean filter only gets a bit slower.
|
||||
|
||||
And when the file in the index is annexed, check if the file size is the
|
||||
same as the size of the annexed key. If it is, verify if the file content
|
||||
matches the key. (typically be hashing). Cases where keys lack size or
|
||||
don't use a checksum could lead to false positives or negatives though.
|
||||
Although, I've not managed to find a version of this bug that makes an
|
||||
annexed file get converted to git unintentionally, so maybe this part does
|
||||
not need to be done?
|
||||
|
||||
----
|
||||
|
||||
Or.. Since the root of the problem is temporarily overriding annex.largefiles,
|
||||
it could just be documented that it's not a good idea to use
|
||||
-c annex.largefiles=anything/nothing, because such broad overrides
|
||||
can affect other files than the ones you intended.
|
||||
(And since the documented methods of converting files from annexed to git and
|
||||
git to annexed use such overrides, that documentation would need to be
|
||||
changed.)
|
||||
"""]]
|
|
@ -0,0 +1,16 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2019-12-27T17:11:42Z"
|
||||
content="""
|
||||
A variant of this where an annexed unlocked file is added first,
|
||||
then the file is touched, and then some other file is added
|
||||
with -c annex.largefiles=nothing does result in the clean filter sending
|
||||
the whole annexed file content back to git, rather than keeping it annexed.
|
||||
For whatever reason, git does not store that content in .git/objects or
|
||||
update the index for that file though, so it doesn't show up as a change.
|
||||
|
||||
So *apparently* that variant is only potentially an expensive cat of a
|
||||
large annexed file, and does not need to be dealt with. Unless git
|
||||
sometimes behaves otherwise.
|
||||
"""]]
|
|
@ -0,0 +1,45 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2019-12-27T18:41:12Z"
|
||||
content="""
|
||||
It's almost possible to get the same unwanted conversion without any git
|
||||
races:
|
||||
|
||||
echo content-git > file-git
|
||||
sleep 2
|
||||
git add file-git
|
||||
git commit -m add
|
||||
|
||||
echo foo > file-git
|
||||
echo content-annex > file-annex
|
||||
git -c annex.largefiles=anything annex add file-annex
|
||||
|
||||
In this case, git currently does not run the modified file-git through the
|
||||
clean filter in the last line, so the annex.largefiles=anything doesn't
|
||||
affect it.
|
||||
|
||||
But, as far as I can see, there's nothing preventing a future version
|
||||
of git from deciding it does want to run file-git through the clean filter
|
||||
in this case.
|
||||
|
||||
I am not going to try to prevent against such a thing happening.
|
||||
As far as I can see, anything that the clean filter can possibly do to
|
||||
avoid such a situation will cripple existing uses cases of
|
||||
annex.largefiles, like largerthan() as mentioned above.
|
||||
The user has told git-annex to annex "anything", and if git
|
||||
decides to run the clean filter while that is in effect, caveat emptor.
|
||||
|
||||
Which is not to say I'm not going to fix the specific case this bug was
|
||||
filed about. I actually have a fix developed now. But just to say that
|
||||
setting annex.largefiles=anything/nothing temporarily is a blunt instrument,
|
||||
and you risk accidental conversion when using it, and so it would be a good
|
||||
idea to not do that.
|
||||
|
||||
One idea: Make `git-annex add --annex` and `git-annex add --git`
|
||||
add a specific file to annex or git, bypassing annex.largefiles and all
|
||||
other configuration and state. This could also be used to easily switch
|
||||
a file from one storage to the other. I'd hope the existence of that
|
||||
would prevent one-off setting of annex.largefiles=anything/nothing.
|
||||
[[todo/git_annex_add_option_to_control_to_where]]
|
||||
"""]]
|
|
@ -0,0 +1,58 @@
|
|||
[[!comment format=mdwn
|
||||
username="kyle"
|
||||
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
|
||||
subject="comment 5"
|
||||
date="2019-12-28T21:06:46Z"
|
||||
content="""
|
||||
Thanks for the explanation and the fix.
|
||||
|
||||
> For whatever reason, git becomes confused about whether this file is
|
||||
> modified. I seem to recall that git distrusts information it recorded in
|
||||
> its own index if the mtime of the index file is too close to the
|
||||
> mtime recorded inside it, or something like that.
|
||||
|
||||
I see. I think the problem and associated workaround you're referring
|
||||
to is described in git's Documentation/technical/racy-git.txt.
|
||||
|
||||
> Note that, you can accomplish the same thing without setting
|
||||
> annex.largefiles, assuming a current version of git-annex:
|
||||
>
|
||||
> git add file-git
|
||||
> git annex add file-annex
|
||||
>
|
||||
> I think the only reason for setting annex.largefiles in either of the two
|
||||
> places you did is if there's a default value that you want to
|
||||
> temporarily override?
|
||||
|
||||
Right. DataLad's methods that are responsible for calling out to `git
|
||||
annex add` have a `git={None,False,True}` parameter. By default
|
||||
(`None`), DataLad just calls `git annex add ...` and let's any
|
||||
configuration in the repo control whether the file goes to git or is
|
||||
annexed. But with `git=True` or `git=False`, the `annex add` call
|
||||
includes a `-c annex.largefiles=` argument with a value of `nothing`
|
||||
or `anything`, respectively.
|
||||
|
||||
> But just to say that setting annex.largefiles=anything/nothing
|
||||
> temporarily is a blunt instrument, and you risk accidental
|
||||
> conversion when using it, and so it would be a good idea to not do
|
||||
> that.
|
||||
|
||||
Noted. As mentioned above, DataLad's default behavior is to honor the
|
||||
repo's `annex.largefiles` configuration. And the documentation for
|
||||
`datalad save`, DataLad's main user-facing entry point for `annex
|
||||
add`, recommends that the user configure .gitattributes rather than
|
||||
using the option that leads calling `annex add` with `-c
|
||||
annex.largefiles=nothing`.
|
||||
|
||||
> One idea: Make `git-annex add --annex` and `git-annex add --git`
|
||||
> add a specific file to annex or git, bypassing annex.largefiles and all
|
||||
> other configuration and state. This could also be used to easily switch
|
||||
> a file from one storage to the other. I'd hope the existence of that
|
||||
> would prevent one-off setting of annex.largefiles=anything/nothing.
|
||||
|
||||
As far as I can see, those flags would completely cover DataLad's
|
||||
one-off setting of `annex.largefiles=anything/nothing`. They map
|
||||
directly to DataLad's `git=False/True` option described above. So,
|
||||
from DataLad's perspective, they'd be very useful and welcome.
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,34 @@
|
|||
[[!comment format=mdwn
|
||||
username="sirio@84e81889437b3f6208201a26e428197c6045c337"
|
||||
nickname="sirio"
|
||||
avatar="http://cdn.libravatar.org/avatar/9f3a0cfaf4825081710b652cc0b438a4"
|
||||
subject="Duplicate 'gcrypt-id' may be the issue?"
|
||||
date="2019-12-29T22:10:26Z"
|
||||
content="""
|
||||
Had a repo exhibit this behavior just now:
|
||||
|
||||
- commit graph `XX -> YY`
|
||||
- host `A` @ commit `YY`
|
||||
- host `B` @ commit `XX` (1 behind)
|
||||
- remotes `hub` and `lab` both @ commit `XX`
|
||||
- `B` pushes and pulls from both `hub` and `lab`: OK
|
||||
- `A` pushes to `hub` (updates to commit `YY`): OK
|
||||
- `B` pulls from `hub`: FAIL with *Packfile does not match digest*
|
||||
- `B` pulls from `lab`: OK
|
||||
- `B` pushes to `hub`: FAIL with *Packfile does not match digest*
|
||||
- `A` pulls from `hub`: OK
|
||||
- `A` pulls from `lab`: OK
|
||||
|
||||
When looking in `.git/config` I noticed that `remote.hub.gcrypt-id` and `remote.lab.gcrypt-id` were identical.
|
||||
|
||||
To fix, I:
|
||||
|
||||
- removed `remote.hub.gcrypt-id` from `.git/config` on both `A` and `B`
|
||||
- deleted and re-created a blank repo on `hub`
|
||||
- `git push hub` on `B`
|
||||
- `git pull hub master` on `A`
|
||||
|
||||
This resulted in a new and unique value for `remote.hub.gcrypt-id`, which is the same on both `A` and `B`.
|
||||
|
||||
Have not had time to dig into why but this is the only thread I can find about this problem so I figured I would log this somewhere for posterity.
|
||||
"""]]
|
97
doc/bugs/assistant_not_synching_with_content.mdwn
Normal file
97
doc/bugs/assistant_not_synching_with_content.mdwn
Normal file
|
@ -0,0 +1,97 @@
|
|||
### Please describe the problem.
|
||||
|
||||
I have the following repos
|
||||
|
||||
a - group manual - all content currently originates on this repo (OS X 10.14.4)
|
||||
b - group backup - this is a rclone special backed by google drive
|
||||
c - this is the underlying git repo on gitlab.com
|
||||
d - group backup - a server that is supposed to backup everything (OS X 10.14.4)
|
||||
|
||||
Assistant is running on a and d
|
||||
|
||||
It is not guaranteed that a and d will be able to directly connect, however, they both have very good connectivity to b and c
|
||||
|
||||
When I add a set of files into a (using git-annex add) the non-annex files get checked into the git repo and pushed to c. Similarly, the content (annex files) get pushed to b. This is confirmed by git-anenx list --allrepos
|
||||
|
||||
Within an hour or so, d will know about the files that were added (git-annex list) and the git log shows that it is on the same commit as a and c.
|
||||
|
||||
However, the assistant on d never does the git-annex sync --content
|
||||
|
||||
If I manually run git-annex sync --content on d, all is updated as expected.
|
||||
|
||||
I've made no changes to the groupwants, group, etc. settings
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
create a repo with a central git upstream and a special via rclone on gdrive. Clone that repo in another machine that can also see the upstream and special, but isn't directly connected to the originator of the repo
|
||||
|
||||
Add annex-handled files to the original repo.
|
||||
|
||||
Check the status of the git upstream, special, and the clone.
|
||||
|
||||
After failure is acknowledged, run git annex sync --content to confirm that the mechanics still work
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
Both hosts are OSX 10.14.4 and are running 7.20191218
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
This is from the assistant on the clone. It is running in debug mode.
|
||||
|
||||
[[!format sh """
|
||||
|
||||
[2019-12-30 17:44:09.362492] main: starting assistant version 7.20191114
|
||||
[2019-12-30 17:44:14.532638] TransferScanner: Syncing with origin
|
||||
(scanning...) [2019-12-30 17:44:14.590159] Watcher: Performing startup scan
|
||||
ControlSocket .git/annex/ssh/git@gitlab already exists, disabling multiplexing
|
||||
Disallowed command
|
||||
Everything up-to-date
|
||||
Disallowed command
|
||||
Disallowed command
|
||||
Disallowed command
|
||||
fatal: Pathspec 'workflow/cc-archive-exif/LICENSE' is in submodule 'workflow/cc-archive-exif'
|
||||
fatal: Pathspec 'workflow/cc-archive-exif/LICENSE' is in submodule 'workflow/cc-archive-exif'
|
||||
fatal: Pathspec 'workflow/cc-archive-exif/LICENSE' is in submodule 'workflow/cc-archive-exif'
|
||||
fatal: Pathspec 'workflow/cc-archive-exif/LICENSE' is in submodule 'workflow/cc-archive-exif'
|
||||
fatal: Pathspec 'workflow/cc-archive-exif/LICENSE' is in submodule 'workflow/cc-archive-exif'
|
||||
fatal: Pathspec 'workflow/cc-archive-exif/LICENSE' is in submodule 'workflow/cc-archive-exif'
|
||||
fatal: Pathspec 'workflow/cc-archive-exif/LICENSE' is in submodule 'workflow/cc-archive-exif'
|
||||
fatal: Pathspec 'workflow/cc-archive-exif/LICENSE' is in submodule 'workflow/cc-archive-exif'
|
||||
fatal: Pathspec 'workflow/cc-archive-exif/LICENSE' is in submodule 'workflow/cc-archive-exif'
|
||||
fatal: Pathspec 'workflow/cc-archive-exif/LICENSE' is in submodule 'workflow/cc-archive-exif'
|
||||
fatal: Pathspec 'workflow/cc-archive-exif/LICENSE' is in submodule 'workflow/cc-archive-exif'
|
||||
|
||||
git cat-file EOF: user error
|
||||
|
||||
fd:38: hFlush: resource vanished (Broken pipe)
|
||||
|
||||
fd:38: hFlush: resource vanished (Broken pipe)
|
||||
Disallowed command
|
||||
(started...)
|
||||
[2019-12-30 17:44:33.097035] Committer: Committing changes to git
|
||||
(recording state in git...)
|
||||
[2019-12-30 17:44:33.176213] Pusher: Syncing with origin
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Everything up-to-date
|
||||
Disallowed command
|
||||
|
||||
<<A bunch of white space lines removed for brevity>>
|
||||
|
||||
Disallowed command
|
||||
Disallowed command
|
||||
Disallowed command
|
||||
Disallowed command
|
||||
Disallowed command
|
||||
# End of transcript or log.
|
||||
"""]]
|
||||
|
||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
Yes - I can run this manually, and overall this is great - I would just love to get this automated....
|
||||
|
||||
|
|
@ -20,3 +20,5 @@ If strict matching (not sure yet about a use case where it would really be neede
|
|||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
||||
|
||||
> Looks like we're agreed this is not necessary, so [[done]] --[[Joey]]
|
||||
|
|
30
doc/bugs/enable-tor_unsupported_on_osx.mdwn
Normal file
30
doc/bugs/enable-tor_unsupported_on_osx.mdwn
Normal file
|
@ -0,0 +1,30 @@
|
|||
### Please describe the problem.
|
||||
|
||||
enable-tor on an OSX box (with magic-wormhole and tor installed via brew) fails miserably.
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
run git-annex enable-tor - multiple fails, see details.
|
||||
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
7.20191106
|
||||
|
||||
OSX 10.14.5
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
The first failure is that enable-tor can't run as root. Instead, I call it with sudo git-annex enable-tor <UID>
|
||||
|
||||
The second failure is that you try and write into /etc/tor/torrc - which is not where torrc is located on a brew installed tor - it's in /usr/local/etc/tor/torrc. I made a symlink to get around that problem.
|
||||
|
||||
The third failure is a complaint about systemctl not being present. I looked in Utilities/Tor.hc and saw you were trying to call for a reload of tor. To hack around that, I wrote a script called systemctl that simply called 'brew services' with the args passed in ( brew services $1 $2 ).
|
||||
|
||||
After that, I still get the error: git-annex: tor failed to create hidden service, perhaps the tor service is not running
|
||||
|
||||
I have restarted tor manually, and it is indeed running. It looks like something is failing in setting up the Onion socket, but I can't see what
|
||||
|
||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
I love it - using it to protect my photo archive now using a central special repo (rclone) for the data, and a gitlab repo for the base.
|
Loading…
Add table
Add a link
Reference in a new issue