remove old closed bugs and todo items to speed up wiki updates and reduce size

Remove closed bugs and todos that were last edited or commented before 2021.

Except for ones tagged projects/* since projects like datalad want to keep
around records of old deleted bugs longer.

Command line used:

    for f in $(grep -l '|done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2021 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2021 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done
    for f in $(grep -l '\[\[done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2021 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2021 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done
This commit is contained in:
Joey Hess 2022-08-22 12:27:10 -04:00
parent a9db0a5055
commit 28921af543
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
566 changed files with 0 additions and 15810 deletions

View file

@ -1,20 +0,0 @@
This suggestion has come from being surprised at the behaviour of "import --skip-duplicates" which copies files instead of moving them and leaves the source directory untouched (description implies it will just leave duplicates alone).
Apologies for the brevity, I've already typed this out once..
"import" has several behaviours which can be controlled through some options, but they don't cover all wanted behaviours. This suggestion is for an alternative interface to control these behaviours, totally stolen from rsync :P
# create symlinks (s), inject content (i) and delete from source (d)
# duplicate (D) and new (N) files
git annex import --mode=Dsid,Nsid $src # (default behaviour)
git annex import --mode=Dsi,Nsi $src # --duplicate
git annex import --mode=Dd,Nsid $src # --deduplicate
git annex import --mode=Nsi $src # --skip-duplicates
git annex import --mode=Dd $src # --clean-duplicates
git annex import --mode=Did,Nsid $src # (import new, reinject duplicate.. really want this!)
git annex import --mode=Ns $src # (just creates symlinks for new)
git annex import --mode=Nsd $src # (invalid mode due to data loss)
git annex import --mode=Nid $src # (invalid or require --force)
> Current thinking is in [[remove_legacy_import_directory_interface]].
> This old todo is redundant, so [[wontfix|done]] --[[Joey]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="CandyAngel"
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
subject="comment 1"
date="2017-01-16T10:30:55Z"
content="""
This [[TODO|todo/import_--reinject/]] (and \"reinject --known\") would then be:
git annex import --mode=Did
"""]]

View file

@ -1,33 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2017-02-07T20:24:29Z"
content="""
Bearing in mind that I would have to *support* all of the resulting
combinatorial explosion, and that several combinations don't make sense,
or are unsafe, or seem useless, I think I'd rather keep it limited to
well-selected points from the space.
I've fixed the description of --skip-duplicates to match its behavior.
I don't know if there's a good motivation for it not deleting the files it
does import. I'd almost rather have thought that was a bug in the
implementation, but the implementation explicitly copies rather than moves
files for --skip-duplicates, so that does seem to have been done
intentionally. In any case, `--clean-duplicates` can be run after it to
delete dups, I suppose.
An implementation of --mode=Did,Nsid seemed worth adding at first, perhaps
as --reinject-duplicates. But thinking about it some more,
that would be the same as:
git annex reinject --known /path/*
git annex import /path/*
The first command moves all known files into the annex, which leaves
only non-duplicate files for the second command to import.
The only time I can think of that this might not be suitable is if `/path` is
getting new files added to it while the commands run... But in that case
you can `mkdir /path/toimport; mv /path/* /path/toimport` and then
run the 2 commands on `/path/toimport/*`
"""]]

View file

@ -1,16 +0,0 @@
[[!comment format=mdwn
username="CandyAngel"
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
subject="comment 3"
date="2017-02-07T22:51:15Z"
content="""
An implementation of --mode=Did,Nsid seemed worth adding at first, perhaps as --reinject-duplicates. But thinking about it some more, that would be the same as
git annex reinject --known /path/*
git annex import /path/*
--mode=Did,Nsid would be quite a bit faster because it wouldn't hash the files twice, which is an advantage this suggestion has over any multiple command alternative.
If you want to keep it to certain points in space rather than deal with all combinations, you could whitelist which ones are acceptable and people can request more to be whitelisted as they discover use cases for those modes. The current commands would alias to the modes (which would also make their behaviour obvious if this alias is mentioned in the documentation).
"""]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2017-02-09T19:33:46Z"
content="""
Actually, import --deduplicate, --skip-duplicates, --clean-duplicates
are implemeted naively and do hash files twice. So it's
the same efficiency..
But, I just finished a more complicated implementation that avoids
the second hashing.
That does make the combined action worth adding, I suppose. Done so as
--reinject-duplicates.
"""]]

View file

@ -1,27 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2017-02-09T19:45:26Z"
content="""
I feel that the problem with this idea is that the suggested
actions "create symlinks (s), inject content (i) and delete from source (d)"
are only an approximation of how import is implemented. If they perfectly
matched the implementation, then import could treat them as a DSL and
simply evaluate the expression to do its work. But it's not that simple.
For one thing, --deduplicate and --clean-duplicates don't simply "delete from source" the
duplicates; they first check that numcopies can be satisfied. The default
import behavior doesn't "sid", in fact it moves from source to the work tree
(thus implicitly deleting from source first), then injects, and then creates
the symlink. Everything has dependencies and interrelationships, and the best
way I've found to express that so far is as the Haskell code in
Command/Import.hs.
Even exposing that interface and using the current implementation for
particular canned expressions seems risky; exposing imperfect abstractions
can shoot you in the foot later when something under the abstraction needs
to change.
So I'd rather improve the documentation for git-annex import if it is
unclear. Not opposed to finding a way to work in these "Dsid,Nsid"
summaries to the the documentation.
"""]]

View file

@ -1,5 +0,0 @@
My original use case was for using git-annex find from scripts, where I didn't want to depend on the branch
checked out at the time, but rather write something like "git annex find --branch=master $searchterms"
> this was [[done]] some years leter and this todo forgotten about until I
> noticed it now, so closing belatedly. --[[Joey]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.154"
subject="comment 1"
date="2014-03-17T19:48:57Z"
content="""
The difficulty with adding a --branch is that if it causes git-annex to operate on a list of (file, key) from the branch, then commands that actually modify the working tree would modify it, instead of the branch. So the options seem to be only generating a list of keys, and so only letting commands that operate on keys work (which rules out the `git annex find` example), or carefully arranging for commands that actually affect the work tree to not be usable with this option.
I'm not sure how many commands are affected. The ones I can immediately think of are sync, lock, unlock. (Commands like get obviously affect the work tree in direct mode, but it's fine to have getting a file from a branch also update files in the work tree, if they pointed at the same key.)
"""]]

View file

@ -1,7 +0,0 @@
Can the `annex.addunlocked` be extended to have the same syntax as `annex.largefiles`? Also, can there be separate settings affecting `git add` and `git annex add`, e.g. `annex.git-add.addunlocked` and `annex.git-annex-add.addunlocked`, with both defaulting to the value of `annex.addunlocked` if not set?
Basically, I want a reliable way to prevent inadvertently adding files as annexed unlocked files.
Related: [[forum/lets_discuss_git_add_behavior]]
> [[done]] --[[Joey]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-10-08T18:35:06Z"
content="""
It is not possible for `git add` to add files in locked form. git's
interface simply does not allow that.
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="preventing inadvertently adding annexed files in unlocked form"
date="2019-10-11T16:38:07Z"
content="""
> It is not possible for git add to add files in locked form. git's interface simply does not allow that.
Makes sense. Then, [[separate annex.largefiles.git-add and annex.largefiles.git-annex-add settings]] seems like the way to prevent inadvertently adding files to annex in unlocked form.
Related: [[todo/auto-lock_files_after_one_edit]]
"""]]

View file

@ -1,32 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2019-12-19T15:29:40Z"
content="""
Retargeting this todo at something useful post-git-add-kerfluffle,
annex.addunlocked could usefully be a pagespec to allow adding some files
unlocked and others locked (by git-annex add only, not git add).
"true" would be the same as "anything" and false as "nothing".
---
It may also then make sense to let it be configured in .gitattributes.
Although, the ugliness of setting a pagespec in .gitattributes,
as was done for annex.largefiles, coupled with the overhead of needing to
query that from git-check-attr for every file, makes me wary.
(Surprising amount of `git-annex add` time is in querying the
annex.largefiles and annex.backend attributes. Setting the former in
gitconfig avoids the attribute query and speeds up add of smaller files by
2%. Granted I've sped up add (except hashing) by probably 20% this month,
and with large files the hashing dominates.)
The query overhead could maybe be finessed: Since adding a file
already queries gitattributes for two other things, a single query could be
done for a file and the result cached.
Letting it be globally configured via `git-annex config` is an alternative
that I'm leaning toward.
(That would also need some caching, easier to implement and faster
since it is not a per-file value as the gitattribute would be.)
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2019-12-20T19:45:21Z"
content="""
Made annex.addunlocked support expressions like annex.largefiles.
And both of them can be set globally with `git annex config`. I did not
make annex.addunlocked be settable by git attribute, because my sense is
that `git annex config` covers that use case, or mostly so.
"""]]

View file

@ -1,83 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="what am I doing wrong?"
date="2020-01-13T20:05:38Z"
content="""
I have tried to use this but I do not see it in effect:
[[!format sh \"\"\"
$> mkdir repo && cd repo && git init && git annex init && git annex config --set addunlocked anything && git show git-annex:config.log && touch 1 2 && git add 1 && git annex add 2 && git commit -m 'committing' && ls -l && git show
Initialized empty Git repository in /tmp/repo/.git/
init (scanning for unlocked files...)
ok
(recording state in git...)
addunlocked anything ok
(recording state in git...)
1578945668.466039639s addunlocked anything
add 2
ok
(recording state in git...)
[master (root-commit) e428211] committing
2 files changed, 1 insertion(+)
create mode 100644 1
create mode 120000 2
total 4
-rw------- 1 yoh yoh 0 Jan 13 15:01 1
lrwxrwxrwx 1 yoh yoh 178 Jan 13 15:01 2 -> .git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
commit e428211fe0c64e67cf45d8c92165c866db5ba75f (HEAD -> master)
Author: Yaroslav Halchenko <debian@onerussian.com>
Date: Mon Jan 13 15:01:08 2020 -0500
committing
diff --git a/1 b/1
new file mode 100644
index 0000000..e69de29
diff --git a/2 b/2
new file mode 120000
index 0000000..ea46194
--- /dev/null
+++ b/2
@@ -0,0 +1 @@
+.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
\"\"\"]]
so I have tried to say that \"anything\" (all files) should be added unlocked. But it seems that neither file (`1` added via `git add` and `2` added via `git annex add`) were added unlocked.
<details>
<summary>Here is some info on version/config: (click to expand)</summary>
[[!format sh \"\"\"
(git-annex)lena:/tmp/repo[master]
$> cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[annex]
uuid = f220cc03-1510-4e23-acb5-b95723ecf9fc
version = 7
[filter \"annex\"]
smudge = git-annex smudge -- %f
clean = git-annex smudge --clean -- %f
(dev3) 1 17256.....................................:Mon 13 Jan 2020 03:03:30 PM EST:.
(git-annex)lena:/tmp/repo[master]
$> git annex version
git-annex version: 7.20191230+git2-g2b9172e98-1~ndall+1
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.20 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.1.0 ghc-8.6.5 http-client-0.5.14 persistent-sqlite-2.9.3 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: linux x86_64
supported repository versions: 7
upgrade supported from repository versions: 0 1 2 3 4 5 6
local repository version: 7
\"\"\"]]
</details>
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="kyle"
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
subject="re: what am I doing wrong?"
date="2020-01-14T03:19:19Z"
content="""
I believe that should be `git annex config --set annex.addunlocked anything` (i.e. an \"annex.\" in front of the name).
"""]]

View file

@ -1,37 +0,0 @@
When an external special remote tells git-annex a fuller URL for a given file, git-annex-addurl does not use that information:
[2018-10-28 16:12:39.933464] git-annex-remote-dnanexus[1] <-- CLAIMURL dx://file-FJZjVx001pB2BQPVKY4zX8kk/
[2018-10-28 16:12:39.933515] git-annex-remote-dnanexus[1] --> CLAIMURL-SUCCESS
[2018-10-28 16:12:39.933568] git-annex-remote-dnanexus[1] <-- CHECKURL dx://file-FJZjVx001pB2BQPVKY4zX8kk/
[2018-10-28 16:12:40.469292] git-annex-remote-dnanexus[1] --> CHECKURL-MULTI dx://file-FJZjVx001pB2BQPVKY4zX8kk/A4.assembly1-trinity.fasta 11086 A4.assembly1-trinity.fasta
addurl dx://file-FJZjVx001pB2BQPVKY4zX8kk/ (from mydx) (to A4.assembly1_trinity.fasta) [2018-10-28 16:12:40.469503] read: git ["--version"]
It would be better if, in the above log, the URL key was based on dx://file-FJZjVx001pB2BQPVKY4zX8kk/A4.assembly1-trinity.fasta , which would preserve the .fasta extension in the key and therefore in the symlink target.
> [[fixed|done]] --[[Joey]]
Also, it would be good if the external special remote could return an etag
for the URL, which would be a value guaranteed to change if the URL's
contents changes; and if git-annex would then compute the URL key based on
the combination of URL and etag.
> This might be a good idea if sufficiently elaborated on, but I am a one
> idea, one bug, one page kind of guy. I dislike reading over a long detailed
> discussion of something, like the problem above and my analysis of it,
> only to find a second, unrelated discussion of something else.
> Suddenly the mental state is polluted with
> different distinct things, some fixed, other still open. The bug tracking
> system has then failed because it's not tracking state in any useful way.
> Which is why I've closed this todo item with my fix of
> a single item from it. --[[Joey]]
It'd also be good if there was a option to automatically migrate URL keys
to the default backend whenever a file from a URL key is downloaded. Also,
to record the checksummed key (e.g. MD5E) as metadata of the URL key (in a
field named e.g. alternateKeys), and if addurl --fast is later done on a
URL key for which a checksummed key is recorded in the metadata, to add the
checksummed key instead of the URL key .
> Again, mixing discussion of several things in one place is a good way to
> muddy the waters. I think this idea has several problems, but don't want
> to discuss them here. --[[Joey]]

View file

@ -1,45 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2018-10-29T17:51:27Z"
content="""
Looking at the code, it addurl clearly *does* use the urls returned by
CHECKURL. In Command/AddUrl.hs:
go deffile (Right (UrlMulti l))
| isNothing (fileOption (downloadOptions o)) =
forM_ l $ \(u', sz, f) -> do
let f' = adjustFile o (deffile </> fromSafeFilePath f)
void $ commandAction $
startRemote r o f' u' sz
`l` is the list of values it returns, and `u'` is individual urls from that list,
as opposed to `u` which is the url the user provided.
`u'` is passed to `startRemote`, and `u` is not.
Hmm, but in Remote/External.hs there is a special case:
-- Treat a single item multi response specially to
-- simplify the external remote implementation.
CHECKURL_MULTI ((_, sz, f):[]) ->
result $ UrlContents sz $ Just $ mkSafeFilePath f
CHECKURL_MULTI l -> result $ UrlMulti $ map mkmulti l
That does not have any kind of rationalle in [[!commit 8a17bcb0be91c345a52d78c08009285b0fcd6e3a]],
but the next commit added `doc/special_remotes/external/git-annex-remote-torrent`
and I think I can see why I felt it simplified things. That script always
replies with CHECKURL-MULTI, but a torrent often contains a single file, and
it would be perhaps bettter to use the original url provided to the user for such a
file from a torrent, rather than an url that asks for file "#1" from the torrent.
Although AFAICS either would work, and Remote/BitTorrent.hs contains just the kind
of special case for a single item torrent that I was wanting to avoid external
special remotes needing to worry about.
The other benefit to special casing UrlContents is that lets addurl --file specify
where to put the file, which the fileOption check in the first code block
above prevents for UrlMulti. But, that could just as well be handled by
adding a single file special case to the code in AddUrl.
I suppose changing this won't break anything, or if it does it was relying
on this undocumented behavior.
"""]]

View file

@ -1,4 +0,0 @@
Is there already a way to addurl video from a Twitter post. Question came up while proposing git annex as a tech for archival in https://github.com/2020PB/police-brutality/issues/315#issuecomment-640163911
> I don't think there's anything for git-annex to do here, so
> [[closing|done]] --[[Joey]]

View file

@ -1,19 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-06-22T18:38:33Z"
content="""
If youtube-dl supports the web site, `git annex addurl` will automatically
use it to download the video.
Looks like youtube-dl does support twitter, so it should just work.
If it didn't though, I'd punt it over to youtube-dl.
(If you also wanted to archive the twitter
page itself, you could use `git annex addurl --raw` to archive the html.
Although there's a good chance the html alone is not enough, and so you
might want to use other tools to archive javascript and other assets;
this is beyond the scope of git-annex, although of course you can `git
annex add` whatever files you end up downloading.)
"""]]

View file

@ -1,28 +0,0 @@
importtree=yes remotes are untrusted, because something is modifying that
remote other than git-annex, and it could change a file at any time, so
git-annex can't rely on the file being there. However, it's possible the user
has a policy of not letting files on the remote be modified. It may even be
that some remotes use storage that avoids such problems. So, there should be
some way to override the default trust level for such remotes.
Currently:
joey@darkstar:/tmp/y8>git annex semitrust borg
semitrust borg
This remote's trust level is overridden to untrusted.
The borg special remote is one example of one where it's easy for the user to
decide they're going to not delete old archives from it, and so want git-annex
to trust it.
Below is some docs I wrote for the borg special remote page, should be
moved there when this gets fixed. --[[Joey]]
> There is Remote.appendonly, which prevents making import remotes
> untrusted. So if there were a way to set that for borg, it could
> be configured at initremote/enableremote time. But,
> Remote.Helper.ExportImport also assumes appendonly means that content can
> be accessed by Key, rather than by ImportLocation, which does not work
> for borg.
>> [[done]] via Remote.untrustworthy --[[Joey]]

View file

@ -1,16 +0,0 @@
Add an arm64 autobuilder (linux).
This is needed to run in termux on some android devices.
And of course there are arm64 servers, although the armel build probably
also works on them.
Status: Builds fine on arm64, but needs an autobuilder. Building under
emulation could be done, or a scaleway arm64 server, which would be a
$5/month expense. Or, perhaps someone has an arm64 that could host the
autobuilder? --[[Joey]]
Currently running release builds for arm64 on my phone, but it's not
practical to run an autobuilder there. --[[Joey]]
>> [[done]]; the current qemu based autobuilder is not ideal, often gets
>> stuck, but there's no point leaving this todo open. --[[Joey]]

View file

@ -1,20 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="arm64 possible CIs etc"
date="2018-10-11T22:22:50Z"
content="""
According to the great details @mmarmm on github provided in request to support [arm64 for neurodebian](https://github.com/neurodebian/dockerfiles/issues/10#issuecomment-406644418):
Shippable supports free Arm64 CI/CD and I believe Codefresh does too (both 64-bit and 32-bit for both providers):
https://blog.shippable.com/shippable-arm-packet-deliver-native-ci-cd-for-arm-architecture
http://docs.shippable.com/platform/tutorial/workflow/run-ci-builds-on-arm/
CodeFresh Arm Beta signup: https://goo.gl/forms/aDhlk56jZcblYokj1
If you need raw infrastructure the WorksOnArm project will supply full servers if you want to deal with metal: https://github.com/worksonarm/cluster/
I personally haven't looked into any of them yet
"""]]

View file

@ -1,28 +0,0 @@
Somehow one of my usb removable drives got a new annex uuid assigned to a
repo in it than the one it had before. Since the drive is now frequently
falling off the USB bus with lots of IO errors, I hypothesize what might
have happened is that when git-annex read the git config, it somehow got
a corrupted version where annex.uuid was not set. So, it autoinitialized with
a new uuid.
(Arguing against this theory is that when git config then wrote to the
file, it would normally use the same cached value so would have written the
corrupted version. Which did not happen.)
I have checked, and if git config exits nonzero, git-annex does not
continue with autoinitialization. So it seems it was not as simple as a
read failure.
To avoid any kind of problem like this leading the a new uuid being
generated, which can be pretty annoying to recover from especially if you
don't notice it for a long time, maybe git-annex should avoid autoinit when
there's a git-annex branch already, or if .git/annex/index already exists.
After all, that implies the repo should have already been initialized, and
now it isn't, so something unusual is going on.
A bare repo that was just cloned will have a git-annex branch
before it gets initialized. So for bare repos, would need to not consider
that, but looking if annex/index exists would still do. Or may be better
not to special case it, and only look for the annex/index file? --[[Joey]]
> [[done]] --[[Joey]]

View file

@ -1,19 +0,0 @@
borg backup is pretty cool, and could be a great special remote backend.
In particular it does delta compression and stuff.
There seem to be two ways it could work. Probably there are borg commands
that allow storing a given blob in it, and retrieving a given blob. And
that could be used for a traditional special remote.
But also, if a whole git-annex repository has been backed up with borg,
then git-annex could look inside such a backup, and see if
.git/annex/object/ contains an object. It could then mark it as
present in the borg special remote. This way you'd use borg to take
backups, and git-annex would then be aware of what was backed up in borg,
and could do things like count that as a copy.
--[[Joey]]
[[!tag needsthought]]
> [[done]]! --[[Joey]]

View file

@ -1,20 +0,0 @@
[[!comment format=mdwn
username="RonnyPfannschmidt"
avatar="http://cdn.libravatar.org/avatar/c5379a3fe2188b7571858c49f9db63c6"
subject="the remote im working on"
date="2018-06-04T07:51:57Z"
content="""
Hi Joey,
i am currently working on a remote to use borg as a tree import source and a content souce
the work is started in https://github.com/RonnyPfannschmidt/git-annex-borg
note that borg does **not** do delta storage - it does content informed dynamic chunk sizes (which helps deduplication)
freestanding borg will not be a good remote for putting things out,
so i will be pulling things out mostly (but i hope to hit a point where its viable to generate a borg archive from the tree of expected contents thats viable for putting things in)
-- Ronny
"""]]

View file

@ -1,65 +0,0 @@
[[!comment format=mdwn
username="anarcat"
avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
subject="progress?"
date="2018-11-27T06:47:26Z"
content="""
How's that remote going, RonnyPfannschmidt? :) I can't tell from the [homepage](https://github.com/RonnyPfannschmidt/git-annex-borg/) but from the source code, it looks like initremote is supported so far, but not much else...
From what I remember, borg supports storing arbitrary blobs with the `borg debug-put-obj` function, and retrieve one with `borg debug-get-obj`. Here's an example of how this could work:
[1145]anarcat@angela:test$ sha256sum /etc/motd
a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 /etc/motd
[1146]anarcat@angela:test$ borg init -e none repo
[1147]anarcat@angela:test$ borg debug-put-obj repo /etc/motd
object a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 put.
[1148]anarcat@angela:test$ borg debug-get-obj repo a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 tmp
object a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 fetched.
[1149]anarcat@angela:test$ sha256sum tmp
a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 tmp
This assumes the underlying blob ID in borg is a SHA256 hash, but that
seems like a fair assumption to make. Naturally, this could cause
problems with git-annex, which supports multiple hashing algorithms
thanks to the multiple [[backends]] support. But maybe this can just
work this out by refusing to store non-matchin backends.
That is, if borg actually worked that way. Unfortunately, while the
above actually works, the resulting repository is not quite right:
$ borg debug dump-repo-objs .
Dumping 000000_0000000000000000000000000000000000000000000000000000000000000000.obj
Data integrity error: Chunk a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921: Invalid encryption envelope
So borg does not like the repository at all... I'm not sure why, but
it sure looks like borg \"objects\" are not as transparent as I
hoped and that this low-level interface will not be suitable for
git-annex.
The higher level interface is \"archives\", which have (more or less) a
CRUD interface (without the U, really) through the
\"create/list/extract/prune\" interface. It's far from what we need:
items are deplicated across archives so it means it is impossible to
reliably delete a key unless we walk (and modify!) the entire archive list, which is
slow and impractical. But it *could* definitely be used to add keys to
a repository, using:
$ time borg create --stdin-name SHA256-a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 .::'{utcnow}' - < /etc/motd
1.30user 0.10system 0:01.62elapsed 86%CPU (0avgtext+0avgdata 81464maxresident)k
72inputs+1496outputs (0major+31135minor)pagefaults 0swaps
As you can see, however, that is *slow* (although arguably not slower
than `debug-put-obj` which is surprising).
But even worse, that blob is now hidden behind that archive - you'd
need to list all archives (which is also expensive) to find it.
So I hit a dead end so I'm curious to hear how you were planning to
implement this, Ronny. :) Presumably there should be a way to generate
an object compatible with `debug-put-obj`, but that interface seems
very brittle and has all sorts of warnings all around it... And on the
other hand, the archive interface is clunky and slow... I wish there
was a better way, and suspect it might be worth talking with upstream
(which I'm not anymore) to see if there's a better way to work this
problem. -- [[anarcat]]
"""]]

View file

@ -1,55 +0,0 @@
[[!comment format=mdwn
username="anarcat"
avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
subject="restic"
date="2018-11-27T07:13:29Z"
content="""
and for what it's worth, borg's main rival, restic, handles this much better and faster:
[1331]anarcat@angela:test$ RESTIC_PASSWORD=test restic init -r repo4
created restic repository 2c75411732 at repo4
Please note that knowledge of your password is required to access
the repository. Losing your password means that your data is
irrecoverably lost.
[1334]anarcat@angela:test1$ RESTIC_PASSWORD=test time restic -r repo4 backup --stdin --stdin-filename SHA256-a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 < /etc/motd
repository 2c754117 opened successfully, password is correct
created new cache in /home/anarcat/.cache/restic
Files: 1 new, 0 changed, 0 unmodified
Dirs: 0 new, 0 changed, 0 unmodified
Added to the repo: 656 B
processed 1 files, 0 B in 0:00
snapshot 87c0db00 saved
0.55user 0.04system 0:00.80elapsed 73%CPU (0avgtext+0avgdata 48384maxresident)k
0inputs+88outputs (0major+9665minor)pagefaults 0swaps
[1337]anarcat@angela:test$ RESTIC_PASSWORD=test time restic -r repo4 backup --stdin --stdin-filename SHA256-a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 < /etc/motd
repository 2c754117 opened successfully, password is correct
Files: 0 new, 1 changed, 0 unmodified
Dirs: 0 new, 0 changed, 0 unmodified
Added to the repo: 370 B
processed 1 files, 0 B in 0:00
snapshot 5b3af830 saved
0.55user 0.04system 0:00.80elapsed 73%CPU (0avgtext+0avgdata 48568maxresident)k
0inputs+64outputs (0major+9691minor)pagefaults 0swaps
[1348]anarcat@angela:test$ RESTIC_PASSWORD=test time restic -r repo4 backup --stdin --stdin-filename SHA256-533128ceb96cb2a6d8039453c3ecf202586c0e001dce312ecbd6a7a356b201dc < ~/folipon.jpg
repository 2c754117 opened successfully, password is correct
Files: 1 new, 0 changed, 0 unmodified
Dirs: 0 new, 0 changed, 0 unmodified
Added to the repo: 372 B
processed 1 files, 0 B in 0:00
snapshot 18879aa4 saved
0.54user 0.03system 0:00.78elapsed 73%CPU (0avgtext+0avgdata 48504maxresident)k
0inputs+64outputs (0major+9700minor)pagefaults 0swaps
[1349]anarcat@angela:test$ RESTIC_PASSWORD=test time restic -r repo4 dump latest SHA256-533128ceb96cb2a6d8039453c3ecf202586c0e001dce312ecbd6a7a356b201dc | sha256sum -
0.50user 0.02system 0:00.73elapsed 72%CPU (0avgtext+0avgdata 47848maxresident)k
0inputs+8outputs (0major+9513minor)pagefaults 0swaps
533128ceb96cb2a6d8039453c3ecf202586c0e001dce312ecbd6a7a356b201dc -
Of course it doesn't validate those checksums, and might freak out with the number of snapshots we would create, but it's a much better start than borg. ;)
"""]]

View file

@ -1,13 +0,0 @@
[[!comment format=mdwn
username="michael@ff03af62c7fd492c75066bda2fbf02370f5431f4"
nickname="michael"
avatar="http://cdn.libravatar.org/avatar/125bdfa8a2b91432c072615364bc3fa1"
subject="Borg vs. restic, some design considerations"
date="2018-12-05T14:36:45Z"
content="""
As I have been looking for a new, de-duplicating, reliable backup system I read through the design documentations of [borg](https://borgbackup.readthedocs.io/en/stable/internals/data-structures.html#archives) and [restic](https://restic.readthedocs.io/en/latest/100_references.html#design). While the design of restic seems to be much simpler and actually quite straightforward, I decided for borg in the end due to its support for compression and the more efficient removal of single backups. Further, it [seems](https://blog.stickleback.dk/borg-or-restic/) the RAM usage is lower for borg.
Here are some comments on both concerning the usability as git annex storage backend. Note that they are all based on my understanding of the design documents that describe how the data is stored in restic and borg. It is well possible that I have misunderstood something or some parts are just impossible due to implementation details. Further, I am quite sure that what I propose is not possible with the current external APIs of git annex and borg.
For none of them, it seems to be a good idea to store individual archives (borg) or snapshots (restic) per file as both of them assume that the list of archives/snapshots is reasonably small, can be presented to the user as a single list and can be pruned based on certain rules about how many to keep per timespan (though that is per group of archives/snapshots). borg stores descriptions of all archives in a single item, the manifest (which means that when an archive is added, the whole list needs to be rewritten), restic stores each archive as a json document in a directory which might scale better but is probably still not a good idea. I think instead of storing individual files, git annex should store the whole set of exported files in a single archive/snapshot, i.e., store some kind of (virtual) directory structure in borg or restic that represents all items that shall be stored. Then, whenever git annex syncs with the borg/restic remote, a new archive/snapshot would be added. The user could then use the time-based pruning rules to remove old snapshots. This would also integrate well with using the same borg/restic repository for other backups, too. It might seem this would make the retrieval of a single file quite inefficient. Both borg and restic split a file into a list of chunks and store information where these chunks can be found. Therefore, it should be possible for a borg/restic special remote to just store this list of chunks for every annexed file. Then, to get a file, git annex would only need to ask for these chunks if it wants to get a single file. For restoring a lot of files, in particular with a non-local restic repository, this might be very inefficient though as restic might need to download a lot of data just to get these chunks - there just getting the whole last archive/snapshot might be more efficient (as far as I understood, then restic downloads each pack of chunks only once and directly writes all of them to the files that want them). Restic stores separate objects for every directory and this directory contains a list of subdirectories and files, where files contain a list of chunks. To add or remove files from a snapshot in restic, git annex would just need to execute the chunker for files not already present in the previous snapshot and could use the already stored chunk ids for the already present files. However, each snapshot would create a completely new directory. Without subdirectories, this would basically mean that the list of all files needs to be re-written for every snapshot. Subdirectories would help with that, but only if few subdirectories are modified. Due to the nature of hashing, this seems unlikely in the case of a git annex special remote (but of course this makes backups of unchanged directories very efficient). Borg doesn't have this directory structure but instead just stores the metadata of every file in one large stream. This stream is chunked in parts consisting of around 128KiB and therefore, only parts where changes occurred need to be stored again. The list of these metadata chunks needs to be stored, nevertheless, but is much smaller. Again, everything that is needed for storing a file could be generated without having the actual source file if the chunk ids are present. In fact, this is what borg does with a file cache that stores for every file of the previous backup both properties like size, timestamp and inode id to identify modifications and a list of chunks. If borg finds the same file again, it just uses the stored chunk list. If the git annex borg special remote could also keep the order of all previously present files the same, this would result in re-using basically all metadata chunks - however, I don't know if borg assumes any order on the files. Note that borg needs to know which chunks are referenced in an archive as borg stores reference counts for all chunks to determine if a chunk is still needed, so just re-using the metadata chunks without reading their content is definitely not possible. Restic has no such reference counts, it needs to iterate over all trees to determine if a chunk can be deleted (which [seems](https://blog.stickleback.dk/borg-or-restic/) to be terribly slow). Nevertheless, both implementations of cleaning up chunks require that chunks are referenced in some file that is contained in some archive/snapshot.
"""]]

View file

@ -1,38 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2019-08-01T16:02:06Z"
content="""
Half a second to store a single annex object with restic is pretty slow,
and that's before the snapshots directory gets bloated with a hundred
thousand files.
I wonder if my original idea up top was not a better approach: Let these
backup tools back up a whole annex repo (or at least .git/annex/objects),
and then make git-annex interoperate with the backups by peering inside
them and learning what has been backed up.
In the meantime, git-annex has gotten tree import facilities,
which is a similar concept, of listing content in a data store
and so learning what's stored in there, and then being able to
retrieve objects out of that data store on demand.
Importing annex objects from a backup is not quite the same as a tree
import, because it wouldn't result in any kind of file tree that
you'd want to merge back into your git repo. Also tree importing has
to download files in order to hash them, while in this case the
object's annex key can be seen in the backup.
But from a user perspective it could be quite similar, something like:
git annex initremote restic type=restic repolocation=...
git annex import --from restic
git annex get
That would use `restic list snapshots` and then `restic ls` each
snapshot and find filenames that look like annex keys
(perhaps looking for part of the annex directory structure to avoid
false positives). Keys it found would be marked as present in
the remote, and the snapshot(s) that contain them recorded in
the git-annex branch for use by git-annex get.
"""]]

View file

@ -1,16 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2019-08-01T16:34:22Z"
content="""
I made a restic repo with 2000 single-file snapshots.
Adding the first snapshot took 0.55s. Adding the 2000th
snapshot took 1.10s.
So that's a very big scalability problem with using restic with single-file
snapshots.
2000 files in a directory is not going to cause that kind of slowdown;
my guess is restic needs to load all past snapshots, or something like
that.
"""]]

View file

@ -1,22 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2020-12-04T01:17:01Z"
content="""
The Remote interface recently got importKey, which gets us
unexpectedly a *lot* closer to making `git-annex import --from borg` a reality!
The Remote would need a listImportableContents that finds all annex objects
in all (new) snapshots, and generates a ContentIdentifier that is just the
snapshot plus object path. Then importKey can simply generate a Key from
that ContentIdentifier without doing any more work. (And, so getting an
object from the remote will also work, because it will have the
ContentIdentifier recorded and so will know what snapshot and path in the
borg repo.)
Seems that all that would be needed is a way to skip generating the git tree
for the imported files, since it would be useless.
And a way to force --no-content, since importing from a borg backup should not
get all the backed up annex objects. It may be best to make this a new
command, that just happens to use the ImportActions interface.
"""]]

View file

@ -1,5 +0,0 @@
Sometimes a borg backup contains several git-annex repos. Then pointing
git-annex at the whole thing will find objects not belonging to the current
repo. To avoid this, add subdir= config.
[[done]] --[[Joey]]

View file

@ -1,10 +0,0 @@
The tree generated by git-annex sync with a borg remote
does not seem to get grafted into the git-annex branch, so
would be subject to being lost to GC.
Is this a general problem affecting importtree too?
> Yes, it was. It would have only caused a problem if the user
> kept doing imports from a remote, but never exporting to it.
> Then, in a clone of the repo that was importing, they would not be able
> to get files. [[fixed|done]] --[[Joey]]

View file

@ -1,46 +0,0 @@
I am not sure this is the case, but from first-hand experience, it
sure looks like you can't turn on v7 (or really v6, actually) on a
single git worktree. For example, if I have my `pictures` repository
on `curie` and turn on v7, `angela` will *also* need to run `git annex
upgrade` on their worktree otherwise git-annex
(e.g. 6.20180913-1~bpo9+1 on Debian stretch) will be really confused:
anarcat@angela:calendes$ less calendrier/calendes.pdf
/annex/objects/SHA256E-s117451415--8d7d8366094a63c54bef99b5cd2e2b5187092f834d8bf7002e1d5fdceb38a710.pdf
anarcat@angela:calendes$ git annex get calendrier/calendes.pdf
anarcat@angela:calendes$ git annex whereis calendrier/calendes.pdf
anarcat@angela:calendes$ # OMG WHERE ARE MY FILES! /me flails wildly
:)
It seems to me there should be a warning in the [[upgrades]] page
about this. I would have done so myself, but I'm not sure (like in my
last bug report) if I am doing things right.
In this case, this repository was already present (v5, indirect mode)
on both machines. I upgraded (using `git annex upgrade`) the
repository on curie (7.20181121 Debian buster) which went well.
(Then I messed around with that thumb drive, which led to
[[bugs/v7_fails_to_fetch_files_on_FAT_filesystem]], but probably
unrelated here.)
Then i powered on my laptop (`angela`) and saw the above. I would have
expected it to either upgrade automatically or warn me about the
repository inconsistency. Of failing that, the upgrades page should at
least warn us this is a "system-wide" (how do we call that?) change...
The workaround is to run `git annex upgrade` on that other repo, of
course, but if the source repo was also upgraded, it might be
difficult to sync files, as you will see that warning:
$ git annex get
get calendrier/calendes.pdf (from sneakernet...)
Repository version 7 is not supported. Upgrade git-annex.
Considering there's no backport of 7.x in Debian stretch, it makes the
upgrade path rather delicate... Is there a way to "downgrade" that
sneakernet repo? :) (Thankfully, the main server still runs v5 so the
files are still accessible from stretch....) -- [[anarcat]]
Updated the [[upgrades]] page, [[done]].

View file

@ -1,18 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 10"""
date="2020-10-05T18:40:32Z"
content="""
Unless it entered an adjusted unlocked branch, this upgrade cannot have
changed locked files to unlocked files itself. So if you were not using
unlocked files in this repo before, and didn't make any changes after the
upgrade that would add any, you don't need to worry about them.
The only risk if it was downgraded to v5 with an unlocked files
is that a command like `git commit -a` would commit the
large content to git. Easy enough to notice that with `git status` after
the downgrade too.
(But do checkout master if the currently checked out branch is
"adjusted/master(unlocked)")
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 11"
date="2020-10-05T23:17:07Z"
content="""
> .. after the upgrade that would add any, you don't need to worry about them.
With the datalad pixie dust on top of git-annex, I am never 100% sure ;) I would better worry and do some basic check before proceeding... -- will do later today/tomorrow, God bless BTRFS and its snapshots - I can get a \"sandbox\" clone of the entire filesystem to play with safely.
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2018-12-04T20:53:49Z"
content="""
You only need to upgrade to v7 when the repository has unlocked files
committed to it. If a file contains a pointer to an annex object, it won't
work with v5. There is not a good way for git-annex to detect when that is
the case; such a file could be committed any time. Committing unlocked
files and upgrading has to be coordinated amoung the users of the repository.
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2020-10-01T14:06:32Z"
content="""
Is there a sensible way (could be a helper script) to safely (checks for git links to be used etc) to downgrade version from 8 to 5?
Rationale: On the original host (smaug) of the monstrous http://datasets.datalad.org (on falkor) I have managed to invoke our cron update script while using a newer annex and I had no `annex.autoupgraderepository` set, so annex upgraded a number of clones (originally version 5) locally to version 8. As I am still by default use older annex, I would like to downgrade those clones on smaug back to 5.
"""]]

View file

@ -1,18 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-10-01T16:54:21Z"
content="""
If the repository did not get switched from direct mode to adjusted
unlocked branch, and does not use any unlocked files, you can:
* remove the filter.annex.smudge and filter.annex.clean from .git/config
* remove .git/info/attributes (or at least the filter=annex line)
* remove .git/hooks/post-checkout and .git/hooks/post-merge
* remove sqlite databases (all of .git/annex/keysdb* .git/annex/fsck/ .git/annex/export/ .git/annex/cidsdb*)
* change annex.version
To get back from adjusted unlocked branch to direct mode, you'd first want
to check out the master branch, and then do all of the above, then `git
annex direct` to get back into direct mode.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 4"
date="2020-10-01T17:26:09Z"
content="""
THANK YOU! What is the most efficient way to identify if there are unlocked files in the tree (or full repository)? I know that annex scans for unlocked files after a clone, so I guess you might have considered different options and already chose the most efficient ;)
"""]]

View file

@ -1,24 +0,0 @@
[[!comment format=mdwn
username="kyle"
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
subject="comment 5"
date="2020-10-01T19:07:45Z"
content="""
> What is the most efficient way to identify if there are unlocked
> files in the tree (or full repository)?
I can't say anything about efficiency, but FWIW with git-annex
7.20191009 or later there's an `--unlocked` matching item, so you can
say `git annex find --unlocked`. Since you're working in the context
of repos that have already been upgraded, I think you could use that
to find unlocked files in the working tree.
As for outside of the working tree, `find` takes a `--branch`
argument, but, as far as I can tell, that doesn't match anything when
combined with `--unlocked` (tried with 8.20200908). However, I'm not
sure you'd need to consider anything other than the working tree. If
all of these repos were v5 before, then an unlocked file could have
only been in an uncommitted state, so I don't see how it'd end on
another ref without committing/switching branches afterwards.
"""]]

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 6"
date="2020-10-05T15:26:08Z"
content="""
THANK YOU Kyle! `find --unlocked` works!
But the tricky part is that I wanted to use some \"single\" instance of git-annex which would support `find --unlocked` and also v5 so I could fsck and do some other tests after I do the evil downgrade. But older versions, such as 7.20191114, which support v5 do not support v8, so cannot do `find --unlocked` on v8. So I need to either
- find another later version which would support both v5 and v8
- make script use multiple versions of git-annex from different locations (one for initial `find --unlocked` and then another one for subsequent checks etc)
- find a way for `find --unlocked` without invoking `git-annex`.
"""]]

View file

@ -1,37 +0,0 @@
[[!comment format=mdwn
username="kyle"
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
subject="comment 7"
date="2020-10-05T17:42:23Z"
content="""
> - find a way for `find --unlocked` without invoking `git-annex`.
Assuming you're interested in finding just the v6+ pointer files,
instead of also finding the uncommitted type changes for v5 unlocked
files, perhaps you could use something like this
[[!format python \"\"\"
import subprocess as sp
p_ls = sp.Popen([\"git\", \"ls-files\", \"--stage\"], stdout=sp.PIPE)
p_cat = sp.Popen([\"git\", \"cat-file\", \"--batch\"], stdin=sp.PIPE, stdout=sp.PIPE)
with p_ls:
with p_cat:
for line in p_ls.stdout:
info, fname = line.strip().split(b\"\t\")
mode, objid = info.split(b\" \")[:2]
if mode != b\"100644\":
continue
p_cat.stdin.write(objid + b\"\n\")
p_cat.stdin.flush()
out = p_cat.stdout.readline()
_, objtype, size = out.split()
size = int(size)
if size > 0:
content = p_cat.stdout.read(size)
if content.startswith(b\"/annex/objects/\"):
print(fname.decode())
p_cat.stdout.readline()
\"\"\"]]
"""]]

View file

@ -1,20 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 8"
date="2020-10-05T18:02:48Z"
content="""
Thank you Kyle! I came up with
```shell
unlocked=( `git grep -l -a --no-textconv --cached '^/annex/objects/' || :` )
if [ \"${#unlocked[*]}\" -ge 1 ]; then
error \"Found ${#unlocked[*]} unlocked files. Cannot do: ${unlocked[*]}\" 2
fi
```
do you think it would miss something?
Here is my complete script ATM (didn't try in \"production\" yet, switched to other tasks for now but it is ready, also does some testing of operation at the end, so must not be applied as is to existing repos without commenting that out): http://www.onerussian.com/tmp/downgrade-annex
"""]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="kyle"
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
subject="comment 9"
date="2020-10-05T18:09:03Z"
content="""
Ah, I didn't think of using `git grep` for this. I think that's much
better than my suggestion.
"""]]

View file

@ -1,38 +0,0 @@
I recently discovered (thanks to Paul Wise) the [Meow hash][]. The
TL;DR: is that it's a fast non-crypto hash which might be useful for
git-annex. Here's their intro, quoted from the website:
[Meow hash]: https://mollyrocket.com/meowhash
> The Meow hash is a high-speed hash function named after the character
> Meow in [Meow the Infinite][]. We developed the hash function at
> [Molly Rocket][] for use in the asset pipeline of [1935][].
>
> Because we have to process hundreds of gigabytes of art assets to build
> game packages, we wanted a fast, non-cryptographic hash for use in
> change detection and deduplication. We had been using a cryptographic
> hash ([SHA-1][]), but it was
> unnecessarily slowing things down.
>
> To our surprise, we found a lack of published, well-optimized,
> large-data hash functions. Most hash work seems to focus on small input
> sizes (for things like dictionary lookup) or on cryptographic quality.
> We wanted the fastest possible hash that would be collision-free in
> practice (like SHA-1 was), and we didn't need any cryptograhic security.
>
> We ended up creating Meow to fill this niche.
[1935]: https://molly1935.com/
[Molly Rocket]: https://mollyrocket.com/
[Meow the Infinite]: https://meowtheinfinite.com/
[SHA-1]: https://en.m.wikipedia.org/wiki/SHA-1
I don't an immediate use case for this right now, but I think it could
be useful to speed up checks on larger files. The license is a
*little* weird but seems close enough to a BSD to be acceptable.
I know it might sound like a conflict of interest, but I *swear* I am
not bringing this up only as a oblique feline reference. ;) -- [[anarcat]]
> Let's concentrate on [[xxhash|todo/add_xxHash_backend]] or other new hashes that are getting general
> adoption, not niche hashes like meow. [[done]] --[[Joey]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-01-06T19:36:32Z"
content="""
xxhash seems to fill a similar niche and is getting a lot more use from
what I can see.
Meow seems to claim a faster gb/s rate than xxhash does, but
it's hard to tell if the benchmarks are really equivilant.
"""]]

View file

@ -1,9 +0,0 @@
Sometimes I start off a large file transfer to a new remote (a la "git-annex copy . --to glacier").
I believe all of the special remotes transfer the files one at a time, which is good, and provides a sensible place to interrupt a copy/move operation.
Wish: When I press ctrl+c in the terminal, git-annex will catch that and finish it's current transfer and then exit cleanly (ie: no odd backtraces in the special remote code). For the case where the file currently being transfered also needs to be killed (ie: it's a big .iso) then subsequent ctrl+c's can do that.
> I'm going to close this, because 6 years later, I just don't think it's a
> good idea. I think that blocking ctrl-c from interrupting the program
> violates least surprise. [[done]] --[[Joey]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.172"
subject="comment 1"
date="2014-02-21T21:36:14Z"
content="""
This really depends on the remote, some can resume where they were interrupted, such as rsync, and some cannot, such as glacier (and, er, encrypted rsync).
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="http://grossmeier.net/"
nickname="greg"
subject="very remote specific"
date="2014-02-21T22:11:16Z"
content="""
Yeah, this is very remote specific and probably means adding the functionality there as well (eg: in the glacier.py code, not only in git-annex haskell). Maybe I should file bugs there accordingly :)
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.172"
subject="comment 3"
date="2014-02-21T22:34:14Z"
content="""
Hmm, I forget if it's possible for git-annex to mask SIGINT when it runs glacier or rsync, so that the child process does not receive it, but the parent git-annex does.
"""]]

View file

@ -1,28 +0,0 @@
When annex.stalldetection is set, and git-annex transferrer is used,
a ctrl-c does not propagate to the transferrer process.
The result is that, the next time the process sends a message to its output
handle (eg a progress update), it gets a SIGINT, and so an ugly message is
output to the console, after the user was returned to the prompt.
The SIGINT is not propagated because a child process group is used for
git-annex transferrer, in order to let child processes of it be killed
along with it when a stall is detected.
Maybe what's needed is a SIGINT handler in the main git-annex that
signals all the transferrer processes with SIGINT and waits on them
exiting. And other signals, eg SIGTSTP for ctrl-z.
> Implemented this, but not for windows (yet). But not gonna leave open
> for something that on windows in my experience does not work very
> reliably in general. (I've many times hit ctrl-c in a windows terminal and
> had the whole terminal lock up.) So, [[done]] --[[Joey]]
Or, note that it would suffice to remove the child process group stuff,
if we assume that all child processes started by git-annex transferrer are
talking to a pipe, and will output something, eg a progress update,
and so receive a SIGPIPE once the transferrer process has caught the
SIGINT and exited.
[[todo/stalldetection_does_not_work_for_rsync_and_gcrypt]] would be a
prereq for this approach. But, might there be long-running child processes
that are not on a pipe, and that need to be shutdown on a stall, too?

View file

@ -1,15 +0,0 @@
As part of the work in [[precache_logs_for_speed_with_cat-file_--buffer]],
key lookups are now done twice as fast as before.
But, limits that look up keys still do a key lookup, before the key
is looked up efficiently. Avoiding that would speed up --in etc, probably
another 1.5x-2x speedup when such limits are used. What that optimisation
needs is a way to tell if the current limit needs the key or not. If it
does, then match on it after getting the key (and precaching the location
log for limits that need that), otherwise before getting the key.
> So this needs a way to introspect a limit to see if the terms used in it
> match some criteria. Another todo that also needs that is
> [[sync_fast_import]] --[[Joey]]
[[done]] --[[Joey]]

View file

@ -1,19 +0,0 @@
Many special remotes can potentially end up exposed in public http. There
is not currently a way to access them over http, without adding per-remote
support (like S3 has).
But generally the filenames used are the same, eg rsync and directory and
webdav and S3. Or if there are differences, they are generally small and
trying a couple of different urls is doable.
And sameas allows for
<https://git-annex.branchable.com/tips/multiple_remotes_accessing_the_same_data_store/>
now.
So, there could be a new special remote type, that allows generic readonly
access of other special remotes whose data stores are exposed via http.
Call it "http" maybe. (There may be some confusion between this and the web
special remote by users looking for such a thing.) --[[Joey]]
> httpalso special remote implemented, [[done]] --[[Joey]]

View file

@ -1,3 +0,0 @@
`git diff` for annexed files, especially unlocked annexed files, is currently uninformative. It would help if [[`git-annex-init`|git-annex-init]] configured a [git diff driver](https://git-scm.com/docs/gitattributes#_generating_diff_text) to diff the contents of the annexed files, rather than the pointer files.
> [[wontfix|done]], see comment

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-01-06T18:39:27Z"
content="""
Normally annexed files are huge binary files. Line-by-line diff of such
files is unlikely to be useful.
So you would need some domain-specific diff for the kind of binary files
you are storing in git-annex. If you have one, you can use
[[git-annex-diffdriver]] to make git use it when diffing annexed files.
Not seeing anything more I can do here, so I'm going to close this todo.
"""]]

View file

@ -1,9 +0,0 @@
I noticed that with the default SHA256E backend, `git annex reinject --known FILE` will fail if FILE has a different extension than it has in the annex. Presumably this is because `git annex calckey FILE` does not generate the same key, even though the file has the same checksum.
I think it would be better if `git annex reinject --known` would ignore the file extension when deciding whether a file is known. A case where that would be much better is caused by the fact that git-annex has changed how it determines a file's extension over time. E.g. if foo.bar.baz was added to the annex a long time ago, it might have a key like `SHA256E-s12--37833383383.baz`. Modern git-annex would calculate a key like `SHA256E-s12--37833383383.bar.baz` and so the reinject of the file using modern git-annex would fail.
This problem does not affect `git annex reinject` without `--known`.
--spwhitton
> mentioned this on the git-annex reinject man page; [[done]] --[[Joey]]

View file

@ -1,20 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-01-06T17:11:58Z"
content="""
I can't think of a reasonable way to implement this.
It would need to hash and then look for a known SHA256E key that uses the
hash. But the layout of the git-annex branch doesn't provide any way to do
that, except for iterating over every filename in the branch. Which
would be prohibitively slow when reinjecting many files. (N times git
ls-tree -r) So it would need to build a data structure to map from SHA256
to known SHA256E key. That can't be stored in memory, git-annex doesn't
let the content of the repo cause it to use arbitrary amounts of memory
(hopefully).
All I can think of is to traverse the git-annex branch and build a sqlite
database and then query that, but that would add quite a lot of setup
overhead to the command.
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="spwhitton"
avatar="http://cdn.libravatar.org/avatar/9c3f08f80e67733fd506c353239569eb"
subject="comment 2"
date="2020-01-07T12:29:47Z"
content="""
Thank you for your reply. Makes sense. If that's the only way to do it then it might as well be a helper script rather than part of git-annex.
Leaving this bug open because it would be good to have the limitation documented in git-annex-reinject(1).
"""]]

View file

@ -1,30 +0,0 @@
`git annex reinject --known` doesn't work in a bare repo.
spwhitton@iris:~/tmp>echo foo >bar
spwhitton@iris:~/tmp>mkdir baz
spwhitton@iris:~/tmp>cd baz
spwhitton@iris:~/tmp/baz>git init --bare
Initialized empty Git repository in /home/spwhitton/tmp/baz/
spwhitton@iris:~/tmp/baz>git annex init
init (scanning for unlocked files...)
ok
(recording state in git...)
spwhitton@iris:~/tmp/baz>git annex reinject --known ../bar
fatal: relative path syntax can't be used outside working tree.
fatal: relative path syntax can't be used outside working tree.
fatal: relative path syntax can't be used outside working tree.
fatal: relative path syntax can't be used outside working tree.
fatal: relative path syntax can't be used outside working tree.
fatal: relative path syntax can't be used outside working tree.
fatal: relative path syntax can't be used outside working tree.
fatal: relative path syntax can't be used outside working tree.
fatal: relative path syntax can't be used outside working tree.
fatal: relative path syntax can't be used outside working tree.
fatal: relative path syntax can't be used outside working tree.
git-annex: fd:15: hGetLine: end of file
Obviously this wasn't actually a file known to git-annex. But I get the same error in a non-dummy bare repo I am trying to reinject.
A workaround is to use `git worktree add` and run `git annex reinject` from there.
> [[fixed|done]] --[[Joey]]

View file

@ -1,7 +0,0 @@
`git annex find --batch` will not accept absolute paths to files in the repo, but `git annex find /abs/path` works.
I tested `git annex lookupkey --batch` which does not have this problem.
--spwhitton
> [[fixed|done]] --[[Joey]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-03-16T18:06:47Z"
content="""
Hmm, I am not reproducing this problem here.
Were you passing other options besides --batch, to eg match some files?
And what version?
"""]]

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="spwhitton"
avatar="http://cdn.libravatar.org/avatar/9c3f08f80e67733fd506c353239569eb"
subject="comment 2"
date="2020-03-25T16:31:13Z"
content="""
Hello Joey,
I was passing `--unlocked` only.
Version 8.20200226 installed from buster-backports.
Thanks!
"""]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-04-15T19:05:59Z"
content="""
Reproduced it, the problem only happens when the files are unlocked,
not the locked files I was trying. The --unlocked option is not the
problem.
"""]]

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2020-04-15T19:13:39Z"
content="""
Other commands like whereis --batch also behave the same.
Looks like what's going on is, when an absolute path is passed
as a parameter, it feeds thru git ls-files, producing a relative file.
But with --batch, it stays absolute. This causes things that try to eg,
look up the file in the tree to not find it.
So, --batch needs to make filepaths relative too..
"""]]

View file

@ -1,23 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2020-04-15T19:22:12Z"
content="""
Most of it can be fixed by making batchStart make
files relative.
Other affected commands that do custom parsing of
batch input, so will need to make the file from it
relative themselves: fromkey metadata rekey rmurl
Also, `git annex info /path/to/file` fails for unlocked
files and works for locked files, because it does not pass
filenames through git ls-files. I think it's the only
command that does not, when not in batch mode.
(I suppose alternatively, lookupKey could make the filename relative,
but I don't know if that is the only thing that fails on absolute
filenames, so prefer to make them all relative on input.)
Ok, all done..
"""]]

View file

@ -1,12 +0,0 @@
When `git annex -c foo.bar` runs git-annex transferrer,
it does not pass along the settings from -c.
(Note that, `git -c foo.bar annex` does propagate the -c. Git does it by
setting an environment variable, which causes git config to reflect the
override. The environment variable propagates to child processes.)
There are a lot of config settings that impact transfers,
and some of them might be commonly used at the command line, so something
needs to be done about this. --[[Joey]]
> [[done]]

View file

@ -1,16 +0,0 @@
Make `git-annex add --force-large` and `git-annex add --force-small`
add a specific file to annex or git, bypassing annex.largefiles
and all other configuration and state.
One reason to want this is that it avoids users doing stuff like this:
git -c annex.largefiles=anything annex add foo.c
Such a temporary setting of annex.largefiles can be problimatic, as explored in
<https://git-annex.branchable.com/bugs/A_case_where_file_tracked_by_git_unexpectedly_becomes_annex_pointer_file/>
Also, this could also be used to easily switch a file from one storage to
the other. I suppose the file would have to be touched first to make git-annex
add process it?
> [[done]] --[[Joey]]

View file

@ -1,22 +0,0 @@
I wanted to share some thoughts for an idea I had.
There are times when I want to stream data from a remote -- I want to start processing it immediately, and do not want to keep it in my annex when I am done with it.
I can give some examples:
* I have several projects which have a large number of similar text files, and they compress really well with borg or bup. For example, I have a repo with many [ncdu](https://dev.yorhel.nl/ncdu) json index files. They total 60G, but in a bup special remote, they are ~3G. In another repo, I have large highly differential tsv files.
* I have an annex with 5-10G video files that are stored in a variety of network special remotes. Most of them are in my Google Drive. I would like to be able to immediately start playing them with VLC rather than downloading and verifying them in their entirety.
It would look like this:
```
git annex cat "someindex.ncdu" | ncdu -f -
diff <(git annex cat "huge-data-dump1.tsv" -f mybupremote ) <(git annex cat "huge-data-dump2.tsv" -f mybupremote )
git annex cat "myvideo.mp4" -f googledrive | vlc -
```
I imagine that there might be issues with verification. But I really am ok with not verifying a video file I am streaming.
> [[dup|done]] --[[Joey]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="git-annex-cat"
date="2020-07-09T00:21:02Z"
content="""
Related: [[todo/git-annex-cat]]
"""]]

View file

@ -1,35 +0,0 @@
git-annex import --no-content means annex.largefiles is not checked, so
non-large files get added as annexed files. That's done because
annex.largefiles can contain expressions that need to examine the content
of the file. In particular for mimetype and mimeencoding.
So, if someone uses import --no-content in one repo, and in another clone
it's used with --content, importing the same files both times, a merge
conflict can result.
May be worth removing support for matching annex.largefiles when the
expression needs the file content, when importing from a special remote.
Or could detect when those are used, and only allow
importing with --content in that case.
> So this needs a way to introspect a preferred content expression
> to see if the terms used in it
> match some criteria. Another todo that also needs that is
> [[faster_key_lookup_for_limits]] --[[Joey]]
> > That introspection is implemented now.
Which is better? The repo may have annex.largefiles set in gitattributes
for good workflow reasons, so it would be very annoying to have importing
error out. And if importing ignores the configuration, the user is likely
to see that as a bug. If importing with --no-content looks at the config
and say "sorry, I can't, need the file content", the user can then choose
between changing largefiles or using --content, and it's clear how they're
asking for contradictory things.
Hmm, if largefiles does not match, it would have to download the file
content to add it to git, even though --no-content is used. A little weird,
but it's a small file, presumably.
[[done]] --[[Joey]]

View file

@ -1,211 +0,0 @@
This todo is about `git-annex import branch --from remote`, which is
implemented now.
> [[done]] --[[Joey]]
## race conditions
(Some thoughts about races that the design should cover now, but kept here
for reference.)
A file could be modified on the remote while
it's being exported, and if the remote then uses the mtime of the modified
file in the content identifier, the modification would never be noticed by
imports.
To fix this race, we need an atomic move operation on the remote. Upload
the file to a temp file, then get its content identifier, and then move it
from the temp file to its final location. Alternatively, upload a file and
get the content identifier atomically, which eg S3 with versioning enabled
provides. It would make sense to have the storeExport operation always return
a content identifier and document that it needs to get it atomically by
either using a temp file or something specific to the remote.
----
There's also a race where a file gets changed on the remote after an
import tree, and an export then overwrites it with something else.
One solution would be to only allow one of importtree or exporttree
to a given remote. This reduces the use cases a lot though, and perhaps
so far that the import tree feature is not worth building. The adb
special remote needs both. Also, such a limitation seems like one that
users might try to work around by initializing two remotes using the same
data and trying to use one for import and the other for export.
Really fixing this race needs locking or an atomic operation. Locking seems
unlikely to be a portable enough solution.
An atomic rename operation could at least narrow the race significantly, eg:
1. get content identifier of $file, check if it's what was expected else
abort (optional but would catch most problems)
2. upload new version of $file to $tmp1
3. rename current $file to $tmp2
4. Get content identifier of $tmp2, check if it's what was expected to
be. If not, $file was modified after the last import tree, and that
conflict has to be resolved. Otherwise, delete $tmp2
5. rename $tmp1 to $file
That leaves a race if the file gets overwritten after it's moved out
of the way. If the rename refuses to overwrite existing files, that race
would be detected by it failing. renameat(2) with `RENAME_NOREPLACE` can do that,
but probably many special remote interfaces don't provide a way to do that.
S3 lacks a rename operation, can only copy and then delete. Which is not
good enough; it risks the file being replaced with new content before
the delete and the new content being deleted.
Is this race really a significant problem? One way to look at it is
analagous to a git merge overwriting a locally modified file.
Git can certianly use similar techniques to entirely detect and recover
from such races (but not the similar race described in the next section).
But, git does not actually do that! I modified git's
merge.c to sleep for 10 seconds after `refresh_index()`, and verified
that changes made to the work tree in that window were silently overwritten
by git merge. In git's case, the race window is normally quite narrow
and this is very unlikely to happen (the similar race described in the next
section is more likely).
If git-annex could get the race window similarly small out would perhaps be
ok. Eg:
1. upload new version of $file to $tmp
2. get content identifier of $file, check if it's what was expected else
abort
3. rename (or copy and delete) $tmp to $file
The race window between #2 and #3 could be quite narrow for some remotes.
But S3, lacking a rename, does a copy that can be very slow for large files.
S3, with versioning, could detect the race after the fact, by listing
the versions of the file, and checking if any of the versions is one
that git-annex did not know the file already had.
[Using this api](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGETVersion.html),
with version-id-marker set to the previous version of the file,
should list only the previous and current versions; if there's an
intermediate version then the race occurred and it could roll the change
back, or otherwise recover the overwritten version. This could be done at
import time, to detect a previous race, and recover from it; importing
a tree with the file(s) that were overwritten due to the race, leading to a
tree import conflict that the user can resolve. This likely generalizes
to importing a sequence of trees, so each version written to S3 gets
imported.
----
A remaining race is that, if the file is open for write at the same
time it's renamed, the write might happen after the content identifer
is checked, and then whatever is written to it will be lost.
But: Git worktree update has the same race condition. Verified with
this perl oneliner, run in a worktree and a second later
followed by a git pull. The lines that it appended to the
file got lost:
perl -e 'open (OUT, ">>foo") || die "$!"; sleep(10); while (<>) { print OUT $_ }'
Since this is acceptable in git, I suppose we can accept it here too..
## S3 versioning and import
Listing a versioned S3 bucket with past versions results in S3 sending
a list that's effectively:
foo current-version
foo past-version
bar deleted
bar past-version
bar even-older-version
Each item on the list also has a LastModified date, and IsLatest
is set for the current version of each file.
This needs to be converted into a ImportableContents tree of file trees.
Getting the current file tree is easy, just filter on IsLatest.
Getting the past file trees seems hard. Two things are in tension:
* Want to generate the same file tree in this import that was used in past
imports. Since the file tree is converted to a git tree, this avoids
a proliferation of git trees.
* Want the past file trees to reflect what was actually in the
S3 bucket at different past points in time.
So while it would work fine to just make one past file tree for each
file, that contains only that single file, the user would not like
the resulting history when they explored it with git.
With the example above, the user expects something like this:
ImportableContents [(foo, current-version)]
[ ImportableContents [(foo, past-version), (bar, past-version)]
[ ImportableContents [(bar, even-older-version)]
[]
]
]
And the user would like for the inner-most list to also include
(foo, past-version) if it were in the S3 bucket at the same time
(bar, even-older-version) was added. So depending on the past
modificatio times of foo vs bar, they may really expect:
let l = ImportableContents [(foo, current-version)]
[ ImportableContents [(foo, past-version), (bar, past-version)]
[ ImportableContents [(foo, past-version), (bar, even-older-version)]
[ ImportableContents [(foo, past-version)]
[]
]
]
]
Now, suppose that foo is deleted and subsequently bar is added back,
so S3 now sends this list:
bar new-version
bar deleted
bar past-version
bar even-older-version
foo deleted
foo current-version
foo past-version
The user would expect this to result in:
ImportableContents [(bar, new-version)]
[ ImportableContents []
l
]
But l needs to be the same as the l above to avoid git trees proliferation.
What is the algorythm here?
1. Build a list of files with historical versions ([[a]]).
2. Extract a snapshot from the list
3. Remove too new versions from the list
4. Recurse with the new list.
Extracting a snapshot:
Map over the list, taking the head version of each item and tracking
the most recent modification time. Add the filenames to a snapshot list
(unless the item is a deletion).
Removing too new versions:
Map over the list, and when the head version of a file matches the most
recent modification time, pop it off.
This results in a list that is only versions before the snapshot.
Overall this is perhaps a bit better than O(n^2) because the size of the list
decreases as it goes?
---
See also, [[adb_special_remote]]
[[!tag confirmed]]

View file

@ -1,73 +0,0 @@
Need to support annex.largefiles when importing a tree from a special
remote.
Note that the legacy `git annex import` from a directory does honor
annex.largefiles.
> annex.largefiles will either need to be matched by downloadImport
> (changing to return `Either Sha Key`, or by buildImportTrees).
>
> If it's done in downloadImport, to avoid re-download of non-large files,
> the content identifier will
> need to be recorded as using the git sha1. This needs a way to encode
> a git sha as a key, that's a bijective mapping (so distinct from annex
> sha1 keys).
>
> Problem: In downloadImport, startdownload checks getcidkey
> to see if the ContentIdentifier is already known, and if so, returns the
> key used for it before. But, with annex.largefiles, the same content
> might be annexed given one filename, and not annexed with another.
> So, the key from getcidkey might not be the right one (or there could be
> more than one, an annex key and a translated git key).
>
> That argues against making downloadImport match annex.largefiles.
> But, if instead buildImportTrees matches annex.largefiles,
> then downloadImport has already run moveAnnex on the download,
> so the content is in the annex. Moving it back out of the annex is
> difficult (there may be other files in the repo using the same key).
> So, downloadImport would then need to not moveAnnex, but move it to
> somewhere temporary. Like the gitAnnexTmpObjectLocation, but using
> that would be a problem if there was a file in the repo
> and git-annex get was run on it at the same time. So an equivilant
> but separate location.
>
> Further problem: downloadImport might skip a download of a CID
> that's already been seen. That CID might have generated a key
> before. The key's content may not still be present in the local
> repo. Then, if buildImportTrees checks annex.largefiles and wants
> to add it directly to git, it won't have the content available to add to
> git. (Conversely, the CID may have been added to git before, but
> annex.largefiles matches now, and so it would need to extract
> the content from git only to store it in the annex, which is doable but
> seems pointless as it's not going to save any space.)
>
> Would it be acceptable for annex.largefiles to be ignored if the same
> content was already imported from a remote earlier? I think maybe so.
>
> Then all these problems are not a concern, and back to downloadImport
> checking annex.largefiles being the simplest approach, since it avoids
> needing the separate temp file location.
>
> From the user's perspective, the special remote contained a file,
> it was already imported in the past, and the file has been renamed.
> It makes no more sense for importing it again to change how it's
> stored between git and annex than it makes sense for git mv of a file
> to change how it's stored.
>
> However... If two people can access the special remote, and import
> from it at different times, and get different trees as a result,
> that might break some assumptions, might also lead to merge
> conflicts. --[[Joey]]
>
> > Importing updates export.log, to indicate the state of the remote
> > (the log file could have been named better). So an annex.largefiles
> > change would result in an export/import conflict. Such a conflict
> > can be resolved by using git-annex export, but this could be a
> > surprising situation for users to encounter, since there is no real
> > conflict.
> >
> > Still, this doesn't feel like a reason not to implement the feature,
> > necessarily.
[[done]]

View file

@ -1,9 +0,0 @@
It would be great to be able to use the pubDate of the entries with the --template option of importfeed.
Text.Feed.Query has a getItemPublishDate (and a getFeedPubDate, if we want some kind of ${feeddate}).
The best would be to allow a reformating of the date(s) with (for example) %Y-%m-%D
> itempubdate was added years ago and I forgot to close this,
> but I've now also added itempubmonth, itempubday, etc. [[done]]
> --[[Joey]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="gueux"
ip="2a01:240:fe6d:0:7986:3659:a8bd:64f1"
subject="syntax"
date="2013-09-12T14:05:16Z"
content="""
use \"itemdate\" and \"feeddate\" as names?
use ${itemdate=%Y-%m-%D} syntax option?
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.2.134"
subject="comment 2"
date="2013-09-13T19:53:52Z"
content="""
getItemPublishDate returns a String, which can contain any of several date formats. Deferred until the feed library has something more sane.
Upstream bug: <https://github.com/sof/feed/issues/6>
As for how to format the date in the feed, I would be ok with having itemdate (YYYYMMDD), itemyear (YYYY), itemmonth (MM) and itemday (DD). Full date formatting seems like overkill here.
"""]]

View file

@ -1,5 +0,0 @@
The documentation for the new import remote command says, "Importing from a special remote first downloads all new content from it". For many special remotes -- such as Google Cloud Storage or DNAnexus -- checksums and sizes of files can be determined without downloading the files. For other special remotes, data files might have associated checksum files (e.g. md5) stored next to them in the remote. In such cases, it would help to be able to import the files without downloading (which can be costly, especially from cloud provider egress charges), similar to addurl --fast .
[[!tag confirmed]]
> [[done]] (only implemented for directory for now) --[[Joey]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 10"
date="2020-07-03T19:55:36Z"
content="""
\"the key generated by import --fast is probably not be the same one generated by a regular import\" -- but that happens already with addurl; is the problem worse here?
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 11"""
date="2020-07-24T17:50:03Z"
content="""
Yes, it can also happen with addurl, but I think it's less likely that two
users add the same url with and without --fast or --relaxed than that two
users sync with the same remote with and without --content.
Anyway, I opened [[sync_fast_import]].
"""]]

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-03-19T17:46:09Z"
content="""
It would also be possible for listImportableContents to
return an url that can be used to publically download the content,
which git-annex could derive a URL key from (as well as recording the url).
If the ContentIdentifier is something globally unique or using some kind
of proprietary hashing (like an S3 version ID), it could be used to
construct a key. (Note that it would be possible for a remote to include its
UUID in the ContentIdentifier if it's not otherwise globally unique.)
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="annex.thin for importing from directory special remote"
date="2020-07-01T22:23:58Z"
content="""
As a special case, when importing from a directory special remote, could there be an option to hardlink the files into the repo instead of copying them?
"""]]

View file

@ -1,41 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-07-02T18:18:57Z"
content="""
Yeah, a directory special remote special case would be good.
It's kind of needed for [[remove_legacy_import_directory_interface]].
It could just as well hash the file in place in the directory,
and leave it there, not "downloading" it into the annex. Which avoids
me having to think about whether hard linking to files in a
special remote makes any kind of sense. (My gut feeling is it's not
the same as hard linking inside a git-annex repo.)
This approach needs this interface to be added.
importKey :: Maybe (ExportLocation -> ContentIdentifier -> ByteSize -> Annex Key)
Then just use that, when it's available, rather than
retrieveExportWithContentIdentifier. Easy enough.
And other remotes could use this interface too.
If some other remote has public urls, it could generate a URL key
and return that. And if a remote has server-side checksums, it can generate
a key from the checksum, as long as it's a checksum git-annex supports.
So this interface seems sufficiently general.
This would be easy to add to the special remote protocol too, although
some new plumbing command might be needed to help generate a key
from information like the md5 and size. Eg,
`git annex genkey --type=MD5 --size=100 --value=3939393` and `git annex genkey
--type=URL value=http://example.com/foo`
----
User interface changes: `git-annex import --from remote --fast` and
`git annex sync` without --content could import from a remote that
way, if it supports importKey. (Currently sync only imports with
--content so this is kind of a behavior change, but I think an ok one to
make.)
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 4"
date="2020-07-02T20:22:25Z"
content="""
Thanks -- this would solve (among other things) [[bugs/removeLink_failed_when_initializing_a_repo_in_a_VirtualBox_shared_folder]]: I could put the git-annex repo on the normal filesystem inside the VM, and only the directory special remote would then deal with the broken vboxsf filesystem. import-tree *with* copying isn't possible as the files are too big.
"""]]

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2020-07-03T01:46:46Z"
content="""
Hmm, it would also be possible for a remote to generate a WORM key,
as long as there was a way for it to get a timestamp for the file being
imported.
That might let it be implemented for several other special remotes.
Although I'm wary about making git-annex ever use WORM without being
explicitly asked to. annex.eatworms? ;)
"""]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2020-07-03T15:52:26Z"
content="""
This has merge conflict potential, because the key generated by import
--fast is probably not be the same one generated by a regular import. So, if
two repositories are both importing from the same special remote, there will be
a need to resolve the resulting merge conflicts.
Since git-annex sync is often run with and without --content, it's probably
the most likely problem point for this. Perhaps there should be another
config that controls whether sync does a fast import or not, and not
control it with --content?
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 7"""
date="2020-07-03T16:27:10Z"
content="""
Hmm, --fast is not very descriptive for this when it's used with a
directory special remote, because hashing is almost as slow as copying.
Probably better to use --no-content and --content, same as sync.
(Though unfortunately with an opposite default though iirc there are plans
somewhere to transition sync to default to --content).
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 8"""
date="2020-07-03T17:39:19Z"
content="""
Note that, since exporttree remotes are always untrusted, after importing
--no-content from one, fsck is going to complain about it being the only
location with the content.
Which seems right.. That content could be overwritten at any time and the
only copy lost. But still worth keeping in mind.
"""]]

View file

@ -1,13 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 9"""
date="2020-07-03T18:29:05Z"
content="""
implemented, directory remote only, but it could be added to adb easily,
and possibly to S3. Also added it to the proposed import extension to the
external special remote protocol.
Still unsure what to do about git-annex sync without --content importing.
For now, sync doesn't do content-less imports still, but that could be
changed if the concerns in comment #6 are dealt with.
"""]]

View file

@ -1,83 +0,0 @@
When a `git annex move` is interrupted at a point where the content has
been transferred, but not yet dropped from the remote, resuming the move
will often refuse to drop the content, because it would violate numcopies.
Eg, if numcopies is 2, and there is only 1 extant copy, on a remote,
git-annex move --from remote will normally ignore numcopies (since it's not
getting any worse) and remove the content from the remote after
transferring it. But, on resume, git-annex sees there are 2 copies and
numcopies is 2, so it can't drop the copy from the remote.
This happens to me often enough to be annoying. Note that being interrupted
during checksum verification makes it happen, so the window is relatively
wide.
I think it can also happen with move --to, although I can't remember seeing
that.
Perhaps some local state could avoid this problem?
--[[Joey]]
> One simple way would be to drop the content from the remote before moving
> it to annex/objects/. Then if the move were interrupted before the drop,
> it could resume the interrupted transfer, and numcopies would work the
> same as it did when the move started.
>
> > After an interrupted move, whereis would say the content is present,
> > but eg an annex link to it would be broken. That seems surprising,
> > and if the user doesn't think to resume the move, fsck would have to be
> > made to deal with it. I don't much like this approach, it seems to
> > change an invariant that usually existance of copy on disk is ground
> > truth, and location tracking tries to reflect it. With this, location
> > tracking would be correct, but only because the content is in an
> > unusual place on disk that it can be recovered from.
>
> Or: Move to annex/objects/ w/o updating local location log.
> Then do the drop, updating the remote's location log as now.
> Then update local location log.
> >
> > If interrupted, and then the move is resumed, it will see
> > there's a local copy, and drop again from the remote. Either that
> > finishes the interrupted drop, or the drop already happened and it's a
> > noop. Either way, the local location log then gets updated.
> > That should clean things up.
> >
> > But, if a sync is done with the remote first, and then the move
> > is resumed, it will no longer think the remote has a copy. This is
> > where the only copy can appear missing (in whereis). So a fsck
> > will be needed to recover. Or, move could be made to recover from
> > this too, noticing the local copy and updating the location log to
> > reflect it.
> >
> > Still, if the move is interrupted and never resumed, after a sync
> > with the remote, the only copy appears missing, which does seem
> > potentially confusing.
> Local state could be a file listing keys that have had a move started
> but not finished. When doing the same move, it should be allowed to
> succeed even if numcopies would prevent it. More accurately, it
> should disregard the local copy when checking numcopies for a move
> --from. And for a move --to, it should disregard the remote copy.
> May need 2 separate lists for the two kinds of moves.
>
> > This is complex to implement, but it avoids the gotchas in the earlier
> > ideas, so I think is best. --[[Joey]]
> > > Implementation will involve willDropMakeItWorse,
> > > which is passed a deststartedwithcopy that currently comes from
> > > inAnnex/checkPresent. Check the log, and if
> > > the interrupted move started with the move destination
> > > not having a copy, pass False.
Are there any situations where this would be surprising? Eg, if git-annex
move were interrupted, and then a year later, run again, and proceeded
to apparently violate numcopies?
Maybe, OTOH I've run into this problem probably weeks after the first move
got interrupted. Eg, if files are always moved from repo A to repo B,
leaving repo A empty, this problem can cause stuff to build up on repo A
unexpectedly. And in such a case, the timing of the resumed move does not
matter, the user expected files to always get eventually moved from A.
[[fixed|done]] --[[Joey]]

View file

@ -1,21 +0,0 @@
Several todos need to examine preferred content expressions to see if
any of the terms in them match some criteria.
That includes:
* [[todo/sync_fast_import]]
* [[todo/faster_key_lookup_for_limits]]
* [[todo/skip_first_pass_in_git_annex_sync]]
Internally, preferred content expressions are compiled
into a `Matcher (AssumeNotPresent -> MatchInfo -> Annex Bool)`
The presence of the function there is a problem, because haskell does not
allow comparing functions for equality. So probably what is needed is
something that contains that function but also indicates which preferred
content term it's for.
Or, perhaps, not the term, but the specific criteria needed by each such
todo.
> [[done]] --[[Joey]]

View file

@ -1,14 +0,0 @@
I don't want files that I dropped to immediately disappear from my local or all of my remotes repos on the next sync. Especially in situations where changes to the git-annex repo get automatically and immediately replicated to remote repos, I want a configurable "grace" period before files in .git/annex/objects get really deleted.
This has similarities to the "trash" on a desktop. It might also be nice to
* configure a maximum amount of space of the "trash"
* have a way to see the contents of the trash to easily recover deleted files
Maybe it would make sense to just move dropped files to the desktops trash? "git annex trash" as an alternative to drop?
> This seems likely to have been a misunderstanding of what drop does,
> since dropping from the local repo would not remove the content from a
> remote.
>
> closing as there's no clear todo here. [[done]] --[[Joey]]

View file

@ -1,29 +0,0 @@
The forwardRetry RetryDecider keeps retrying a transfer as long as at least
one more byte got transferred than in the previous, failed try.
Suppose that a transfer was restarting from the beginning each time, and it
just so happened that each try got a tiny little bit further before
failing. Then transferring an `N` byte object could result in `sum [1..N]`
bytes being sent. Worst case. (Real world it involves the size of chunks
sent in a failing operation, so probably `sum [1..N/1024]` or so.)
So I think forwardRetry should cap after some amount of automatic retrying.
Ie, it could give up after 5 retries. --[[Joey]]
Of course, the real use case for forwardRetry is remotes that use eg, rsync
and can really resume at the last byte. But, forwardRetry can't tell
if a remote is doing that (unless some timing heuristics were used). Around
5 retries seems fairly reasonable for that case too, it would be unlikely
for a rsync transfer to keep failing so many times while still making
forward progess. --[[Joey]]
> Or could add data to remotes about this, but it would need to be added
> for external special remotes too, and this does not really seem worth the
> complication.
>
> I think, even if a remote does not support resuming like
> rsync, it makes sense to retry a few failed transfers if it's getting
> closer to success each time, because forward progress suggests whatever
> made it fail is becoming less of a problem.
[[done]] --[[Joey]]

View file

@ -1,19 +0,0 @@
Add --maximum-cost=N which prevents trying to access any remotes with a
larger cost. May as well add --minimum-cost too for completeness.
My use case: Want to git annex get --auto and pull from any of 3 usb
drives, but not from the network. --[[Joey]]
> Hmm, [[todo/to_and_from_multiple_remotes]] might be another way to do
> that. Put the 3 drives in a git remote group, or list the remotes on the
> fly.
>
> There could still be benefit in avoiding high cost remotes. But, the cost
> numbers are only intended to create a local ordering, so making them part of a
> user interface is kind of weird. While 50 might be a high cost in one
> repository, in another repository it could be a fairly low cost. The user
> would need to examine all the costs to pick the cost they want; using
> remote names seems better UI. --[[Joey]]
> > that seems convincing reason not to implement this and instead
> > implement remote groups. [[wontfix|done]] --[[Joey]]

View file

@ -1,33 +0,0 @@
ATM upon `get` of a file for which no remote in .git/config provides its content, git-annex spits out a message like
[[!format sh """
/tmp/najafi-2018-nwb > git annex get data/FN_dataSharing/nwb/mouse1_fni16_150817_001_ch2-PnevPanResults-170808-190057.nwb
(merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
get data/FN_dataSharing/nwb/mouse1_fni16_150817_001_ch2-PnevPanResults-170808-190057.nwb
Remote origin not usable by git-annex; setting annex-ignore
(not available)
Try making some of these repositories available:
2cca1320-6f51-4acf-a778-efdc79f87ab3 -- smaug:/mnt/btrfs/datasets/datalad/crawl/labs/churchland/najafi-2018-nwb
e513795e-1311-431d-8106-917d9528cfbd -- datasets.datalad.org
(Note that these git remotes have annex-ignore set: origin)
failed
(recording state in git...)
git-annex: get: 1 failed
"""]]
although those remote descriptions/names give an idea for an informed user, they do not event differentiate between regular and special remotes. Special remotes could just be "enabled", some of them might even have `autoenable` set. May be it could separate them and provide a message like
[[!format sh """
...
Try making some of these repositories available:
2cca1320-6f51-4acf-a778-efdc79f87ab3 -- smaug:/mnt/btrfs/datasets/datalad/crawl/labs/churchland/najafi-2018-nwb
or enable (using git annex enableremote <name>) one of:
e513795e-1311-431d-8106-917d9528cfbd -- datasets.datalad.org
"""]]
[[!meta author=yoh]]
> implemented as shown. [[done]] --[[Joey]]

View file

@ -1,38 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-09-30T17:55:22Z"
content="""
I'm not sure that the distinction between regular and special remotes is
likely to matter in general?
If I intuit correctly, in your use case, you may have special remotes that
are extremely easy to enable. (Auto-enabling seems a red herring since it
didn't get autoenabled). While conversely some random repository
might be on a LAN/device the user doesn't have access to.
But it seems just as likely that a user might have a special remote that
needs installing extra software to access, or needs a password or other
authentication method that's a pain, but it be easy enough to add a ssh
remote pointing at another repository on the LAN, or to mount a drive.
Or in my personal setup, some repositories are on offline drives and a pain
to access, others are on network attached storage and easy, and special
remotes are a distant third choice. (I use repo descriptions to
differentiate.)
I also feel that this message is already really too verbose, and adding
lots more instructions to it will overall hurt usability. Bear in mind
there can be many such messages displayed by a single command.
Also, the proposed output suggesting to run git-annex enableremote doesn't
make sense if the special remote is actually already enabled, but was still
not able to be accessed for whatever reason. The existing message is
intentionally worded so it works in either case, disambiguated by
displaying the names of the remotes that are enabled.
It might be that more metadata about repositories would help, like it
already separates out untrusted repositories into a separate list.
But it would have to be metadata that applies to all users of a repository,
or is somehow probed at runtime.
"""]]

View file

@ -1,34 +0,0 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 2"
date="2019-09-30T20:35:59Z"
content="""
> I'm not sure that the distinction between regular and special remotes is likely to matter in general?
Those (regular git repositories, and special remotes) are technically completely different beasts, and \"made available\" using different mechanisms (`git remote add` vs `git annex enableremote`). Listing them in one list makes it hard-to-impossible for a user to choose a correct command without background knowledge. Indeed some of them (regardless of the type) would be harder to \"make available\" than the others, but that is different type of information which annex unlikely to ever contain and thus to express in the message. `autoenabled` ones though are more likely to be the \"easy ones\".
> (Auto-enabling seems a red herring since it didn't get autoenabled)
`datalad install` autoenables by default since we call `git annex init` on a fresh clone (IIRC if we see `git-annex` branch on remote). With pure `git annex`, I believe it is only if you run `git annex init` explicitly after cloning, you would get it autoenabled. So `git clone https://github.com/dandi/najafi-2018-nwb && cd najafi-2018-nwb && git annex get data/FN_dataSharing/nwb/mouse1_fni16_150817_001_ch2-PnevPanResults-170808-190057.nwb` wouldn't work, while the one with `git annex init` before `git annex get` would.
So I wouldn't say it is `red herring` per se - I (user) can end up in a situation where a special remote was not enabled since I did not explicitly `git annex init` locally.
> ... Bear in mind there can be many such messages displayed by a single command.
yeah, that is what I (as a user) dislike as well. I even thought that in `datalad` (e.g. [#3078](https://github.com/datalad/datalad/issues/3078)) we could parse those and provide a single summary statement... I think that splitting here into two wouldn't be the straw to break the camel's back. Some more generic (re)solution is needed.
> Also, the proposed output suggesting to run git-annex enableremote doesn't make sense if the special remote is actually already enabled, but was still not able to be accessed for whatever reason.
Indeed. But `git annex` \"knows\" either any given special remote was or was not available/tried, correct? To a user (if we forget about the verbosity for a moment) most informative message then could be
1. a list of remotes which tried but failed (thus might need to be \"made available) - may be even with some reason for each (e.g. \"connection time out\", \"file is missing\", ...)
2. a list of regular remotes (to be added via `git remote add`)
3. a list of special remotes (to be enabled via `git annex enableremote`)
from `1.` I would see if I should do something about what I had already connected to, from 2. and 3. I would immediately see what and how to enable (if I see that I potentially has access to it)
> It might be that more metadata about repositories would help, like it already separates out untrusted repositories into a separate list.
Besides considering untrusted repos last (could be placed last in any corresponding list) I personally do not see such separation as useful.
"""]]

View file

@ -1,25 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2020-09-22T16:15:49Z"
content="""
Yes, it knows which remotes are configured, and every configured remote
that it's going to list will have been tried and not been accessible
when there's such a message. So, the list can be split into repos
that have a remote and those without one. Eg:
Try making some of these remotes accessible:
2370e576-fcef-11ea-a46e-7fce4739e70f -- joey@localhost:/media/usb [usbdrive]
346cad24-fcef-11ea-a275-d3951b734346 -- joey@server:repo [origin]
9808c3da-fcf0-11ea-b47f-cfa6e90a9d4a -- amazon S3
Maybe enable some of these special remotes (git annex enableremote):
e513795e-1311-431d-8106-917d9528cfbd -- datasets.datalad.org
Maybe add some of these git remotes (git remote add):
2cca1320-6f51-4acf-a778-efdc79f87ab3 -- smaug:/mnt/btrfs/datasets/datalad/crawl/labs/churchland/najafi-2018-nwb
So only 2 lines longer at most.
(The "Maybe" wording is because "And/or" is so ugly, and yet
the user may need to only do one, or more than one, depending on what
they're doing.)
"""]]

View file

@ -1,16 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2020-09-22T16:45:19Z"
content="""
This risks changing the --json output. Eg currently it has:
{"command":"get","wanted":[{"here":false,"uuid":"7f03b57d-5923-489a-be26-1ab254d0620d","description":"archive-13 [house]"}],"note":"from house...\nrsync failed -- run git annex again to resume file transfer\nUnable to access these remotes: house\nTry making some of these repositories available:\n\t7f03b57d-5923-489a-be26-1ab254d0620d -- archive-13 [house]\n","skipped":[]
The "wanted" list comes from the display of the list of
uuids, but now there would be up to 3 lists displayed.
I doubt anything uses that, but I don't want to change the json,
so I suppose it would need to keep the current behavior when json is
enabled, ugh.
"""]]

View file

@ -1,6 +0,0 @@
The http special remote doesn't currently support being used with a
--sameas remote that uses exporttree=yes.
It seems like this should be fairly easy to implement. --[[Joey]]
> [[done]] --[[Joey]]

View file

@ -1,7 +0,0 @@
I want to add some dotfiles in the root of my repository to git-annex as unlocked annexed files. So I edited `.git/info/attributes` to remove the line `.* !filter`, such that it only contains the line `* filter=annex`. This seems to be working fine.
I was thinking that it might make sense to have a `git annex config` option to tell git-annex not to add the `.* !filter` line to `.git/info/attributes` when initialising other clones of this repo. In the meantime, I've worked around it using a `post_checkout` hook in my `~/.mrconfig` which edits `.git/info/attributes`.
--spwhitton
> annex.dotfiles added, [[done]] --[[Joey]]

Some files were not shown because too many files have changed in this diff Show more