remove old closed bugs and todo items to speed up wiki updates and reduce size
Remove closed bugs and todos that were last edited or commented before 2021. Except for ones tagged projects/* since projects like datalad want to keep around records of old deleted bugs longer. Command line used: for f in $(grep -l '|done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2021 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2021 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done for f in $(grep -l '\[\[done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2021 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2021 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done
This commit is contained in:
parent
a9db0a5055
commit
28921af543
566 changed files with 0 additions and 15810 deletions
|
@ -1,20 +0,0 @@
|
|||
This suggestion has come from being surprised at the behaviour of "import --skip-duplicates" which copies files instead of moving them and leaves the source directory untouched (description implies it will just leave duplicates alone).
|
||||
|
||||
Apologies for the brevity, I've already typed this out once..
|
||||
|
||||
"import" has several behaviours which can be controlled through some options, but they don't cover all wanted behaviours. This suggestion is for an alternative interface to control these behaviours, totally stolen from rsync :P
|
||||
|
||||
# create symlinks (s), inject content (i) and delete from source (d)
|
||||
# duplicate (D) and new (N) files
|
||||
git annex import --mode=Dsid,Nsid $src # (default behaviour)
|
||||
git annex import --mode=Dsi,Nsi $src # --duplicate
|
||||
git annex import --mode=Dd,Nsid $src # --deduplicate
|
||||
git annex import --mode=Nsi $src # --skip-duplicates
|
||||
git annex import --mode=Dd $src # --clean-duplicates
|
||||
git annex import --mode=Did,Nsid $src # (import new, reinject duplicate.. really want this!)
|
||||
git annex import --mode=Ns $src # (just creates symlinks for new)
|
||||
git annex import --mode=Nsd $src # (invalid mode due to data loss)
|
||||
git annex import --mode=Nid $src # (invalid or require --force)
|
||||
|
||||
> Current thinking is in [[remove_legacy_import_directory_interface]].
|
||||
> This old todo is redundant, so [[wontfix|done]] --[[Joey]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="CandyAngel"
|
||||
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
|
||||
subject="comment 1"
|
||||
date="2017-01-16T10:30:55Z"
|
||||
content="""
|
||||
This [[TODO|todo/import_--reinject/]] (and \"reinject --known\") would then be:
|
||||
|
||||
git annex import --mode=Did
|
||||
|
||||
"""]]
|
|
@ -1,33 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2017-02-07T20:24:29Z"
|
||||
content="""
|
||||
Bearing in mind that I would have to *support* all of the resulting
|
||||
combinatorial explosion, and that several combinations don't make sense,
|
||||
or are unsafe, or seem useless, I think I'd rather keep it limited to
|
||||
well-selected points from the space.
|
||||
|
||||
I've fixed the description of --skip-duplicates to match its behavior.
|
||||
I don't know if there's a good motivation for it not deleting the files it
|
||||
does import. I'd almost rather have thought that was a bug in the
|
||||
implementation, but the implementation explicitly copies rather than moves
|
||||
files for --skip-duplicates, so that does seem to have been done
|
||||
intentionally. In any case, `--clean-duplicates` can be run after it to
|
||||
delete dups, I suppose.
|
||||
|
||||
An implementation of --mode=Did,Nsid seemed worth adding at first, perhaps
|
||||
as --reinject-duplicates. But thinking about it some more,
|
||||
that would be the same as:
|
||||
|
||||
git annex reinject --known /path/*
|
||||
git annex import /path/*
|
||||
|
||||
The first command moves all known files into the annex, which leaves
|
||||
only non-duplicate files for the second command to import.
|
||||
|
||||
The only time I can think of that this might not be suitable is if `/path` is
|
||||
getting new files added to it while the commands run... But in that case
|
||||
you can `mkdir /path/toimport; mv /path/* /path/toimport` and then
|
||||
run the 2 commands on `/path/toimport/*`
|
||||
"""]]
|
|
@ -1,16 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="CandyAngel"
|
||||
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
|
||||
subject="comment 3"
|
||||
date="2017-02-07T22:51:15Z"
|
||||
content="""
|
||||
An implementation of --mode=Did,Nsid seemed worth adding at first, perhaps as --reinject-duplicates. But thinking about it some more, that would be the same as
|
||||
|
||||
git annex reinject --known /path/*
|
||||
git annex import /path/*
|
||||
|
||||
|
||||
--mode=Did,Nsid would be quite a bit faster because it wouldn't hash the files twice, which is an advantage this suggestion has over any multiple command alternative.
|
||||
|
||||
If you want to keep it to certain points in space rather than deal with all combinations, you could whitelist which ones are acceptable and people can request more to be whitelisted as they discover use cases for those modes. The current commands would alias to the modes (which would also make their behaviour obvious if this alias is mentioned in the documentation).
|
||||
"""]]
|
|
@ -1,15 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2017-02-09T19:33:46Z"
|
||||
content="""
|
||||
Actually, import --deduplicate, --skip-duplicates, --clean-duplicates
|
||||
are implemeted naively and do hash files twice. So it's
|
||||
the same efficiency..
|
||||
|
||||
But, I just finished a more complicated implementation that avoids
|
||||
the second hashing.
|
||||
|
||||
That does make the combined action worth adding, I suppose. Done so as
|
||||
--reinject-duplicates.
|
||||
"""]]
|
|
@ -1,27 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2017-02-09T19:45:26Z"
|
||||
content="""
|
||||
I feel that the problem with this idea is that the suggested
|
||||
actions "create symlinks (s), inject content (i) and delete from source (d)"
|
||||
are only an approximation of how import is implemented. If they perfectly
|
||||
matched the implementation, then import could treat them as a DSL and
|
||||
simply evaluate the expression to do its work. But it's not that simple.
|
||||
For one thing, --deduplicate and --clean-duplicates don't simply "delete from source" the
|
||||
duplicates; they first check that numcopies can be satisfied. The default
|
||||
import behavior doesn't "sid", in fact it moves from source to the work tree
|
||||
(thus implicitly deleting from source first), then injects, and then creates
|
||||
the symlink. Everything has dependencies and interrelationships, and the best
|
||||
way I've found to express that so far is as the Haskell code in
|
||||
Command/Import.hs.
|
||||
|
||||
Even exposing that interface and using the current implementation for
|
||||
particular canned expressions seems risky; exposing imperfect abstractions
|
||||
can shoot you in the foot later when something under the abstraction needs
|
||||
to change.
|
||||
|
||||
So I'd rather improve the documentation for git-annex import if it is
|
||||
unclear. Not opposed to finding a way to work in these "Dsid,Nsid"
|
||||
summaries to the the documentation.
|
||||
"""]]
|
|
@ -1,5 +0,0 @@
|
|||
My original use case was for using git-annex find from scripts, where I didn't want to depend on the branch
|
||||
checked out at the time, but rather write something like "git annex find --branch=master $searchterms"
|
||||
|
||||
> this was [[done]] some years leter and this todo forgotten about until I
|
||||
> noticed it now, so closing belatedly. --[[Joey]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.154"
|
||||
subject="comment 1"
|
||||
date="2014-03-17T19:48:57Z"
|
||||
content="""
|
||||
The difficulty with adding a --branch is that if it causes git-annex to operate on a list of (file, key) from the branch, then commands that actually modify the working tree would modify it, instead of the branch. So the options seem to be only generating a list of keys, and so only letting commands that operate on keys work (which rules out the `git annex find` example), or carefully arranging for commands that actually affect the work tree to not be usable with this option.
|
||||
|
||||
I'm not sure how many commands are affected. The ones I can immediately think of are sync, lock, unlock. (Commands like get obviously affect the work tree in direct mode, but it's fine to have getting a file from a branch also update files in the work tree, if they pointed at the same key.)
|
||||
"""]]
|
|
@ -1,7 +0,0 @@
|
|||
Can the `annex.addunlocked` be extended to have the same syntax as `annex.largefiles`? Also, can there be separate settings affecting `git add` and `git annex add`, e.g. `annex.git-add.addunlocked` and `annex.git-annex-add.addunlocked`, with both defaulting to the value of `annex.addunlocked` if not set?
|
||||
|
||||
Basically, I want a reliable way to prevent inadvertently adding files as annexed unlocked files.
|
||||
|
||||
Related: [[forum/lets_discuss_git_add_behavior]]
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2019-10-08T18:35:06Z"
|
||||
content="""
|
||||
It is not possible for `git add` to add files in locked form. git's
|
||||
interface simply does not allow that.
|
||||
"""]]
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="preventing inadvertently adding annexed files in unlocked form"
|
||||
date="2019-10-11T16:38:07Z"
|
||||
content="""
|
||||
> It is not possible for git add to add files in locked form. git's interface simply does not allow that.
|
||||
|
||||
Makes sense. Then, [[separate annex.largefiles.git-add and annex.largefiles.git-annex-add settings]] seems like the way to prevent inadvertently adding files to annex in unlocked form.
|
||||
|
||||
Related: [[todo/auto-lock_files_after_one_edit]]
|
||||
"""]]
|
|
@ -1,32 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2019-12-19T15:29:40Z"
|
||||
content="""
|
||||
Retargeting this todo at something useful post-git-add-kerfluffle,
|
||||
annex.addunlocked could usefully be a pagespec to allow adding some files
|
||||
unlocked and others locked (by git-annex add only, not git add).
|
||||
"true" would be the same as "anything" and false as "nothing".
|
||||
|
||||
---
|
||||
|
||||
It may also then make sense to let it be configured in .gitattributes.
|
||||
Although, the ugliness of setting a pagespec in .gitattributes,
|
||||
as was done for annex.largefiles, coupled with the overhead of needing to
|
||||
query that from git-check-attr for every file, makes me wary.
|
||||
|
||||
(Surprising amount of `git-annex add` time is in querying the
|
||||
annex.largefiles and annex.backend attributes. Setting the former in
|
||||
gitconfig avoids the attribute query and speeds up add of smaller files by
|
||||
2%. Granted I've sped up add (except hashing) by probably 20% this month,
|
||||
and with large files the hashing dominates.)
|
||||
|
||||
The query overhead could maybe be finessed: Since adding a file
|
||||
already queries gitattributes for two other things, a single query could be
|
||||
done for a file and the result cached.
|
||||
|
||||
Letting it be globally configured via `git-annex config` is an alternative
|
||||
that I'm leaning toward.
|
||||
(That would also need some caching, easier to implement and faster
|
||||
since it is not a per-file value as the gitattribute would be.)
|
||||
"""]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2019-12-20T19:45:21Z"
|
||||
content="""
|
||||
Made annex.addunlocked support expressions like annex.largefiles.
|
||||
|
||||
And both of them can be set globally with `git annex config`. I did not
|
||||
make annex.addunlocked be settable by git attribute, because my sense is
|
||||
that `git annex config` covers that use case, or mostly so.
|
||||
"""]]
|
|
@ -1,83 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="what am I doing wrong?"
|
||||
date="2020-01-13T20:05:38Z"
|
||||
content="""
|
||||
I have tried to use this but I do not see it in effect:
|
||||
|
||||
[[!format sh \"\"\"
|
||||
$> mkdir repo && cd repo && git init && git annex init && git annex config --set addunlocked anything && git show git-annex:config.log && touch 1 2 && git add 1 && git annex add 2 && git commit -m 'committing' && ls -l && git show
|
||||
Initialized empty Git repository in /tmp/repo/.git/
|
||||
init (scanning for unlocked files...)
|
||||
ok
|
||||
(recording state in git...)
|
||||
addunlocked anything ok
|
||||
(recording state in git...)
|
||||
1578945668.466039639s addunlocked anything
|
||||
add 2
|
||||
ok
|
||||
(recording state in git...)
|
||||
[master (root-commit) e428211] committing
|
||||
2 files changed, 1 insertion(+)
|
||||
create mode 100644 1
|
||||
create mode 120000 2
|
||||
total 4
|
||||
-rw------- 1 yoh yoh 0 Jan 13 15:01 1
|
||||
lrwxrwxrwx 1 yoh yoh 178 Jan 13 15:01 2 -> .git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
|
||||
commit e428211fe0c64e67cf45d8c92165c866db5ba75f (HEAD -> master)
|
||||
Author: Yaroslav Halchenko <debian@onerussian.com>
|
||||
Date: Mon Jan 13 15:01:08 2020 -0500
|
||||
|
||||
committing
|
||||
|
||||
diff --git a/1 b/1
|
||||
new file mode 100644
|
||||
index 0000000..e69de29
|
||||
diff --git a/2 b/2
|
||||
new file mode 120000
|
||||
index 0000000..ea46194
|
||||
--- /dev/null
|
||||
+++ b/2
|
||||
@@ -0,0 +1 @@
|
||||
+.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
|
||||
|
||||
\"\"\"]]
|
||||
|
||||
so I have tried to say that \"anything\" (all files) should be added unlocked. But it seems that neither file (`1` added via `git add` and `2` added via `git annex add`) were added unlocked.
|
||||
|
||||
<details>
|
||||
<summary>Here is some info on version/config: (click to expand)</summary>
|
||||
|
||||
|
||||
[[!format sh \"\"\"
|
||||
(git-annex)lena:/tmp/repo[master]
|
||||
$> cat .git/config
|
||||
[core]
|
||||
repositoryformatversion = 0
|
||||
filemode = true
|
||||
bare = false
|
||||
logallrefupdates = true
|
||||
[annex]
|
||||
uuid = f220cc03-1510-4e23-acb5-b95723ecf9fc
|
||||
version = 7
|
||||
[filter \"annex\"]
|
||||
smudge = git-annex smudge -- %f
|
||||
clean = git-annex smudge --clean -- %f
|
||||
(dev3) 1 17256.....................................:Mon 13 Jan 2020 03:03:30 PM EST:.
|
||||
(git-annex)lena:/tmp/repo[master]
|
||||
$> git annex version
|
||||
git-annex version: 7.20191230+git2-g2b9172e98-1~ndall+1
|
||||
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
|
||||
dependency versions: aws-0.20 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.1.0 ghc-8.6.5 http-client-0.5.14 persistent-sqlite-2.9.3 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
|
||||
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
|
||||
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
|
||||
operating system: linux x86_64
|
||||
supported repository versions: 7
|
||||
upgrade supported from repository versions: 0 1 2 3 4 5 6
|
||||
local repository version: 7
|
||||
|
||||
\"\"\"]]
|
||||
|
||||
</details>
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="kyle"
|
||||
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
|
||||
subject="re: what am I doing wrong?"
|
||||
date="2020-01-14T03:19:19Z"
|
||||
content="""
|
||||
I believe that should be `git annex config --set annex.addunlocked anything` (i.e. an \"annex.\" in front of the name).
|
||||
"""]]
|
|
@ -1,37 +0,0 @@
|
|||
When an external special remote tells git-annex a fuller URL for a given file, git-annex-addurl does not use that information:
|
||||
|
||||
[2018-10-28 16:12:39.933464] git-annex-remote-dnanexus[1] <-- CLAIMURL dx://file-FJZjVx001pB2BQPVKY4zX8kk/
|
||||
[2018-10-28 16:12:39.933515] git-annex-remote-dnanexus[1] --> CLAIMURL-SUCCESS
|
||||
[2018-10-28 16:12:39.933568] git-annex-remote-dnanexus[1] <-- CHECKURL dx://file-FJZjVx001pB2BQPVKY4zX8kk/
|
||||
[2018-10-28 16:12:40.469292] git-annex-remote-dnanexus[1] --> CHECKURL-MULTI dx://file-FJZjVx001pB2BQPVKY4zX8kk/A4.assembly1-trinity.fasta 11086 A4.assembly1-trinity.fasta
|
||||
addurl dx://file-FJZjVx001pB2BQPVKY4zX8kk/ (from mydx) (to A4.assembly1_trinity.fasta) [2018-10-28 16:12:40.469503] read: git ["--version"]
|
||||
|
||||
It would be better if, in the above log, the URL key was based on dx://file-FJZjVx001pB2BQPVKY4zX8kk/A4.assembly1-trinity.fasta , which would preserve the .fasta extension in the key and therefore in the symlink target.
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
||||
|
||||
Also, it would be good if the external special remote could return an etag
|
||||
for the URL, which would be a value guaranteed to change if the URL's
|
||||
contents changes; and if git-annex would then compute the URL key based on
|
||||
the combination of URL and etag.
|
||||
|
||||
> This might be a good idea if sufficiently elaborated on, but I am a one
|
||||
> idea, one bug, one page kind of guy. I dislike reading over a long detailed
|
||||
> discussion of something, like the problem above and my analysis of it,
|
||||
> only to find a second, unrelated discussion of something else.
|
||||
> Suddenly the mental state is polluted with
|
||||
> different distinct things, some fixed, other still open. The bug tracking
|
||||
> system has then failed because it's not tracking state in any useful way.
|
||||
> Which is why I've closed this todo item with my fix of
|
||||
> a single item from it. --[[Joey]]
|
||||
|
||||
It'd also be good if there was a option to automatically migrate URL keys
|
||||
to the default backend whenever a file from a URL key is downloaded. Also,
|
||||
to record the checksummed key (e.g. MD5E) as metadata of the URL key (in a
|
||||
field named e.g. alternateKeys), and if addurl --fast is later done on a
|
||||
URL key for which a checksummed key is recorded in the metadata, to add the
|
||||
checksummed key instead of the URL key .
|
||||
|
||||
> Again, mixing discussion of several things in one place is a good way to
|
||||
> muddy the waters. I think this idea has several problems, but don't want
|
||||
> to discuss them here. --[[Joey]]
|
|
@ -1,45 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2018-10-29T17:51:27Z"
|
||||
content="""
|
||||
Looking at the code, it addurl clearly *does* use the urls returned by
|
||||
CHECKURL. In Command/AddUrl.hs:
|
||||
|
||||
go deffile (Right (UrlMulti l))
|
||||
| isNothing (fileOption (downloadOptions o)) =
|
||||
forM_ l $ \(u', sz, f) -> do
|
||||
let f' = adjustFile o (deffile </> fromSafeFilePath f)
|
||||
void $ commandAction $
|
||||
startRemote r o f' u' sz
|
||||
|
||||
`l` is the list of values it returns, and `u'` is individual urls from that list,
|
||||
as opposed to `u` which is the url the user provided.
|
||||
`u'` is passed to `startRemote`, and `u` is not.
|
||||
|
||||
Hmm, but in Remote/External.hs there is a special case:
|
||||
|
||||
-- Treat a single item multi response specially to
|
||||
-- simplify the external remote implementation.
|
||||
CHECKURL_MULTI ((_, sz, f):[]) ->
|
||||
result $ UrlContents sz $ Just $ mkSafeFilePath f
|
||||
CHECKURL_MULTI l -> result $ UrlMulti $ map mkmulti l
|
||||
|
||||
That does not have any kind of rationalle in [[!commit 8a17bcb0be91c345a52d78c08009285b0fcd6e3a]],
|
||||
but the next commit added `doc/special_remotes/external/git-annex-remote-torrent`
|
||||
and I think I can see why I felt it simplified things. That script always
|
||||
replies with CHECKURL-MULTI, but a torrent often contains a single file, and
|
||||
it would be perhaps bettter to use the original url provided to the user for such a
|
||||
file from a torrent, rather than an url that asks for file "#1" from the torrent.
|
||||
Although AFAICS either would work, and Remote/BitTorrent.hs contains just the kind
|
||||
of special case for a single item torrent that I was wanting to avoid external
|
||||
special remotes needing to worry about.
|
||||
|
||||
The other benefit to special casing UrlContents is that lets addurl --file specify
|
||||
where to put the file, which the fileOption check in the first code block
|
||||
above prevents for UrlMulti. But, that could just as well be handled by
|
||||
adding a single file special case to the code in AddUrl.
|
||||
|
||||
I suppose changing this won't break anything, or if it does it was relying
|
||||
on this undocumented behavior.
|
||||
"""]]
|
|
@ -1,4 +0,0 @@
|
|||
Is there already a way to addurl video from a Twitter post. Question came up while proposing git annex as a tech for archival in https://github.com/2020PB/police-brutality/issues/315#issuecomment-640163911
|
||||
|
||||
> I don't think there's anything for git-annex to do here, so
|
||||
> [[closing|done]] --[[Joey]]
|
|
@ -1,19 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2020-06-22T18:38:33Z"
|
||||
content="""
|
||||
If youtube-dl supports the web site, `git annex addurl` will automatically
|
||||
use it to download the video.
|
||||
|
||||
Looks like youtube-dl does support twitter, so it should just work.
|
||||
|
||||
If it didn't though, I'd punt it over to youtube-dl.
|
||||
|
||||
(If you also wanted to archive the twitter
|
||||
page itself, you could use `git annex addurl --raw` to archive the html.
|
||||
Although there's a good chance the html alone is not enough, and so you
|
||||
might want to use other tools to archive javascript and other assets;
|
||||
this is beyond the scope of git-annex, although of course you can `git
|
||||
annex add` whatever files you end up downloading.)
|
||||
"""]]
|
|
@ -1,28 +0,0 @@
|
|||
importtree=yes remotes are untrusted, because something is modifying that
|
||||
remote other than git-annex, and it could change a file at any time, so
|
||||
git-annex can't rely on the file being there. However, it's possible the user
|
||||
has a policy of not letting files on the remote be modified. It may even be
|
||||
that some remotes use storage that avoids such problems. So, there should be
|
||||
some way to override the default trust level for such remotes.
|
||||
|
||||
Currently:
|
||||
|
||||
joey@darkstar:/tmp/y8>git annex semitrust borg
|
||||
semitrust borg
|
||||
This remote's trust level is overridden to untrusted.
|
||||
|
||||
The borg special remote is one example of one where it's easy for the user to
|
||||
decide they're going to not delete old archives from it, and so want git-annex
|
||||
to trust it.
|
||||
|
||||
Below is some docs I wrote for the borg special remote page, should be
|
||||
moved there when this gets fixed. --[[Joey]]
|
||||
|
||||
> There is Remote.appendonly, which prevents making import remotes
|
||||
> untrusted. So if there were a way to set that for borg, it could
|
||||
> be configured at initremote/enableremote time. But,
|
||||
> Remote.Helper.ExportImport also assumes appendonly means that content can
|
||||
> be accessed by Key, rather than by ImportLocation, which does not work
|
||||
> for borg.
|
||||
|
||||
>> [[done]] via Remote.untrustworthy --[[Joey]]
|
|
@ -1,16 +0,0 @@
|
|||
Add an arm64 autobuilder (linux).
|
||||
|
||||
This is needed to run in termux on some android devices.
|
||||
And of course there are arm64 servers, although the armel build probably
|
||||
also works on them.
|
||||
|
||||
Status: Builds fine on arm64, but needs an autobuilder. Building under
|
||||
emulation could be done, or a scaleway arm64 server, which would be a
|
||||
$5/month expense. Or, perhaps someone has an arm64 that could host the
|
||||
autobuilder? --[[Joey]]
|
||||
|
||||
Currently running release builds for arm64 on my phone, but it's not
|
||||
practical to run an autobuilder there. --[[Joey]]
|
||||
|
||||
>> [[done]]; the current qemu based autobuilder is not ideal, often gets
|
||||
>> stuck, but there's no point leaving this todo open. --[[Joey]]
|
|
@ -1,20 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="arm64 possible CIs etc"
|
||||
date="2018-10-11T22:22:50Z"
|
||||
content="""
|
||||
According to the great details @mmarmm on github provided in request to support [arm64 for neurodebian](https://github.com/neurodebian/dockerfiles/issues/10#issuecomment-406644418):
|
||||
|
||||
Shippable supports free Arm64 CI/CD and I believe Codefresh does too (both 64-bit and 32-bit for both providers):
|
||||
|
||||
https://blog.shippable.com/shippable-arm-packet-deliver-native-ci-cd-for-arm-architecture
|
||||
http://docs.shippable.com/platform/tutorial/workflow/run-ci-builds-on-arm/
|
||||
|
||||
CodeFresh Arm Beta signup: https://goo.gl/forms/aDhlk56jZcblYokj1
|
||||
|
||||
If you need raw infrastructure the WorksOnArm project will supply full servers if you want to deal with metal: https://github.com/worksonarm/cluster/
|
||||
|
||||
|
||||
I personally haven't looked into any of them yet
|
||||
"""]]
|
|
@ -1,28 +0,0 @@
|
|||
Somehow one of my usb removable drives got a new annex uuid assigned to a
|
||||
repo in it than the one it had before. Since the drive is now frequently
|
||||
falling off the USB bus with lots of IO errors, I hypothesize what might
|
||||
have happened is that when git-annex read the git config, it somehow got
|
||||
a corrupted version where annex.uuid was not set. So, it autoinitialized with
|
||||
a new uuid.
|
||||
|
||||
(Arguing against this theory is that when git config then wrote to the
|
||||
file, it would normally use the same cached value so would have written the
|
||||
corrupted version. Which did not happen.)
|
||||
|
||||
I have checked, and if git config exits nonzero, git-annex does not
|
||||
continue with autoinitialization. So it seems it was not as simple as a
|
||||
read failure.
|
||||
|
||||
To avoid any kind of problem like this leading the a new uuid being
|
||||
generated, which can be pretty annoying to recover from especially if you
|
||||
don't notice it for a long time, maybe git-annex should avoid autoinit when
|
||||
there's a git-annex branch already, or if .git/annex/index already exists.
|
||||
After all, that implies the repo should have already been initialized, and
|
||||
now it isn't, so something unusual is going on.
|
||||
|
||||
A bare repo that was just cloned will have a git-annex branch
|
||||
before it gets initialized. So for bare repos, would need to not consider
|
||||
that, but looking if annex/index exists would still do. Or may be better
|
||||
not to special case it, and only look for the annex/index file? --[[Joey]]
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,19 +0,0 @@
|
|||
borg backup is pretty cool, and could be a great special remote backend.
|
||||
In particular it does delta compression and stuff.
|
||||
|
||||
There seem to be two ways it could work. Probably there are borg commands
|
||||
that allow storing a given blob in it, and retrieving a given blob. And
|
||||
that could be used for a traditional special remote.
|
||||
|
||||
But also, if a whole git-annex repository has been backed up with borg,
|
||||
then git-annex could look inside such a backup, and see if
|
||||
.git/annex/object/ contains an object. It could then mark it as
|
||||
present in the borg special remote. This way you'd use borg to take
|
||||
backups, and git-annex would then be aware of what was backed up in borg,
|
||||
and could do things like count that as a copy.
|
||||
|
||||
--[[Joey]]
|
||||
|
||||
[[!tag needsthought]]
|
||||
|
||||
> [[done]]! --[[Joey]]
|
|
@ -1,20 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="RonnyPfannschmidt"
|
||||
avatar="http://cdn.libravatar.org/avatar/c5379a3fe2188b7571858c49f9db63c6"
|
||||
subject="the remote im working on"
|
||||
date="2018-06-04T07:51:57Z"
|
||||
content="""
|
||||
Hi Joey,
|
||||
|
||||
i am currently working on a remote to use borg as a tree import source and a content souce
|
||||
|
||||
the work is started in https://github.com/RonnyPfannschmidt/git-annex-borg
|
||||
|
||||
note that borg does **not** do delta storage - it does content informed dynamic chunk sizes (which helps deduplication)
|
||||
|
||||
freestanding borg will not be a good remote for putting things out,
|
||||
so i will be pulling things out mostly (but i hope to hit a point where its viable to generate a borg archive from the tree of expected contents thats viable for putting things in)
|
||||
|
||||
-- Ronny
|
||||
|
||||
"""]]
|
|
@ -1,65 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="anarcat"
|
||||
avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
|
||||
subject="progress?"
|
||||
date="2018-11-27T06:47:26Z"
|
||||
content="""
|
||||
How's that remote going, RonnyPfannschmidt? :) I can't tell from the [homepage](https://github.com/RonnyPfannschmidt/git-annex-borg/) but from the source code, it looks like initremote is supported so far, but not much else...
|
||||
|
||||
From what I remember, borg supports storing arbitrary blobs with the `borg debug-put-obj` function, and retrieve one with `borg debug-get-obj`. Here's an example of how this could work:
|
||||
|
||||
[1145]anarcat@angela:test$ sha256sum /etc/motd
|
||||
a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 /etc/motd
|
||||
[1146]anarcat@angela:test$ borg init -e none repo
|
||||
[1147]anarcat@angela:test$ borg debug-put-obj repo /etc/motd
|
||||
object a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 put.
|
||||
[1148]anarcat@angela:test$ borg debug-get-obj repo a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 tmp
|
||||
object a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 fetched.
|
||||
[1149]anarcat@angela:test$ sha256sum tmp
|
||||
a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 tmp
|
||||
|
||||
This assumes the underlying blob ID in borg is a SHA256 hash, but that
|
||||
seems like a fair assumption to make. Naturally, this could cause
|
||||
problems with git-annex, which supports multiple hashing algorithms
|
||||
thanks to the multiple [[backends]] support. But maybe this can just
|
||||
work this out by refusing to store non-matchin backends.
|
||||
|
||||
That is, if borg actually worked that way. Unfortunately, while the
|
||||
above actually works, the resulting repository is not quite right:
|
||||
|
||||
$ borg debug dump-repo-objs .
|
||||
Dumping 000000_0000000000000000000000000000000000000000000000000000000000000000.obj
|
||||
Data integrity error: Chunk a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921: Invalid encryption envelope
|
||||
|
||||
So borg does not like the repository at all... I'm not sure why, but
|
||||
it sure looks like borg \"objects\" are not as transparent as I
|
||||
hoped and that this low-level interface will not be suitable for
|
||||
git-annex.
|
||||
|
||||
The higher level interface is \"archives\", which have (more or less) a
|
||||
CRUD interface (without the U, really) through the
|
||||
\"create/list/extract/prune\" interface. It's far from what we need:
|
||||
items are deplicated across archives so it means it is impossible to
|
||||
reliably delete a key unless we walk (and modify!) the entire archive list, which is
|
||||
slow and impractical. But it *could* definitely be used to add keys to
|
||||
a repository, using:
|
||||
|
||||
$ time borg create --stdin-name SHA256-a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 .::'{utcnow}' - < /etc/motd
|
||||
1.30user 0.10system 0:01.62elapsed 86%CPU (0avgtext+0avgdata 81464maxresident)k
|
||||
72inputs+1496outputs (0major+31135minor)pagefaults 0swaps
|
||||
|
||||
As you can see, however, that is *slow* (although arguably not slower
|
||||
than `debug-put-obj` which is surprising).
|
||||
|
||||
But even worse, that blob is now hidden behind that archive - you'd
|
||||
need to list all archives (which is also expensive) to find it.
|
||||
|
||||
So I hit a dead end so I'm curious to hear how you were planning to
|
||||
implement this, Ronny. :) Presumably there should be a way to generate
|
||||
an object compatible with `debug-put-obj`, but that interface seems
|
||||
very brittle and has all sorts of warnings all around it... And on the
|
||||
other hand, the archive interface is clunky and slow... I wish there
|
||||
was a better way, and suspect it might be worth talking with upstream
|
||||
(which I'm not anymore) to see if there's a better way to work this
|
||||
problem. -- [[anarcat]]
|
||||
"""]]
|
|
@ -1,55 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="anarcat"
|
||||
avatar="http://cdn.libravatar.org/avatar/4ad594c1e13211c1ad9edb81ce5110b7"
|
||||
subject="restic"
|
||||
date="2018-11-27T07:13:29Z"
|
||||
content="""
|
||||
and for what it's worth, borg's main rival, restic, handles this much better and faster:
|
||||
|
||||
[1331]anarcat@angela:test$ RESTIC_PASSWORD=test restic init -r repo4
|
||||
created restic repository 2c75411732 at repo4
|
||||
|
||||
Please note that knowledge of your password is required to access
|
||||
the repository. Losing your password means that your data is
|
||||
irrecoverably lost.
|
||||
[1334]anarcat@angela:test1$ RESTIC_PASSWORD=test time restic -r repo4 backup --stdin --stdin-filename SHA256-a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 < /etc/motd
|
||||
repository 2c754117 opened successfully, password is correct
|
||||
created new cache in /home/anarcat/.cache/restic
|
||||
|
||||
Files: 1 new, 0 changed, 0 unmodified
|
||||
Dirs: 0 new, 0 changed, 0 unmodified
|
||||
Added to the repo: 656 B
|
||||
|
||||
processed 1 files, 0 B in 0:00
|
||||
snapshot 87c0db00 saved
|
||||
0.55user 0.04system 0:00.80elapsed 73%CPU (0avgtext+0avgdata 48384maxresident)k
|
||||
0inputs+88outputs (0major+9665minor)pagefaults 0swaps
|
||||
[1337]anarcat@angela:test$ RESTIC_PASSWORD=test time restic -r repo4 backup --stdin --stdin-filename SHA256-a378977155fb42bb006496321cbe31f74cbda803c3f6ca590f30e76d1afad921 < /etc/motd
|
||||
repository 2c754117 opened successfully, password is correct
|
||||
|
||||
Files: 0 new, 1 changed, 0 unmodified
|
||||
Dirs: 0 new, 0 changed, 0 unmodified
|
||||
Added to the repo: 370 B
|
||||
|
||||
processed 1 files, 0 B in 0:00
|
||||
snapshot 5b3af830 saved
|
||||
0.55user 0.04system 0:00.80elapsed 73%CPU (0avgtext+0avgdata 48568maxresident)k
|
||||
0inputs+64outputs (0major+9691minor)pagefaults 0swaps
|
||||
[1348]anarcat@angela:test$ RESTIC_PASSWORD=test time restic -r repo4 backup --stdin --stdin-filename SHA256-533128ceb96cb2a6d8039453c3ecf202586c0e001dce312ecbd6a7a356b201dc < ~/folipon.jpg
|
||||
repository 2c754117 opened successfully, password is correct
|
||||
|
||||
Files: 1 new, 0 changed, 0 unmodified
|
||||
Dirs: 0 new, 0 changed, 0 unmodified
|
||||
Added to the repo: 372 B
|
||||
|
||||
processed 1 files, 0 B in 0:00
|
||||
snapshot 18879aa4 saved
|
||||
0.54user 0.03system 0:00.78elapsed 73%CPU (0avgtext+0avgdata 48504maxresident)k
|
||||
0inputs+64outputs (0major+9700minor)pagefaults 0swaps
|
||||
[1349]anarcat@angela:test$ RESTIC_PASSWORD=test time restic -r repo4 dump latest SHA256-533128ceb96cb2a6d8039453c3ecf202586c0e001dce312ecbd6a7a356b201dc | sha256sum -
|
||||
0.50user 0.02system 0:00.73elapsed 72%CPU (0avgtext+0avgdata 47848maxresident)k
|
||||
0inputs+8outputs (0major+9513minor)pagefaults 0swaps
|
||||
533128ceb96cb2a6d8039453c3ecf202586c0e001dce312ecbd6a7a356b201dc -
|
||||
|
||||
Of course it doesn't validate those checksums, and might freak out with the number of snapshots we would create, but it's a much better start than borg. ;)
|
||||
"""]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="michael@ff03af62c7fd492c75066bda2fbf02370f5431f4"
|
||||
nickname="michael"
|
||||
avatar="http://cdn.libravatar.org/avatar/125bdfa8a2b91432c072615364bc3fa1"
|
||||
subject="Borg vs. restic, some design considerations"
|
||||
date="2018-12-05T14:36:45Z"
|
||||
content="""
|
||||
As I have been looking for a new, de-duplicating, reliable backup system I read through the design documentations of [borg](https://borgbackup.readthedocs.io/en/stable/internals/data-structures.html#archives) and [restic](https://restic.readthedocs.io/en/latest/100_references.html#design). While the design of restic seems to be much simpler and actually quite straightforward, I decided for borg in the end due to its support for compression and the more efficient removal of single backups. Further, it [seems](https://blog.stickleback.dk/borg-or-restic/) the RAM usage is lower for borg.
|
||||
|
||||
Here are some comments on both concerning the usability as git annex storage backend. Note that they are all based on my understanding of the design documents that describe how the data is stored in restic and borg. It is well possible that I have misunderstood something or some parts are just impossible due to implementation details. Further, I am quite sure that what I propose is not possible with the current external APIs of git annex and borg.
|
||||
|
||||
For none of them, it seems to be a good idea to store individual archives (borg) or snapshots (restic) per file as both of them assume that the list of archives/snapshots is reasonably small, can be presented to the user as a single list and can be pruned based on certain rules about how many to keep per timespan (though that is per group of archives/snapshots). borg stores descriptions of all archives in a single item, the manifest (which means that when an archive is added, the whole list needs to be rewritten), restic stores each archive as a json document in a directory which might scale better but is probably still not a good idea. I think instead of storing individual files, git annex should store the whole set of exported files in a single archive/snapshot, i.e., store some kind of (virtual) directory structure in borg or restic that represents all items that shall be stored. Then, whenever git annex syncs with the borg/restic remote, a new archive/snapshot would be added. The user could then use the time-based pruning rules to remove old snapshots. This would also integrate well with using the same borg/restic repository for other backups, too. It might seem this would make the retrieval of a single file quite inefficient. Both borg and restic split a file into a list of chunks and store information where these chunks can be found. Therefore, it should be possible for a borg/restic special remote to just store this list of chunks for every annexed file. Then, to get a file, git annex would only need to ask for these chunks if it wants to get a single file. For restoring a lot of files, in particular with a non-local restic repository, this might be very inefficient though as restic might need to download a lot of data just to get these chunks - there just getting the whole last archive/snapshot might be more efficient (as far as I understood, then restic downloads each pack of chunks only once and directly writes all of them to the files that want them). Restic stores separate objects for every directory and this directory contains a list of subdirectories and files, where files contain a list of chunks. To add or remove files from a snapshot in restic, git annex would just need to execute the chunker for files not already present in the previous snapshot and could use the already stored chunk ids for the already present files. However, each snapshot would create a completely new directory. Without subdirectories, this would basically mean that the list of all files needs to be re-written for every snapshot. Subdirectories would help with that, but only if few subdirectories are modified. Due to the nature of hashing, this seems unlikely in the case of a git annex special remote (but of course this makes backups of unchanged directories very efficient). Borg doesn't have this directory structure but instead just stores the metadata of every file in one large stream. This stream is chunked in parts consisting of around 128KiB and therefore, only parts where changes occurred need to be stored again. The list of these metadata chunks needs to be stored, nevertheless, but is much smaller. Again, everything that is needed for storing a file could be generated without having the actual source file if the chunk ids are present. In fact, this is what borg does with a file cache that stores for every file of the previous backup both properties like size, timestamp and inode id to identify modifications and a list of chunks. If borg finds the same file again, it just uses the stored chunk list. If the git annex borg special remote could also keep the order of all previously present files the same, this would result in re-using basically all metadata chunks - however, I don't know if borg assumes any order on the files. Note that borg needs to know which chunks are referenced in an archive as borg stores reference counts for all chunks to determine if a chunk is still needed, so just re-using the metadata chunks without reading their content is definitely not possible. Restic has no such reference counts, it needs to iterate over all trees to determine if a chunk can be deleted (which [seems](https://blog.stickleback.dk/borg-or-restic/) to be terribly slow). Nevertheless, both implementations of cleaning up chunks require that chunks are referenced in some file that is contained in some archive/snapshot.
|
||||
"""]]
|
|
@ -1,38 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2019-08-01T16:02:06Z"
|
||||
content="""
|
||||
Half a second to store a single annex object with restic is pretty slow,
|
||||
and that's before the snapshots directory gets bloated with a hundred
|
||||
thousand files.
|
||||
|
||||
I wonder if my original idea up top was not a better approach: Let these
|
||||
backup tools back up a whole annex repo (or at least .git/annex/objects),
|
||||
and then make git-annex interoperate with the backups by peering inside
|
||||
them and learning what has been backed up.
|
||||
|
||||
In the meantime, git-annex has gotten tree import facilities,
|
||||
which is a similar concept, of listing content in a data store
|
||||
and so learning what's stored in there, and then being able to
|
||||
retrieve objects out of that data store on demand.
|
||||
|
||||
Importing annex objects from a backup is not quite the same as a tree
|
||||
import, because it wouldn't result in any kind of file tree that
|
||||
you'd want to merge back into your git repo. Also tree importing has
|
||||
to download files in order to hash them, while in this case the
|
||||
object's annex key can be seen in the backup.
|
||||
|
||||
But from a user perspective it could be quite similar, something like:
|
||||
|
||||
git annex initremote restic type=restic repolocation=...
|
||||
git annex import --from restic
|
||||
git annex get
|
||||
|
||||
That would use `restic list snapshots` and then `restic ls` each
|
||||
snapshot and find filenames that look like annex keys
|
||||
(perhaps looking for part of the annex directory structure to avoid
|
||||
false positives). Keys it found would be marked as present in
|
||||
the remote, and the snapshot(s) that contain them recorded in
|
||||
the git-annex branch for use by git-annex get.
|
||||
"""]]
|
|
@ -1,16 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2019-08-01T16:34:22Z"
|
||||
content="""
|
||||
I made a restic repo with 2000 single-file snapshots.
|
||||
Adding the first snapshot took 0.55s. Adding the 2000th
|
||||
snapshot took 1.10s.
|
||||
|
||||
So that's a very big scalability problem with using restic with single-file
|
||||
snapshots.
|
||||
|
||||
2000 files in a directory is not going to cause that kind of slowdown;
|
||||
my guess is restic needs to load all past snapshots, or something like
|
||||
that.
|
||||
"""]]
|
|
@ -1,22 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 7"""
|
||||
date="2020-12-04T01:17:01Z"
|
||||
content="""
|
||||
The Remote interface recently got importKey, which gets us
|
||||
unexpectedly a *lot* closer to making `git-annex import --from borg` a reality!
|
||||
|
||||
The Remote would need a listImportableContents that finds all annex objects
|
||||
in all (new) snapshots, and generates a ContentIdentifier that is just the
|
||||
snapshot plus object path. Then importKey can simply generate a Key from
|
||||
that ContentIdentifier without doing any more work. (And, so getting an
|
||||
object from the remote will also work, because it will have the
|
||||
ContentIdentifier recorded and so will know what snapshot and path in the
|
||||
borg repo.)
|
||||
|
||||
Seems that all that would be needed is a way to skip generating the git tree
|
||||
for the imported files, since it would be useless.
|
||||
And a way to force --no-content, since importing from a borg backup should not
|
||||
get all the backed up annex objects. It may be best to make this a new
|
||||
command, that just happens to use the ImportActions interface.
|
||||
"""]]
|
|
@ -1,5 +0,0 @@
|
|||
Sometimes a borg backup contains several git-annex repos. Then pointing
|
||||
git-annex at the whole thing will find objects not belonging to the current
|
||||
repo. To avoid this, add subdir= config.
|
||||
|
||||
[[done]] --[[Joey]]
|
|
@ -1,10 +0,0 @@
|
|||
The tree generated by git-annex sync with a borg remote
|
||||
does not seem to get grafted into the git-annex branch, so
|
||||
would be subject to being lost to GC.
|
||||
|
||||
Is this a general problem affecting importtree too?
|
||||
|
||||
> Yes, it was. It would have only caused a problem if the user
|
||||
> kept doing imports from a remote, but never exporting to it.
|
||||
> Then, in a clone of the repo that was importing, they would not be able
|
||||
> to get files. [[fixed|done]] --[[Joey]]
|
|
@ -1,46 +0,0 @@
|
|||
I am not sure this is the case, but from first-hand experience, it
|
||||
sure looks like you can't turn on v7 (or really v6, actually) on a
|
||||
single git worktree. For example, if I have my `pictures` repository
|
||||
on `curie` and turn on v7, `angela` will *also* need to run `git annex
|
||||
upgrade` on their worktree otherwise git-annex
|
||||
(e.g. 6.20180913-1~bpo9+1 on Debian stretch) will be really confused:
|
||||
|
||||
anarcat@angela:calendes$ less calendrier/calendes.pdf
|
||||
/annex/objects/SHA256E-s117451415--8d7d8366094a63c54bef99b5cd2e2b5187092f834d8bf7002e1d5fdceb38a710.pdf
|
||||
anarcat@angela:calendes$ git annex get calendrier/calendes.pdf
|
||||
anarcat@angela:calendes$ git annex whereis calendrier/calendes.pdf
|
||||
anarcat@angela:calendes$ # OMG WHERE ARE MY FILES! /me flails wildly
|
||||
|
||||
:)
|
||||
|
||||
It seems to me there should be a warning in the [[upgrades]] page
|
||||
about this. I would have done so myself, but I'm not sure (like in my
|
||||
last bug report) if I am doing things right.
|
||||
|
||||
In this case, this repository was already present (v5, indirect mode)
|
||||
on both machines. I upgraded (using `git annex upgrade`) the
|
||||
repository on curie (7.20181121 Debian buster) which went well.
|
||||
|
||||
(Then I messed around with that thumb drive, which led to
|
||||
[[bugs/v7_fails_to_fetch_files_on_FAT_filesystem]], but probably
|
||||
unrelated here.)
|
||||
|
||||
Then i powered on my laptop (`angela`) and saw the above. I would have
|
||||
expected it to either upgrade automatically or warn me about the
|
||||
repository inconsistency. Of failing that, the upgrades page should at
|
||||
least warn us this is a "system-wide" (how do we call that?) change...
|
||||
|
||||
The workaround is to run `git annex upgrade` on that other repo, of
|
||||
course, but if the source repo was also upgraded, it might be
|
||||
difficult to sync files, as you will see that warning:
|
||||
|
||||
$ git annex get
|
||||
get calendrier/calendes.pdf (from sneakernet...)
|
||||
Repository version 7 is not supported. Upgrade git-annex.
|
||||
|
||||
Considering there's no backport of 7.x in Debian stretch, it makes the
|
||||
upgrade path rather delicate... Is there a way to "downgrade" that
|
||||
sneakernet repo? :) (Thankfully, the main server still runs v5 so the
|
||||
files are still accessible from stretch....) -- [[anarcat]]
|
||||
|
||||
Updated the [[upgrades]] page, [[done]].
|
|
@ -1,18 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 10"""
|
||||
date="2020-10-05T18:40:32Z"
|
||||
content="""
|
||||
Unless it entered an adjusted unlocked branch, this upgrade cannot have
|
||||
changed locked files to unlocked files itself. So if you were not using
|
||||
unlocked files in this repo before, and didn't make any changes after the
|
||||
upgrade that would add any, you don't need to worry about them.
|
||||
|
||||
The only risk if it was downgraded to v5 with an unlocked files
|
||||
is that a command like `git commit -a` would commit the
|
||||
large content to git. Easy enough to notice that with `git status` after
|
||||
the downgrade too.
|
||||
|
||||
(But do checkout master if the currently checked out branch is
|
||||
"adjusted/master(unlocked)")
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 11"
|
||||
date="2020-10-05T23:17:07Z"
|
||||
content="""
|
||||
> .. after the upgrade that would add any, you don't need to worry about them.
|
||||
|
||||
With the datalad pixie dust on top of git-annex, I am never 100% sure ;) I would better worry and do some basic check before proceeding... -- will do later today/tomorrow, God bless BTRFS and its snapshots - I can get a \"sandbox\" clone of the entire filesystem to play with safely.
|
||||
"""]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2018-12-04T20:53:49Z"
|
||||
content="""
|
||||
You only need to upgrade to v7 when the repository has unlocked files
|
||||
committed to it. If a file contains a pointer to an annex object, it won't
|
||||
work with v5. There is not a good way for git-annex to detect when that is
|
||||
the case; such a file could be committed any time. Committing unlocked
|
||||
files and upgrading has to be coordinated amoung the users of the repository.
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2020-10-01T14:06:32Z"
|
||||
content="""
|
||||
Is there a sensible way (could be a helper script) to safely (checks for git links to be used etc) to downgrade version from 8 to 5?
|
||||
|
||||
Rationale: On the original host (smaug) of the monstrous http://datasets.datalad.org (on falkor) I have managed to invoke our cron update script while using a newer annex and I had no `annex.autoupgraderepository` set, so annex upgraded a number of clones (originally version 5) locally to version 8. As I am still by default use older annex, I would like to downgrade those clones on smaug back to 5.
|
||||
"""]]
|
|
@ -1,18 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2020-10-01T16:54:21Z"
|
||||
content="""
|
||||
If the repository did not get switched from direct mode to adjusted
|
||||
unlocked branch, and does not use any unlocked files, you can:
|
||||
|
||||
* remove the filter.annex.smudge and filter.annex.clean from .git/config
|
||||
* remove .git/info/attributes (or at least the filter=annex line)
|
||||
* remove .git/hooks/post-checkout and .git/hooks/post-merge
|
||||
* remove sqlite databases (all of .git/annex/keysdb* .git/annex/fsck/ .git/annex/export/ .git/annex/cidsdb*)
|
||||
* change annex.version
|
||||
|
||||
To get back from adjusted unlocked branch to direct mode, you'd first want
|
||||
to check out the master branch, and then do all of the above, then `git
|
||||
annex direct` to get back into direct mode.
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 4"
|
||||
date="2020-10-01T17:26:09Z"
|
||||
content="""
|
||||
THANK YOU! What is the most efficient way to identify if there are unlocked files in the tree (or full repository)? I know that annex scans for unlocked files after a clone, so I guess you might have considered different options and already chose the most efficient ;)
|
||||
"""]]
|
|
@ -1,24 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="kyle"
|
||||
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
|
||||
subject="comment 5"
|
||||
date="2020-10-01T19:07:45Z"
|
||||
content="""
|
||||
> What is the most efficient way to identify if there are unlocked
|
||||
> files in the tree (or full repository)?
|
||||
|
||||
I can't say anything about efficiency, but FWIW with git-annex
|
||||
7.20191009 or later there's an `--unlocked` matching item, so you can
|
||||
say `git annex find --unlocked`. Since you're working in the context
|
||||
of repos that have already been upgraded, I think you could use that
|
||||
to find unlocked files in the working tree.
|
||||
|
||||
As for outside of the working tree, `find` takes a `--branch`
|
||||
argument, but, as far as I can tell, that doesn't match anything when
|
||||
combined with `--unlocked` (tried with 8.20200908). However, I'm not
|
||||
sure you'd need to consider anything other than the working tree. If
|
||||
all of these repos were v5 before, then an unlocked file could have
|
||||
only been in an uncommitted state, so I don't see how it'd end on
|
||||
another ref without committing/switching branches afterwards.
|
||||
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 6"
|
||||
date="2020-10-05T15:26:08Z"
|
||||
content="""
|
||||
THANK YOU Kyle! `find --unlocked` works!
|
||||
|
||||
But the tricky part is that I wanted to use some \"single\" instance of git-annex which would support `find --unlocked` and also v5 so I could fsck and do some other tests after I do the evil downgrade. But older versions, such as 7.20191114, which support v5 do not support v8, so cannot do `find --unlocked` on v8. So I need to either
|
||||
|
||||
- find another later version which would support both v5 and v8
|
||||
- make script use multiple versions of git-annex from different locations (one for initial `find --unlocked` and then another one for subsequent checks etc)
|
||||
- find a way for `find --unlocked` without invoking `git-annex`.
|
||||
"""]]
|
|
@ -1,37 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="kyle"
|
||||
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
|
||||
subject="comment 7"
|
||||
date="2020-10-05T17:42:23Z"
|
||||
content="""
|
||||
> - find a way for `find --unlocked` without invoking `git-annex`.
|
||||
|
||||
Assuming you're interested in finding just the v6+ pointer files,
|
||||
instead of also finding the uncommitted type changes for v5 unlocked
|
||||
files, perhaps you could use something like this
|
||||
|
||||
[[!format python \"\"\"
|
||||
import subprocess as sp
|
||||
|
||||
p_ls = sp.Popen([\"git\", \"ls-files\", \"--stage\"], stdout=sp.PIPE)
|
||||
p_cat = sp.Popen([\"git\", \"cat-file\", \"--batch\"], stdin=sp.PIPE, stdout=sp.PIPE)
|
||||
with p_ls:
|
||||
with p_cat:
|
||||
for line in p_ls.stdout:
|
||||
info, fname = line.strip().split(b\"\t\")
|
||||
mode, objid = info.split(b\" \")[:2]
|
||||
if mode != b\"100644\":
|
||||
continue
|
||||
p_cat.stdin.write(objid + b\"\n\")
|
||||
p_cat.stdin.flush()
|
||||
out = p_cat.stdout.readline()
|
||||
_, objtype, size = out.split()
|
||||
size = int(size)
|
||||
if size > 0:
|
||||
content = p_cat.stdout.read(size)
|
||||
if content.startswith(b\"/annex/objects/\"):
|
||||
print(fname.decode())
|
||||
p_cat.stdout.readline()
|
||||
\"\"\"]]
|
||||
|
||||
"""]]
|
|
@ -1,20 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 8"
|
||||
date="2020-10-05T18:02:48Z"
|
||||
content="""
|
||||
Thank you Kyle! I came up with
|
||||
|
||||
```shell
|
||||
unlocked=( `git grep -l -a --no-textconv --cached '^/annex/objects/' || :` )
|
||||
if [ \"${#unlocked[*]}\" -ge 1 ]; then
|
||||
error \"Found ${#unlocked[*]} unlocked files. Cannot do: ${unlocked[*]}\" 2
|
||||
fi
|
||||
```
|
||||
|
||||
do you think it would miss something?
|
||||
|
||||
Here is my complete script ATM (didn't try in \"production\" yet, switched to other tasks for now but it is ready, also does some testing of operation at the end, so must not be applied as is to existing repos without commenting that out): http://www.onerussian.com/tmp/downgrade-annex
|
||||
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="kyle"
|
||||
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
|
||||
subject="comment 9"
|
||||
date="2020-10-05T18:09:03Z"
|
||||
content="""
|
||||
Ah, I didn't think of using `git grep` for this. I think that's much
|
||||
better than my suggestion.
|
||||
"""]]
|
|
@ -1,38 +0,0 @@
|
|||
I recently discovered (thanks to Paul Wise) the [Meow hash][]. The
|
||||
TL;DR: is that it's a fast non-crypto hash which might be useful for
|
||||
git-annex. Here's their intro, quoted from the website:
|
||||
|
||||
[Meow hash]: https://mollyrocket.com/meowhash
|
||||
|
||||
> The Meow hash is a high-speed hash function named after the character
|
||||
> Meow in [Meow the Infinite][]. We developed the hash function at
|
||||
> [Molly Rocket][] for use in the asset pipeline of [1935][].
|
||||
>
|
||||
> Because we have to process hundreds of gigabytes of art assets to build
|
||||
> game packages, we wanted a fast, non-cryptographic hash for use in
|
||||
> change detection and deduplication. We had been using a cryptographic
|
||||
> hash ([SHA-1][]), but it was
|
||||
> unnecessarily slowing things down.
|
||||
>
|
||||
> To our surprise, we found a lack of published, well-optimized,
|
||||
> large-data hash functions. Most hash work seems to focus on small input
|
||||
> sizes (for things like dictionary lookup) or on cryptographic quality.
|
||||
> We wanted the fastest possible hash that would be collision-free in
|
||||
> practice (like SHA-1 was), and we didn't need any cryptograhic security.
|
||||
>
|
||||
> We ended up creating Meow to fill this niche.
|
||||
|
||||
[1935]: https://molly1935.com/
|
||||
[Molly Rocket]: https://mollyrocket.com/
|
||||
[Meow the Infinite]: https://meowtheinfinite.com/
|
||||
[SHA-1]: https://en.m.wikipedia.org/wiki/SHA-1
|
||||
|
||||
I don't an immediate use case for this right now, but I think it could
|
||||
be useful to speed up checks on larger files. The license is a
|
||||
*little* weird but seems close enough to a BSD to be acceptable.
|
||||
|
||||
I know it might sound like a conflict of interest, but I *swear* I am
|
||||
not bringing this up only as a oblique feline reference. ;) -- [[anarcat]]
|
||||
|
||||
> Let's concentrate on [[xxhash|todo/add_xxHash_backend]] or other new hashes that are getting general
|
||||
> adoption, not niche hashes like meow. [[done]] --[[Joey]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2020-01-06T19:36:32Z"
|
||||
content="""
|
||||
xxhash seems to fill a similar niche and is getting a lot more use from
|
||||
what I can see.
|
||||
|
||||
Meow seems to claim a faster gb/s rate than xxhash does, but
|
||||
it's hard to tell if the benchmarks are really equivilant.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
Sometimes I start off a large file transfer to a new remote (a la "git-annex copy . --to glacier").
|
||||
|
||||
I believe all of the special remotes transfer the files one at a time, which is good, and provides a sensible place to interrupt a copy/move operation.
|
||||
|
||||
Wish: When I press ctrl+c in the terminal, git-annex will catch that and finish it's current transfer and then exit cleanly (ie: no odd backtraces in the special remote code). For the case where the file currently being transfered also needs to be killed (ie: it's a big .iso) then subsequent ctrl+c's can do that.
|
||||
|
||||
> I'm going to close this, because 6 years later, I just don't think it's a
|
||||
> good idea. I think that blocking ctrl-c from interrupting the program
|
||||
> violates least surprise. [[done]] --[[Joey]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.172"
|
||||
subject="comment 1"
|
||||
date="2014-02-21T21:36:14Z"
|
||||
content="""
|
||||
This really depends on the remote, some can resume where they were interrupted, such as rsync, and some cannot, such as glacier (and, er, encrypted rsync).
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://grossmeier.net/"
|
||||
nickname="greg"
|
||||
subject="very remote specific"
|
||||
date="2014-02-21T22:11:16Z"
|
||||
content="""
|
||||
Yeah, this is very remote specific and probably means adding the functionality there as well (eg: in the glacier.py code, not only in git-annex haskell). Maybe I should file bugs there accordingly :)
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.172"
|
||||
subject="comment 3"
|
||||
date="2014-02-21T22:34:14Z"
|
||||
content="""
|
||||
Hmm, I forget if it's possible for git-annex to mask SIGINT when it runs glacier or rsync, so that the child process does not receive it, but the parent git-annex does.
|
||||
"""]]
|
|
@ -1,28 +0,0 @@
|
|||
When annex.stalldetection is set, and git-annex transferrer is used,
|
||||
a ctrl-c does not propagate to the transferrer process.
|
||||
|
||||
The result is that, the next time the process sends a message to its output
|
||||
handle (eg a progress update), it gets a SIGINT, and so an ugly message is
|
||||
output to the console, after the user was returned to the prompt.
|
||||
|
||||
The SIGINT is not propagated because a child process group is used for
|
||||
git-annex transferrer, in order to let child processes of it be killed
|
||||
along with it when a stall is detected.
|
||||
|
||||
Maybe what's needed is a SIGINT handler in the main git-annex that
|
||||
signals all the transferrer processes with SIGINT and waits on them
|
||||
exiting. And other signals, eg SIGTSTP for ctrl-z.
|
||||
|
||||
> Implemented this, but not for windows (yet). But not gonna leave open
|
||||
> for something that on windows in my experience does not work very
|
||||
> reliably in general. (I've many times hit ctrl-c in a windows terminal and
|
||||
> had the whole terminal lock up.) So, [[done]] --[[Joey]]
|
||||
|
||||
Or, note that it would suffice to remove the child process group stuff,
|
||||
if we assume that all child processes started by git-annex transferrer are
|
||||
talking to a pipe, and will output something, eg a progress update,
|
||||
and so receive a SIGPIPE once the transferrer process has caught the
|
||||
SIGINT and exited.
|
||||
[[todo/stalldetection_does_not_work_for_rsync_and_gcrypt]] would be a
|
||||
prereq for this approach. But, might there be long-running child processes
|
||||
that are not on a pipe, and that need to be shutdown on a stall, too?
|
|
@ -1,15 +0,0 @@
|
|||
As part of the work in [[precache_logs_for_speed_with_cat-file_--buffer]],
|
||||
key lookups are now done twice as fast as before.
|
||||
|
||||
But, limits that look up keys still do a key lookup, before the key
|
||||
is looked up efficiently. Avoiding that would speed up --in etc, probably
|
||||
another 1.5x-2x speedup when such limits are used. What that optimisation
|
||||
needs is a way to tell if the current limit needs the key or not. If it
|
||||
does, then match on it after getting the key (and precaching the location
|
||||
log for limits that need that), otherwise before getting the key.
|
||||
|
||||
> So this needs a way to introspect a limit to see if the terms used in it
|
||||
> match some criteria. Another todo that also needs that is
|
||||
> [[sync_fast_import]] --[[Joey]]
|
||||
|
||||
[[done]] --[[Joey]]
|
|
@ -1,19 +0,0 @@
|
|||
Many special remotes can potentially end up exposed in public http. There
|
||||
is not currently a way to access them over http, without adding per-remote
|
||||
support (like S3 has).
|
||||
|
||||
But generally the filenames used are the same, eg rsync and directory and
|
||||
webdav and S3. Or if there are differences, they are generally small and
|
||||
trying a couple of different urls is doable.
|
||||
|
||||
And sameas allows for
|
||||
<https://git-annex.branchable.com/tips/multiple_remotes_accessing_the_same_data_store/>
|
||||
now.
|
||||
|
||||
So, there could be a new special remote type, that allows generic readonly
|
||||
access of other special remotes whose data stores are exposed via http.
|
||||
|
||||
Call it "http" maybe. (There may be some confusion between this and the web
|
||||
special remote by users looking for such a thing.) --[[Joey]]
|
||||
|
||||
> httpalso special remote implemented, [[done]] --[[Joey]]
|
|
@ -1,3 +0,0 @@
|
|||
`git diff` for annexed files, especially unlocked annexed files, is currently uninformative. It would help if [[`git-annex-init`|git-annex-init]] configured a [git diff driver](https://git-scm.com/docs/gitattributes#_generating_diff_text) to diff the contents of the annexed files, rather than the pointer files.
|
||||
|
||||
> [[wontfix|done]], see comment
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2020-01-06T18:39:27Z"
|
||||
content="""
|
||||
Normally annexed files are huge binary files. Line-by-line diff of such
|
||||
files is unlikely to be useful.
|
||||
|
||||
So you would need some domain-specific diff for the kind of binary files
|
||||
you are storing in git-annex. If you have one, you can use
|
||||
[[git-annex-diffdriver]] to make git use it when diffing annexed files.
|
||||
|
||||
Not seeing anything more I can do here, so I'm going to close this todo.
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
I noticed that with the default SHA256E backend, `git annex reinject --known FILE` will fail if FILE has a different extension than it has in the annex. Presumably this is because `git annex calckey FILE` does not generate the same key, even though the file has the same checksum.
|
||||
|
||||
I think it would be better if `git annex reinject --known` would ignore the file extension when deciding whether a file is known. A case where that would be much better is caused by the fact that git-annex has changed how it determines a file's extension over time. E.g. if foo.bar.baz was added to the annex a long time ago, it might have a key like `SHA256E-s12--37833383383.baz`. Modern git-annex would calculate a key like `SHA256E-s12--37833383383.bar.baz` and so the reinject of the file using modern git-annex would fail.
|
||||
|
||||
This problem does not affect `git annex reinject` without `--known`.
|
||||
|
||||
--spwhitton
|
||||
|
||||
> mentioned this on the git-annex reinject man page; [[done]] --[[Joey]]
|
|
@ -1,20 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2020-01-06T17:11:58Z"
|
||||
content="""
|
||||
I can't think of a reasonable way to implement this.
|
||||
|
||||
It would need to hash and then look for a known SHA256E key that uses the
|
||||
hash. But the layout of the git-annex branch doesn't provide any way to do
|
||||
that, except for iterating over every filename in the branch. Which
|
||||
would be prohibitively slow when reinjecting many files. (N times git
|
||||
ls-tree -r) So it would need to build a data structure to map from SHA256
|
||||
to known SHA256E key. That can't be stored in memory, git-annex doesn't
|
||||
let the content of the repo cause it to use arbitrary amounts of memory
|
||||
(hopefully).
|
||||
|
||||
All I can think of is to traverse the git-annex branch and build a sqlite
|
||||
database and then query that, but that would add quite a lot of setup
|
||||
overhead to the command.
|
||||
"""]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="spwhitton"
|
||||
avatar="http://cdn.libravatar.org/avatar/9c3f08f80e67733fd506c353239569eb"
|
||||
subject="comment 2"
|
||||
date="2020-01-07T12:29:47Z"
|
||||
content="""
|
||||
Thank you for your reply. Makes sense. If that's the only way to do it then it might as well be a helper script rather than part of git-annex.
|
||||
|
||||
Leaving this bug open because it would be good to have the limitation documented in git-annex-reinject(1).
|
||||
"""]]
|
|
@ -1,30 +0,0 @@
|
|||
`git annex reinject --known` doesn't work in a bare repo.
|
||||
|
||||
spwhitton@iris:~/tmp>echo foo >bar
|
||||
spwhitton@iris:~/tmp>mkdir baz
|
||||
spwhitton@iris:~/tmp>cd baz
|
||||
spwhitton@iris:~/tmp/baz>git init --bare
|
||||
Initialized empty Git repository in /home/spwhitton/tmp/baz/
|
||||
spwhitton@iris:~/tmp/baz>git annex init
|
||||
init (scanning for unlocked files...)
|
||||
ok
|
||||
(recording state in git...)
|
||||
spwhitton@iris:~/tmp/baz>git annex reinject --known ../bar
|
||||
fatal: relative path syntax can't be used outside working tree.
|
||||
fatal: relative path syntax can't be used outside working tree.
|
||||
fatal: relative path syntax can't be used outside working tree.
|
||||
fatal: relative path syntax can't be used outside working tree.
|
||||
fatal: relative path syntax can't be used outside working tree.
|
||||
fatal: relative path syntax can't be used outside working tree.
|
||||
fatal: relative path syntax can't be used outside working tree.
|
||||
fatal: relative path syntax can't be used outside working tree.
|
||||
fatal: relative path syntax can't be used outside working tree.
|
||||
fatal: relative path syntax can't be used outside working tree.
|
||||
fatal: relative path syntax can't be used outside working tree.
|
||||
git-annex: fd:15: hGetLine: end of file
|
||||
|
||||
Obviously this wasn't actually a file known to git-annex. But I get the same error in a non-dummy bare repo I am trying to reinject.
|
||||
|
||||
A workaround is to use `git worktree add` and run `git annex reinject` from there.
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,7 +0,0 @@
|
|||
`git annex find --batch` will not accept absolute paths to files in the repo, but `git annex find /abs/path` works.
|
||||
|
||||
I tested `git annex lookupkey --batch` which does not have this problem.
|
||||
|
||||
--spwhitton
|
||||
|
||||
> [[fixed|done]] --[[Joey]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2020-03-16T18:06:47Z"
|
||||
content="""
|
||||
Hmm, I am not reproducing this problem here.
|
||||
|
||||
Were you passing other options besides --batch, to eg match some files?
|
||||
|
||||
And what version?
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="spwhitton"
|
||||
avatar="http://cdn.libravatar.org/avatar/9c3f08f80e67733fd506c353239569eb"
|
||||
subject="comment 2"
|
||||
date="2020-03-25T16:31:13Z"
|
||||
content="""
|
||||
Hello Joey,
|
||||
|
||||
I was passing `--unlocked` only.
|
||||
|
||||
Version 8.20200226 installed from buster-backports.
|
||||
|
||||
Thanks!
|
||||
"""]]
|
|
@ -1,9 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2020-04-15T19:05:59Z"
|
||||
content="""
|
||||
Reproduced it, the problem only happens when the files are unlocked,
|
||||
not the locked files I was trying. The --unlocked option is not the
|
||||
problem.
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2020-04-15T19:13:39Z"
|
||||
content="""
|
||||
Other commands like whereis --batch also behave the same.
|
||||
|
||||
Looks like what's going on is, when an absolute path is passed
|
||||
as a parameter, it feeds thru git ls-files, producing a relative file.
|
||||
But with --batch, it stays absolute. This causes things that try to eg,
|
||||
look up the file in the tree to not find it.
|
||||
|
||||
So, --batch needs to make filepaths relative too..
|
||||
"""]]
|
|
@ -1,23 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2020-04-15T19:22:12Z"
|
||||
content="""
|
||||
Most of it can be fixed by making batchStart make
|
||||
files relative.
|
||||
|
||||
Other affected commands that do custom parsing of
|
||||
batch input, so will need to make the file from it
|
||||
relative themselves: fromkey metadata rekey rmurl
|
||||
|
||||
Also, `git annex info /path/to/file` fails for unlocked
|
||||
files and works for locked files, because it does not pass
|
||||
filenames through git ls-files. I think it's the only
|
||||
command that does not, when not in batch mode.
|
||||
|
||||
(I suppose alternatively, lookupKey could make the filename relative,
|
||||
but I don't know if that is the only thing that fails on absolute
|
||||
filenames, so prefer to make them all relative on input.)
|
||||
|
||||
Ok, all done..
|
||||
"""]]
|
|
@ -1,12 +0,0 @@
|
|||
When `git annex -c foo.bar` runs git-annex transferrer,
|
||||
it does not pass along the settings from -c.
|
||||
|
||||
(Note that, `git -c foo.bar annex` does propagate the -c. Git does it by
|
||||
setting an environment variable, which causes git config to reflect the
|
||||
override. The environment variable propagates to child processes.)
|
||||
|
||||
There are a lot of config settings that impact transfers,
|
||||
and some of them might be commonly used at the command line, so something
|
||||
needs to be done about this. --[[Joey]]
|
||||
|
||||
> [[done]]
|
|
@ -1,16 +0,0 @@
|
|||
Make `git-annex add --force-large` and `git-annex add --force-small`
|
||||
add a specific file to annex or git, bypassing annex.largefiles
|
||||
and all other configuration and state.
|
||||
|
||||
One reason to want this is that it avoids users doing stuff like this:
|
||||
|
||||
git -c annex.largefiles=anything annex add foo.c
|
||||
|
||||
Such a temporary setting of annex.largefiles can be problimatic, as explored in
|
||||
<https://git-annex.branchable.com/bugs/A_case_where_file_tracked_by_git_unexpectedly_becomes_annex_pointer_file/>
|
||||
|
||||
Also, this could also be used to easily switch a file from one storage to
|
||||
the other. I suppose the file would have to be touched first to make git-annex
|
||||
add process it?
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,22 +0,0 @@
|
|||
I wanted to share some thoughts for an idea I had.
|
||||
|
||||
There are times when I want to stream data from a remote -- I want to start processing it immediately, and do not want to keep it in my annex when I am done with it.
|
||||
|
||||
I can give some examples:
|
||||
|
||||
* I have several projects which have a large number of similar text files, and they compress really well with borg or bup. For example, I have a repo with many [ncdu](https://dev.yorhel.nl/ncdu) json index files. They total 60G, but in a bup special remote, they are ~3G. In another repo, I have large highly differential tsv files.
|
||||
* I have an annex with 5-10G video files that are stored in a variety of network special remotes. Most of them are in my Google Drive. I would like to be able to immediately start playing them with VLC rather than downloading and verifying them in their entirety.
|
||||
|
||||
It would look like this:
|
||||
|
||||
```
|
||||
git annex cat "someindex.ncdu" | ncdu -f -
|
||||
|
||||
diff <(git annex cat "huge-data-dump1.tsv" -f mybupremote ) <(git annex cat "huge-data-dump2.tsv" -f mybupremote )
|
||||
|
||||
git annex cat "myvideo.mp4" -f googledrive | vlc -
|
||||
```
|
||||
|
||||
I imagine that there might be issues with verification. But I really am ok with not verifying a video file I am streaming.
|
||||
|
||||
> [[dup|done]] --[[Joey]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="git-annex-cat"
|
||||
date="2020-07-09T00:21:02Z"
|
||||
content="""
|
||||
Related: [[todo/git-annex-cat]]
|
||||
"""]]
|
|
@ -1,35 +0,0 @@
|
|||
git-annex import --no-content means annex.largefiles is not checked, so
|
||||
non-large files get added as annexed files. That's done because
|
||||
annex.largefiles can contain expressions that need to examine the content
|
||||
of the file. In particular for mimetype and mimeencoding.
|
||||
|
||||
So, if someone uses import --no-content in one repo, and in another clone
|
||||
it's used with --content, importing the same files both times, a merge
|
||||
conflict can result.
|
||||
|
||||
May be worth removing support for matching annex.largefiles when the
|
||||
expression needs the file content, when importing from a special remote.
|
||||
|
||||
Or could detect when those are used, and only allow
|
||||
importing with --content in that case.
|
||||
|
||||
> So this needs a way to introspect a preferred content expression
|
||||
> to see if the terms used in it
|
||||
> match some criteria. Another todo that also needs that is
|
||||
> [[faster_key_lookup_for_limits]] --[[Joey]]
|
||||
|
||||
> > That introspection is implemented now.
|
||||
|
||||
Which is better? The repo may have annex.largefiles set in gitattributes
|
||||
for good workflow reasons, so it would be very annoying to have importing
|
||||
error out. And if importing ignores the configuration, the user is likely
|
||||
to see that as a bug. If importing with --no-content looks at the config
|
||||
and say "sorry, I can't, need the file content", the user can then choose
|
||||
between changing largefiles or using --content, and it's clear how they're
|
||||
asking for contradictory things.
|
||||
|
||||
Hmm, if largefiles does not match, it would have to download the file
|
||||
content to add it to git, even though --no-content is used. A little weird,
|
||||
but it's a small file, presumably.
|
||||
|
||||
[[done]] --[[Joey]]
|
|
@ -1,211 +0,0 @@
|
|||
This todo is about `git-annex import branch --from remote`, which is
|
||||
implemented now.
|
||||
|
||||
> [[done]] --[[Joey]]
|
||||
|
||||
## race conditions
|
||||
|
||||
(Some thoughts about races that the design should cover now, but kept here
|
||||
for reference.)
|
||||
|
||||
A file could be modified on the remote while
|
||||
it's being exported, and if the remote then uses the mtime of the modified
|
||||
file in the content identifier, the modification would never be noticed by
|
||||
imports.
|
||||
|
||||
To fix this race, we need an atomic move operation on the remote. Upload
|
||||
the file to a temp file, then get its content identifier, and then move it
|
||||
from the temp file to its final location. Alternatively, upload a file and
|
||||
get the content identifier atomically, which eg S3 with versioning enabled
|
||||
provides. It would make sense to have the storeExport operation always return
|
||||
a content identifier and document that it needs to get it atomically by
|
||||
either using a temp file or something specific to the remote.
|
||||
|
||||
----
|
||||
|
||||
There's also a race where a file gets changed on the remote after an
|
||||
import tree, and an export then overwrites it with something else.
|
||||
|
||||
One solution would be to only allow one of importtree or exporttree
|
||||
to a given remote. This reduces the use cases a lot though, and perhaps
|
||||
so far that the import tree feature is not worth building. The adb
|
||||
special remote needs both. Also, such a limitation seems like one that
|
||||
users might try to work around by initializing two remotes using the same
|
||||
data and trying to use one for import and the other for export.
|
||||
|
||||
Really fixing this race needs locking or an atomic operation. Locking seems
|
||||
unlikely to be a portable enough solution.
|
||||
|
||||
An atomic rename operation could at least narrow the race significantly, eg:
|
||||
|
||||
1. get content identifier of $file, check if it's what was expected else
|
||||
abort (optional but would catch most problems)
|
||||
2. upload new version of $file to $tmp1
|
||||
3. rename current $file to $tmp2
|
||||
4. Get content identifier of $tmp2, check if it's what was expected to
|
||||
be. If not, $file was modified after the last import tree, and that
|
||||
conflict has to be resolved. Otherwise, delete $tmp2
|
||||
5. rename $tmp1 to $file
|
||||
|
||||
That leaves a race if the file gets overwritten after it's moved out
|
||||
of the way. If the rename refuses to overwrite existing files, that race
|
||||
would be detected by it failing. renameat(2) with `RENAME_NOREPLACE` can do that,
|
||||
but probably many special remote interfaces don't provide a way to do that.
|
||||
|
||||
S3 lacks a rename operation, can only copy and then delete. Which is not
|
||||
good enough; it risks the file being replaced with new content before
|
||||
the delete and the new content being deleted.
|
||||
|
||||
Is this race really a significant problem? One way to look at it is
|
||||
analagous to a git merge overwriting a locally modified file.
|
||||
Git can certianly use similar techniques to entirely detect and recover
|
||||
from such races (but not the similar race described in the next section).
|
||||
But, git does not actually do that! I modified git's
|
||||
merge.c to sleep for 10 seconds after `refresh_index()`, and verified
|
||||
that changes made to the work tree in that window were silently overwritten
|
||||
by git merge. In git's case, the race window is normally quite narrow
|
||||
and this is very unlikely to happen (the similar race described in the next
|
||||
section is more likely).
|
||||
|
||||
If git-annex could get the race window similarly small out would perhaps be
|
||||
ok. Eg:
|
||||
|
||||
1. upload new version of $file to $tmp
|
||||
2. get content identifier of $file, check if it's what was expected else
|
||||
abort
|
||||
3. rename (or copy and delete) $tmp to $file
|
||||
|
||||
The race window between #2 and #3 could be quite narrow for some remotes.
|
||||
But S3, lacking a rename, does a copy that can be very slow for large files.
|
||||
|
||||
S3, with versioning, could detect the race after the fact, by listing
|
||||
the versions of the file, and checking if any of the versions is one
|
||||
that git-annex did not know the file already had.
|
||||
[Using this api](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGETVersion.html),
|
||||
with version-id-marker set to the previous version of the file,
|
||||
should list only the previous and current versions; if there's an
|
||||
intermediate version then the race occurred and it could roll the change
|
||||
back, or otherwise recover the overwritten version. This could be done at
|
||||
import time, to detect a previous race, and recover from it; importing
|
||||
a tree with the file(s) that were overwritten due to the race, leading to a
|
||||
tree import conflict that the user can resolve. This likely generalizes
|
||||
to importing a sequence of trees, so each version written to S3 gets
|
||||
imported.
|
||||
|
||||
----
|
||||
|
||||
A remaining race is that, if the file is open for write at the same
|
||||
time it's renamed, the write might happen after the content identifer
|
||||
is checked, and then whatever is written to it will be lost.
|
||||
|
||||
But: Git worktree update has the same race condition. Verified with
|
||||
this perl oneliner, run in a worktree and a second later
|
||||
followed by a git pull. The lines that it appended to the
|
||||
file got lost:
|
||||
|
||||
perl -e 'open (OUT, ">>foo") || die "$!"; sleep(10); while (<>) { print OUT $_ }'
|
||||
|
||||
Since this is acceptable in git, I suppose we can accept it here too..
|
||||
|
||||
## S3 versioning and import
|
||||
|
||||
Listing a versioned S3 bucket with past versions results in S3 sending
|
||||
a list that's effectively:
|
||||
|
||||
foo current-version
|
||||
foo past-version
|
||||
bar deleted
|
||||
bar past-version
|
||||
bar even-older-version
|
||||
|
||||
Each item on the list also has a LastModified date, and IsLatest
|
||||
is set for the current version of each file.
|
||||
|
||||
This needs to be converted into a ImportableContents tree of file trees.
|
||||
|
||||
Getting the current file tree is easy, just filter on IsLatest.
|
||||
|
||||
Getting the past file trees seems hard. Two things are in tension:
|
||||
|
||||
* Want to generate the same file tree in this import that was used in past
|
||||
imports. Since the file tree is converted to a git tree, this avoids
|
||||
a proliferation of git trees.
|
||||
|
||||
* Want the past file trees to reflect what was actually in the
|
||||
S3 bucket at different past points in time.
|
||||
|
||||
So while it would work fine to just make one past file tree for each
|
||||
file, that contains only that single file, the user would not like
|
||||
the resulting history when they explored it with git.
|
||||
|
||||
With the example above, the user expects something like this:
|
||||
|
||||
ImportableContents [(foo, current-version)]
|
||||
[ ImportableContents [(foo, past-version), (bar, past-version)]
|
||||
[ ImportableContents [(bar, even-older-version)]
|
||||
[]
|
||||
]
|
||||
]
|
||||
|
||||
And the user would like for the inner-most list to also include
|
||||
(foo, past-version) if it were in the S3 bucket at the same time
|
||||
(bar, even-older-version) was added. So depending on the past
|
||||
modificatio times of foo vs bar, they may really expect:
|
||||
|
||||
let l = ImportableContents [(foo, current-version)]
|
||||
[ ImportableContents [(foo, past-version), (bar, past-version)]
|
||||
[ ImportableContents [(foo, past-version), (bar, even-older-version)]
|
||||
[ ImportableContents [(foo, past-version)]
|
||||
[]
|
||||
]
|
||||
]
|
||||
]
|
||||
|
||||
Now, suppose that foo is deleted and subsequently bar is added back,
|
||||
so S3 now sends this list:
|
||||
|
||||
bar new-version
|
||||
bar deleted
|
||||
bar past-version
|
||||
bar even-older-version
|
||||
foo deleted
|
||||
foo current-version
|
||||
foo past-version
|
||||
|
||||
The user would expect this to result in:
|
||||
|
||||
ImportableContents [(bar, new-version)]
|
||||
[ ImportableContents []
|
||||
l
|
||||
]
|
||||
|
||||
But l needs to be the same as the l above to avoid git trees proliferation.
|
||||
|
||||
What is the algorythm here?
|
||||
|
||||
1. Build a list of files with historical versions ([[a]]).
|
||||
2. Extract a snapshot from the list
|
||||
3. Remove too new versions from the list
|
||||
4. Recurse with the new list.
|
||||
|
||||
Extracting a snapshot:
|
||||
|
||||
Map over the list, taking the head version of each item and tracking
|
||||
the most recent modification time. Add the filenames to a snapshot list
|
||||
(unless the item is a deletion).
|
||||
|
||||
Removing too new versions:
|
||||
|
||||
Map over the list, and when the head version of a file matches the most
|
||||
recent modification time, pop it off.
|
||||
|
||||
This results in a list that is only versions before the snapshot.
|
||||
|
||||
Overall this is perhaps a bit better than O(n^2) because the size of the list
|
||||
decreases as it goes?
|
||||
|
||||
---
|
||||
|
||||
See also, [[adb_special_remote]]
|
||||
|
||||
[[!tag confirmed]]
|
|
@ -1,73 +0,0 @@
|
|||
Need to support annex.largefiles when importing a tree from a special
|
||||
remote.
|
||||
|
||||
Note that the legacy `git annex import` from a directory does honor
|
||||
annex.largefiles.
|
||||
|
||||
> annex.largefiles will either need to be matched by downloadImport
|
||||
> (changing to return `Either Sha Key`, or by buildImportTrees).
|
||||
>
|
||||
> If it's done in downloadImport, to avoid re-download of non-large files,
|
||||
> the content identifier will
|
||||
> need to be recorded as using the git sha1. This needs a way to encode
|
||||
> a git sha as a key, that's a bijective mapping (so distinct from annex
|
||||
> sha1 keys).
|
||||
>
|
||||
> Problem: In downloadImport, startdownload checks getcidkey
|
||||
> to see if the ContentIdentifier is already known, and if so, returns the
|
||||
> key used for it before. But, with annex.largefiles, the same content
|
||||
> might be annexed given one filename, and not annexed with another.
|
||||
> So, the key from getcidkey might not be the right one (or there could be
|
||||
> more than one, an annex key and a translated git key).
|
||||
>
|
||||
> That argues against making downloadImport match annex.largefiles.
|
||||
|
||||
> But, if instead buildImportTrees matches annex.largefiles,
|
||||
> then downloadImport has already run moveAnnex on the download,
|
||||
> so the content is in the annex. Moving it back out of the annex is
|
||||
> difficult (there may be other files in the repo using the same key).
|
||||
> So, downloadImport would then need to not moveAnnex, but move it to
|
||||
> somewhere temporary. Like the gitAnnexTmpObjectLocation, but using
|
||||
> that would be a problem if there was a file in the repo
|
||||
> and git-annex get was run on it at the same time. So an equivilant
|
||||
> but separate location.
|
||||
>
|
||||
> Further problem: downloadImport might skip a download of a CID
|
||||
> that's already been seen. That CID might have generated a key
|
||||
> before. The key's content may not still be present in the local
|
||||
> repo. Then, if buildImportTrees checks annex.largefiles and wants
|
||||
> to add it directly to git, it won't have the content available to add to
|
||||
> git. (Conversely, the CID may have been added to git before, but
|
||||
> annex.largefiles matches now, and so it would need to extract
|
||||
> the content from git only to store it in the annex, which is doable but
|
||||
> seems pointless as it's not going to save any space.)
|
||||
>
|
||||
> Would it be acceptable for annex.largefiles to be ignored if the same
|
||||
> content was already imported from a remote earlier? I think maybe so.
|
||||
>
|
||||
> Then all these problems are not a concern, and back to downloadImport
|
||||
> checking annex.largefiles being the simplest approach, since it avoids
|
||||
> needing the separate temp file location.
|
||||
>
|
||||
> From the user's perspective, the special remote contained a file,
|
||||
> it was already imported in the past, and the file has been renamed.
|
||||
> It makes no more sense for importing it again to change how it's
|
||||
> stored between git and annex than it makes sense for git mv of a file
|
||||
> to change how it's stored.
|
||||
>
|
||||
> However... If two people can access the special remote, and import
|
||||
> from it at different times, and get different trees as a result,
|
||||
> that might break some assumptions, might also lead to merge
|
||||
> conflicts. --[[Joey]]
|
||||
>
|
||||
> > Importing updates export.log, to indicate the state of the remote
|
||||
> > (the log file could have been named better). So an annex.largefiles
|
||||
> > change would result in an export/import conflict. Such a conflict
|
||||
> > can be resolved by using git-annex export, but this could be a
|
||||
> > surprising situation for users to encounter, since there is no real
|
||||
> > conflict.
|
||||
> >
|
||||
> > Still, this doesn't feel like a reason not to implement the feature,
|
||||
> > necessarily.
|
||||
|
||||
[[done]]
|
|
@ -1,9 +0,0 @@
|
|||
It would be great to be able to use the pubDate of the entries with the --template option of importfeed.
|
||||
|
||||
Text.Feed.Query has a getItemPublishDate (and a getFeedPubDate, if we want some kind of ${feeddate}).
|
||||
|
||||
The best would be to allow a reformating of the date(s) with (for example) %Y-%m-%D
|
||||
|
||||
> itempubdate was added years ago and I forgot to close this,
|
||||
> but I've now also added itempubmonth, itempubday, etc. [[done]]
|
||||
> --[[Joey]]
|
|
@ -1,10 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="gueux"
|
||||
ip="2a01:240:fe6d:0:7986:3659:a8bd:64f1"
|
||||
subject="syntax"
|
||||
date="2013-09-12T14:05:16Z"
|
||||
content="""
|
||||
use \"itemdate\" and \"feeddate\" as names?
|
||||
|
||||
use ${itemdate=%Y-%m-%D} syntax option?
|
||||
"""]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="4.154.2.134"
|
||||
subject="comment 2"
|
||||
date="2013-09-13T19:53:52Z"
|
||||
content="""
|
||||
getItemPublishDate returns a String, which can contain any of several date formats. Deferred until the feed library has something more sane.
|
||||
Upstream bug: <https://github.com/sof/feed/issues/6>
|
||||
|
||||
As for how to format the date in the feed, I would be ok with having itemdate (YYYYMMDD), itemyear (YYYY), itemmonth (MM) and itemday (DD). Full date formatting seems like overkill here.
|
||||
"""]]
|
|
@ -1,5 +0,0 @@
|
|||
The documentation for the new import remote command says, "Importing from a special remote first downloads all new content from it". For many special remotes -- such as Google Cloud Storage or DNAnexus -- checksums and sizes of files can be determined without downloading the files. For other special remotes, data files might have associated checksum files (e.g. md5) stored next to them in the remote. In such cases, it would help to be able to import the files without downloading (which can be costly, especially from cloud provider egress charges), similar to addurl --fast .
|
||||
|
||||
[[!tag confirmed]]
|
||||
|
||||
> [[done]] (only implemented for directory for now) --[[Joey]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="comment 10"
|
||||
date="2020-07-03T19:55:36Z"
|
||||
content="""
|
||||
\"the key generated by import --fast is probably not be the same one generated by a regular import\" -- but that happens already with addurl; is the problem worse here?
|
||||
"""]]
|
|
@ -1,11 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 11"""
|
||||
date="2020-07-24T17:50:03Z"
|
||||
content="""
|
||||
Yes, it can also happen with addurl, but I think it's less likely that two
|
||||
users add the same url with and without --fast or --relaxed than that two
|
||||
users sync with the same remote with and without --content.
|
||||
|
||||
Anyway, I opened [[sync_fast_import]].
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2019-03-19T17:46:09Z"
|
||||
content="""
|
||||
It would also be possible for listImportableContents to
|
||||
return an url that can be used to publically download the content,
|
||||
which git-annex could derive a URL key from (as well as recording the url).
|
||||
|
||||
If the ContentIdentifier is something globally unique or using some kind
|
||||
of proprietary hashing (like an S3 version ID), it could be used to
|
||||
construct a key. (Note that it would be possible for a remote to include its
|
||||
UUID in the ContentIdentifier if it's not otherwise globally unique.)
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="annex.thin for importing from directory special remote"
|
||||
date="2020-07-01T22:23:58Z"
|
||||
content="""
|
||||
As a special case, when importing from a directory special remote, could there be an option to hardlink the files into the repo instead of copying them?
|
||||
"""]]
|
|
@ -1,41 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2020-07-02T18:18:57Z"
|
||||
content="""
|
||||
Yeah, a directory special remote special case would be good.
|
||||
It's kind of needed for [[remove_legacy_import_directory_interface]].
|
||||
|
||||
It could just as well hash the file in place in the directory,
|
||||
and leave it there, not "downloading" it into the annex. Which avoids
|
||||
me having to think about whether hard linking to files in a
|
||||
special remote makes any kind of sense. (My gut feeling is it's not
|
||||
the same as hard linking inside a git-annex repo.)
|
||||
|
||||
This approach needs this interface to be added.
|
||||
|
||||
importKey :: Maybe (ExportLocation -> ContentIdentifier -> ByteSize -> Annex Key)
|
||||
|
||||
Then just use that, when it's available, rather than
|
||||
retrieveExportWithContentIdentifier. Easy enough.
|
||||
|
||||
And other remotes could use this interface too.
|
||||
If some other remote has public urls, it could generate a URL key
|
||||
and return that. And if a remote has server-side checksums, it can generate
|
||||
a key from the checksum, as long as it's a checksum git-annex supports.
|
||||
So this interface seems sufficiently general.
|
||||
|
||||
This would be easy to add to the special remote protocol too, although
|
||||
some new plumbing command might be needed to help generate a key
|
||||
from information like the md5 and size. Eg,
|
||||
`git annex genkey --type=MD5 --size=100 --value=3939393` and `git annex genkey
|
||||
--type=URL value=http://example.com/foo`
|
||||
|
||||
----
|
||||
|
||||
User interface changes: `git-annex import --from remote --fast` and
|
||||
`git annex sync` without --content could import from a remote that
|
||||
way, if it supports importKey. (Currently sync only imports with
|
||||
--content so this is kind of a behavior change, but I think an ok one to
|
||||
make.)
|
||||
"""]]
|
|
@ -1,8 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="comment 4"
|
||||
date="2020-07-02T20:22:25Z"
|
||||
content="""
|
||||
Thanks -- this would solve (among other things) [[bugs/removeLink_failed_when_initializing_a_repo_in_a_VirtualBox_shared_folder]]: I could put the git-annex repo on the normal filesystem inside the VM, and only the directory special remote would then deal with the broken vboxsf filesystem. import-tree *with* copying isn't possible as the files are too big.
|
||||
"""]]
|
|
@ -1,14 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 5"""
|
||||
date="2020-07-03T01:46:46Z"
|
||||
content="""
|
||||
Hmm, it would also be possible for a remote to generate a WORM key,
|
||||
as long as there was a way for it to get a timestamp for the file being
|
||||
imported.
|
||||
|
||||
That might let it be implemented for several other special remotes.
|
||||
|
||||
Although I'm wary about making git-annex ever use WORM without being
|
||||
explicitly asked to. annex.eatworms? ;)
|
||||
"""]]
|
|
@ -1,15 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 6"""
|
||||
date="2020-07-03T15:52:26Z"
|
||||
content="""
|
||||
This has merge conflict potential, because the key generated by import
|
||||
--fast is probably not be the same one generated by a regular import. So, if
|
||||
two repositories are both importing from the same special remote, there will be
|
||||
a need to resolve the resulting merge conflicts.
|
||||
|
||||
Since git-annex sync is often run with and without --content, it's probably
|
||||
the most likely problem point for this. Perhaps there should be another
|
||||
config that controls whether sync does a fast import or not, and not
|
||||
control it with --content?
|
||||
"""]]
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 7"""
|
||||
date="2020-07-03T16:27:10Z"
|
||||
content="""
|
||||
Hmm, --fast is not very descriptive for this when it's used with a
|
||||
directory special remote, because hashing is almost as slow as copying.
|
||||
|
||||
Probably better to use --no-content and --content, same as sync.
|
||||
(Though unfortunately with an opposite default though iirc there are plans
|
||||
somewhere to transition sync to default to --content).
|
||||
"""]]
|
|
@ -1,12 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 8"""
|
||||
date="2020-07-03T17:39:19Z"
|
||||
content="""
|
||||
Note that, since exporttree remotes are always untrusted, after importing
|
||||
--no-content from one, fsck is going to complain about it being the only
|
||||
location with the content.
|
||||
|
||||
Which seems right.. That content could be overwritten at any time and the
|
||||
only copy lost. But still worth keeping in mind.
|
||||
"""]]
|
|
@ -1,13 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 9"""
|
||||
date="2020-07-03T18:29:05Z"
|
||||
content="""
|
||||
implemented, directory remote only, but it could be added to adb easily,
|
||||
and possibly to S3. Also added it to the proposed import extension to the
|
||||
external special remote protocol.
|
||||
|
||||
Still unsure what to do about git-annex sync without --content importing.
|
||||
For now, sync doesn't do content-less imports still, but that could be
|
||||
changed if the concerns in comment #6 are dealt with.
|
||||
"""]]
|
|
@ -1,83 +0,0 @@
|
|||
When a `git annex move` is interrupted at a point where the content has
|
||||
been transferred, but not yet dropped from the remote, resuming the move
|
||||
will often refuse to drop the content, because it would violate numcopies.
|
||||
|
||||
Eg, if numcopies is 2, and there is only 1 extant copy, on a remote,
|
||||
git-annex move --from remote will normally ignore numcopies (since it's not
|
||||
getting any worse) and remove the content from the remote after
|
||||
transferring it. But, on resume, git-annex sees there are 2 copies and
|
||||
numcopies is 2, so it can't drop the copy from the remote.
|
||||
|
||||
This happens to me often enough to be annoying. Note that being interrupted
|
||||
during checksum verification makes it happen, so the window is relatively
|
||||
wide.
|
||||
|
||||
I think it can also happen with move --to, although I can't remember seeing
|
||||
that.
|
||||
|
||||
Perhaps some local state could avoid this problem?
|
||||
|
||||
--[[Joey]]
|
||||
|
||||
> One simple way would be to drop the content from the remote before moving
|
||||
> it to annex/objects/. Then if the move were interrupted before the drop,
|
||||
> it could resume the interrupted transfer, and numcopies would work the
|
||||
> same as it did when the move started.
|
||||
>
|
||||
> > After an interrupted move, whereis would say the content is present,
|
||||
> > but eg an annex link to it would be broken. That seems surprising,
|
||||
> > and if the user doesn't think to resume the move, fsck would have to be
|
||||
> > made to deal with it. I don't much like this approach, it seems to
|
||||
> > change an invariant that usually existance of copy on disk is ground
|
||||
> > truth, and location tracking tries to reflect it. With this, location
|
||||
> > tracking would be correct, but only because the content is in an
|
||||
> > unusual place on disk that it can be recovered from.
|
||||
>
|
||||
> Or: Move to annex/objects/ w/o updating local location log.
|
||||
> Then do the drop, updating the remote's location log as now.
|
||||
> Then update local location log.
|
||||
> >
|
||||
> > If interrupted, and then the move is resumed, it will see
|
||||
> > there's a local copy, and drop again from the remote. Either that
|
||||
> > finishes the interrupted drop, or the drop already happened and it's a
|
||||
> > noop. Either way, the local location log then gets updated.
|
||||
> > That should clean things up.
|
||||
> >
|
||||
> > But, if a sync is done with the remote first, and then the move
|
||||
> > is resumed, it will no longer think the remote has a copy. This is
|
||||
> > where the only copy can appear missing (in whereis). So a fsck
|
||||
> > will be needed to recover. Or, move could be made to recover from
|
||||
> > this too, noticing the local copy and updating the location log to
|
||||
> > reflect it.
|
||||
> >
|
||||
> > Still, if the move is interrupted and never resumed, after a sync
|
||||
> > with the remote, the only copy appears missing, which does seem
|
||||
> > potentially confusing.
|
||||
|
||||
> Local state could be a file listing keys that have had a move started
|
||||
> but not finished. When doing the same move, it should be allowed to
|
||||
> succeed even if numcopies would prevent it. More accurately, it
|
||||
> should disregard the local copy when checking numcopies for a move
|
||||
> --from. And for a move --to, it should disregard the remote copy.
|
||||
> May need 2 separate lists for the two kinds of moves.
|
||||
>
|
||||
> > This is complex to implement, but it avoids the gotchas in the earlier
|
||||
> > ideas, so I think is best. --[[Joey]]
|
||||
|
||||
> > > Implementation will involve willDropMakeItWorse,
|
||||
> > > which is passed a deststartedwithcopy that currently comes from
|
||||
> > > inAnnex/checkPresent. Check the log, and if
|
||||
> > > the interrupted move started with the move destination
|
||||
> > > not having a copy, pass False.
|
||||
|
||||
Are there any situations where this would be surprising? Eg, if git-annex
|
||||
move were interrupted, and then a year later, run again, and proceeded
|
||||
to apparently violate numcopies?
|
||||
|
||||
Maybe, OTOH I've run into this problem probably weeks after the first move
|
||||
got interrupted. Eg, if files are always moved from repo A to repo B,
|
||||
leaving repo A empty, this problem can cause stuff to build up on repo A
|
||||
unexpectedly. And in such a case, the timing of the resumed move does not
|
||||
matter, the user expected files to always get eventually moved from A.
|
||||
|
||||
[[fixed|done]] --[[Joey]]
|
|
@ -1,21 +0,0 @@
|
|||
Several todos need to examine preferred content expressions to see if
|
||||
any of the terms in them match some criteria.
|
||||
|
||||
That includes:
|
||||
|
||||
* [[todo/sync_fast_import]]
|
||||
* [[todo/faster_key_lookup_for_limits]]
|
||||
* [[todo/skip_first_pass_in_git_annex_sync]]
|
||||
|
||||
Internally, preferred content expressions are compiled
|
||||
into a `Matcher (AssumeNotPresent -> MatchInfo -> Annex Bool)`
|
||||
|
||||
The presence of the function there is a problem, because haskell does not
|
||||
allow comparing functions for equality. So probably what is needed is
|
||||
something that contains that function but also indicates which preferred
|
||||
content term it's for.
|
||||
|
||||
Or, perhaps, not the term, but the specific criteria needed by each such
|
||||
todo.
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,14 +0,0 @@
|
|||
I don't want files that I dropped to immediately disappear from my local or all of my remotes repos on the next sync. Especially in situations where changes to the git-annex repo get automatically and immediately replicated to remote repos, I want a configurable "grace" period before files in .git/annex/objects get really deleted.
|
||||
|
||||
This has similarities to the "trash" on a desktop. It might also be nice to
|
||||
|
||||
* configure a maximum amount of space of the "trash"
|
||||
* have a way to see the contents of the trash to easily recover deleted files
|
||||
|
||||
Maybe it would make sense to just move dropped files to the desktops trash? "git annex trash" as an alternative to drop?
|
||||
|
||||
> This seems likely to have been a misunderstanding of what drop does,
|
||||
> since dropping from the local repo would not remove the content from a
|
||||
> remote.
|
||||
>
|
||||
> closing as there's no clear todo here. [[done]] --[[Joey]]
|
|
@ -1,29 +0,0 @@
|
|||
The forwardRetry RetryDecider keeps retrying a transfer as long as at least
|
||||
one more byte got transferred than in the previous, failed try.
|
||||
|
||||
Suppose that a transfer was restarting from the beginning each time, and it
|
||||
just so happened that each try got a tiny little bit further before
|
||||
failing. Then transferring an `N` byte object could result in `sum [1..N]`
|
||||
bytes being sent. Worst case. (Real world it involves the size of chunks
|
||||
sent in a failing operation, so probably `sum [1..N/1024]` or so.)
|
||||
|
||||
So I think forwardRetry should cap after some amount of automatic retrying.
|
||||
Ie, it could give up after 5 retries. --[[Joey]]
|
||||
|
||||
Of course, the real use case for forwardRetry is remotes that use eg, rsync
|
||||
and can really resume at the last byte. But, forwardRetry can't tell
|
||||
if a remote is doing that (unless some timing heuristics were used). Around
|
||||
5 retries seems fairly reasonable for that case too, it would be unlikely
|
||||
for a rsync transfer to keep failing so many times while still making
|
||||
forward progess. --[[Joey]]
|
||||
|
||||
> Or could add data to remotes about this, but it would need to be added
|
||||
> for external special remotes too, and this does not really seem worth the
|
||||
> complication.
|
||||
>
|
||||
> I think, even if a remote does not support resuming like
|
||||
> rsync, it makes sense to retry a few failed transfers if it's getting
|
||||
> closer to success each time, because forward progress suggests whatever
|
||||
> made it fail is becoming less of a problem.
|
||||
|
||||
[[done]] --[[Joey]]
|
|
@ -1,19 +0,0 @@
|
|||
Add --maximum-cost=N which prevents trying to access any remotes with a
|
||||
larger cost. May as well add --minimum-cost too for completeness.
|
||||
|
||||
My use case: Want to git annex get --auto and pull from any of 3 usb
|
||||
drives, but not from the network. --[[Joey]]
|
||||
|
||||
> Hmm, [[todo/to_and_from_multiple_remotes]] might be another way to do
|
||||
> that. Put the 3 drives in a git remote group, or list the remotes on the
|
||||
> fly.
|
||||
>
|
||||
> There could still be benefit in avoiding high cost remotes. But, the cost
|
||||
> numbers are only intended to create a local ordering, so making them part of a
|
||||
> user interface is kind of weird. While 50 might be a high cost in one
|
||||
> repository, in another repository it could be a fairly low cost. The user
|
||||
> would need to examine all the costs to pick the cost they want; using
|
||||
> remote names seems better UI. --[[Joey]]
|
||||
|
||||
> > that seems convincing reason not to implement this and instead
|
||||
> > implement remote groups. [[wontfix|done]] --[[Joey]]
|
|
@ -1,33 +0,0 @@
|
|||
ATM upon `get` of a file for which no remote in .git/config provides its content, git-annex spits out a message like
|
||||
|
||||
[[!format sh """
|
||||
/tmp/najafi-2018-nwb > git annex get data/FN_dataSharing/nwb/mouse1_fni16_150817_001_ch2-PnevPanResults-170808-190057.nwb
|
||||
(merging origin/git-annex into git-annex...)
|
||||
(recording state in git...)
|
||||
(scanning for unlocked files...)
|
||||
get data/FN_dataSharing/nwb/mouse1_fni16_150817_001_ch2-PnevPanResults-170808-190057.nwb
|
||||
Remote origin not usable by git-annex; setting annex-ignore
|
||||
(not available)
|
||||
Try making some of these repositories available:
|
||||
2cca1320-6f51-4acf-a778-efdc79f87ab3 -- smaug:/mnt/btrfs/datasets/datalad/crawl/labs/churchland/najafi-2018-nwb
|
||||
e513795e-1311-431d-8106-917d9528cfbd -- datasets.datalad.org
|
||||
|
||||
(Note that these git remotes have annex-ignore set: origin)
|
||||
failed
|
||||
(recording state in git...)
|
||||
git-annex: get: 1 failed
|
||||
"""]]
|
||||
|
||||
although those remote descriptions/names give an idea for an informed user, they do not event differentiate between regular and special remotes. Special remotes could just be "enabled", some of them might even have `autoenable` set. May be it could separate them and provide a message like
|
||||
|
||||
[[!format sh """
|
||||
...
|
||||
Try making some of these repositories available:
|
||||
2cca1320-6f51-4acf-a778-efdc79f87ab3 -- smaug:/mnt/btrfs/datasets/datalad/crawl/labs/churchland/najafi-2018-nwb
|
||||
or enable (using git annex enableremote <name>) one of:
|
||||
e513795e-1311-431d-8106-917d9528cfbd -- datasets.datalad.org
|
||||
"""]]
|
||||
|
||||
[[!meta author=yoh]]
|
||||
|
||||
> implemented as shown. [[done]] --[[Joey]]
|
|
@ -1,38 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 1"""
|
||||
date="2019-09-30T17:55:22Z"
|
||||
content="""
|
||||
I'm not sure that the distinction between regular and special remotes is
|
||||
likely to matter in general?
|
||||
|
||||
If I intuit correctly, in your use case, you may have special remotes that
|
||||
are extremely easy to enable. (Auto-enabling seems a red herring since it
|
||||
didn't get autoenabled). While conversely some random repository
|
||||
might be on a LAN/device the user doesn't have access to.
|
||||
|
||||
But it seems just as likely that a user might have a special remote that
|
||||
needs installing extra software to access, or needs a password or other
|
||||
authentication method that's a pain, but it be easy enough to add a ssh
|
||||
remote pointing at another repository on the LAN, or to mount a drive.
|
||||
|
||||
Or in my personal setup, some repositories are on offline drives and a pain
|
||||
to access, others are on network attached storage and easy, and special
|
||||
remotes are a distant third choice. (I use repo descriptions to
|
||||
differentiate.)
|
||||
|
||||
I also feel that this message is already really too verbose, and adding
|
||||
lots more instructions to it will overall hurt usability. Bear in mind
|
||||
there can be many such messages displayed by a single command.
|
||||
|
||||
Also, the proposed output suggesting to run git-annex enableremote doesn't
|
||||
make sense if the special remote is actually already enabled, but was still
|
||||
not able to be accessed for whatever reason. The existing message is
|
||||
intentionally worded so it works in either case, disambiguated by
|
||||
displaying the names of the remotes that are enabled.
|
||||
|
||||
It might be that more metadata about repositories would help, like it
|
||||
already separates out untrusted repositories into a separate list.
|
||||
But it would have to be metadata that applies to all users of a repository,
|
||||
or is somehow probed at runtime.
|
||||
"""]]
|
|
@ -1,34 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 2"
|
||||
date="2019-09-30T20:35:59Z"
|
||||
content="""
|
||||
> I'm not sure that the distinction between regular and special remotes is likely to matter in general?
|
||||
|
||||
Those (regular git repositories, and special remotes) are technically completely different beasts, and \"made available\" using different mechanisms (`git remote add` vs `git annex enableremote`). Listing them in one list makes it hard-to-impossible for a user to choose a correct command without background knowledge. Indeed some of them (regardless of the type) would be harder to \"make available\" than the others, but that is different type of information which annex unlikely to ever contain and thus to express in the message. `autoenabled` ones though are more likely to be the \"easy ones\".
|
||||
|
||||
> (Auto-enabling seems a red herring since it didn't get autoenabled)
|
||||
|
||||
`datalad install` autoenables by default since we call `git annex init` on a fresh clone (IIRC if we see `git-annex` branch on remote). With pure `git annex`, I believe it is only if you run `git annex init` explicitly after cloning, you would get it autoenabled. So `git clone https://github.com/dandi/najafi-2018-nwb && cd najafi-2018-nwb && git annex get data/FN_dataSharing/nwb/mouse1_fni16_150817_001_ch2-PnevPanResults-170808-190057.nwb` wouldn't work, while the one with `git annex init` before `git annex get` would.
|
||||
So I wouldn't say it is `red herring` per se - I (user) can end up in a situation where a special remote was not enabled since I did not explicitly `git annex init` locally.
|
||||
|
||||
> ... Bear in mind there can be many such messages displayed by a single command.
|
||||
|
||||
yeah, that is what I (as a user) dislike as well. I even thought that in `datalad` (e.g. [#3078](https://github.com/datalad/datalad/issues/3078)) we could parse those and provide a single summary statement... I think that splitting here into two wouldn't be the straw to break the camel's back. Some more generic (re)solution is needed.
|
||||
|
||||
> Also, the proposed output suggesting to run git-annex enableremote doesn't make sense if the special remote is actually already enabled, but was still not able to be accessed for whatever reason.
|
||||
|
||||
Indeed. But `git annex` \"knows\" either any given special remote was or was not available/tried, correct? To a user (if we forget about the verbosity for a moment) most informative message then could be
|
||||
|
||||
1. a list of remotes which tried but failed (thus might need to be \"made available) - may be even with some reason for each (e.g. \"connection time out\", \"file is missing\", ...)
|
||||
2. a list of regular remotes (to be added via `git remote add`)
|
||||
3. a list of special remotes (to be enabled via `git annex enableremote`)
|
||||
|
||||
from `1.` I would see if I should do something about what I had already connected to, from 2. and 3. I would immediately see what and how to enable (if I see that I potentially has access to it)
|
||||
|
||||
> It might be that more metadata about repositories would help, like it already separates out untrusted repositories into a separate list.
|
||||
|
||||
Besides considering untrusted repos last (could be placed last in any corresponding list) I personally do not see such separation as useful.
|
||||
|
||||
"""]]
|
|
@ -1,25 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2020-09-22T16:15:49Z"
|
||||
content="""
|
||||
Yes, it knows which remotes are configured, and every configured remote
|
||||
that it's going to list will have been tried and not been accessible
|
||||
when there's such a message. So, the list can be split into repos
|
||||
that have a remote and those without one. Eg:
|
||||
|
||||
Try making some of these remotes accessible:
|
||||
2370e576-fcef-11ea-a46e-7fce4739e70f -- joey@localhost:/media/usb [usbdrive]
|
||||
346cad24-fcef-11ea-a275-d3951b734346 -- joey@server:repo [origin]
|
||||
9808c3da-fcf0-11ea-b47f-cfa6e90a9d4a -- amazon S3
|
||||
Maybe enable some of these special remotes (git annex enableremote):
|
||||
e513795e-1311-431d-8106-917d9528cfbd -- datasets.datalad.org
|
||||
Maybe add some of these git remotes (git remote add):
|
||||
2cca1320-6f51-4acf-a778-efdc79f87ab3 -- smaug:/mnt/btrfs/datasets/datalad/crawl/labs/churchland/najafi-2018-nwb
|
||||
|
||||
So only 2 lines longer at most.
|
||||
|
||||
(The "Maybe" wording is because "And/or" is so ugly, and yet
|
||||
the user may need to only do one, or more than one, depending on what
|
||||
they're doing.)
|
||||
"""]]
|
|
@ -1,16 +0,0 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 4"""
|
||||
date="2020-09-22T16:45:19Z"
|
||||
content="""
|
||||
This risks changing the --json output. Eg currently it has:
|
||||
|
||||
{"command":"get","wanted":[{"here":false,"uuid":"7f03b57d-5923-489a-be26-1ab254d0620d","description":"archive-13 [house]"}],"note":"from house...\nrsync failed -- run git annex again to resume file transfer\nUnable to access these remotes: house\nTry making some of these repositories available:\n\t7f03b57d-5923-489a-be26-1ab254d0620d -- archive-13 [house]\n","skipped":[]
|
||||
|
||||
The "wanted" list comes from the display of the list of
|
||||
uuids, but now there would be up to 3 lists displayed.
|
||||
|
||||
I doubt anything uses that, but I don't want to change the json,
|
||||
so I suppose it would need to keep the current behavior when json is
|
||||
enabled, ugh.
|
||||
"""]]
|
|
@ -1,6 +0,0 @@
|
|||
The http special remote doesn't currently support being used with a
|
||||
--sameas remote that uses exporttree=yes.
|
||||
|
||||
It seems like this should be fairly easy to implement. --[[Joey]]
|
||||
|
||||
> [[done]] --[[Joey]]
|
|
@ -1,7 +0,0 @@
|
|||
I want to add some dotfiles in the root of my repository to git-annex as unlocked annexed files. So I edited `.git/info/attributes` to remove the line `.* !filter`, such that it only contains the line `* filter=annex`. This seems to be working fine.
|
||||
|
||||
I was thinking that it might make sense to have a `git annex config` option to tell git-annex not to add the `.* !filter` line to `.git/info/attributes` when initialising other clones of this repo. In the meantime, I've worked around it using a `post_checkout` hook in my `~/.mrconfig` which edits `.git/info/attributes`.
|
||||
|
||||
--spwhitton
|
||||
|
||||
> annex.dotfiles added, [[done]] --[[Joey]]
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue