Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
ea95de3656
23 changed files with 301 additions and 4 deletions
17
doc/bugs/git_annex_fix_broken.mdwn
Normal file
17
doc/bugs/git_annex_fix_broken.mdwn
Normal file
|
@ -0,0 +1,17 @@
|
|||
### Please describe the problem.
|
||||
`git annex fix` doesn't fix up broken symlinks afer moving a file.
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
git init
|
||||
git annex init
|
||||
mkdir dir
|
||||
touch dir/a
|
||||
git annex add .
|
||||
git annex sync
|
||||
mv dir/a .
|
||||
git annex fix a
|
||||
ls -alh
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
8.20201127 (I know I know... One year old version :)
|
13
doc/bugs/git_annex_fsck_--time-limit_broken.mdwn
Normal file
13
doc/bugs/git_annex_fsck_--time-limit_broken.mdwn
Normal file
|
@ -0,0 +1,13 @@
|
|||
### Please describe the problem.
|
||||
`git annex fsck --time-limit=` is broken. <br>
|
||||
For one, there is a large delay between the specified time limit until something actually happens. With 20 seconds, `git annex fsck` always runs more than 5 minutes. And then something of the following happens: <br>
|
||||
Sometimes it works as intended. <br>
|
||||
Sometimes it prints "Time limit (20s) reached!" but hangs without exiting. <br>
|
||||
Sometimes it prints "Time limit (20s) reached!" but continues fscking. <br>
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
In a sufficiently large repo run `git annex fsck --time-limit=20s`.
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
8.20201127
|
||||
|
|
@ -0,0 +1,35 @@
|
|||
[[!comment format=mdwn
|
||||
username="AlbertZeyer"
|
||||
avatar="http://cdn.libravatar.org/avatar/b37d71961a6a5abf9b7184ed77b5a941"
|
||||
subject="comment 2"
|
||||
date="2021-01-01T22:30:34Z"
|
||||
content="""
|
||||
Hi, thanks for the answer.
|
||||
|
||||
What if I would want to leave `~/Pictures` as-is, and not move it, nor change it? I would prefer that. I just want to add its content to a Git Annex repo, and easily sync future changes as well to the repo (e.g. after I added more files, or renamed some files, or updated some files).
|
||||
|
||||
Why `git annex sync` and not `git commit`? I always did only `git commit` so far.
|
||||
|
||||
Why `git annex reinject` and not `git annex import` or `cp|mv` & `git annex add`?
|
||||
Also, why would I not add files which were/are not part of the original `~/Pictures`? The original `~/Pictures` would not have contained all of the pictures, as they are somewhat distributed. So I want to add unknown files as well.
|
||||
|
||||
Why would I import the DVD to a dummy branch? I would want it all in my master/main branch, or not? (I also don't quite understand why I would want branches at all?)
|
||||
I also potentially want to `git annex get` such a file at some point.
|
||||
|
||||
What are \"too many files\" for a single repo? And why is that a problem?
|
||||
I am just adding a Google Takeout archive to Git Annex ([via](https://github.com/albertz/chrome-ext-google-takeout-downloader/)), and it will contain also many of the files of `~/Pictures` (although not all; and sometimes, but not always, in smaller quality, but often also in original quality), but also many other files. So it's already pretty mixed up.
|
||||
Or does it make sense to just share the Annex object storage (`.git/annex/objects`) in multiple repos?
|
||||
Or do you mean that as the intended use case for branches actually?
|
||||
|
||||
What dotfiles does `annex.dotfiles` include? Just all `.*`?
|
||||
Why would I not want to add dotfiles? I think I would want to just archive the whole directory as-is.
|
||||
|
||||
Also, after reading a bit further, and trying it out a bit, I don't quite understand:
|
||||
|
||||
Given some file path (e.g. `Picture/BestPics2020/a.jpg`), how can I find other paths of the same file? (E.g. I would also have the file stored under `Picture/2020/01/a.jpg` or so.) Is that with `git annex list`? I'm not sure this lists all paths. So far I only see a single path always.
|
||||
|
||||
I'm not really sure how to use `git annex import` properly, in case the file is already annexed under a different path. In any case, I also want to add the new path (new name).
|
||||
|
||||
Sorry for the many follow-up questions, but this is still all somewhat unclear to me.
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,20 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 3"
|
||||
date="2021-01-02T15:05:01Z"
|
||||
content="""
|
||||
You can of course just use `~/Pictures` directly as a repository. So `cd ~/Pictures; git init; git annex init`.
|
||||
|
||||
`git annex sync` does a little more things than just `git commit`. For example, it also automatically commits deletion of files.
|
||||
|
||||
Sorry, I thought the existing copies of your Photos where just backups of your `~/Pictures`. In that case I suggest you to `mv` the files into the annex and then just `git annex add` them. For DVD's import to a sub-directory of your master branch instead of a dummy branch and without the `--no-content` option.
|
||||
|
||||
\"Too many files\" depends on you liking. The more files the slower some operations get, like `git annex sync`. I suggest you to set something like `git annex config --set annex.largefiles 'largerthan=32kb'`. This way, small files get added to git itself instead of git-annex, which speeds up git-annex operations if there are a lot of small files. Note that these small files will be in every clone of the repo and can't be `git annex drop`ed.
|
||||
|
||||
The various configuration options are documented in the main [[git-annex]] manpage, at the bottom. Without the `annex.dotfiles` option, dotfiles (any file starting with \".\" and anything inside directories starting with \".\") will still be added, but to git itself with the disadvantages mentioned above.
|
||||
|
||||
You can get the key/hash for that file with `git annex info <file>`, and then search for other files with the same content with `find . -lname '*<key>'`.
|
||||
|
||||
You can just `cp/mv` the files in the annex and `git annex add` them. Note that for duplicate files in the annex, only one copy of the data/file content will be stored.
|
||||
"""]]
|
|
@ -1,9 +1,9 @@
|
|||
So... I've been flirting with using git annex for literal years now, and if for some reason you are wanting to use it too here are some tips:
|
||||
|
||||
1) keep backups. seriously. just do it. it's possible to lose data, even though git annex is designed to avoid eating your data it will do it under certain circumstances. you aren't lucky enough to avoid it. trust me.
|
||||
2) make a big fat git annex with too many files in it, and kick the tires, hard. run all the commands and try to break it, see what it does under certain circumstances before you run those same commands on your beloved data. (the documentation isn't always up to date, sometimes the options (which are complex) operate differently than the website says and differently than you expect, this is most likely due to code changes that haven't propagated to the website.
|
||||
3) git annex bogs down fast when you are dealing with a large number of objects, there are ways to get that under control, but nothing is going to make managing an annex with millions of files "fast" for many operations.
|
||||
4) now that you are a pro at git annex, STILL *keep* backups. git annex isn't a backup. it just isn't. nothing beats a simple usb hard drive stuffed in your safe with all your files on it and without the complexity that is git annex in the way.
|
||||
* keep backups. seriously. just do it. it's possible to lose data, even though git annex is designed to avoid eating your data it will do it under certain circumstances. you aren't lucky enough to avoid it. trust me.
|
||||
* make a big fat git annex with too many files in it, and kick the tires, hard. run all the commands and try to break it, see what it does under certain circumstances before you run those same commands on your beloved data. (the documentation isn't always up to date, sometimes the options (which are complex) operate differently than the website says and differently than you expect, this is most likely due to code changes that haven't propagated to the website.
|
||||
* git annex bogs down fast when you are dealing with a large number of objects, there are ways to get that under control, but nothing is going to make managing an annex with millions of files "fast" for many operations.
|
||||
* now that you are a pro at git annex, STILL *keep* backups. git annex isn't a backup. it just isn't. nothing beats a simple usb hard drive stuffed in your safe with all your files on it and without the complexity that is git annex in the way.
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="eric.w@eee65cd362d995ced72640c7cfae388ae93a4234"
|
||||
nickname="eric.w"
|
||||
avatar="http://cdn.libravatar.org/avatar/8d9808c12db3a3f93ff7f9e74c0870fc"
|
||||
subject="comment 10"
|
||||
date="2020-12-31T22:45:03Z"
|
||||
content="""
|
||||
You will be unsurprised to hear that what you suggested worked. not sure what helped other than me cleaning up my working tree and doing a solid git annex add .; git annex sync. I also removed annex.thin since its evidently not helping me.
|
||||
thanks a ton. what got me here was me basically running through the \"splitting a repo\" process of making a new git repo, doing a cp -rl ./.git/annex/objects to the new repo and then running various tests on it. I just want to make sure I don't step on my own feet here.
|
||||
|
||||
thanks a ton.
|
||||
"""]]
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="eric.w@eee65cd362d995ced72640c7cfae388ae93a4234"
|
||||
nickname="eric.w"
|
||||
avatar="http://cdn.libravatar.org/avatar/8d9808c12db3a3f93ff7f9e74c0870fc"
|
||||
subject="comment 11"
|
||||
date="2021-01-01T04:08:43Z"
|
||||
content="""
|
||||
couple of final notes:
|
||||
|
||||
* ```--reflog=always``` isn't a cp option, its reflink, and I am a moron.
|
||||
* that same options on btrfs is the bomb. all of the advantages of hardlinks without the disadvantages.
|
||||
"""]]
|
|
@ -0,0 +1,14 @@
|
|||
[[!comment format=mdwn
|
||||
username="eric.w@eee65cd362d995ced72640c7cfae388ae93a4234"
|
||||
nickname="eric.w"
|
||||
avatar="http://cdn.libravatar.org/avatar/8d9808c12db3a3f93ff7f9e74c0870fc"
|
||||
subject="comment 2"
|
||||
date="2020-12-31T18:55:29Z"
|
||||
content="""
|
||||
the most recent example I've run across is the use of
|
||||
git.annex=thin
|
||||
in the link here: https://git-annex.branchable.com/tips/unlocked_files/
|
||||
it didn't result in a hardlink being made of the content for either git annex unlock or git annex unannex
|
||||
instead I ended up getting the same functionality by use --fast.
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,9 @@
|
|||
[[!comment format=mdwn
|
||||
username="eric.w@eee65cd362d995ced72640c7cfae388ae93a4234"
|
||||
nickname="eric.w"
|
||||
avatar="http://cdn.libravatar.org/avatar/8d9808c12db3a3f93ff7f9e74c0870fc"
|
||||
subject="comment 3"
|
||||
date="2020-12-31T19:34:56Z"
|
||||
content="""
|
||||
right now I am driving myself crazy trying to understand why I have objects that *nothing is pointing to*, yet git annex unused fails to report them. these objects report 1 hardlink and they are from a migrated backend. I'll try git annex forget, but I really don't understand what is keeping these objects from being reported as unused.
|
||||
"""]]
|
|
@ -0,0 +1,9 @@
|
|||
[[!comment format=mdwn
|
||||
username="eric.w@eee65cd362d995ced72640c7cfae388ae93a4234"
|
||||
nickname="eric.w"
|
||||
avatar="http://cdn.libravatar.org/avatar/8d9808c12db3a3f93ff7f9e74c0870fc"
|
||||
subject="comment 5"
|
||||
date="2020-12-31T20:34:55Z"
|
||||
content="""
|
||||
I am digging into this further, and it looks like git annex uses cp --reflog=auto, confirmed with filefrag -v, but even if the object from the old backend isn't taking up space, its still frustrating that I can't figure out why git annex is keeping old files around and not reporting them via git annex unused.
|
||||
"""]]
|
|
@ -0,0 +1,15 @@
|
|||
[[!comment format=mdwn
|
||||
username="eric.w@eee65cd362d995ced72640c7cfae388ae93a4234"
|
||||
nickname="eric.w"
|
||||
avatar="http://cdn.libravatar.org/avatar/8d9808c12db3a3f93ff7f9e74c0870fc"
|
||||
subject="comment 5"
|
||||
date="2020-12-31T19:40:22Z"
|
||||
content="""
|
||||
https://git-annex.branchable.com/bugs/migrated_files_not_showing_up_in_unused_list/
|
||||
|
||||
according to the link above it should be hardlinked to the new key for the new backend, but this isn't the case. this is on btrfs btw.
|
||||
this is a test repo with no remotes as another data point.
|
||||
also I migrated from SHA256E to SHA256.
|
||||
|
||||
I tried git annex forget --force; git annex sync; git annex unused, still it isn't showing the objects as unused.
|
||||
"""]]
|
|
@ -0,0 +1,21 @@
|
|||
[[!comment format=mdwn
|
||||
username="eric.w@eee65cd362d995ced72640c7cfae388ae93a4234"
|
||||
nickname="eric.w"
|
||||
avatar="http://cdn.libravatar.org/avatar/8d9808c12db3a3f93ff7f9e74c0870fc"
|
||||
subject="comment 6"
|
||||
date="2020-12-31T20:51:12Z"
|
||||
content="""
|
||||
after (re)reading the following:
|
||||
|
||||
https://git-annex.branchable.com/forum/switching_backends/
|
||||
|
||||
https://git-annex.branchable.com/bugs/migrated_files_not_showing_up_in_unused_list/
|
||||
|
||||
I confirmed again that git annex sync was re-ran, there are no remotes, so that isn't a thing here. I checked out each git branch and did a
|
||||
|
||||
```find ./???/ -lname '*c0ade___this_is_a_long_hash___566fd3*'```
|
||||
|
||||
and nothing in any branch is pointed to this old backend key.
|
||||
|
||||
so I am both stymied and befuddled... any tips are appreciated.
|
||||
"""]]
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 7"
|
||||
date="2020-12-31T21:46:33Z"
|
||||
content="""
|
||||
Hmm, you seem to have mixed a lot of things up here: <br>
|
||||
1. You are not supposed to use `git annex unannex` to unlock a file. Just pretend this command doesn't exist for now and use `git annex unlock` instead. In general, look at the manpages of the commands. For example `man git-annex-unannex`. <br>
|
||||
2. Before doing anything further, clean up your repository from the mistake above. First, add all unannexed files back to the annex with `git annex add .` (from the root of your repo) and then commit everything with `git annex sync`. `git status` should now output `nothing to commit, working tree clean`. <br>
|
||||
3. After setting `git config annex.thin true` you are supposed to run `git annex fix`. That's exactly what the link you gave says. But as you are using btrfs, I suggest you not to use hard-links, as git annex makes use of reflinks already. <br>
|
||||
4. Now that you have a clean worktree, try `git annex unused` again. If it still doesn't work post the full output of `git annex unused` here.
|
||||
"""]]
|
|
@ -0,0 +1,19 @@
|
|||
[[!comment format=mdwn
|
||||
username="eric.w@eee65cd362d995ced72640c7cfae388ae93a4234"
|
||||
nickname="eric.w"
|
||||
avatar="http://cdn.libravatar.org/avatar/8d9808c12db3a3f93ff7f9e74c0870fc"
|
||||
subject="comment 8"
|
||||
date="2020-12-31T22:00:07Z"
|
||||
content="""
|
||||
thanks for responding...
|
||||
|
||||
I used git annex unannex because I tried using git annex uninit and it DELETED my entire multi TB ./.git/annex/objects, even though I only had a handful of symlinks on in that repo, I wanted to find another way to unannex files that wouldn't delete my technically \"unused\" data.
|
||||
|
||||
and git annex unannex was what I tried when git annex unlock would not hardlink the files via annex.thin=true. it was only with toying with the 2 commands and finally --fast that I was able to get it to hardlink the files
|
||||
|
||||
my end goal was to be able to remove my data reliably from git annex entirely without it purging the object store.
|
||||
|
||||
and now as I read about hardlinks=true or whatever I see that git annex doesn't really love to hardlink multiple files past 2 because then multiple, independent files being modified would corrupt the object store.
|
||||
|
||||
I just want this thing to be reliable at scale. I put all my data into it but the speed is killing me, so I want to be able to get it out or split off data types to secondary git annexes, while having some idea of what it's doing under the covers so I don't get surprised.
|
||||
"""]]
|
|
@ -0,0 +1,11 @@
|
|||
[[!comment format=mdwn
|
||||
username="eric.w@eee65cd362d995ced72640c7cfae388ae93a4234"
|
||||
nickname="eric.w"
|
||||
avatar="http://cdn.libravatar.org/avatar/8d9808c12db3a3f93ff7f9e74c0870fc"
|
||||
subject="comment 9"
|
||||
date="2020-12-31T22:05:19Z"
|
||||
content="""
|
||||
I'll chew on the rest of your response, I was bent on hardlinks because I haven't messed with btrfs reflog COW thing much, but its likely clearly the way to go here, so all of my consternation with hardlinks is likely getting me nowhere. I am just always at 90% full and so I don't want to do anything that is going to run me out of space in the middle of an expensive operation.
|
||||
|
||||
anyways, thanks. I guess I just wish I had only put big files into my annex at first, though I would never have known how badly it fails at scale (on my hardware, etc.)
|
||||
"""]]
|
|
@ -0,0 +1,11 @@
|
|||
[[!comment format=mdwn
|
||||
username="AlbertZeyer"
|
||||
avatar="http://cdn.libravatar.org/avatar/b37d71961a6a5abf9b7184ed77b5a941"
|
||||
subject="annex.largefiles"
|
||||
date="2021-01-01T21:25:43Z"
|
||||
content="""
|
||||
Does annex.largefiles has some documentation? It would be nice to link to that on the doc of git-annex-add.
|
||||
|
||||
Esp, after reading this, I wonder about the default value of annex.largefiles. (I assume/hope it is disabled?)
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,11 @@
|
|||
[[!comment format=mdwn
|
||||
username="AlbertZeyer"
|
||||
avatar="http://cdn.libravatar.org/avatar/b37d71961a6a5abf9b7184ed77b5a941"
|
||||
subject="Adding external files"
|
||||
date="2021-01-01T21:30:38Z"
|
||||
content="""
|
||||
Let's assume I have some external files in my `~/Pictures` and I want to import them.
|
||||
|
||||
Should I use `git annex import ~/Pictures/BestPics2020` or `cp -r ~/Pictures/BestPics2020 .; git annex add BestPics2020`? Is there a difference? Which way would be recommended or preferred?
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 3"
|
||||
date="2021-01-02T15:12:49Z"
|
||||
content="""
|
||||
The various configuration options are documented in the main [[git-annex]] manpage, at the bottom.
|
||||
|
||||
If it is a one-shot, just use `cp/mv` and `git annex add`. If you want to frequently import from that location, use directory special-remotes with importtree=yes.
|
||||
"""]]
|
|
@ -0,0 +1,9 @@
|
|||
[[!comment format=mdwn
|
||||
username="AlbertZeyer"
|
||||
avatar="http://cdn.libravatar.org/avatar/b37d71961a6a5abf9b7184ed77b5a941"
|
||||
subject="comment 4"
|
||||
date="2021-01-02T16:23:05Z"
|
||||
content="""
|
||||
But is there a difference to `git annex import`? What is the difference? Why would you use `git annex add` instead of `git annex import`?
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,13 @@
|
|||
[[!comment format=mdwn
|
||||
username="AlbertZeyer"
|
||||
avatar="http://cdn.libravatar.org/avatar/b37d71961a6a5abf9b7184ed77b5a941"
|
||||
subject="rename or move files"
|
||||
date="2021-01-01T21:33:41Z"
|
||||
content="""
|
||||
Is this command also for renaming or moving files, like `git mv`?
|
||||
|
||||
If not, I think this should be explained more clearly in the documentation.
|
||||
|
||||
If not, how would I move/rename files then? As I understand, annexed files are just symlinks. So if I would move the file to another directory (e.g. via `git mv` or just `mv`), the symlink might break.
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 5"
|
||||
date="2021-01-02T15:22:18Z"
|
||||
content="""
|
||||
`git annex move` belongs so the same class of commands as `git annex get`, `git annex drop` and `git annex copy` in that it manages file content. Git annex automatically registers a pre-commit hook to fixup symlinks. `git annex fix` can also be used to fixup symlinks, but it currently is broken.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="AlbertZeyer"
|
||||
avatar="http://cdn.libravatar.org/avatar/b37d71961a6a5abf9b7184ed77b5a941"
|
||||
subject="Difference to import/add?"
|
||||
date="2021-01-01T21:49:29Z"
|
||||
content="""
|
||||
Considering `git annex reinject /tmp/foo.iso foo.iso`, what is the difference to `git import `/tmp/foo.iso` or `cp /tmp/foo.iso; git annex add foo.iso`?
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 3"
|
||||
date="2021-01-02T15:16:04Z"
|
||||
content="""
|
||||
The difference of `git annex reinject` to (`git annex import` or `cp/mv; git annex add`) is that only known file contents will be reinjected.
|
||||
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue