Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
f354697648
3 changed files with 52 additions and 0 deletions
16
doc/forum/Backup_of_whole_Linux_system.mdwn
Normal file
16
doc/forum/Backup_of_whole_Linux_system.mdwn
Normal file
|
@ -0,0 +1,16 @@
|
|||
If I would want to backup my whole Linux system, what's unclear or maybe missing from Git Annex:
|
||||
|
||||
I'm not exactly sure about the best way to import the files. Should I just copy over all the files (e.g. using `cp -ax /* .`, or maybe `rsync -a /* .` or so) to the repo, and then use `git annex add`? (Let's skip `/dev` and maybe other special files for now.)
|
||||
|
||||
Let's say I added now all files to the annex.
|
||||
|
||||
I would also want to store the owning user, group, and access attributes, and maybe other extended attributes (ACL, xattr).
|
||||
This is not yet covered by Git Annex (by default), right?
|
||||
This could be stored as annex metadata. Or maybe better in some other way, because this would be per file path, and not per file content.
|
||||
Has anyone already done sth like this? It should not be too hard to do this, right?
|
||||
|
||||
I'm also not exactly sure how Git Annex handles symlinks. Would it store the original symlink? Or would it not handle them at all, and just add them to Git itself?
|
||||
|
||||
There will be some overlap of the files with other Git Annex repos (e.g. this could contain a subset of pictures I have elsewhere).
|
||||
I would want that the annexed data files are shared with my much bigger Annex repo which contains all my main data (pictures and lots of other stuff).
|
||||
This is actually the main reason why I consider using Git Annex as well for this purpose, and not some other solution, so that I don't need to store data separately, and get other benefits (to simplify my backups).
|
|
@ -0,0 +1,21 @@
|
|||
[[!comment format=mdwn
|
||||
username="AlbertZeyer"
|
||||
avatar="http://cdn.libravatar.org/avatar/b37d71961a6a5abf9b7184ed77b5a941"
|
||||
subject="comment 6"
|
||||
date="2021-01-06T14:59:00Z"
|
||||
content="""
|
||||
Thanks for the answer.
|
||||
|
||||
Maybe the forum post title here was chosen badly. It's not just about how to import existing files, but also/mainly I was trying to figure out whether Git Annex fits my needs (for a quite big archive of data). That's why I had all these questions. Also because this was not exactly clear to me after reading the docs.
|
||||
|
||||
What's still not exactly clear to me is whether it is not a better idea to keep the Annex repo separate from the checked out files. I don't like all the symlinks too much, and a couple of applications behave strange (because they follow the symlinks). I would prefer a solution where the (maybe bare) repo is separate from the checked out tree.
|
||||
|
||||
That is why I asked about Git Worktree. But this is still not clear to me.
|
||||
|
||||
I also read about [Git Annex Direct mode](https://git-annex.branchable.com/git-annex-direct/), which sounds like it is exactly that? But apparently this is not supported anymore? Why?
|
||||
|
||||
I also read about the [Git Annex Assistant](https://git-annex.branchable.com/assistant/), which also sounds like this? But the docs are somewhat sparse, and its not totally clear how this is done, and why the main Git Annex cannot do that, while Git Annex Assistant can do that. But discussions like [this](https://git-annex.branchable.com/design/assistant/desymlink/) sound very relevant (that describes many of the issues I have with symlinks). But I would not specifically want to do it all automatically (I think that's the purpose of the assistant) but do it explicitly (like adding files to the annex, i.e. using the commands `git annex add` etc).
|
||||
|
||||
I think this should be possible without having to watch live for changes (via inotify or so) (where it anyway would be easy to miss changes). E.g. `git status` seems to be very fast at such checks. I'm not exactly sure how it does it but I assume it does some fast checks for changed mtime or maybe other things. Some filesystems might also provide other means. E.g. if the file was copied with a reflink (`cp --reflink`) (which anyway makes sense to not store the data twice, and which is much more efficient), it could check whether the reflink has changed. Or otherwise using hardlinks and locking the files (readonly), and unlocking them would make them writeable (that's ok if unlocked files are less efficient to handle, as this would be a rare action).
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,15 @@
|
|||
[[!comment format=mdwn
|
||||
username="AlbertZeyer"
|
||||
avatar="http://cdn.libravatar.org/avatar/b37d71961a6a5abf9b7184ed77b5a941"
|
||||
subject="comment 2"
|
||||
date="2021-01-06T11:15:42Z"
|
||||
content="""
|
||||
Thanks for the answer.
|
||||
|
||||
How does `git status` checks for changes? I feel it is quite fast at that.
|
||||
|
||||
So you could update the persistent database by post-commit hook, and have a temporary virtual overlay when used which takes current staged changes also into account. And maybe you can also add a `--fast` option, which would skip this part, because the user probably knows when to expect staged changes.
|
||||
|
||||
I think this would be pretty useful. This would also change somewhat the whole way how I would use the Annex. I expect that I have this case quite often, that some file content is referenced from multiple file paths.
|
||||
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue