Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2019-11-18 16:27:17 -04:00
commit 49cf86cfc0
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 35 additions and 40 deletions

View file

@ -0,0 +1,13 @@
Hi,
Thank you very much for this software. I'm working in a research institute and we are very interested into using git-annex with DataLad to manage our datasets.
We aim to provide a datasets repository accessible through the local network on a single file system. Some of our datasets are multi TB with a few millions of files. It will be managed by a few people but the primary users, the researchers, will only have read access. We would like to use hardlinks everywhere to avoid infrequent reading errors related to symlinks and save space when we want to propose different versions of the datasets with slight changes. The file system will be backed-up so we don't really need multi copies of the same files on a single file system.
We seam to be able to achieve this using the `direct` mode in git-annex version 5 but it seams that the `unlock` mode in version 7 does copies instead of hardlinks. I'm wondering how we could achieve the same behaviour as in version 5. I believe I've read in the doc that there's a maximum of 2 hardlinks for a single file but I can't remember where or see if that is still the case. If that is still the case, I couldn't find if there is a setting to set or remove this maximum.
We've tested with git-annex local version 5 / build 7.20190819, local version 7 / build 7.20190819 and local version 7 / build 7.20191106. [Here is a gist](https://gist.github.com/satyaog/b08a6e5d1eee75217ba823d38b84fb8b) containing test scripts for each setup. The `.annex-cache` part can be ignored for this topic. I've used Miniconda3-4.3.30 on Ubuntu 18.04.2 LTS to setup the environments.
Thank you,
Satya

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="using hardlinks"
date="2019-11-18T19:21:18Z"
content="""
I don't have a full answer, but [[tips/local_caching_of_annexed_files]] might have relevant info.
There is also the `annex.thin` setting; but check some [caveats](https://git-annex.branchable.com/bugs/annex.thin_can_cause_corrupt___40__not_just_missing__41___data/) related to it.
"""]]

View file

@ -1,40 +0,0 @@
[[!inline raw=yes pages="summary"]]
[[!sidebar content="""
[[!inline feeds=no template=bare pages=sidebar]]
"""]]
<table>
<tr>
<td width="33%" valign="top">[[!inline feeds=no template=bare pages=links/key_concepts]]</td>
<td width="33%" valign="top">[[!inline feeds=no template=bare pages=links/the_details]]</td>
<td width="33%" valign="top">[[!inline feeds=no template=bare pages=links/other_stuff]]</td>
</tr>
</table>
<table>
<tr>
<td width="50%" valign="top">[[!inline feeds=no template=bare pages=use_case/bob]]</td>
<td width="50%" valign="top">[[!inline feeds=no template=bare pages=use_case/alice]]</td>
</tr>
</table>
If that describes you, or if you're some from column A and some from column
B, then git-annex may be the tool you've been looking for to expand from
keeping all your small important files in git, to managing your large
files with git.
<table>
<tr>
<td width="50%" valign="top">[[!inline feeds=no template=bare pages=footer/column_a]]</td>
<td width="50%" valign="top">[[!inline feeds=no template=bare pages=footer/column_b]]</td>
</tr>
</table>
----
git-annex is [[Free Software|license]], written in [Haskell](http://www.haskell.org/).
You can [[contribute]]!
git-annex's wiki is powered by [Ikiwiki](http://ikiwiki.info/) and
hosted by [Branchable](http://branchable.com/).

View file

@ -0,0 +1,5 @@
[[git-annex-import]] by default deletes the original files. Keeping them by default would be better. "import" in many other tools (e.g. the bioinformatics tool [Geneious](https://www.geneious.com/)) means a non-destructive import. The short description of `git-annex-import` on its man page says it "adds" files to the repo, which does not suggest erasure. When I first used `git-annex-import`, I was surprised by the default behavior, and others may be too. Also, the command has now been "overloaded" for importing from a special remote, and in that mode the originals are not erased; giving the import-from-dir mode the same default would be more consistent. In general, erasing data by default seems dangerous: what if it was being imported into a temporary or untrusted repo?
Changing the default would also let one [[repeatedly re-import a directory while keeping original files in place|bugs/impossible__40____63____41___to_continuously_re-import_a_directory_while_keeping_original_files_in_place]].
I realize this would be a breaking change for some workflows; warning of it [[like git does|todo/warn_of_breaking_changes_same_way_git_does]] would mitigate the breakage.

View file

@ -0,0 +1,7 @@
I just wanted to have a way to manage data copying / syncing between a fileserver and my android phone. So I pushed some files on my fileserver into a git remote and added the files with the annex subcommands then cloned the git tree from my workstation which is connected to my smartphone.
Now I followed the documentation about the special remote adb and created that remote with the initremote command. When I then export I get (not available) failed errors.
Which is caused by the fact that I didn't have checked out the files on my workstation. I don't need the files on this pc so it would be stupid to checkout partially huge files there or in other words I don't need the files at that place, I don't get why the export command not has a --from option where it can get the files?
Is there a reason that does not exist and if so what would be a way to do sending files to the android device without ssh-ing into my server?