Merge branch 'master' into export

This commit is contained in:
Joey Hess 2017-09-06 15:49:30 -04:00
commit 35cd329bd8
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
9 changed files with 85 additions and 2 deletions

View file

@ -0,0 +1,11 @@
Good progress on `git annex export` today. Changing the exported tree now
works and is done efficiently. Resuming an export is working. Even
detecting and resolving export conflicts should work (have not tested it).
The necessary information about the export is recorded in the git-annex
branch, including grafting in the exported tree there.
There are some known problems when the tree that is exported contains
multiple files with the same content. And git-annex is not yet able
to download exported files from a special remote. Handling both of those
needs way to get from keys to exported filenames. So, I plan to
populate a sqlite database with that information next.

View file

@ -0,0 +1,27 @@
More work on `git annex export`. Made `initremote exporttree=yes` be
required to enable exporting to a special remote. Added a sqlite database
to keep track of what files have been exported. That let me fix the known
problems with exporting multiple files that have the same content.
The same database lets `git annex get` (etc) download content from exports.
Since an export is not a key/value store, git-annex has to do more
verification of content downloaded from an export. Some types of keys,
that are not based on checksums (eg WORM and URL),
cannot be downloaded from an export. And, git-annex will never trust
an export to retain the content of a key, since some other tree could
be exported over it at any time.
With `git annex get` working from exports, it might be nice to also support
`git annex copy --to export` for exporting specific files to them. However,
that needs information that is not currently stored in the sqlite database
until the export has already completed. One way it could work is for `git
annex export --fast treeish --to export` to put all the filenames in the
database but not export anything, and then `git annex copy --to export` (or
even `git annex sync --content` to send the contents). I don't know if this
complication is worth it.
Otherwise, the export feature is fairly close to being complete now.
Still need to make renames be handled efficiently, and add support for
exporting to more special remotes.
Today's work was supported by the NSF-funded DataLad project.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="EskildHustvedt"
avatar="http://cdn.libravatar.org/avatar/0be1310904ded29624b9edb4824d451b"
subject="Partial exports"
date="2017-09-05T09:16:26Z"
content="""
For what it's worth, partial exports (being able to only copy certain files to an export) would be very useful for me. My main usecase is exporting to my android phone (which has an sshd in termux that I use) from my desktop. I've got some large repos where having it all on my phone isn't possible, but it would be very useful to use git-annex to upload partials (right now I'm just using plain-old-rsync for that).
"""]]

View file

@ -0,0 +1,20 @@
Out of curiosity, is there an equivalent to `git cat-file` with `git annex`?
The motivation is our usage of Bazel as a build system, which during test enforces hermiticity, and thus is very persnickity about modifying your workspace (e.g., the Git repository) while the test is being run, and usually isolates execution to a chroot'd sandbox of sorts.
Ideally, the workflows I'd like are:
A. Developer
- 1. Clones repository.
- 2. Inits `git annex`, and does `git annex get .` to fetch all required files.
- 3. Runs `bazel test //repo:my_test`, which will symlink the existing large file into the sandbox, and run without a hitch.
B. Tentative Contributor
- 1. Clones repository. Pokes around.
- 2. Runs `bazel test //repo:my_test`. Since the large file does not exist, under the hood `git annex cat-file` is called to directly add the file to sandbox (possibly caching it somewhere, such that `git annex get` will use the already fetch'd file).
May I ask if this doable with simple visible commands?
If not, is there a way to achieve this that is special remote-agnostic?

View file

@ -0,0 +1,3 @@
Having by mistake annex a full repo i look for a way of unannex some file to make them
managed by the "standard" git proccess again - mostly some source code file -
Is there a way to do that ?

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="vgp"
avatar="http://cdn.libravatar.org/avatar/b332bfc1d3f49c196e1bff84b53d0f8b"
subject="comment 3"
date="2017-09-01T21:40:11Z"
content="""
I've tried the \"directory\" special remote with encryption=shared. It works well and I got a total size of 3.5GB while the working tree .git/annex dir has 21GB :-). The problem is: the git server of my research lab gives me a disk quota of 10GB, however, I cannot access it directly to store these files using \"directory\" special remote. Is there a way to use compression (probably through encryption) with a normal git remote?
"""]]

View file

@ -37,6 +37,11 @@ the public repositories that you can clone to try out git-annex.
A slightly outdated mirror of http://ifarchive.org. Scripts should probably be written
to update the archive regularly.
* [datasets.datalad.org](http://datasets.datalad.org)
A large (over 10TB of data) collection of DataLad (git-annex) datasets, providing access primarily
to public neural data resources. Organized via git submodule mechanism. Although underlying
repositories are pure git/git-annex repositories, use of datalad tool is advised for more functionality
(search, recursive operation, etc). It is regularly updated and enriched.
This is a wiki -- add your own public repository to the list!
See [[tips/centralized_git_repository_tutorial]].

View file

@ -74,3 +74,4 @@ Lukas Platz,
Sergey Karpukhin,
Silvio Ankermann,
Paul Tötterman,
Erik Bjäreholt,

View file

@ -4,8 +4,8 @@
subject="sounds like the dumb backend, except not dumb"
date="2017-04-08T20:21:41Z"
content="""
This sounds a lot like what i was trying to do in [[todo/dumb, unsafe,
human-readable_backend]], except done properly. :)
This sounds a lot like what i was trying to do in
[[todo/dumb, unsafe, human-readable_backend]], except done properly. :)
I was wondering about that asymmetry recentrly, and it would seem like
a good idea to fix this. the `--to remote` flag could especially be