Merge branch 'master' into import-from-s3

This commit is contained in:
Joey Hess 2019-05-01 14:30:52 -04:00
commit 700a3f2787
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
29 changed files with 426 additions and 43 deletions

View file

@ -1,3 +1,17 @@
If git-tracked files are removed from the remote, they don't get synced over after a "git annex fsck" and "git annex export".
Is there some way that they could make it to the remote? I'm imagining an rsync-like behavior to copy over files that have different time stamps or file sizes. Would such a feature be welcome in git annex?
> Since git-annex 6.20180626, `git annex fsck --from` an exporttree=yes remote
> will notice if files on it have been deleted, and then
> `git annex sync --content` or `git-annex export` will re-upload them.
>
> But perhaps more interesting, if the remote is also configured with
> importtree=yes, `git-annex import` from it can now notice deletions
> as well as other changes to the content on the remote, and make a remote
> tracking branch in git reflecting the changes. You can then merge or
> revert the changes and export or sync can be used to put the deleted
> files back on the remote if desired.
>
> Only a subset of remotes support importree, but the fsck method
> will work for all. So, this is [[done]]. --[[Joey]]

View file

@ -0,0 +1,5 @@
ATM there is no `--json-progress` in `git annex add` (only `--json`), so no feedback to the user could be provided on ETA etc. Would be nice to have `--json-progress` there to echo similar one for `get` and `copy`.
Cheers!
[[!meta author=yoh]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-04-26T14:17:58Z"
content="""
First things first.. There is no progress of any kind for add of an
individual item.
This would need changes to the Backend interface so it can display progress
while hashing..
"""]]

View file

@ -0,0 +1 @@
Would it be hard to add a variantion to checksumming [[backends]], that would change how the checksum is computed: instead of computing it on the whole file, it would first be computed on file chunks of given size, and then the final checksum computed on the concatenation of the chunk checksums? You'd add a new [[key field|internals/key_format]], say cNNNNN, specifying the chunking size (the last chunk might be shorter). Then (1) for large files, checksum computation could be parallelized (there could be a config option specifying the default chunk size for newly added files); (2) I often have large files on a remote, for which I have md5 for each chunk, but not for the full file; this would enable me to register the location of these fies with git-annex without downloading them, while still using a checksum-based key.