Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2018-09-22 11:29:29 -04:00
commit 8873cf0789
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 71 additions and 0 deletions

View file

@ -0,0 +1,5 @@
Current *E key-value backends support file extensions of length <=4. Files with longer extensions (such as .fasta files common in bioinformatics) get linked to extension-less files, potentially causing hard-to-predict problems. Simple fix is to add backends like MD5E5 which keeps extensions of length <=5 . Better fix would be to keep the entire filename:
file myfile.fasta becomes the symlink .git/annex/objects/xx/xx/key/myfile.fasta . If there's anotherfile.fasta with the same key but different filename, it becomes a symlink to
.git/annex/objects/xx/xx/key/anotherfile.fasta which is a hardlink to myfile.fasta . An added plus is that the symlinks checked into git typically becomes shorter. Or, for better backwards compatibility, the symlinks checked into git don't change, but
.git/annex/objects/xx/xx/key/key becomes a symlink to .git/annex/objects/xx/xx/key/myfile.fasta . However, if there is anotherfile.fasta with the same key, its symlink will still end up terminating at myfile.fasta rather than anotherfile.fasta .
It's useful to preserve full filenames, because it's not uncommon to e.g. encode parameter information in filenames (myresult.threshold100.dat); and it's not uncommon to call something like python's os.path.realpath to unwind symlink chains before processing a file.