git-annex/doc/todo/support_longer_file_extensions.mdwn

6 lines
1.3 KiB
Text
Raw Normal View History

Current *E key-value backends support file extensions of length <=4. Files with longer extensions (such as .fasta files common in bioinformatics) get linked to extension-less files, potentially causing hard-to-predict problems. Simple fix is to add backends like MD5E5 which keeps extensions of length <=5 . Better fix would be to keep the entire filename:
file myfile.fasta becomes the symlink .git/annex/objects/xx/xx/key/myfile.fasta . If there's anotherfile.fasta with the same key but different filename, it becomes a symlink to
.git/annex/objects/xx/xx/key/anotherfile.fasta which is a hardlink to myfile.fasta . An added plus is that the symlinks checked into git typically becomes shorter. Or, for better backwards compatibility, the symlinks checked into git don't change, but
.git/annex/objects/xx/xx/key/key becomes a symlink to .git/annex/objects/xx/xx/key/myfile.fasta . However, if there is anotherfile.fasta with the same key, its symlink will still end up terminating at myfile.fasta rather than anotherfile.fasta .
It's useful to preserve full filenames, because it's not uncommon to e.g. encode parameter information in filenames (myresult.threshold100.dat); and it's not uncommon to call something like python's os.path.realpath to unwind symlink chains before processing a file.