From 1cc0de995c23a7c6a76fc37187b24108cbbed169 Mon Sep 17 00:00:00 2001 From: Ilya_Shlyakhter Date: Fri, 21 Sep 2018 00:36:08 +0000 Subject: [PATCH] added a todo suggestion about supporting longer file extensions and full filenames in symlink targets --- doc/todo/support_longer_file_extensions.mdwn | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 doc/todo/support_longer_file_extensions.mdwn diff --git a/doc/todo/support_longer_file_extensions.mdwn b/doc/todo/support_longer_file_extensions.mdwn new file mode 100644 index 0000000000..438d4da960 --- /dev/null +++ b/doc/todo/support_longer_file_extensions.mdwn @@ -0,0 +1,5 @@ +Current *E key-value backends support file extensions of length <=4. Files with longer extensions (such as .fasta files common in bioinformatics) get linked to extension-less files, potentially causing hard-to-predict problems. Simple fix is to add backends like MD5E5 which keeps extensions of length <=5 . Better fix would be to keep the entire filename: +file myfile.fasta becomes the symlink .git/annex/objects/xx/xx/key/myfile.fasta . If there's anotherfile.fasta with the same key but different filename, it becomes a symlink to +.git/annex/objects/xx/xx/key/anotherfile.fasta which is a hardlink to myfile.fasta . An added plus is that the symlinks checked into git typically becomes shorter. Or, for better backwards compatibility, the symlinks checked into git don't change, but +.git/annex/objects/xx/xx/key/key becomes a symlink to .git/annex/objects/xx/xx/key/myfile.fasta . However, if there is anotherfile.fasta with the same key, its symlink will still end up terminating at myfile.fasta rather than anotherfile.fasta . +It's useful to preserve full filenames, because it's not uncommon to e.g. encode parameter information in filenames (myresult.threshold100.dat); and it's not uncommon to call something like python's os.path.realpath to unwind symlink chains before processing a file.