Improve SHA*E extension extraction code

Do not treat parts of the filename that contain punctuation or other
non-alphanumeric characters as extensions. Before, such characters were
filtered out.

Note that in 45308ec78b "foo.ba__________r"
was munged to ".bar" and so incorrectly treated as an extension. That was
fixed by changing the filter order, but not allowing punctuation seems a
better fix.

This assumes that extensions containing punctuation are rare. "_" seems the
most likely character; I used it in ikiwiki "._comment" files. But I can't
recall seeing it anywhere else. It certianly seems that no commonly used
extensions contain punctuation. If git-annex doesn't treat "._comment"
as an extension, it's not likely to break software that expects to see that
extension like some software expects to see .epub or .mp3.

This commit was sponsored by Jack Hill on Patreon.
This commit is contained in:
Joey Hess 2018-03-05 11:25:01 -04:00
parent 6d6e3c6c49
commit 07e253b1fb
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 34 additions and 1 deletions

View file

@ -94,7 +94,7 @@ selectExtension f
| otherwise = intercalate "." ("":es)
where
es = filter (not . null) $ reverse $
take 2 $ map (filter validInExtension) $
take 2 $ filter (all validInExtension) $
takeWhile shortenough $
reverse $ splitc '.' $ takeExtensions f
shortenough e = length e <= 4 -- long enough for "jpeg"