addurl: Always use whole url as destination filename, rather than only its file component.

First, this ensures that git annex addurl, when run repeatedly with the
same url, doesn't create duplicate files, which it did before when it
fell back to the longer filename.

Secondly, the file part of an url is frequently not very descriptive on its
own.

The uri scheme, auth, and port is intentionally left out, as clutter.
This commit is contained in:
Joey Hess 2011-09-07 19:04:51 -04:00
parent 7c768c0984
commit 03d6209e1c
3 changed files with 17 additions and 23 deletions

View file

@ -7,9 +7,10 @@
module Command.AddUrl where module Command.AddUrl where
import Control.Monad.State (liftIO, when) import Control.Monad.State
import Network.URI import Network.URI
import Data.String.Utils import Data.String.Utils
import Data.Maybe
import System.Directory import System.Directory
import Command import Command
@ -24,6 +25,7 @@ import Content
import PresenceLog import PresenceLog
import Locations import Locations
import Utility.Path import Utility.Path
import Utility.Conditional
command :: [Command] command :: [Command]
command = [repoCommand "addurl" paramPath seek "add urls to annex"] command = [repoCommand "addurl" paramPath seek "add urls to annex"]
@ -75,20 +77,10 @@ nodownload url file = do
url2file :: URI -> IO FilePath url2file :: URI -> IO FilePath
url2file url = do url2file url = do
let parts = filter safe $ split "/" $ uriPath url whenM (doesFileExist file) $
if null parts error $ "already have this url in " ++ file
then fallback
else do
let file = last parts
e <- doesFileExist file
if e then fallback else return file
where
fallback = do
let file = replace "/" "_" $ show url
e <- doesFileExist file
when e $ error "already have this url"
return file return file
safe "" = False where
safe "." = False file = escape $ uriRegName auth ++ uriPath url ++ uriQuery url
safe ".." = False escape = replace "/?" $ repeat '_'
safe _ = True auth = fromMaybe (error $ "bad url " ++ show url) $ uriAuthority url

2
debian/changelog vendored
View file

@ -3,6 +3,8 @@ git-annex (3.20110907) UNRELEASED; urgency=low
* whereis: Show untrusted locations separately and do not include in * whereis: Show untrusted locations separately and do not include in
location count. location count.
* Fix build without S3. * Fix build without S3.
* addurl: Always use whole url as destination filename, rather than
only its file component.
-- Joey Hess <joeyh@debian.org> Tue, 06 Sep 2011 16:59:15 -0400 -- Joey Hess <joeyh@debian.org> Tue, 06 Sep 2011 16:59:15 -0400

View file

@ -1,20 +1,20 @@
The web can be used as a [[special_remote|special_remotes]] too. The web can be used as a [[special_remote|special_remotes]] too.
# git annex addurl http://example.com/video.mpeg # git annex addurl http://example.com/video.mpeg
addurl video.mpeg (downloading http://example.com/video.mpeg) addurl example.com_video.mpeg (downloading http://example.com/video.mpeg)
########################################################## 100.0% ########################################################## 100.0%
ok ok
Now the file is downloaded, and has been added to the annex like any other Now the file is downloaded, and has been added to the annex like any other
file. So it can be copied to other repositories, and so on. file. So it can be renamed, copied to other repositories, and so on.
Note that git-annex assumes that, if the web site does not 404, the file is Note that git-annex assumes that, if the web site does not 404, the file is
still present on the web, and this counts as one [[copy|copies]] of the still present on the web, and this counts as one [[copy|copies]] of the
file. So it will let you remove your last copy, trusting it can be file. So it will let you remove your last copy, trusting it can be
downloaded again: downloaded again:
# git annex drop video.mpeg # git annex drop example.com_video.mpeg
drop video.mpeg (checking http://example.com/video.mpeg) ok drop example.com_video.mpeg (checking http://example.com/video.mpeg) ok
If you don't [[trust]] the web to this degree, just let git-annex know: If you don't [[trust]] the web to this degree, just let git-annex know:
@ -23,8 +23,8 @@ If you don't [[trust]] the web to this degree, just let git-annex know:
With the result that it will hang onto files: With the result that it will hang onto files:
# git annex drop video.mpeg # git annex drop example.com_video.mpeg
drop video.mpeg (unsafe) drop example.com_video.mpeg (unsafe)
Could only verify the existence of 0 out of 1 necessary copies Could only verify the existence of 0 out of 1 necessary copies
Also these untrusted repositories may contain the file: Also these untrusted repositories may contain the file:
00000000-0000-0000-0000-000000000001 -- web 00000000-0000-0000-0000-000000000001 -- web