improved bwrate limiting implementation

New method is much better. Avoids unrestrained transfer at the beginning (except for the first block. Keeps right at or a few kb/s below the configured limit, with very little varation in the actual reported bandwidth. Removed the /s part of the config as it's not needed. Ready to merge. Sponsored-by: Luke Shumaker on Patreon
2021-09-22 15:14:28 -04:00 · 2021-09-22 15:14:28 -04:00 · e8496d62e4
commit e8496d62e4
parent 44d3d50785
5 changed files with 46 additions and 55 deletions
--- a/5
+++ b/5
@ -1,7 +1,8 @@
 git-annex (8.20210904) UNRELEASED; urgency=medium

-  * Added annex.bwlimit and remote.name.annex-bwlimit config that works
-    for git remotes and many but not all special remotes.
+  * Added annex.bwlimit and remote.name.annex-bwlimit config to limit
+    the bandwidth of transfers. It works for git remotes and many
+    but not all special remotes.
  * Bug fix: Git configs such as annex.verify were incorrectly overriding
    per-remote git configs such as remote.name.annex-verify.
    (Reversion in version 4.20130323)
--- a/Types/GitConfig.hs
+++ b/Types/GitConfig.hs
@ -407,9 +407,9 @@ extractRemoteGitConfig r remotename = do
 		, remoteAnnexStallDetection =
 			either (const Nothing) Just . parseStallDetection
 				=<< getmaybe "stalldetection"
-		, remoteAnnexBwLimit =
-			either (const Nothing) Just . parseBwRate
-				=<< getmaybe "bwlimit"
+		, remoteAnnexBwLimit = do
+			sz <- readSize dataUnits =<< getmaybe "bwlimit"
+			return (BwRate sz (Duration 1))
 		, remoteAnnexAllowUnverifiedDownloads = (== Just "ACKTHPPT") $
 			getmaybe ("security-allow-unverified-downloads")
 		, remoteAnnexConfigUUID = toUUID <$> getmaybe "config-uuid"
--- a/Utility/Metered.hs
+++ b/Utility/Metered.hs
@ -389,39 +389,34 @@ rateLimitMeterUpdate delta (Meter totalsizev _ _ _) meterupdate = do
 -- same process and thread as the call to the MeterUpdate.
 --
 -- For example, if the desired bandwidth is 100kb/s, and over the past
-- second, 200kb was sent, then pausing for half a second, and then
-- running for half a second should result in the desired bandwidth.
-- But, if after that pause, only 75kb is sent over the next half a
-- second, then the next pause should be 2/3rds of a second.
+-- 1/10th of a second, 30kb was sent, then the current bandwidth is
+-- 300kb/s, 3x as fast as desired. So, after getting the next chunk,
+-- pause for twice as long as it took to get it.
 bwLimitMeterUpdate :: ByteSize -> Duration -> MeterUpdate -> IO MeterUpdate
-bwLimitMeterUpdate sz duration meterupdate = do
+bwLimitMeterUpdate bwlimit duration meterupdate
+	| bwlimit <= 0 = return meterupdate
+	| otherwise = do
 		nowtime <- getPOSIXTime
-	lastpause <- newMVar (nowtime, toEnum 0 :: POSIXTime, 0)
-	return $ mu lastpause
+		mv <- newMVar (nowtime, 0)
+		return (mu mv)
  where
-	mu lastpause n@(BytesProcessed i) = do
-		nowtime <- getPOSIXTime
+	mu mv n@(BytesProcessed i) = do
+		endtime <- getPOSIXTime
+		(starttime, previ) <- takeMVar mv
+
+		let runtime = endtime - starttime
+		let currbw = fromIntegral (i - previ) / runtime
+		let pausescale = if currbw > bwlimit'
+			then (currbw / bwlimit') - 1
+			else 0
+		unboundDelay (floor (runtime * pausescale * msecs))
 		meterupdate n
-		lastv@(prevtime, prevpauselength, previ) <- takeMVar lastpause
-		let timedelta = nowtime - prevtime
-		if timedelta >= durationsecs
-			then do
-				let sz' = i - previ
-				let runtime = timedelta - prevpauselength
-				let pauselength = calcpauselength sz' runtime
-				if pauselength > 0
-					then do
-						unboundDelay (floor (pauselength * fromIntegral oneSecond))
-						putMVar lastpause (nowtime, pauselength, i)
-					else putMVar lastpause lastv
-			else putMVar lastpause lastv

-	calcpauselength sz' runtime
-		| sz' > sz && sz' > 0 && runtime > 0 =
-			durationsecs - (fromIntegral sz / fromIntegral sz') * runtime
-		| otherwise = 0
+		nowtime <- getPOSIXTime
+		putMVar mv (nowtime, i)

-	durationsecs = fromIntegral (durationSeconds duration)
+	bwlimit' = fromIntegral (bwlimit * durationSeconds duration) 
+	msecs = fromIntegral oneSecond

 data Meter = Meter (MVar (Maybe TotalSize)) (MVar MeterState) (MVar String) DisplayMeter

--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@ -1389,22 +1389,13 @@ Remotes are configured using these settings in `.git/config`.
  This can be used to limit how much bandwidth is used for a transfer
  from or to a remote.
 
-  For example, to limit transfers to 1 gigabyte per second:
-  `git config annex.bwlimit "1GB/1s"`
+  For example, to limit transfers to 1 mebibyte per second:
+  `git config annex.bwlimit "1MiB"`
  
  This will work with many remotes, including git remotes, but not
  for remotes where the transfer is run by a separate program than
  git-annex. 

-  The bandwidth limiting is implemented by pausing when
-  the transfer is running too fast, so it may use more bandwidth
-  than configured before being slowed down, either at the beginning
-  or if the available bandwidth changes while it is running.
-
-  It is different to use "1GB/1s" than "10GB/10s". git-annex will
-  track how much data was transferred over the time period, and then
-  pausing. So usually 1s is the best time period to use.
-
 * `remote.<name>.annex-stalldetecton`, `annex.stalldetection`

  Configuring this lets stalled or too-slow transfers be detected, and
--- a/doc/todo/bwlimit.mdwn
+++ b/doc/todo/bwlimit.mdwn
@ -10,17 +10,17 @@ works, it will probably work to put the delay in there. --[[Joey]]

 [[confirmed]]

-> Implmentation in progress in the `bwlimit` branch. Seems to work, but see
-> commit message for what still needs to be done. --[[Joey]]
-
-> The directory special remote, when resuming an interrupted
+> Implemented and works well.
+> 
+> A local git remote, when resuming an interrupted
 > transfer, has to hash the file (with default annex.verify settings),
 > and that hashing updates the progress bar, and so the bwlimit can kick
 > in and slow down that initial hashing, before any data copying begins.
-> This seems perhaps ok; if you've bwlimited a directory special
+> This seems perhaps ok; if you've bwlimited a local git remote,
 > remote you're wanting to limit disk IO. Only reason it might not be ok
-> is if the intent is to limit IO to the disk containing the directory
-> special remote, but not the one containing the annex repo.
+> is if the intent is to limit IO to the disk containing the remote
+> but not the one containing the annex repo. (This also probably
+> holds for the directory special remote.)
 > 
 > Other remotes, including git over ssh, when resuming don't have that
 > problem. Looks like chunked special remotes narrowly avoid it, just
@ -28,4 +28,8 @@ works, it will probably work to put the delay in there. --[[Joey]]
 > when resuming. It might be worthwhile to differentiate between progress
 > updates for incremental verification setup and for actual transfers, and
 > only rate limit the latter, just to avoid fragility in the code.
+> I have not done so yet though, and am closing this..
 > --[[Joey]]
+
+[[done]]
+