Commit graph

35919 commits

Author SHA1 Message Date
Ilya_Shlyakhter
e9ff2381bd Added a comment: same contents with different keys 2019-11-30 16:51:58 +00:00
atrent
d9b0481779 Added a comment: duplicate objects? 2019-11-30 14:04:17 +00:00
yarikoptic
dda81ca26f Added a comment 2019-11-29 18:09:45 +00:00
yarikoptic
dd33f68982 refiled under dandi project - use case is https://gin.g-node.org 2019-11-29 18:06:14 +00:00
yarikoptic
9b5240ce83 Added a comment: reference original bug report 2019-11-29 17:58:29 +00:00
ply
8a3c543b3b Added a comment 2019-11-28 11:18:50 +00:00
ply
822962fa5f 2019-11-28 11:10:51 +00:00
yarikoptic
01ee4995c5 Added a comment: related: shouldn't git annex try external remotes to download config? 2019-11-28 01:22:53 +00:00
yarikoptic
7752e73481 initial report on inability to use remotes with authentication cached by git 2019-11-28 01:01:49 +00:00
Joey Hess
d7833def66
use ByteString for git config
The parser and looking up config keys in the map should both be faster
due to using ByteString.

I had hoped this would speed up startup time, but any improvement to
that was too small to measure. Seems worth keeping though.

Note that the parser breaks up the ByteString, but a config map ends up
pointing to the config as read, which is retained in memory until every
value from it is no longer used. This can change memory usage
patterns marginally, but won't affect git-annex.
2019-11-27 17:40:09 -04:00
Joey Hess
37d0f73e66
reword comment 2019-11-27 16:38:18 -04:00
Ilya_Shlyakhter
7d3750682b git-annex-cat 2019-11-27 18:16:35 +00:00
Joey Hess
d830386ab2
update based on profiling
While L.toStrict copies, profiling showed it was only around 0.3% of
git-annex find runtime. Does not seem worth optimising that, which would
probably involve either a major refactoring, or a use of
UnsafeInterleaveIO.

Also, it seems to me that the latter would need to read chunks, and
preappend the leftover part to the next chunk. But a strict ByteString
append itself is a copy, so I'm not convinced that would be faster than
L.toStrict.
2019-11-27 14:09:11 -04:00
Joey Hess
c914058bf9
Merge branch 'master' into bs 2019-11-27 13:47:03 -04:00
Ilya_Shlyakhter
9e642fd038 Added a comment: parallelization 2019-11-27 17:30:12 +00:00
Ilya_Shlyakhter
9f4b99a0e7 Added a comment: parallelization 2019-11-27 17:23:15 +00:00
Ilya_Shlyakhter
a27ffd3aec Added a comment: representing paths 2019-11-27 15:08:41 +00:00
Ilya_Shlyakhter
e67d367b63 removed 2019-11-26 23:28:35 +00:00
anarcat
a98efcda3d Added a comment: amazing! 2019-11-26 21:07:32 +00:00
Joey Hess
a2b566be29
Merge branch 'master' of ssh://git-annex.branchable.com 2019-11-26 16:12:53 -04:00
Joey Hess
ac1e481bfa
devblog 2019-11-26 16:12:14 -04:00
Joey Hess
3361edfb61
todo for bs branch 2019-11-26 16:11:55 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
linnearight02@915958f850452a19de84ec14a765402d1f7ecdb0
41458cd060 Added a comment: Online Coursework Service 2019-11-26 11:11:07 +00:00
Joey Hess
6a97ff6b3a
wip RawFilePath
Goal is to make git-annex faster by using ByteString for all the
worktree traversal. For now, this is focusing on Command.Find,
in order to benchmark how much it helps. (All other commands are
temporarily disabled)

Currently in a very bad unbuildable in-between state.
2019-11-25 16:18:19 -04:00
Ilya_Shlyakhter
dd58cfd8e1 Added a comment: use named pipes? 2019-11-25 16:45:27 +00:00
yarikoptic
70172712a5 initial idea on joint "get+checksum" 2019-11-25 03:26:23 +00:00
Ilya_Shlyakhter
1f035c0d66 Added a comment: even git mv -f seems to work correctly 2019-11-24 17:25:32 +00:00
Ilya_Shlyakhter
20da59f62f Added a comment: moving unlocked file onto locked file isn't possible 2019-11-24 16:36:24 +00:00
Joey Hess
1ff889e456
explict export lists
A small amount of dead code removed.

All of Utility/ done now.

This commit was sponsored by Brock Spratlen on Patreon.
2019-11-23 11:24:10 -04:00
Joey Hess
960f62a564
typo 2019-11-22 19:48:34 -04:00
Joey Hess
4cc6985494
todo 2019-11-22 19:47:53 -04:00
Joey Hess
6e3bccd4ac
updated profiling 2019-11-22 19:13:35 -04:00
Joey Hess
ddf6973d22
minor optimisation
avoid repeated scan of the same bytestring
2019-11-22 19:13:05 -04:00
Joey Hess
61af9d8f63
Merge /home/joey/tmp/git-annex 2019-11-22 17:51:40 -04:00
Joey Hess
81d402216d cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:

Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.

The Eq instance is fixed.

Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.

Used git-annex benchmark:
  find is 7% faster
  whereis is 3% faster
  get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
2019-11-22 17:49:16 -04:00
Joey Hess
0e8c8edc90
improve hints about squelching output 2019-11-22 17:26:13 -04:00
Yaroslav Halchenko
e296637737
(Build-)depend on git >= 2.22 to avoid memory leaked git being bundled or used
Note from Joey:
  git-annex still supports git 2.1, but operates in a degraded fashion.
  It would be better for backports of the debian package to also
  backport a newer git. This dependency is mostly expressing that,
  also that any users who might upgrade git-annex should also upgrade
  git.

  Also worth noting that the i386ancient autobuilder has git 2.1 on it
  (best I have been able to manage there), but luckily the epoch is
  bumped to 2, so the dependencies will still be satisfied.
2019-11-22 13:56:18 -04:00
Joey Hess
1d0dbdf201
squelch tab warnings 2019-11-22 12:49:41 -04:00
Joey Hess
b82ab21468
missed an export 2019-11-22 12:35:57 -04:00
Joey Hess
93789cbf40
close as dup 2019-11-22 12:11:04 -04:00
Joey Hess
cf2e23d39c
close not viable 2019-11-22 12:10:59 -04:00
Joey Hess
92e1bb250b
simplify the name of the test cases 2019-11-21 17:38:58 -04:00
Joey Hess
a9888f6151
Windows: Fix handling of changes to time zone.
Used to work but was broken in version 7.20181031, specifically commit
5ab0f48ffb.

That this was not noticed over at least 1 daylight savings time zone
changes makes me wonder if the TSDelta stuff is still needed.
Perhaps the mtime on Windows no longer changes when the time zone is changed?

(cherry picked from commit 09ee6b0ccb)
2019-11-21 17:28:18 -04:00
Joey Hess
25ba8156bc
improve benchmark --databases
* benchmark: Changed --databases to take a parameter specifiying the size
  of the database to benchmark.
* benchmark --databases: Display size of the populated database.
* benchmark --databases: Improve the "addAssociatedFile to (new)"
  benchmark to really add new values, not overwriting old values.
2019-11-21 17:25:20 -04:00
Joey Hess
8ea5f3ff99
explict export lists
Eliminated some dead code. In other cases, exported a currently unused
function, since it was a logical part of the API.

Of course this improves the API documentation. It may also sometimes
let ghc optimize code better, since it can know a function is internal
to a module.

364 modules still to go, according to
git grep -E 'module [A-Za-z.]+ where'
2019-11-21 16:08:37 -04:00
Joey Hess
740e0ddbfe
avoid running scanUnlockedFiles in bare repo
It's not necessary. And if the bare repo somehow has a pointer
file in it with the same name as a file in HEAD, that file would be
populated, which would be surprising since the file is not really under
git's control.
2019-11-21 14:31:12 -04:00
xwvvvvwx
f39e5a4219 Added a comment 2019-11-21 17:32:31 +00:00
Joey Hess
c1d88305d7
Merge branch 'master' of ssh://git-annex.branchable.com 2019-11-21 13:31:47 -04:00
Joey Hess
43f19ef00a
Fix bug that made bare repos be treated as non-bare when --git-dir was used.
Eg:

git clone url --bare r
git --git-dir r annex init

This resulted in worktree = Just "." and so several things that check
worktree to determine when the repo is bare ran code paths intended for
non-bare. One such code path[1] ran git checkout with --worktree=. which
actually makes it ignore core.bare config, and so the current directory
got populated with a checkout of the master branch in this example. There
was probably also other breakage.

The fix is a bit complicated because whether the repo is bare is not
known until after Git.Config reads the config, but Git.Config handles
setting the RepoLocations's worktree when core.worktree is set. So have
to assume the worktree is the cwd, let core.worktree override that,
and then if the repo turns out to be bare, it's set back to Nothing.
(And then GIT_WORK_TREE can still override all of that.)

[1] switchHEADBack, which runs even when the clone is not from a bare repo.
2019-11-21 13:26:02 -04:00