Commit graph

40052 commits

Author SHA1 Message Date
Joey Hess
d2be68907c
drop, move, mirror: when two files have the same content, honor the max numcopies and requiredcopies
Eg, before with a .gitattributes like:

*.2 annex.numcopies=2
*.1 annex.numcopies=1

And foo.1 and foo.2 having the same content and key, git-annex drop foo.1 foo.2
would succeed, leaving just 1 copy, despite foo.2 needing 2 copies.
It dropped foo.1 first and then skipped foo.2 since its content was gone.

Now that the keys database includes locked files, this longstanding wart
can be fixed.

Sponsored-by: Noam Kremen on Patreon
2021-06-15 11:38:44 -04:00
Joey Hess
0ed1369dcd
remove unused import 2021-06-15 11:31:59 -04:00
Joey Hess
af9fdf5dba
verify associated files when checking numcopies
Most of this is just refactoring. But, handleDropsFrom
did not verify that associated files from the keys db were still
accurate, and has now been fixed to.

A minor improvement to this would be to avoid calling catKeyFile
twice on the same file, when getting the numcopies and mincopies value,
in the common case where the same file has the highest value for both.
But, it avoids checking every associated file, so it will scale well to
lots of dups already.

Sponsored-by: Kevin Mueller on Patreon
2021-06-15 11:14:52 -04:00
Joey Hess
d164434679
fix build 2021-06-15 11:14:43 -04:00
Joey Hess
0b91afb57d
avoid warning 2021-06-15 11:11:55 -04:00
Joey Hess
77517ab506
avoid nub
It's O(N^2) which could matter when there are many dup files using the
same key.
2021-06-15 10:48:11 -04:00
Joey Hess
b3712b6047
refactor 2021-06-15 10:27:33 -04:00
Joey Hess
effc9bf5dd
close 2021-06-15 10:11:14 -04:00
Joey Hess
4fbfe0082f
respinse 2021-06-15 10:02:41 -04:00
Joey Hess
895a4750ba
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-15 09:47:30 -04:00
Joey Hess
3af4c9a29a
fix exponential blowup when adding lots of identical files
This was an old problem when the files were being added unlocked,
so the changelog mentions that being fixed. However, recently it's also
affected locked files.

The fix for locked files is kind of stupidly simple. moveAnnex already
handles populating unlocked files, and only does it when the object file
was not already present. So remove the redundant populateUnlockedFiles
call. (That call was added all the way back in
cfaac52b88, and has always been
unncessary.)

Sponsored-by: Dartmouth College's Datalad project
2021-06-15 09:45:55 -04:00
Joey Hess
e147ae07f4
remove supportUnlocked check that is not worth its overhead
moveAnnex only gets to that check if the object file was not present
before. So in the case where dup files are being added repeatedly,
it will only run the first time, and so there's no significant speedup
from doing it; all it avoids is a single sqlite lookup. Since MVar
accesses do have overhead, it's better to optimise for the common case,
where unlocked files are supported.

removeAnnex is less clear cut, but I think mostly is skipped running on
keys when the object has already been dropped, so similar reasoning
applies.
2021-06-15 09:28:56 -04:00
9qf@758d7b174d81a134727acab9db0168c8f0782b3a
85b2dbce32 2021-06-15 12:21:19 +00:00
Joey Hess
6099edbf1c
bloom doesn't work, but this should I hope 2021-06-14 17:53:01 -04:00
Joey Hess
2df4c1cf91
plan 2021-06-14 17:13:37 -04:00
Joey Hess
0e3802c7ee
comment 2021-06-14 15:11:09 -04:00
Joey Hess
643dc36e37
going round and round, boredly 2021-06-14 14:37:06 -04:00
Joey Hess
fa6e8fc660
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-14 14:34:35 -04:00
Joey Hess
711252331e
comment 2021-06-14 14:34:22 -04:00
Joey Hess
398f9decd4
comment 2021-06-14 14:32:38 -04:00
Joey Hess
78da00c7a6
Future proof activity log parsing
When the log has an activity that is not known, eg added by a future
version of git-annex, it used to be treated as no activity at all,
which would make git-annex expire think it should expire the repository,
despite it having some kind of recent activity.

Hopefully there will be no reason to add a new activity until enough
time has passed that this commit is in use everywhere.

Sponsored-by: Jake Vosloo on Patreon
2021-06-14 14:18:19 -04:00
yarikoptic
6043a2c7a0 Added a comment 2021-06-14 17:36:16 +00:00
james@06209b7878fcf3b5c46b8028dacb3cec6609369c
9d34a9d013 2021-06-14 17:19:50 +00:00
Joey Hess
372ace599a
comment 2021-06-14 13:13:46 -04:00
Joey Hess
f0cbaa194c
improve docs based on forum feedback 2021-06-14 13:04:58 -04:00
Joey Hess
fbd2f96b2c
comment 2021-06-14 12:56:29 -04:00
Joey Hess
dcd2c95249
fix windows build 2021-06-14 12:43:26 -04:00
Joey Hess
3ac9363c03
comment 2021-06-14 12:42:11 -04:00
Joey Hess
014dc63a55
avoid sometimes expensive operations when annex.supportunlocked = false
This will mostly just avoid a DB lookup, so things get marginally
faster. But in cases where there are many files using the same key, it
can be a more significant speedup.

Added overhead is one MVar lookup per call, which should be small
enough, since this happens after transferring or ingesting a file,
which is always a lot more work than that. It would be nice, though,
to move getGitConfig to AnnexRead, which there is an open todo about.
2021-06-14 12:40:41 -04:00
Joey Hess
a02b5c2904
response 2021-06-14 12:36:42 -04:00
yarikoptic
51fede57a2 Added a comment 2021-06-14 16:23:41 +00:00
Ilya_Shlyakhter
35afd58a76 Added a comment: git-annex-add slowdown 2021-06-14 16:00:44 +00:00
Joey Hess
c4f1465a81
check symlink before reading file
This is faster because when multiple files are in a directory, it gets
cached.
2021-06-14 11:53:51 -04:00
Joey Hess
4163344ed6
retitle 2021-06-14 11:44:55 -04:00
Joey Hess
0eff5a3f71
reproduced 2021-06-14 11:37:21 -04:00
Joey Hess
26a9ea12d1
handle edge case of symlink to something that is not really a pointer file
That seems very unlikely to happen, but still, it's possible it could.
And with the recent addition of locked files to the keys db, this could
be called by places that did not call it before, so it seems even more
important it's correct.

Adds an extra stat of the file, and is potentially racy, but both
problems are fixed by the unix-2.8.0 path. I have not tested that path
builds because that package is not yet released and it would be difficult
to install it since it's tightly tied to a ghc version.
2021-06-14 11:35:52 -04:00
Joey Hess
673b2feaf3
rename for clarity
Associated files are recorded now also for locked files, but this is
only needed to populate unlocked files.
2021-06-14 10:55:24 -04:00
yarikoptic
8f66f73fea Added a comment 2021-06-09 22:28:06 +00:00
yarikoptic
e30f973323 Added a comment: more "mystery resolved" -- identical (empty) keys 2021-06-09 21:00:34 +00:00
Joey Hess
4b09b93a18
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-09 15:38:58 -04:00
Joey Hess
fad281767a
comment 2021-06-09 15:38:55 -04:00
yarikoptic
714d9f1315 Added a comment 2021-06-08 22:02:34 +00:00
yarikoptic
a8fb61329d Added a comment 2021-06-08 21:58:20 +00:00
yarikoptic
3985ae3224 Added a comment: OSX mystery resolved. add --batch is effective mitigation 2021-06-08 21:56:53 +00:00
Joey Hess
6cb9113ff5
comments 2021-06-08 17:38:56 -04:00
yarikoptic
c3993a2655 Added a comment 2021-06-08 20:23:09 +00:00
yarikoptic
437d9366b7 Added a comment: getting closer... 2021-06-08 19:21:59 +00:00
Ilya_Shlyakhter
be4a029e1b Added a comment 2021-06-08 19:08:01 +00:00
jenkin.schibel@286264d9ceb79998aecff0d5d1a4ffe34f8b8421
be173f213d 2021-06-08 18:40:09 +00:00
jenkin.schibel@286264d9ceb79998aecff0d5d1a4ffe34f8b8421
e4cf6cc306 removed 2021-06-08 18:26:30 +00:00