Commit graph

3661 commits

Author SHA1 Message Date
Ilya_Shlyakhter
de12aeb1a4 Added a comment: matching include/exclude based on file extension in the key 2021-06-02 17:02:58 +00:00
Ilya_Shlyakhter
6a2bfad192 Added a comment: keys db optimization 2021-06-02 16:53:03 +00:00
Joey Hess
6f3f972355
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-01 11:43:36 -04:00
Joey Hess
3155c0d03e
todo 2021-06-01 10:39:48 -04:00
Ilya_Shlyakhter
a7e8a630fb Added a comment: keys-to-paths db 2021-05-31 23:15:21 +00:00
Joey Hess
f00e365f41
comments 2021-05-31 17:54:17 -04:00
Ilya_Shlyakhter
2dac55978c Added a comment: startup scan for files 2021-05-31 20:50:36 +00:00
Joey Hess
8734f17bc5
comment 2021-05-31 15:15:09 -04:00
Joey Hess
988dbce27a
Merge branch 'master' of ssh://git-annex.branchable.com 2021-05-31 15:05:40 -04:00
Joey Hess
eb6f6ff9b8
speed up keys database writes
There seems to be no reason to check the time here. I think it was
inherited from code in Database.Fsck, which does have a reason to commit
every few minutes. Removing that syscall speeds up a git-annex init
in a repo with 100000 annexed files by about 3 seconds.

Sponsored-by: Dartmouth College's Datalad project
2021-05-31 15:01:00 -04:00
Atemu
6da7f26e2a 2021-05-31 18:59:15 +00:00
Atemu
ae129dc317 2021-05-31 18:42:56 +00:00
Joey Hess
0f54e5e0ae
speed up initial scanning for annexed files
Streaming through git this way speeds it up by around 25%. This is
similar to the optimisations of seeking annexed files.

Sponsored-by: Dartmouth College's Datalad project
2021-05-31 14:29:34 -04:00
Joey Hess
759e5a9903
todo 2021-05-31 10:50:22 -04:00
Joey Hess
3b7f28feca
comment 2021-05-31 10:43:59 -04:00
Joey Hess
57a0ef8d90
comment and reject todo 2021-05-27 12:19:35 -04:00
Atemu
0a0889e72e Added a comment 2021-05-26 07:11:20 +00:00
Joey Hess
13a6bfff49
comments 2021-05-25 16:37:32 -04:00
Joey Hess
f5dc06077d
Merge branch 'master' of ssh://git-annex.branchable.com 2021-05-25 13:10:34 -04:00
Joey Hess
b5f5475ed6
New matching options --excludesamecontent and --includesamecontent
The normalisation of filenames turns out to be the tricky part here,
because the associated files coming out of the keys db may look like
"./foo/bar" or "../bar". For the former to match a glob like "foo/*",
it needs to be normalised.

Note that, on windows, normalise "./foo/bar" = "foo\\bar"
which a glob like "foo/*" won't match. So the glob is matched a second
time, on the toInternalGitPath, so allowing the user to provide a glob
with the slashes in either direction. However, this still won't support
some wacky edge cases like the user providing a glob of "foo/bar\\*"

Sponsored-by: Dartmouth College's Datalad project
2021-05-25 13:08:18 -04:00
Lukey
2ccf525b7f Added a comment 2021-05-25 16:48:26 +00:00
Joey Hess
cd73fcc92c
Merge branch 'master' of ssh://git-annex.branchable.com 2021-05-25 11:45:02 -04:00
Joey Hess
483fc4dc6b
Merge branch 'trackassociated' 2021-05-25 11:43:52 -04:00
Joey Hess
e9c95ef890
comments 2021-05-25 11:43:46 -04:00
Atemu
7ed4c4a35c 2021-05-25 14:51:21 +00:00
parhuzamos
54e1ac849a Added a comment 2021-05-24 09:33:50 +00:00
Joey Hess
428c91606b
include locked files in the keys database associated files
Before only unlocked files were included.

The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.

reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.

On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.

reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.

Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.

Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.

There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.

However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 16:24:37 -04:00
Joey Hess
1d9bad51d2
plan for these 2021-05-21 13:50:26 -04:00
Joey Hess
b68a40fa88
todo 2021-05-20 11:18:46 -04:00
Joey Hess
64e26287dd
comment 2021-05-19 11:07:02 -04:00
yarikoptic
674f33c139 todo for extra logging when content changed 2021-05-18 14:05:18 +00:00
Joey Hess
c525d18cf7
filter-branch: New command, useful to produce a filtered version of the git-annex branch, eg when splitting a repository 2021-05-17 14:16:46 -04:00
Joey Hess
5004eed27d
branch 2021-05-13 16:18:35 -04:00
Joey Hess
715d3d728c
new name for command 2021-05-13 16:07:30 -04:00
Joey Hess
40ade7a515
add some functions listing log files
Not used yet, will be used by copy-branch to generate the list of files
to copy.
2021-05-13 14:57:38 -04:00
Joey Hess
13a8706cda
almost have a plan 2021-05-13 14:09:06 -04:00
Joey Hess
03f46b95e6
comment 2021-05-13 12:05:24 -04:00
Atemu
d8ed6daeb3 Added a comment 2021-05-13 10:26:53 +00:00
Joey Hess
7500ba7ceb
already implemented 2021-05-12 12:24:55 -04:00
Joey Hess
8dbbbc7250
idea 2021-05-10 19:16:15 -04:00
Joey Hess
b184fc490a
split out common options to its own page and mention it on each subcommand page
Sometimes users would get confused because an option they were looking
for was not mentioned on a subcommand's man page, and they had not
noticed that the main git-annex man page had a list of common options.
This change lets each subcommand mention the common options, similarly
to how the matching options are handled.

This commit was sponsored by Svenne Krap on Patreon.
2021-05-10 15:00:13 -04:00
Joey Hess
56ccc0302e
mention --all on fsck man page, and repurpose todo 2021-05-10 11:11:50 -04:00
Joey Hess
9cc8e24727
comment 2021-05-10 10:48:44 -04:00
Atemu
c09497dfad Added a comment 2021-05-10 14:13:55 +00:00
Lukey
c2b1c730a5 Added a comment 2021-05-10 12:21:37 +00:00
Atemu
689a26d25a 2021-05-10 11:01:43 +00:00
Joey Hess
9e1a693f16
Merge branch 'master' of ssh://git-annex.branchable.com 2021-05-07 11:44:58 -04:00
Joey Hess
0caf171c63
patch review 2021-05-07 11:44:30 -04:00
Ilya_Shlyakhter
fd60824979 Added a comment: different keys for the same content 2021-05-07 15:37:46 +00:00
Atemu
e686a878ed 2021-05-05 08:52:28 +00:00
Atemu
78948270e2 2021-05-05 06:11:51 +00:00
Atemu
931a55b9a4 Added a comment 2021-05-05 06:04:49 +00:00
Atemu
857bffe388 Added a comment 2021-05-05 05:43:56 +00:00
yarikoptic
4d6e5bc6ad removed 2021-05-04 19:57:53 +00:00
yarikoptic
eac6b763d3 Added a comment 2021-05-04 17:51:09 +00:00
yarikoptic
66e175fec9 Added a comment 2021-05-04 17:50:45 +00:00
Atemu
6b23eb031f Added a comment 2021-05-04 17:37:04 +00:00
yarikoptic
1f7577839c Added a comment 2021-05-04 16:53:18 +00:00
Joey Hess
32e6d6880f
comment 2021-05-04 11:13:50 -04:00
Joey Hess
2d36fa7e17
comment 2021-05-04 10:56:27 -04:00
Joey Hess
20fee2ef04
response 2021-05-04 10:31:53 -04:00
Joey Hess
084f0e3e89
comment and todo 2021-05-04 10:14:53 -04:00
Atemu
0df9dc617f Added a comment 2021-05-04 12:46:59 +00:00
Atemu
6299fd688b 2021-05-04 02:43:14 +00:00
Joey Hess
4bd22a45e4
update 2021-05-02 15:22:22 -04:00
yarikoptic
e5bbbb5d02 initial todo asking for possibility to assign costs per URL 2021-04-30 13:54:30 +00:00
yarikoptic
349c0668be initial todo for perspective copy-key(file) command(s) 2021-04-29 22:41:56 +00:00
Ilya_Shlyakhter
5efce5078a added suggestion: support tree-ish in command args 2021-04-29 16:53:31 +00:00
Joey Hess
34c2d13dce
add note to git-annex-log man page about when information is not available 2021-04-23 14:53:38 -04:00
Joey Hess
bfa2db9222
done 2021-04-23 14:48:45 -04:00
Joey Hess
32138b8cd8
implement annex.privateremote and remote.name.private configs
The slightly unusual parsing in Types.GitConfig avoids the need to look
at the remote list to get configs of remotes. annexPrivateRepos combines
all the configs, and will only be calculated once, so it's nice and
fast.

privateUUIDsKnown and regardingPrivateUUID now need to read from the
annex mvar, so are not entirely free. But that overhead can be optimised
away, as seen in getJournalFileStale. The other call sites didn't seem
worth optimising to save a single MVar access. The feature should have
impreceptable speed overhead when not being used.
2021-04-23 14:21:57 -04:00
Joey Hess
d5a05655b4
Merge branch 'master' into hiddenannex 2021-04-23 13:06:33 -04:00
Joey Hess
0547884eb2
importfeed: fix bug while also speeding up 12x!
* Fix bug that could make git-annex importfeed not see recently recorded
  state when configured with annex.alwayscommit=false.
* importfeed: Made "checking known urls" phase run 12 times faster.

The massive speedup is because it no longer queries for metadata
accompanying each url. Instead it processes the whole git-annex branch and
checks all metadata files for feed item ids, and uses any it finds.

This could result in a behavior change, in an unlikely situation: If a feed
id is recorded in a key's metadata, but the url gets removed, the old code
would not see that item id and would re-download it if it finds an url for
it in a feed, while the new code will see the item id. I don't think
the old behavior was intentional, and it may be that the new behavior is
better. Not gonna worry about this.
2021-04-23 12:36:56 -04:00
Joey Hess
657d55c401
convert withKnownUrls to use overBranchFileContents
This only partly fixes importfeed to see journalled files, since it
separately cats metadata directly from the branch. Held off on a
changelog for a bug fix until that's dealt with.
2021-04-23 11:32:25 -04:00
Joey Hess
27a29c99fe
update 2021-04-22 12:52:32 -04:00
Joey Hess
dc37a5d1eb
update 2021-04-21 23:42:00 -04:00
Joey Hess
7cb96bc3e3
alternative 2021-04-21 17:18:47 -04:00
Joey Hess
0bb57702e1
Merge branch 'master' into hiddenannex 2021-04-21 15:45:12 -04:00
Joey Hess
653b719472
fix --all to include not yet committed files from the journal
Fix bug caused by recent optimisations that could make git-annex not see
recently recorded status information when configured with
annex.alwayscommit=false.

This does mean that --all can end up processing the same key more than once,
but before the optimisations that introduced this bug, it used to also behave
that way. So I didn't try to fix that; it's an edge case and anyway git-annex
behaves well when run on the same key repeatedly.

I am not too happy with the use of a MVar to buffer the list of files in the
journal. I guess it doesn't defeat lazy streaming of the list, if that
list is actually generated lazily, and anyway the size of the journal is
normally capped and small, so if configs are changed to make it huge and
this code path fire, git-annex using enough memory to buffer it all is not a
large problem.
2021-04-21 15:40:32 -04:00
Joey Hess
9b870e29fd
Merge branch 'master' into hiddenannex 2021-04-21 13:04:40 -04:00
Ilya_Shlyakhter
78f31022e3 Added a comment: auto-expire temp repos 2021-04-21 15:37:38 +00:00
Joey Hess
5dae95f95f
Merge branch 'master' of ssh://git-annex.branchable.com 2021-04-20 15:18:56 -04:00
Joey Hess
154fb46b24
update 2021-04-20 15:18:18 -04:00
Joey Hess
05989556a2
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.

Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.

All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.

Currently, no UUIDs are treated as private yet, a way to configure that
is needed.

The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.

It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.

And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 15:04:53 -04:00
Atemu
2f05565db5 Added a comment 2021-04-20 18:05:27 +00:00
Joey Hess
3d9d1d1416
Merge branch 'master' of ssh://git-annex.branchable.com 2021-04-20 11:08:06 -04:00
Joey Hess
2d1cbdaba7
thoughts 2021-04-19 13:58:43 -04:00
Joey Hess
3262d6c0bc
yoh asked me to tag this datalad 2021-04-19 13:20:07 -04:00
Ilya_Shlyakhter
3cfc51343f Added a comment 2021-04-18 23:45:26 +00:00
anatoly.sayenko@880a118acc67f3244b406a2700f0556b2f10672c
4a9bb24d60 Added a comment: migration warning still present after migration 2021-04-18 09:37:10 +00:00
Ilya_Shlyakhter
d21b54b05b Added a comment: drop --not-used-elsewhere 2021-04-17 22:31:52 +00:00
Ilya_Shlyakhter
44aad24f30 added suggestion: let git-annex-matching-options query .gitattributes 2021-04-17 20:38:09 +00:00
Joey Hess
c8e607f226
comment 2021-04-16 14:45:46 -04:00
Joey Hess
e56e40c51a
Merge branch 'master' of ssh://git-annex.branchable.com 2021-04-16 14:41:51 -04:00
Joey Hess
29108a8801
thoughts 2021-04-16 14:41:12 -04:00
Lukey
c4fcfa8d33 Added a comment 2021-04-16 18:29:22 +00:00
Joey Hess
7496c86c7c
comment 2021-04-16 13:49:05 -04:00
Joey Hess
90eb649e73
idea 2021-04-16 13:30:23 -04:00
Ilya_Shlyakhter
3378f74fb0 Added a comment: lockContent for special remotes 2021-04-15 16:32:34 +00:00
yarikoptic
a72b7b8c2f initial todo/report on drop dropping a key "for all paths" 2021-04-15 02:30:33 +00:00
yarikoptic
3aa4cc9d6f Added a comment 2021-04-14 20:46:11 +00:00
Joey Hess
17646b0b31
Merge branch 'master' of ssh://git-annex.branchable.com 2021-04-14 16:20:13 -04:00
Joey Hess
58da9f74b7
directory CoW on export
Completing Cow support for directory.
2021-04-14 16:19:43 -04:00
Joey Hess
b86206b553
directory CoW on import 2021-04-14 16:10:09 -04:00
Joey Hess
a36be49b01
comment 2021-04-14 14:12:32 -04:00
yarikoptic
066c4f1efc Added a comment: comment to joey response on cp --reflink workaround 2021-04-14 18:04:53 +00:00
Joey Hess
34e959f181
tag confirmed 2021-04-14 13:45:59 -04:00
Joey Hess
cac7866bce
note 2021-04-14 13:44:43 -04:00
Joey Hess
d1478e8b40
correction 2021-04-14 13:42:37 -04:00
Joey Hess
42c8f1e5f5
comment 2021-04-14 13:41:24 -04:00
Joey Hess
799e7b3c29
update 2021-04-14 13:32:28 -04:00
Joey Hess
5978b2a35b
comment 2021-04-14 13:31:08 -04:00
Joey Hess
5783a8d081
fsck: avoid redundant checksum when transfer is Verified
When downloading content from a remote, if the content is able to be
verified during the transfer, skip checksumming it a second time.

Note that in this case, the fsck output does not include "(checksum)"
which it does when the checksumming is done separately from the download.

This commit was sponsored by Brock Spratlen on Patreon.
2021-04-14 13:22:54 -04:00
Atemu
46309994a2 2021-04-14 16:14:20 +00:00
yarikoptic
c300675051 is importtree CoW from directory? 2021-04-14 14:24:18 +00:00
Joey Hess
0bcf155e11
thoughts 2021-04-13 14:41:27 -04:00
Joey Hess
6911787042
idea 2021-04-13 13:41:36 -04:00
Joey Hess
67d91c63f7
update 2021-04-12 14:13:44 -04:00
Joey Hess
1e322c329e
update 2021-04-12 13:00:24 -04:00
Joey Hess
4c35d58bfe
comment and analysis 2021-04-12 12:54:46 -04:00
Ilya_Shlyakhter
ad2a6d45db Added a comment 2021-04-12 15:39:31 +00:00
Ilya_Shlyakhter
70991c1d65 Added a comment 2021-04-12 14:42:13 +00:00
Ilya_Shlyakhter
cf60184992 Added a comment: lockContent for special remotes w/o changing the protocol 2021-04-12 01:20:16 +00:00
Joey Hess
7b6ab0ae9a
comment 2021-04-08 13:51:43 -04:00
Atemu
351f5d753f fix url 2021-04-08 11:56:13 +00:00
Atemu
474dd1a3fc 2021-04-08 11:50:59 +00:00
Ilya_Shlyakhter
9041b2b6a4 Added a comment: running untrusted code 2021-04-07 16:52:42 +00:00
Joey Hess
da88863082
comment and close, open related todo 2021-04-06 16:51:38 -04:00
Joey Hess
98b223a71c
Merge branch 'master' of ssh://git-annex.branchable.com 2021-04-05 15:32:08 -04:00
Joey Hess
1b645e1ace
added --debugfilter (and annex.debugfilter) 2021-04-05 15:31:10 -04:00
Atemu
45a93d7129 2021-04-04 09:23:41 +00:00
Joey Hess
3204f0bbaa
comments 2021-04-02 13:41:26 -04:00
Joey Hess
4a30fddc2a
idea 2021-04-01 15:49:30 -04:00
Joey Hess
632ae09e28
comment 2021-04-01 12:24:21 -04:00
Ilya_Shlyakhter
4dde355c79 Added a comment: dockerized special remotes: security 2021-04-01 15:20:05 +00:00
Joey Hess
24c576bfa7
Merge branch 'master' of ssh://git-annex.branchable.com 2021-03-30 12:58:34 -04:00
Lukey
a366e9d0fc Added a comment 2021-03-30 16:21:14 +00:00
Joey Hess
773752b040
comment 2021-03-30 12:06:36 -04:00
Lukey
568f1c421b Added a comment 2021-03-30 16:01:04 +00:00
Ilya_Shlyakhter
4403791c6c Added a comment: autoenabling external special remotes 2021-03-30 15:17:05 +00:00
Ilya_Shlyakhter
9b8661c327 added suggestion to have git-annex-info display the time of last interaction with repos 2021-03-30 14:31:14 +00:00
parhuzamos
72f5088d34 2021-03-26 12:34:01 +00:00
parhuzamos
2187892a81 2021-03-26 12:31:47 +00:00
Ilya_Shlyakhter
a4cc0c95b4 added suggestion for additional git-annex-config settings 2021-03-23 20:11:39 +00:00
Ilya_Shlyakhter
547a5a8ca8 Added a comment: annex.supportunlocked=false 2021-03-23 20:02:19 +00:00
Joey Hess
f19271c5d9
comment 2021-03-23 15:51:21 -04:00
Joey Hess
806b6f77b9
Merge branch 'master' of ssh://git-annex.branchable.com 2021-03-23 15:47:21 -04:00
Ilya_Shlyakhter
3925235805 Added a comment: annex.supportunlocked 2021-03-23 19:30:44 +00:00
Joey Hess
5d78cd9d08
Sped up git-annex init in a clone of an existing repository
Seems that hasOrigin was never finding origin's git-annex branch, so a new
one got created each time. And so then it later needed to merge the two
branches, which is expensive.

Added --no-track to git branch to avoid it displaying a message about
setting up tracking branches. Of course there's no reason to make the
git-annex branch a tracking branch since git-annex auto-merges it.
2021-03-23 15:23:13 -04:00
yarikoptic
ed5fd5b896 Added a comment 2021-03-23 18:43:32 +00:00