Commit graph

41932 commits

Author SHA1 Message Date
Joey Hess
8b4f7af605
Merge branch 'master' of ssh://git-annex.branchable.com 2022-07-20 13:34:37 -04:00
Joey Hess
a0746d2027
fixed 2022-07-20 13:32:26 -04:00
Joey Hess
05b96a1acf
Merge branch 'append' 2022-07-20 13:24:04 -04:00
Joey Hess
4e88137a28
prevent appends except when annex.alwayscompact=false
I would like for a new repo version to enable appends, but to do so
safely would need a v11 followed by a 1 year delay followed by a v12
that does it. Since a similar v9 and v10 transition is currently
happening, and is less than 6 months along in most repos, it does not
feel wise to stack up another year-long transition behind that. What if
I need to hurry up a new repo version for some other change?

Added todo so I remember to make this change at some time when a v11
and probably v12 repo version do make sense.

Sponsored-by: Dartmouth College's DANDI project
2022-07-20 13:23:55 -04:00
Joey Hess
d275874e6c
handling of interrupted appends
An append that is interrupted and writes part of a line is now dealt
with by subsequent reads and appends. This also handles a read that
happens at the same time as an append to the file.

Old versions of git-annex will still see a partially written line,
and could get confused. Since appends are currently done for url logs
and location logs, the confusion is limited to a substring of the actual
url or UUID of the remote being read. This will not affect writes, since
the journal file is locked when reading in preparation for writing.
However, the bad data can be output by git-annex and used by other
things, or could cause surprising behavior by git-annex. Including eg,
downloading the content of the wrong url.

So, something needs to be done to prevent old versions of git-annex from
running in a repository where this appending is being done..

Sponsored-by: Dartmouth College's DANDI project
2022-07-20 12:40:49 -04:00
Joey Hess
6f1fd3abdd
no locking of journal on read after all
Finally have a final design, and it turns out not to need locking on read.
2022-07-20 10:57:28 -04:00
Joey Hess
c933b0074f
comment 2022-07-19 18:10:45 -04:00
yarikoptic
78642ededb Added a comment 2022-07-19 21:29:02 +00:00
Joey Hess
d832a7c211
thoughts 2022-07-19 15:45:08 -04:00
akspecs@fae4f3d58a0c6c9d50f01a850ce53d425e1a90ba
d23c281856 Added a comment: cannot reproduce 2022-07-19 12:53:46 +00:00
arnaud@c1d1cc612a3921dc06a417301be08a3e125478c4
b566ef839e 2022-07-19 11:22:26 +00:00
daven.quinn@d0ed4e0e5e4462d9a74a5d5a8fbd1b17f85db13e
0568fc872d 2022-07-19 07:57:37 +00:00
Joey Hess
fd6e01d9b6
comment 2022-07-18 17:02:41 -04:00
Joey Hess
2cb634c373
comment 2022-07-18 16:56:31 -04:00
Joey Hess
71b4a6ba26
Merge branch 'master' into append 2022-07-18 16:46:01 -04:00
Joey Hess
d0860b7f0e
fix build
After 28b0aaea54
2022-07-18 16:44:32 -04:00
Joey Hess
28b0aaea54
re-add lock journal before reading journal files
This reverts commit 2e6e9876e3.

This is gonna be needed after all.. The append will only be atomic if
the journal is locked, because the file being appended will have to be
moved out of the way to avoid an old version of git-annex seeing an
incomplete write to it. When git-annex finds that the file is not in the
journal, and checks the append location, locking will be needed to avoid
a race causing it to miss it in the append location too due to it being
moved back to the journal.
2022-07-18 16:40:25 -04:00
Joey Hess
36f0bdcd57
add annex.alwayscompact
Added annex.alwayscompact setting which can be unset to speed up writes to
the git-annex branch in some cases.

Sponsored-by: Dartmouth College's DANDI project
2022-07-18 16:39:19 -04:00
yarikoptic
cab61b88e0 Added a comment 2022-07-18 19:23:02 +00:00
Joey Hess
b04435ea27
Merge branch 'master' of ssh://git-annex.branchable.com 2022-07-18 14:45:23 -04:00
Joey Hess
efee53f433
comments 2022-07-18 14:45:03 -04:00
Joey Hess
ccff639651
Merge branch 'master' into append 2022-07-18 14:17:15 -04:00
Joey Hess
de18d92de6
efficient but unsafe journal file append
This is only for checking performance, it's not safe.

Sponsored-by: Dartmouth College's DANDI project
2022-07-18 14:17:12 -04:00
Joey Hess
1c40b927aa
minor optimisation
Avoid re-writing the file when the journal directory did not
exist.
2022-07-18 13:50:35 -04:00
Joey Hess
2e6e9876e3
Revert "lock journal before reading journal files"
This reverts commit 47358a6f95.

This added overhead, and will not be needed, because appends are going
to have to be made atomic for other reasons than avoiding incomplete
reads of data being appended.

In particular, when git-annex is interrupted in the middle of an append,
it must not leave the file with a partially written line. So appending
has to somehow be made fully atomic.
2022-07-18 13:38:12 -04:00
Joey Hess
ce455223df
split out appending to journal from writing, high level only
Currently this is not an improvement, but it allows for optimising
appendJournalFile later. With an optimised appendJournalFile, this will
greatly speed up access patterns like git-annex addurl of a lot of urls
to the same key, where the log file can grow rather large. Appending
rather than re-writing the journal file for each line can save a lot of
disk writes.

It still has to read the current journal or branch file, to check
if it can append to it, and so when the journal file does not exist yet,
it can write the old content from the branch to it. Probably the re-reads
are better cached by the filesystem than repeated writes. (If the
re-reads turn out to keep performance bad, they could be eliminated, at
the cost of not being able to compact the log when replacing old
information in it. That could be enabled by a switch.)

While the immediate need is to affect addurl writes, it was implemented
at the level of presence logs, so will also perhaps speed up location logs.
The only added overhead is the call to isNewInfo, which only needs to
compare ByteStrings. Helping to balance that out, it avoids compactLog
when it's able to append.

Sponsored-by: Dartmouth College's DANDI project
2022-07-18 13:22:50 -04:00
Ilya_Shlyakhter
1155bbb3db added suggestion to record ETags in URL- keys 2022-07-18 16:43:28 +00:00
g@aaed65f19d6c3a2a18c33da828e66c7bb915e65a
14e8b42c77 removed duplicated content 2022-07-18 16:42:10 +00:00
Joey Hess
2ce1eaf56a
Merge branch 'master' into append 2022-07-18 12:38:17 -04:00
g@aaed65f19d6c3a2a18c33da828e66c7bb915e65a
f8d454b493 Add new bug "creating failure getting/copying on git-lfs remote (gcrypt)" 2022-07-18 16:18:42 +00:00
g@aaed65f19d6c3a2a18c33da828e66c7bb915e65a
36dd9c124f Add new bug "creating failure getting/copying on git-lfs remote (gcrypt)" 2022-07-18 16:17:33 +00:00
Ilya_Shlyakhter
0b3f1adf32 Added a comment: checksums and addurl --fast 2022-07-18 16:01:41 +00:00
cehteh
aff50de9c5 Added a comment 2022-07-18 13:19:44 +00:00
oliv5
dd8a3f2d21 Added a comment 2022-07-17 20:41:51 +00:00
Atemu
2f09ebaf4d 2022-07-17 13:49:11 +00:00
Atemu
4a78214bde 2022-07-17 13:17:22 +00:00
Lukey
5a38f32207 Added a comment 2022-07-17 09:33:04 +00:00
tomdhunt
aa3ef581c6 Added a comment: Hashes for files added via addurl 2022-07-16 20:12:20 +00:00
Joey Hess
4b520e0683
increase cabal-version to work with recent cabal
It started complaining about custom setup needing too old a version of
cabal, a very confusing error message.

1.12 is the version of Cabal on the i386ancient builder.

Sponsored-by: Jack Hill on Patreon
2022-07-16 14:57:29 -04:00
dwhitman44@5e9829794550d547215bc1a5a197bde8e0c2d741
4f055f2bc6 2022-07-16 17:03:03 +00:00
yarikoptic
eee1169ad5 Added a comment 2022-07-15 19:33:50 +00:00
Joey Hess
ee8acd5b5d
Merge branch 'master' of ssh://git-annex.branchable.com 2022-07-15 15:07:05 -04:00
Joey Hess
8bc9381d8d
design work 2022-07-15 15:06:40 -04:00
Joey Hess
47358a6f95
lock journal before reading journal files
This is not currently necessary; journal files are updated atomically.

However, for faster appends to large journal files, locking on read will
be needed, because appends are not atomic.

Sponsored-by: Dartmouth College's DANDI project
2022-07-15 14:43:29 -04:00
Joey Hess
a2b1f369d1
disable journalIgnorable in enableInteractiveBranchAccess
Fix a reversion that prevented --batch commands (and the assistant)
from noticing data written to the journal by other commands.

I have not identified which commit broke this for sure,
but probably it was aeca7c2207

--batch commands that wrote to the journal avoided the problem since
journalIgnorable sets unset on write. It's a little bit surprising that
nobody noticed that query --batch commands did not see data written by
other commands.

Sponsored-by: Dartmouth College's DANDI project
2022-07-15 13:48:41 -04:00
Joey Hess
91abd872d3
complete a comment 2022-07-15 12:59:59 -04:00
nick.guenther@e418ed3c763dff37995c2ed5da4232a7c6cee0a9
4f66f036e6 2022-07-15 16:26:18 +00:00
nick.guenther@e418ed3c763dff37995c2ed5da4232a7c6cee0a9
bfcdf8374b Added a comment 2022-07-15 15:55:38 +00:00
Joey Hess
dc2de5784a
comment 2022-07-15 11:14:13 -04:00
Joey Hess
7c8c5ffe8e
Merge branch 'master' of ssh://git-annex.branchable.com 2022-07-15 11:10:44 -04:00