Commit graph

11632 commits

Author SHA1 Message Date
Joey Hess
c29e2ed916
comment 2022-11-29 12:49:09 -04:00
Joey Hess
e7e7007e0c
explanss and close 2022-11-29 12:46:59 -04:00
cnjr2
3edc86ffe1 2022-11-29 12:29:42 +00:00
Joey Hess
2b5e6ff20a
test: Add --test-debug option
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2022-11-28 15:12:53 -04:00
Joey Hess
022c14ec02
comment 2022-11-28 14:07:35 -04:00
beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec
85ea582e2b 2022-11-24 07:11:26 +00:00
yarikoptic
e140b36fd5 initial report on stalled test 2022-11-23 03:15:51 +00:00
Joey Hess
63b33d4181
update 2022-11-18 16:23:20 -04:00
Joey Hess
3e20daccdd
more benchmarking work 2022-11-18 15:54:06 -04:00
Joey Hess
2b014f1a8b
don't frontload reconcileStaged in git-annex init
init: Avoid scanning for annexed files, which can be lengthy in a
large repository. Instead that scan is done on demand. This lets git-annex
init be run and some query commands be used in a repository without
waiting.

Note that autoinit already behaved this way, so while this will mean some
commands like git-annex get/unlock/add will do the scan the first time run,
that is not really a significant behavior change.

And, it's really better to have a consistent behavior. The reason for
the inconsistency was a strange bug discussed in
b3c4579c79. Avoiding reconcileStaged in
init will keep avoiding whatever that was.

Sponsored-by: Dartmouth College's DANDI project
2022-11-18 13:58:47 -04:00
Joey Hess
c834d2025a
queue more changes to keys db
Increasing the size of the queue 10x makes git-annex init 7% faster in a
repository with 86000 annexed files.

The memory use goes up, from 70876 kb to 85376 kb.
2022-11-18 13:29:34 -04:00
Joey Hess
8fcee4ac9d
Sped up the initial scanning for annexed files by 15%
Avoids database querying overhead when the database is newly created.

In the large repository where git-annex init took 24 seconds, this sped it
up to 20.47 seconds, a speedup of around 15%.

Sponsored-by: Dartmouth College's DANDI project
2022-11-18 13:16:57 -04:00
Joey Hess
814bb3a270
profiling and performance 2022-11-17 16:47:51 -04:00
Joey Hess
a91bf72684
Merge branch 'master' of ssh://git-annex.branchable.com 2022-11-09 16:29:16 -04:00
Joey Hess
b2cc63d5bf
export: fix multi-file delete bug
export: Fix a bug that left a file on a special remote when two files with
the same content were both deleted in the exported tree.

Case of the wrong data structure leading to the wrong result.
The DiffMap now contains all the old filenames, and all the new filenames.

Note that, when 2 files with the same content are both renamed,
it only renames the first, but deletes and re-exports the second.
Improving that is possible, but it would need to use a different temporary
filename. Anyway, that is an unusual case, and there are known to be other
unusual cases where export does not rename with maximum efficiency, IIRC.
(Or maybe this is the case that I remember?)

Sponsored-by: Dartmouth College's OpenNeuro project
2022-11-09 16:24:37 -04:00
nell@7201fe78ade251118ef3441f4e509b37cd836503
ca0c62f289 Added a comment 2022-11-09 19:51:20 +00:00
Joey Hess
56d563577f
analysis 2022-11-09 15:46:14 -04:00
Joey Hess
cdbee87a05
comment and retitle 2022-11-09 15:18:28 -04:00
Joey Hess
94ee608b2e
comment 2022-11-09 15:14:28 -04:00
Joey Hess
5409b0c5c4
close 2022-11-09 14:32:14 -04:00
nell@7201fe78ade251118ef3441f4e509b37cd836503
a219d24c8c Adding S3 special remote export bug report 2022-11-08 22:37:14 +00:00
jkniiv
9414eb0745 report on troubles with aws-0.23 and stack builds 2022-11-08 11:36:21 +00:00
Joey Hess
4f6c6114fb
avoid splitting repo tests into too small parts around -J16
The initTests have to be run once per part, and a point of diminishing
returns can be reached where more work is being done to set up for 1 or
2 tests than to run them.

This is better than a hard cap of -J8 or so, because it lets other
things than these particular tests still be parallelized at -J16.

Sponsored-by: Dartmouth College's Datalad project
2022-11-07 14:44:51 -04:00
Joey Hess
5e6cb47bd8
coment 2022-11-07 14:28:55 -04:00
Joey Hess
38e67e169d
followup 2022-11-07 12:26:04 -04:00
yarikoptic
7698e25f6f initial report on stalling test 2022-11-07 14:31:55 +00:00
xloem
8b5b078df3 2022-11-07 10:33:22 +00:00
yarikoptic
99b40cd6a2 Added a comment 2022-11-04 12:41:47 +00:00
Joey Hess
14f7a386f0
Make git-annex enable-tor work when using the linux standalone build
Clean the standalone environment before running the su command
to run "sh". Otherwise, PATH leaked through, causing it to run
git-annex.linux/bin/sh, but GIT_ANNEX_DIR was not set,
which caused that script to not work:

[2022-10-26 15:07:02.145466106] (Utility.Process) process [938146] call: pkexec ["sh","-c","cd '/home/joey/tmp/git-annex.linux/r' && '/home/joey/tmp/git-annex.linux/git-annex' 'enable-tor' '1000'"]
/home/joey/tmp/git-annex.linux/bin/sh: 4: exec: /exe/sh: not found

Changed programPath to not use GIT_ANNEX_PROGRAMPATH,
but instead run the scripts at the top of GIT_ANNEX_DIR.
That works both when the standalone environment is set up, and when it's
not.

Sponsored-by: Kevin Mueller on Patreon
2022-10-26 15:45:08 -04:00
Joey Hess
a8ce8ac75d
comment 2022-10-26 14:54:38 -04:00
Joey Hess
731e806c96
use lookupKeyStaged in --batch code paths
Make --batch mode handle unstaged annexed files consistently whether the
file is unlocked or not. Before this, a unstaged locked file
would have the symlink on disk examined and operated on in --batch mode,
while an unstaged unlocked file would be skipped.

Note that, when not in batch mode, unstaged files are skipped over too.
That is actually somewhat new behavior; as late as 7.20191114 a
command like `git-annex whereis .` would operate on unstaged locked
files and skip over unstaged unlocked files. That changed during
optimisation of CmdLine.Seek with apparently little fanfare or notice.

Turns out that rmurl still behaved that way when given an unstaged file
on the command line. It was changed to use lookupKeyStaged to
handle its --batch mode. That also affected its non-batch mode, but
since that's just catching up to the change earlier made to most
other commands, I have not mentioed that in the changelog.

It may be that other uses of lookupKey should also change to
lookupKeyStaged. But it may also be that would slow down some things,
or lead to unwanted behavior changes, so I've kept the changes minimal
for now.

An example of a place where the use of lookupKey is better than
lookupKeyStaged is in Command.AddUrl, where it looks to see if the file
already exists, and adds the url to the file when so. It does not matter
there whether the file is staged or not (when it's locked). The use of
lookupKey in Command.Unused likewise seems good (and faster).

Sponsored-by: Nicholas Golder-Manning on Patreon
2022-10-26 14:43:06 -04:00
Joey Hess
1944549a38
comment 2022-10-26 12:58:10 -04:00
jwodder
0d853cef01 2022-10-25 16:18:09 +00:00
gyurmo.gyuri@8d622eb91a0312fcb7e63e4f47a6e191c417a0c8
5e52325147 2022-10-23 08:30:57 +00:00
Joey Hess
cde2e61105
improve sqlite retrying behavior
Avoid hanging when a suspended git-annex process is keeping a sqlite
database locked.

Sponsored-by: Dartmouth College's Datalad project
2022-10-18 15:47:20 -04:00
Joey Hess
3149a1e2fe
More robust handling of ErrorBusy when writing to sqlite databases
While ErrorBusy and other exceptions were caught and the write retried for
up to 10 seconds, it was still possible for git-annex to eventually
give up and error out without writing to the database. Now it will retry
as long as necessary.

This does mean that, if one git-annex process is suspended just as sqlite
has locked the database for writing, another git-annex that tries to write
it it might get stuck retrying forever. But, that could already happen when
opening the sqlite database, which retries forever on ErrorBusy. This is an
area where git-annex is known to not behave well, there's a todo about the
general case of it.

Sponsored-by: Dartmouth College's Datalad project
2022-10-17 15:56:19 -04:00
Joey Hess
22f9598dfb
Merge branch 'master' of ssh://git-annex.branchable.com 2022-10-12 15:54:12 -04:00
Joey Hess
d5cd1de280
update and open a todo about something I'm pondering 2022-10-12 15:53:56 -04:00
benjamin.poldrack@d09ccff6d42dd20277610b59867cf7462927b8e3
5bd79e717f Added a comment 2022-10-12 06:12:17 +00:00
Joey Hess
b312b2a30b
update 2022-10-11 15:07:52 -04:00
Joey Hess
c2ad84b423
all keys are still present on versioned remote after import of a tree
When importing from versioned remotes, fix tracking of the content of
deleted files.

Only S3 supports versioning so far, so only it was affected.

But, the draft import/export interface for external remotes also seemed to
need a change, so that versionedExport could be set.
2022-10-11 13:05:40 -04:00
benjamin.poldrack@d09ccff6d42dd20277610b59867cf7462927b8e3
e22c3b3d7c 2022-10-11 09:12:00 +00:00
Joey Hess
4a42c69092
take lock in checkLogFile and calcLogFile
move: Fix openFile crash with -J

This does make them a bit slower, although usually the log file is not
very big, so even when it's being rewritten, they will not block for
long taking the lock. Still, little slowdowns may add up when moving a lot
file files.

A less expensive fix would be to use something lower level than openFile
that does not check if the file is already open for write by another
thread. But GHC does not seem to provide anything convenient; even mkFD
checks for a writing thread.

fullLines is no longer necessary since these functions no longer will
read the file while it's being written.

Sponsored-by: Dartmouth College's DANDI project
2022-10-07 13:19:17 -04:00
Joey Hess
85dbc21c1c
fix typo 2022-10-07 12:30:07 -04:00
jkniiv
b7d189d6c0 Added a comment 2022-10-06 16:04:41 +00:00
yarikoptic
91c9a27c5a Added a comment 2022-10-06 12:50:01 +00:00
yarikoptic
77825cadfa removed 2022-10-06 12:47:48 +00:00
yarikoptic
04117f0e52 Added a comment 2022-10-06 12:47:17 +00:00
jkniiv
c9bf143fc8 Added a comment 2022-10-06 06:21:40 +00:00
yarikoptic
5761ea969f 2022-10-06 01:37:24 +00:00