Commit graph

42775 commits

Author SHA1 Message Date
nell@7201fe78ade251118ef3441f4e509b37cd836503
a219d24c8c Adding S3 special remote export bug report 2022-11-08 22:37:14 +00:00
jkniiv
9414eb0745 report on troubles with aws-0.23 and stack builds 2022-11-08 11:36:21 +00:00
Joey Hess
4f6c6114fb
avoid splitting repo tests into too small parts around -J16
The initTests have to be run once per part, and a point of diminishing
returns can be reached where more work is being done to set up for 1 or
2 tests than to run them.

This is better than a hard cap of -J8 or so, because it lets other
things than these particular tests still be parallelized at -J16.

Sponsored-by: Dartmouth College's Datalad project
2022-11-07 14:44:51 -04:00
Joey Hess
5e6cb47bd8
coment 2022-11-07 14:28:55 -04:00
Joey Hess
38e67e169d
followup 2022-11-07 12:26:04 -04:00
yarikoptic
7698e25f6f initial report on stalling test 2022-11-07 14:31:55 +00:00
xloem
8b5b078df3 2022-11-07 10:33:22 +00:00
Joey Hess
ad7dfdb37f
Merge branch 'master' of ssh://git-annex.branchable.com 2022-11-04 16:21:20 -04:00
Joey Hess
e03cf52504
done 2022-11-04 16:20:51 -04:00
Joey Hess
e100993935
complete support for S3 signature=anonymous
aws-0.23 has been released.

When built with an older aws, initremote will error out when run
with signature=anonymous. And when a remote has been initialized with
that by a version of git-annex that does support it, older versions will
fail when the remote is accessed, with a useful error message.

Sponsored-by: Dartmouth College's DANDI project
2022-11-04 16:20:28 -04:00
Joey Hess
f3fbdddb8a
revert demo changes to stack.yaml
done to test aws before releasing it
2022-11-04 15:11:53 -04:00
Joey Hess
de1e8201a6
Merge branch 'master' into anons3 2022-11-04 15:08:29 -04:00
yarikoptic
99b40cd6a2 Added a comment 2022-11-04 12:41:47 +00:00
Joey Hess
bc980815ee
fix build of helper program
broken by ba7ecbc6a9

Sponsored-by: Svenne Krap on Patreon
2022-11-03 14:11:49 -04:00
Joey Hess
95405c067f
add news item for git-annex 10.20221103 2022-11-03 14:08:29 -04:00
Joey Hess
d22bd53310
releasing package git-annex version 10.20221103 2022-11-03 14:07:53 -04:00
Joey Hess
9dc3acc95d
comment, update tip 2022-10-31 12:16:36 -04:00
xloem
bf27a02b07 Added a comment: ipfs 2022-10-31 13:36:51 +00:00
Stefan
813bc50cb3 Added a comment: This guide fails with "fatal: refusing to merge unrelated histories" 2022-10-29 10:28:19 +00:00
Joey Hess
9187a37dec
fix build warning 2022-10-27 10:21:24 -04:00
Joey Hess
14f7a386f0
Make git-annex enable-tor work when using the linux standalone build
Clean the standalone environment before running the su command
to run "sh". Otherwise, PATH leaked through, causing it to run
git-annex.linux/bin/sh, but GIT_ANNEX_DIR was not set,
which caused that script to not work:

[2022-10-26 15:07:02.145466106] (Utility.Process) process [938146] call: pkexec ["sh","-c","cd '/home/joey/tmp/git-annex.linux/r' && '/home/joey/tmp/git-annex.linux/git-annex' 'enable-tor' '1000'"]
/home/joey/tmp/git-annex.linux/bin/sh: 4: exec: /exe/sh: not found

Changed programPath to not use GIT_ANNEX_PROGRAMPATH,
but instead run the scripts at the top of GIT_ANNEX_DIR.
That works both when the standalone environment is set up, and when it's
not.

Sponsored-by: Kevin Mueller on Patreon
2022-10-26 15:45:08 -04:00
Joey Hess
a8ce8ac75d
comment 2022-10-26 14:54:38 -04:00
Joey Hess
731e806c96
use lookupKeyStaged in --batch code paths
Make --batch mode handle unstaged annexed files consistently whether the
file is unlocked or not. Before this, a unstaged locked file
would have the symlink on disk examined and operated on in --batch mode,
while an unstaged unlocked file would be skipped.

Note that, when not in batch mode, unstaged files are skipped over too.
That is actually somewhat new behavior; as late as 7.20191114 a
command like `git-annex whereis .` would operate on unstaged locked
files and skip over unstaged unlocked files. That changed during
optimisation of CmdLine.Seek with apparently little fanfare or notice.

Turns out that rmurl still behaved that way when given an unstaged file
on the command line. It was changed to use lookupKeyStaged to
handle its --batch mode. That also affected its non-batch mode, but
since that's just catching up to the change earlier made to most
other commands, I have not mentioed that in the changelog.

It may be that other uses of lookupKey should also change to
lookupKeyStaged. But it may also be that would slow down some things,
or lead to unwanted behavior changes, so I've kept the changes minimal
for now.

An example of a place where the use of lookupKey is better than
lookupKeyStaged is in Command.AddUrl, where it looks to see if the file
already exists, and adds the url to the file when so. It does not matter
there whether the file is staged or not (when it's locked). The use of
lookupKey in Command.Unused likewise seems good (and faster).

Sponsored-by: Nicholas Golder-Manning on Patreon
2022-10-26 14:43:06 -04:00
Joey Hess
b2ee2496ee
remove whenAnnexed and ifAnnexed
In preparation for adding a new variation on lookupKey.

Sponsored-by: Max Thoursie on Patreon
2022-10-26 14:06:32 -04:00
Joey Hess
1944549a38
comment 2022-10-26 12:58:10 -04:00
jwodder
0d853cef01 2022-10-25 16:18:09 +00:00
gyurmo.gyuri@8d622eb91a0312fcb7e63e4f47a6e191c417a0c8
5e52325147 2022-10-23 08:30:57 +00:00
AlexPraga
e4355a6f33 Added a comment 2022-10-22 15:12:26 +00:00
Joey Hess
cde2e61105
improve sqlite retrying behavior
Avoid hanging when a suspended git-annex process is keeping a sqlite
database locked.

Sponsored-by: Dartmouth College's Datalad project
2022-10-18 15:47:20 -04:00
Joey Hess
3149a1e2fe
More robust handling of ErrorBusy when writing to sqlite databases
While ErrorBusy and other exceptions were caught and the write retried for
up to 10 seconds, it was still possible for git-annex to eventually
give up and error out without writing to the database. Now it will retry
as long as necessary.

This does mean that, if one git-annex process is suspended just as sqlite
has locked the database for writing, another git-annex that tries to write
it it might get stuck retrying forever. But, that could already happen when
opening the sqlite database, which retries forever on ErrorBusy. This is an
area where git-annex is known to not behave well, there's a todo about the
general case of it.

Sponsored-by: Dartmouth College's Datalad project
2022-10-17 15:56:19 -04:00
Joey Hess
0d762acf7e
update comment, probably not a sqlite bug
Sqlite's page documenting WAL mode changed in Oct 2016 to mention ways
that queries could fail with SQLITE_BUSY.

http://web.archive.org/web/20161009044054/http://www.sqlite.org:80/wal.html

Probably not cooincidentally, I emailed sqlite-users about such a
situation in Feb 2015.
https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg90580.html

Noone ever replied to me, but at least now I understand why it does that.
Since it's documented now, it's no longer a bug.
2022-10-17 15:09:47 -04:00
Joey Hess
f8fe393052
comment 2022-10-17 12:37:57 -04:00
Joey Hess
b953bd0e7b
Merge branch 'master' of ssh://git-annex.branchable.com 2022-10-14 15:14:11 -04:00
AlexPraga
7b9e3dc5fb 2022-10-14 09:32:44 +00:00
AlexPraga
2cad4b95f4 2022-10-14 09:28:38 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
dbca1781d9 Update WSL1 tips 2022-10-13 15:29:12 +00:00
Joey Hess
a8a5d444c5
test v10 2022-10-12 16:13:30 -04:00
Joey Hess
22f9598dfb
Merge branch 'master' of ssh://git-annex.branchable.com 2022-10-12 15:54:12 -04:00
Joey Hess
d5cd1de280
update and open a todo about something I'm pondering 2022-10-12 15:53:56 -04:00
Joey Hess
6fbd337e34
avoid uncessary keys db writes; doubled speed!
When running eg git-annex get, for each file it has to read from and
write to the keys database. But it's reading exclusively from one table,
and writing to a different table. So, it is not necessary to flush the
write to the database before reading. This avoids writing the database
once per file, instead it will buffer 1000 changes before writing.

Benchmarking getting 1000 small files from a local origin,
git-annex get now takes 13.62s, down from 22.41s!
git-annex drop now takes 9.07s, down from 18.63s!
Wowowowowowowow!

(It would perhaps have been better if there were separate databases for
the two tables. At least it would have avoided this complexity. Ah well,
this is better than splitting the table in a annex.version upgrade.)

Sponsored-by: Dartmouth College's Datalad project
2022-10-12 15:33:16 -04:00
Joey Hess
ba7ecbc6a9
avoid flushing keys db queue after each Annex action
The flush was only done Annex.run' to make sure that the queue was flushed
before git-annex exits. But, doing it there means that as soon as one
change gets queued, it gets flushed soon after, which contributes to
excessive writes to the database, slowing git-annex down.
(This does not yet speed git-annex up, but it is a stepping stone to
doing so.)

Database queues do not autoflush when garbage collected, so have to
be flushed explicitly. I don't think it's possible to make them
autoflush (except perhaps if git-annex sqitched to using ResourceT..).
The comment in Database.Keys.closeDb used to be accurate, since the
automatic flushing did mean that all writes reached the database even
when closeDb was not called. But now, closeDb or flushDb needs to be
called before stopping using an Annex state. So, removed that comment.

In Remote.Git, change to using quiesce everywhere that it used to use
stopCoProcesses. This means that uses on onLocal in there are just as
slow as before. I considered only calling closeDb on the local git remotes
when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses
in each onLocal is so as not to leave git processes running that have files
open on the remote repo, when it's on removable media. So, it seemed to make
sense to also closeDb after each one, since sqlite may also keep files
open. Although that has not seemed to cause problems with removable
media so far. It was also just easier to quiesce in each onLocal than
once at the end. This does likely leave performance on the floor, so
could be revisited.

In Annex.Content.saveState, there was no reason to close the db,
flushing it is enough.

The rest of the changes are from auditing for Annex.new, and making
sure that quiesce is called, after any action that might possibly need
it.

After that audit, I'm pretty sure that the change to Annex.run' is
safe. The only concern might be that this does let more changes get
queued for write to the db, and if git-annex is interrupted, those will be
lost. But interrupting git-annex can obviously already prevent it from
writing the most recent change to the db, so it must recover from such
lost data... right?

Sponsored-by: Dartmouth College's Datalad project
2022-10-12 14:12:23 -04:00
benjamin.poldrack@d09ccff6d42dd20277610b59867cf7462927b8e3
5bd79e717f Added a comment 2022-10-12 06:12:17 +00:00
Joey Hess
b312b2a30b
update 2022-10-11 15:07:52 -04:00
Joey Hess
c2ad84b423
all keys are still present on versioned remote after import of a tree
When importing from versioned remotes, fix tracking of the content of
deleted files.

Only S3 supports versioning so far, so only it was affected.

But, the draft import/export interface for external remotes also seemed to
need a change, so that versionedExport could be set.
2022-10-11 13:05:40 -04:00
benjamin.poldrack@d09ccff6d42dd20277610b59867cf7462927b8e3
e22c3b3d7c 2022-10-11 09:12:00 +00:00
Joey Hess
b4305315b2
S3: pass fileprefix into getBucket calls
S3: Speed up importing from a large bucket when fileprefix= is set by only
asking for files under the prefix.

getBucket still returns the files with the prefix included, so the rest of
the fileprefix stripping still works unchanged.

Sponsored-by: Dartmouth College's DANDI project
2022-10-10 17:37:26 -04:00
Joey Hess
ca91c3ba91
S3: Support signature=anonymous to access a S3 bucket anonymously
This can be used, for example, with importtree=yes to import from a public
bucket.

This needs a patch that has not yet landed in the aws library, and will
need to be adjusted to support compiling with old versions of the library,
so is not yet suitable for merging.
See https://github.com/aristidb/aws/pull/281

The stack.yaml changes are provided to show how to build against the aws
fork and will need to be reverted as well.

Sponsored-by: Dartmouth College's DANDI project
2022-10-10 17:02:45 -04:00
Joey Hess
90f9671e00
future proof AWS.Credentials generation
Avoid breaking when a field is added to the constructor.

Sponsored-by: Dartmouth College's DANDI project
2022-10-10 16:33:21 -04:00
Joey Hess
4a42c69092
take lock in checkLogFile and calcLogFile
move: Fix openFile crash with -J

This does make them a bit slower, although usually the log file is not
very big, so even when it's being rewritten, they will not block for
long taking the lock. Still, little slowdowns may add up when moving a lot
file files.

A less expensive fix would be to use something lower level than openFile
that does not check if the file is already open for write by another
thread. But GHC does not seem to provide anything convenient; even mkFD
checks for a writing thread.

fullLines is no longer necessary since these functions no longer will
read the file while it's being written.

Sponsored-by: Dartmouth College's DANDI project
2022-10-07 13:19:17 -04:00
Joey Hess
85dbc21c1c
fix typo 2022-10-07 12:30:07 -04:00