Commit graph

1460 commits

Author SHA1 Message Date
Joey Hess
04ec726d3b
S3 region=
S3: Support a region= configuration useful for some non-Amazon S3
implementations. This feature needs git-annex to be built with aws-0.24.

datacenter= sets both the AWS hostname and region in one setting, which is
easy when using AWS, but not useful for other hosts. So kept datacenter
as-is, but added this additional config.

Sponsored-By: Brett Eisenberg on Patreon
2023-02-06 14:08:45 -04:00
Joey Hess
65167463aa
releasing package git-annex version 10.20230126 2023-01-26 15:27:32 -04:00
Joey Hess
3a08c80dd8
improve wording 2023-01-24 14:11:32 -04:00
Joey Hess
a6c1d9752b
move/copy: option parsing for --from with --to
Allowing --from and --to as an alternative to --from or --to
is hard to do with optparse-applicative!

The obvious approach of (pfrom <|> pto <|> pfromandto) does not work
when pfromandto uses the same option names as pfrom and pto do.
It compiles but the generated parser does not work for all desired
combinations.

Instead, have to parse optionally from and optionally to. When neither
is provided, the parser succeeds, but it's a result that can't be
handled. So, have to giveup after option parsing. There does not seem to
be a way to make an optparse-applicative Parser give up internally
either.

Also, need seek' because I first tried making fto be a where binding,
but that resulted in a hang when git-annex move was run without --from
or --to. I think because startConcurrency was not expecting the stages
value to contain an exception and so ended up blocking.

Sponsored-by: Dartmouth College's DANDI project
2023-01-18 14:42:39 -04:00
Joey Hess
f8bc208e89
findkeys: New command, very similar to git-annex find but operating on keys
I've long been asked for `git-annex find --all` or something like that,
but pushed back on it because I feel that the command is analagous to
find(1) and so it would be surprising for it to list keys rather than
files. So instead, add a new findkeys subcommand.

Note that the use of withKeyOptions is rather strange because usually
that is used to fall back to --all rather than listing files, but here
it's made to default to --all like behavior and never list files.

A performance thing that could be improved is that withKeyOptions
always reads and caches location logs. But findkeys with no options does
not need them, so it could be made faster. That caching does speed up
options like --in though. This is really just a subset of a more general
performance thing that --all reads location logs sometimes unncessarily.
Anyway, it needs to read the location log in order to checkDead,
and it seems good that findkeys does skip dead keys.

Also, cleaned up comments on git-annex-find man page asking for --all
option.

Sponsored-by: Dartmouth College's DANDI project
2023-01-17 14:51:57 -04:00
Joey Hess
cfaae7e931
added an optional cost= configuration to all special remotes
Note that when this is specified and an older git-annex is used to
enableremote such a special remote, it will simply ignore the cost= field
and use whatever the default cost is.

In passing, fixed adb to support the remote.name.cost and
remote.name.cost-command configs.

Sponsored-by: Dartmouth College's DANDI project
2023-01-12 13:42:28 -04:00
Joey Hess
6fa166e1fc
web: Add urlinclude and urlexclude configuration settings
Sponsored-by: Dartmouth College's DANDI project
2023-01-09 17:16:53 -04:00
Joey Hess
8d06930c88
web special remote is no longer a singleton
Allow initremote of additional special remotes with type=web, in addition
to the default web special remote.

When --sameas=web is used, these provide additional names for the web
special remote, and may also have their own additional configuration
(once there is any for the web special remote) and cost.

Sponsored-by: Dartmouth College's DANDI project
2023-01-09 15:49:20 -04:00
Joey Hess
f316b7f105
Revert "Removed the vendored git-lfs and the GitLfs build flag"
This reverts commit efda811404.

Turns out that datalad is building git-annex against debian bullseye.
https://github.com/datalad/git-annex/issues/149
2023-01-04 17:33:29 -04:00
Joey Hess
7919b9d57a
changelog for cf892f4256 2023-01-02 12:36:10 -04:00
Joey Hess
efda811404
Removed the vendored git-lfs and the GitLfs build flag
AFAICS all git-annex builds are using the git-lfs library not the vendored
copy.

Debian stable does have a too old haskell-git-lfs package to be able to
build git-annex from source, but there is not currently a backport of a
recent git-annex to Debian stable. And if they update the backport at some
point, they should be able to backport the library too.

Sponsored-by: Svenne Krap on Patreon
2022-12-26 12:49:53 -04:00
Joey Hess
d475f82c62
Added libgcc_s.so.1 to the linux standalone build so pthread_cancel will work
In Makefile, listed additional deps of Build/Standalone. Without that,
it does not get updated for the change to Utility/LinuxMkLibs.hs when
compiling incrementally.

Sponsored-by: Dartmouth College's DANDI project
2022-12-22 15:15:25 -04:00
Joey Hess
eb8e0594bb
use status --ignore-submodules in configureSmudgeFilter
Speed up git-annex upgrade (from v5) and init in a repository that has
submodules. Setting the config does not affect the submodules, so avoid
the work of getting status in them, which may involve using the smudge
filter etc.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2022-12-20 16:02:42 -04:00
Joey Hess
0b2dd374d8
--anything and --nothing
Added --anything (and --nothing). Eg, git-annex find --anything will list
all annexed files whether or not the content is present. This is slightly
faster and clearer than --include=* or --exclude=*

While I can't imagine how --nothing will be used, preferred content
expressions already had anything and nothing, so might as well support both
as matching options as well.

Sponsored-by: Dartmouth College's Datalad project
2022-12-20 15:44:09 -04:00
Joey Hess
9d60385001
convert renameFile to moveFile to support cross-device moves
Improve handling of some .git/annex/ subdirectories being on other
filesystems, in the bittorrent special remote, and youtube-dl integration,
and git-annex addurl.

The only one of these that I've confirmed to be a problem is in the
bittorrent special remote when .git/annex/tmp and .git/annex/othertmp are
on different filesystems.

As well as auditing for renameFile, also audited for createLink, all of
those are ok as are the other remaining renameFile calls. Also audited all
code paths that use .git/annex/othertmp, and did not find any other
cross-device problems. So, removing mention of othertmp needing to be on
the same device.

Sponsored-by: Dartmouth College's Datalad project
2022-12-20 15:17:50 -04:00
Joey Hess
aa6919737c
--metadata lexicographical comparisons
Change --metadata comparisons < > <= and >= to fall back to lexicographical
comparisons when one or both values being compared are not numbers.

Sponsored-by: Erik Bjäreholt on Patreon
2022-12-12 13:33:24 -04:00
Joey Hess
ab11fd70e2
releasing package git-annex version 10.20221212 2022-12-12 12:51:59 -04:00
Joey Hess
65f9e7a3c7
fix deadlock in restagePointerFiles
Fix a hang that occasionally occurred during commands such as move.
(A bug introduced in 10.20220927, in
commit 6a3bd283b8)

The restage.log was kept locked while running a complex index refresh
action. In an unusual situation, that action could need to write to the
restage log, which caused a deadlock.

The solution is a two-stage process. First the restage.log is moved to a
work file, which is done with the lock held. Then the content of the work
file is read and processed, which happens without the lock being held.
This is all done in a crash-safe manner.

Note that streamRestageLog may not be fully safe to run concurrently
with itself. That's ok, because restagePointerFiles uses it with the
index lock held, so only one can be run at a time.

streamRestageLog does delete the restage.old file at the end without
locking. If a calcRestageLog is run concurrently, it will either see the
file content before it was deleted, or will see it's missing. Either is
ok, because at most this will cause calcRestageLog to report more
work remains to be done than there is.

Sponsored-by: Dartmouth College's Datalad project
2022-12-08 14:36:11 -04:00
Joey Hess
2b5e6ff20a
test: Add --test-debug option
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2022-11-28 15:12:53 -04:00
Joey Hess
43f681d4c1
Support parsing yt-dpl output to display download progress
Before this fix, no progress was displayed when yt-dpl was used.

Sponsored-by: Graham Spencer on Patreon
2022-11-21 15:04:36 -04:00
Joey Hess
5256be61c1
When youtube-dl is not available in PATH, use yt-dlp instead
Debian is going to drop youtube-dl which is not active upstream, and yt-dlp
is the replacement. This will make it be used if youtube-dl gets removed.

If an old version of youtube-dl remains installed, git-annex will still use
it. That might not be desirable, but changing git-annex to use yt-dlp in
preference to youtube-dl when both are installed risks breaking when
the user has annex.youtube-dl-options set to something that is supported
by youtube-dl, but not by yt-dlp.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2022-11-21 14:40:33 -04:00
Joey Hess
2b014f1a8b
don't frontload reconcileStaged in git-annex init
init: Avoid scanning for annexed files, which can be lengthy in a
large repository. Instead that scan is done on demand. This lets git-annex
init be run and some query commands be used in a repository without
waiting.

Note that autoinit already behaved this way, so while this will mean some
commands like git-annex get/unlock/add will do the scan the first time run,
that is not really a significant behavior change.

And, it's really better to have a consistent behavior. The reason for
the inconsistency was a strange bug discussed in
b3c4579c79. Avoiding reconcileStaged in
init will keep avoiding whatever that was.

Sponsored-by: Dartmouth College's DANDI project
2022-11-18 13:58:47 -04:00
Joey Hess
c834d2025a
queue more changes to keys db
Increasing the size of the queue 10x makes git-annex init 7% faster in a
repository with 86000 annexed files.

The memory use goes up, from 70876 kb to 85376 kb.
2022-11-18 13:29:34 -04:00
Joey Hess
8fcee4ac9d
Sped up the initial scanning for annexed files by 15%
Avoids database querying overhead when the database is newly created.

In the large repository where git-annex init took 24 seconds, this sped it
up to 20.47 seconds, a speedup of around 15%.

Sponsored-by: Dartmouth College's DANDI project
2022-11-18 13:16:57 -04:00
Joey Hess
a3e9a0ae27
update changelog 2022-11-18 12:58:13 -04:00
Joey Hess
def779b250
revert change to use lts-19.32
This reverts commit 15dd7fe84b.

aws 0.23 is not used any longer, so read-only S3 import won't be
supported yet when building with stack.

That commit broke the build on windows, because the new version of Win32 that
was included (because the old one does not work with this lts version)
needs a version of filepath that is newer than the one bundled with the
ghc in that lts version. It is not possible to override that to a newer
filepath.

Seems that the only solution to get aws 0.23 will be to wait for a ghc that
contains filepath 1.4.100.0. No ghc yet contains it. (Backporting the Win32
fix to a point release version that does not include this bleeding edge
filepath would also resolve it, but seems unlikely to happen.)

Sponsored-by: Jarkko Kniivilä on Patreon
2022-11-14 11:56:55 -04:00
Joey Hess
b2cc63d5bf
export: fix multi-file delete bug
export: Fix a bug that left a file on a special remote when two files with
the same content were both deleted in the exported tree.

Case of the wrong data structure leading to the wrong result.
The DiffMap now contains all the old filenames, and all the new filenames.

Note that, when 2 files with the same content are both renamed,
it only renames the first, but deletes and re-exports the second.
Improving that is possible, but it would need to use a different temporary
filename. Anyway, that is an unusual case, and there are known to be other
unusual cases where export does not rename with maximum efficiency, IIRC.
(Or maybe this is the case that I remember?)

Sponsored-by: Dartmouth College's OpenNeuro project
2022-11-09 16:24:37 -04:00
Joey Hess
15dd7fe84b
stack.yaml: Updated to lts-19.32
This allows building with aws-0.23

Win32-2.13.4.0 contains a function that is not in lts-19.32 yet. Adding
it to stack.yaml does not seem to cause problems when building on linux.
2022-11-09 14:29:00 -04:00
Joey Hess
e100993935
complete support for S3 signature=anonymous
aws-0.23 has been released.

When built with an older aws, initremote will error out when run
with signature=anonymous. And when a remote has been initialized with
that by a version of git-annex that does support it, older versions will
fail when the remote is accessed, with a useful error message.

Sponsored-by: Dartmouth College's DANDI project
2022-11-04 16:20:28 -04:00
Joey Hess
de1e8201a6
Merge branch 'master' into anons3 2022-11-04 15:08:29 -04:00
Joey Hess
d22bd53310
releasing package git-annex version 10.20221103 2022-11-03 14:07:53 -04:00
Joey Hess
14f7a386f0
Make git-annex enable-tor work when using the linux standalone build
Clean the standalone environment before running the su command
to run "sh". Otherwise, PATH leaked through, causing it to run
git-annex.linux/bin/sh, but GIT_ANNEX_DIR was not set,
which caused that script to not work:

[2022-10-26 15:07:02.145466106] (Utility.Process) process [938146] call: pkexec ["sh","-c","cd '/home/joey/tmp/git-annex.linux/r' && '/home/joey/tmp/git-annex.linux/git-annex' 'enable-tor' '1000'"]
/home/joey/tmp/git-annex.linux/bin/sh: 4: exec: /exe/sh: not found

Changed programPath to not use GIT_ANNEX_PROGRAMPATH,
but instead run the scripts at the top of GIT_ANNEX_DIR.
That works both when the standalone environment is set up, and when it's
not.

Sponsored-by: Kevin Mueller on Patreon
2022-10-26 15:45:08 -04:00
Joey Hess
731e806c96
use lookupKeyStaged in --batch code paths
Make --batch mode handle unstaged annexed files consistently whether the
file is unlocked or not. Before this, a unstaged locked file
would have the symlink on disk examined and operated on in --batch mode,
while an unstaged unlocked file would be skipped.

Note that, when not in batch mode, unstaged files are skipped over too.
That is actually somewhat new behavior; as late as 7.20191114 a
command like `git-annex whereis .` would operate on unstaged locked
files and skip over unstaged unlocked files. That changed during
optimisation of CmdLine.Seek with apparently little fanfare or notice.

Turns out that rmurl still behaved that way when given an unstaged file
on the command line. It was changed to use lookupKeyStaged to
handle its --batch mode. That also affected its non-batch mode, but
since that's just catching up to the change earlier made to most
other commands, I have not mentioed that in the changelog.

It may be that other uses of lookupKey should also change to
lookupKeyStaged. But it may also be that would slow down some things,
or lead to unwanted behavior changes, so I've kept the changes minimal
for now.

An example of a place where the use of lookupKey is better than
lookupKeyStaged is in Command.AddUrl, where it looks to see if the file
already exists, and adds the url to the file when so. It does not matter
there whether the file is staged or not (when it's locked). The use of
lookupKey in Command.Unused likewise seems good (and faster).

Sponsored-by: Nicholas Golder-Manning on Patreon
2022-10-26 14:43:06 -04:00
Joey Hess
cde2e61105
improve sqlite retrying behavior
Avoid hanging when a suspended git-annex process is keeping a sqlite
database locked.

Sponsored-by: Dartmouth College's Datalad project
2022-10-18 15:47:20 -04:00
Joey Hess
3149a1e2fe
More robust handling of ErrorBusy when writing to sqlite databases
While ErrorBusy and other exceptions were caught and the write retried for
up to 10 seconds, it was still possible for git-annex to eventually
give up and error out without writing to the database. Now it will retry
as long as necessary.

This does mean that, if one git-annex process is suspended just as sqlite
has locked the database for writing, another git-annex that tries to write
it it might get stuck retrying forever. But, that could already happen when
opening the sqlite database, which retries forever on ErrorBusy. This is an
area where git-annex is known to not behave well, there's a todo about the
general case of it.

Sponsored-by: Dartmouth College's Datalad project
2022-10-17 15:56:19 -04:00
Joey Hess
6fbd337e34
avoid uncessary keys db writes; doubled speed!
When running eg git-annex get, for each file it has to read from and
write to the keys database. But it's reading exclusively from one table,
and writing to a different table. So, it is not necessary to flush the
write to the database before reading. This avoids writing the database
once per file, instead it will buffer 1000 changes before writing.

Benchmarking getting 1000 small files from a local origin,
git-annex get now takes 13.62s, down from 22.41s!
git-annex drop now takes 9.07s, down from 18.63s!
Wowowowowowowow!

(It would perhaps have been better if there were separate databases for
the two tables. At least it would have avoided this complexity. Ah well,
this is better than splitting the table in a annex.version upgrade.)

Sponsored-by: Dartmouth College's Datalad project
2022-10-12 15:33:16 -04:00
Joey Hess
c2ad84b423
all keys are still present on versioned remote after import of a tree
When importing from versioned remotes, fix tracking of the content of
deleted files.

Only S3 supports versioning so far, so only it was affected.

But, the draft import/export interface for external remotes also seemed to
need a change, so that versionedExport could be set.
2022-10-11 13:05:40 -04:00
Joey Hess
b4305315b2
S3: pass fileprefix into getBucket calls
S3: Speed up importing from a large bucket when fileprefix= is set by only
asking for files under the prefix.

getBucket still returns the files with the prefix included, so the rest of
the fileprefix stripping still works unchanged.

Sponsored-by: Dartmouth College's DANDI project
2022-10-10 17:37:26 -04:00
Joey Hess
ca91c3ba91
S3: Support signature=anonymous to access a S3 bucket anonymously
This can be used, for example, with importtree=yes to import from a public
bucket.

This needs a patch that has not yet landed in the aws library, and will
need to be adjusted to support compiling with old versions of the library,
so is not yet suitable for merging.
See https://github.com/aristidb/aws/pull/281

The stack.yaml changes are provided to show how to build against the aws
fork and will need to be reverted as well.

Sponsored-by: Dartmouth College's DANDI project
2022-10-10 17:02:45 -04:00
Joey Hess
4a42c69092
take lock in checkLogFile and calcLogFile
move: Fix openFile crash with -J

This does make them a bit slower, although usually the log file is not
very big, so even when it's being rewritten, they will not block for
long taking the lock. Still, little slowdowns may add up when moving a lot
file files.

A less expensive fix would be to use something lower level than openFile
that does not check if the file is already open for write by another
thread. But GHC does not seem to provide anything convenient; even mkFD
checks for a writing thread.

fullLines is no longer necessary since these functions no longer will
read the file while it's being written.

Sponsored-by: Dartmouth College's DANDI project
2022-10-07 13:19:17 -04:00
Joey Hess
15f9fcbcb1
avoid combining multiple words provided to trust/untrust/dead
* trust, untrust, semitrust, dead: Fix behavior when provided with
  multiple repositories to operate on.
* trust, untrust, semitrust, dead: When provided with no parameters,
  do not operate on a repository that has an empty name.

The man page and usage already indicated that multiple repos could be
provided to these commands, but they actually used unwords to combine
everything into string, and found a repo matching that string. This was
especially bad when no parameters resulted in the empty string and some
repo happened to have an empty description.

This does change the behavior, and it's possible someone relied on the
current behavior to eg, trust a repo by name with the name not quoted into
a single parameter. But fixing the empty string bug and matching the
documentation are worth breaking that usage.

Note that git-annex init/reinit do still unwords multiple parameters when
provided to them. That is inconsistent behavior, but it certianly seems
possible that something does run git-annex init with an unquoted
description, and I don't think it's worth breaking that just to make it more
consistent with these other commands.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2022-10-03 13:48:40 -04:00
Joey Hess
32a44c3813
releasing package git-annex version 10.20221003 2022-10-03 13:24:21 -04:00
Joey Hess
1328be2013
applied a patch 2022-09-30 14:04:10 -04:00
Joey Hess
49ee07f93d
fix flush of a closed file handle
Avoids displaying warning about git-annex restage needing to be run in
situations where it does not.

Closing a handle flushes it anyway, so no need for an explict flush. The
handle does get closed twice, but that's fine, the second one does nothing.

Sponsored-by: Dartmouth College's DANDI project
2022-09-30 14:02:31 -04:00
Joey Hess
02fca53cb2
releasing package git-annex version 10.20220927 2022-09-27 13:31:55 -04:00
Joey Hess
7059322a6c
Support "inbackend" in preferred content expressions
Well, actually, fix a typo that has always been in the implementation of
that. "inbacked" used to work, but let's not tell users about that; they
might try to use it and expect git-annex to keep supporting the typo..

Sponsored-by: Jack Hill on Patreon
2022-09-26 16:06:49 -04:00
Joey Hess
17129fed66
fix wormhole --appid option position
p2p: Pass wormhole the --appid option before the receive/send command, as
it does not accept that option after the command

I'm left wondering, did I get this wrong from the beginning, or did
wormhole change its option parser? I'm reminded of the change in 0.8.2
where it silently changed what FD the pairing code was output to.
But, looking at the wormhole source, it was at least putting --appid before
send in its test suite from the introduction of the option.

So I think probably this has always been broken. On 2021-12-31 the --appid
option was enabled, and it took until now for someone to try
git-annex p2p --pair and notice that flag day broke it..

Sponsored-by: Svenne Krap on Patreon
2022-09-26 15:11:38 -04:00
Joey Hess
ce65f11de0
enable-tor: Fix breakage caused by git's fix for CVE-2022-24765
This relies on bfa451fc4e and is a bit of an
ugly hack.

Sponsored-by: Noam Kremen on Patreon
2022-09-26 14:48:58 -04:00
Joey Hess
bfa451fc4e
pass --git-dir when reading git config when it was specified explicitly
Let GIT_DIR and --git-dir override git's protection against operating in a
repository owned by another user.

This is the same behavior other git commands have.

Sponsored-by: Jarkko Kniivilä on Patreon
2022-09-26 14:38:34 -04:00
Joey Hess
79aadf63d4
changelog and close 2022-09-26 13:11:23 -04:00