Commit graph

1645 commits

Author SHA1 Message Date
Joey Hess
0fdc1a54db
git-annex log --received modifier option
Only counting received and not dropped makes this show the bandwidth of
data coming into the repository, although only in a sense. Since
git-annex branch updates only happen at the end of a command, and we
don't know when a command started, it's only an approximation of the
actual bandwidth. (A previous git-annex branch update made have
happened in a different repository.)

It would be possible to also add a --dropped option, but I don't know
how useful that would be?

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-11-14 14:04:46 -04:00
Joey Hess
574514545c
git-annex log --sizesof
This can take a lot of memory. I decided to violate the usual rule in
git-annex that it operate in constant memory no matter how many annexed
objects. In this case, it would be hard to be fast without using a big
map of the location logs. The main difficulty here is that there can be
many git-annex branches and it needs to display a consistent view at a
point in time, which means merging information from multiple git-annex
branches.

I have not checked if there are any laziness leaks in this code. It
takes 1 gb to run in my big repo, which is around what I estimated
before writing it.

2 options that are documented are not yet implemented.

Small bug: With eg --when=1h, it will display at 12:00 then 1:10 if the
next change after 12:59 is then. Then it waits until after 2:10 to
display the next change. It ought to wait until after 2:00.

Sponsored-by: Brock Spratlen on Patreon
2023-11-10 17:26:10 -04:00
Joey Hess
11cc9f1933
info: Added calculation of combined annex size of all repositories
Factored out overLocationLogs from CmdLine.Seek, which can calculate this
pretty fast even in a large repo. In my big repo, the time to run git-annex
info went up from 1.33s to 8.5s.

Note that the "backend usage" stats are for annexed files in the working
tree only, not all annexed files. This new data source would let that be
changed, but that would be a confusing behavior change. And I cannot
retitle it either, out of fear something uses the current title (eg parsing
the json).

Also note that, while time says "402108maxresident" in my big repo now,
up from "54092maxresident", top shows the RES constant at 64mb, and it
was 48mb before. So I don't think there is a memory leak. I tried using
deepseq to force full evaluation of addKeyCopies and memory use didn't
change, which also says no memory leak. And indeed, not even calling
addKeyCopies resulted in the same memory use. Probably the increased memory
usage is buffering the stream of data from git in overLocationLogs.

Sponsored-by: Brett Eisenberg on Patreon
2023-11-08 13:35:11 -04:00
Joey Hess
4e35067325
windows hook scripts newlines without CR
Windows: When git-annex init is installing hook scripts, it will
avoid ending lines with CR for portability.

Existing hook scripts that do have CR line endings will not be changed.
While it would be possible to have git-annex init upgrade them, users would
need to know to use that command to do that, and it would add complexity
that does not seem warranted for the portability benefit alone.

Sponsored-by: Luke T. Shumaker on Patreon
2023-11-02 13:37:04 -04:00
Joey Hess
f8d35d9480
lookupkey: Sped up --batch
When the file is relative, it does not need to be passed
through git lsfiles to normalize it.

Sponsored-by: Kevin Mueller on Patreon
2023-10-30 14:59:09 -04:00
Joey Hess
39ca30e004
Windows: Consistently avoid ending output lines with CR
This matches the behavior of git on Windows, which does not end lines with
CR either.

Previously, git-annex used to always write lines with putStrLn, so would
output CR on Windows. Then parts of it changed to use ByteString.putStrLn,
which does not output CR. That left its output inconsistent, sometimes
within the same command.

The point of this commit is to get back to consistency. Having the same
behavior as git is a nice bonus. It would be much harder to make it
consistently output CR, because every place it uses ByteString.putStrLn or
similar would need to be changed.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-10-30 14:43:43 -04:00
Joey Hess
eb42935e58
Windows: Fix CRLF handling in some log files
In particular, the mergedrefs file was written with CR added to each line,
but read without CRLF handling. This resulted in each update of the file
adding CR to each line in it, growing the number of lines, while also
preventing the optimisation from working, so it remerged unncessarily.

writeFile and readFile do NewlineMode translation on Windows. But the
ByteString conversion prevented that from happening any longer.

I've audited for other cases of this, and found three more
(.git/annex/index.lck, .git/annex/ignoredrefs, and .git/annex/import/). All
of those also only prevent optimisations from working. Some other files are
currently both read and written with ByteString, but old git-annex may have
written them with NewlineMode translation. Other files are at risk for
breakage later if the reader gets converted to ByteString.

This is a minimal fix, but should be enough, as long as I remember to use
fileLines when splitting a ByteString into lines. This leaves files written
using ByteString without CR added, but that's ok because old git-annex has
no difficulty reading such files.

When the mergedrefs file has gotten lines that end with "\r\r\r\n", this
will eventually clean it up. Each update will remove a single trailing CR.

Note that S8.lines is still used in eg Command.Unused, where it is parsing
git show-ref, and similar in Git/*. git commands don't include CR in their
output so that's ok.

Sponsored-by: Joshua Antonishen on Patreon
2023-10-30 14:23:23 -04:00
Joey Hess
0da1d40cd4
Improve memory use of --all when using annex.private
This does not improve Annex.Branch.files at all, since it still uses ++ to
combine the lists, so forcing all but the last one.

But when there are a lot of files in the private journal, it does avoid
--all (or a bare repo) from buffering the filenames in memory.

See commit 653b719472 for prior discussion of
this buffering.

Sponsored-by: Graham Spencer on Patreon
2023-10-24 13:20:55 -04:00
Joey Hess
8bde6101e3
sqlite datbase for importfeed
importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.

Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.

Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.

Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.

Sponsored-by: unqueued on Patreon
2023-10-23 16:46:22 -04:00
Joey Hess
6a61c7ff45
Fix crash of enableremote when the special remote has embedcreds=yes
The crash occurred because writeCreds got called twice, and writeFileProtected
neglected to close its file handle, so the file was open for write when
written the second time.

It seems unncessary and suboptimal that writeCreds gets called twice.
One call is from getRemoteCredPair and the other from setRemoteCredPair'.
What happens is that in the enableremote case, code that also runs at
initremote does unncessary work. Might be possible to improve that, but
I've gone for the simple fix.

Sponsored-by: k0ld on Patreon
2023-10-20 13:19:12 -04:00
Joey Hess
c268dc5878
only stage regular files from the journal
git-annex only writes regular files there, but other things may drop junk
like empty .DAV directories around the tree. And trying to hash such things
can have weird and hard to understand effects. So it seems best to do a
small amount of work in statting the journal file to make sure it's a
regular file.

Sponsored-by: Jack Hill on Patreon
2023-10-10 13:22:02 -04:00
Joey Hess
b9240d2c5d
releasing package git-annex version 10.20230926 2023-09-26 13:29:49 -04:00
Joey Hess
41f4d0bda9
enableremote: Avoid overwriting existing git remote when passed the uuid of a specialremote that was earlier initialized with the same name 2023-09-22 13:29:48 -04:00
Joey Hess
54da44d42a
Support being built with crypton rather than cryptonite
crypton is a fork of cryptonite, and cryptonite's github repo has been
archived. Some deps are already using cryptonite so it's clearly the way
forward.

Added a build flag without a default, so cabal configure will select on its
own which to use. stack files pin to cryptonite for now.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-09-21 12:43:42 -04:00
Joey Hess
a18e40bdd7
lookupkey: Added --ref option
Sponsored-by: Joshua Antonishen on Patreon
2023-09-12 12:49:11 -04:00
Joey Hess
7be8950138
propigateAdjustedCommits in seekExportContent
push: When on an adjusted branch, propagate changes to parent branch before
updating export remotes.

This is a somewhat redundant call to propigateAdjustedCommits, since it
also gets called at pushLocal time. That other one needs to come after
importing from importtree remotes though, and seekExportContent has to come
earlier, so I don't see a way to avoid doing it twice.

Note that git-annex sync also manages to avoid the problem, it's only
git-annex push that had the bug.

Sponsored-by: Leon Schuermann on Patreon
2023-09-11 14:54:26 -04:00
Joey Hess
29ae536637
adb send to final filename not tmp file
Avoids some problems with unusual character in exporttree filenames
that confuse adb shell commands.

In particular, with a filename that contains \351, adb push sends the file
to the correct filename in /sdcard. And running find on the android device
roundtrips the filename. But, running mv on that filename on the android
device fails with "bad <filename>: No such file or directory".
Interestingly, ls on android works, and rm fails.

adb push to the final name to avoids this problem. But what about
atomicity? Well, I tried an adb push and interrupted it part way through.
The file was present while the push was running, but was removed once the
push got interrupted. I also tried yanking the cable while adb push was
running, and the partially received file was also deleted then. That avoids
most problems.

An import that runs at the same time as an export will see the partially
sent file. But that is unlikely to be done, and if it did happen, it would
notice that the imported file had changed in the meantime and discard it.

Note that, since rm on the android device fails on these filenames,
exporting a tree where the file is deleted is going to fail to remove it. I
don't see what I can do about that, so long as android is using an rm that
has issues with filename encodings.

This was tested on a phone where find, ls, and rm all come from Toybox 0.8.6.

Sponsored-by: unqueued on Patreon
2023-09-11 13:13:05 -04:00
Joey Hess
baf8e4f6ed
Override safe.bareRepository for git remotes
Fix using git remotes that are bare when git is configured
with safe.bareRepository = explicit

Sponsored-by: Dartmouth College's DANDI project
2023-09-07 14:56:26 -04:00
Joey Hess
cbfd214993
set safe.directory when getting config for git-annex-shell or git remotes
Fix more breakage caused by git's fix for CVE-2022-24765, this time
involving a remote (either local or ssh) that is a repository not owned by
the current user.

Sponsored-by: Dartmouth College's DANDI project
2023-09-07 14:40:50 -04:00
Joey Hess
32cb2bd3fa
Fix linker optimisation in linux standalone tarballs
Was only symlinking when there is a usr/ directory, but with usr/ merge,
there are none.

Sponsored-by: Dartmouth College's Datalad project
2023-09-07 12:59:27 -04:00
Joey Hess
50300a47fe
Removed the vendored git-lfs and the GitLfs build flag
AFAICS all git-annex builds are using the git-lfs library not the vendored
copy.

Debian stable now includes a new enough haskell-git-lfs package as well.
Last time this was tried it did not.
2023-08-28 13:12:31 -04:00
Joey Hess
a0a42e7ec1
releasing package git-annex version 10.20230828 2023-08-28 13:04:06 -04:00
Joey Hess
43e2a66a31
wording 2023-08-28 12:13:58 -04:00
Joey Hess
cf8b30c914
oldkeys: New command that lists the keys used by old versions of a file
The tricky thing about this turned out to be handling renames and reverts.
For that, it has to make two passes over the git log, and to avoid
buffering a possibly huge amount of logs in memory (ie the whole git log of
an entire repository!), runs git log twice.

(It might be possible to speed this up by asking git log to show a diff,
and so avoid needing to use catKey.)

Sponsored-By: Brock Spratlen on Patreon
2023-08-22 14:51:06 -04:00
Joey Hess
379d58b499
diffdriver: Added --get option
Removed the dontCheck repoExists, because running it in a repo that has not
been initialized yet would update location log with nouuid. And I guess
it's ok for it to only support running in git-annex repos.
2023-08-22 11:58:53 -04:00
Joey Hess
724ceeb1a9
avoid unncessary use of curl when conduit will do
Avoid using curl when annex.security.allowed-ip-addresses is set but
neither annex.web-options nor annex.security.allowed-url-schemes is set to
a value that needs curl.

Bug introduced in 840bd50390

Sponsored-By: Brock Spratlen on Patreon
2023-08-22 10:25:53 -04:00
Joey Hess
7aac60769a
implement Unavilable for gcrypt
Sponsored-by: Brett Eisenberg on Patreon
2023-08-16 15:54:54 -04:00
Joey Hess
977403d338
implement Unavilable for borg bup ddar directory rsync
Only gcrypt remains to add support for. (Well, possibly also adb?)

Sponsored-by: Luke T. Shumaker on Patreon
2023-08-16 15:48:09 -04:00
Joey Hess
67c99a4db7
info: Added available to the info displayed for a remote
Sponsored-by: Kevin Mueller on Patreon
2023-08-16 14:52:58 -04:00
Joey Hess
9286769d2c
let Remote.availability return Unavilable
This is groundwork for making special remotes like borg be skipped by
sync when on an offline drive.

Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension
to the external special remote protocol. The extension is needed because
old git-annex, if it sees that response, will display a warning
message. (It does continue as if the remote is globally available, which
is acceptable, and the warning is only displayed at initremote due to
remote.name.annex-availability caching, but still it seemed best to make
this a protocol extension.)

The remote.name.annex-availability git config is no longer used any
more, and is documented as such. It was only used by external special
remotes to cache the availability, to avoid needing to start the
external process every time. Now that availability is queried as an
Annex action, the external is only started by sync (and the assistant),
when they actually check availability.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-08-16 14:31:31 -04:00
Joey Hess
75275ed41f
update
Last commit also removed curl from linux standalone tarball.
Which may or may not have been a mistake.. I'm inclined to go ahead and
simplify it.
2023-08-15 14:22:30 -04:00
Joey Hess
571a516ed2
Stop bundling curl in the OSX dmg
New curl binary links to libldap with a @loader_path that prevents using
the binary when the dmg is used elsewhere.
See https://github.com/datalad/git-annex/issues/170

git-annex doesn't use curl by default anyway, so it doesn't really need to
be included in the dmg.
2023-08-15 14:21:53 -04:00
Joey Hess
10b5f79e2d
fix empty tree import when directory does not exist
Fix behavior when importing a tree from a directory remote when the
directory does not exist. An empty tree was imported, rather than the
import failing. Merging that tree would delete every file in the
branch, if those files had been exported to the directory before.

The problem was that dirContentsRecursive returned [] when the directory
did not exist. Better for it to throw an exception. But in commit
74f0d67aa3 back in 2012, I made it never
theow exceptions, because exceptions throw inside unsafeInterleaveIO become
untrappable when the list is being traversed.

So, changed it to list the contents of the directory before entering
unsafeInterleaveIO. So exceptions are thrown for the directory. But still
not if it's unable to list the contents of a subdirectory. That's less of a
problem, because the subdirectory does exist (or if not, it got removed
after being listed, and it's ok to not include it in the list). A
subdirectory that has permissions that don't allow listing it will have its
contents omitted from the list still.

(Might be better to have it return a type that includes indications of
errors listing contents of subdirectories?)

The rest of the changes are making callers of dirContentsRecursive
use emptyWhenDoesNotExist when they relied on the behavior of it not
throwing an exception when the directory does not exist. Note that
it's possible some callers of dirContentsRecursive that used to ignore
permissions problems listing a directory will now start throwing exceptions
on them.

The fix to the directory special remote consisted of not making its
call in listImportableContentsM use emptyWhenDoesNotExist. So it will
throw an exception as desired.

Sponsored-by: Joshua Antonishen on Patreon
2023-08-15 12:57:41 -04:00
Joey Hess
adda6c1088
Add git-annex remote refs that are not newer to the merged refs list
Significant startup speed increase by avoiding repeatedly checking if some
remote git-annex branch refs need to be merged when it is not newer.

One way this could happen is when there are 2 remotes that are themselves
connected. The git-annex branch on the first remote gets updated. Then the
second remote pulls from the first, and merges in its git-annex branch.
Then the local repo pulls from the second remote, and merges its git-annex
branch. At this point, a pull from the first remote will get a git-annex
branch that is not newer, but is not on the merged refs list.

In my big repo, git-annex startup time dropped from 4 seconds to 0.1 seconds.
There were 5 to 10 such remote refs out of 18 remotes.

Sponsored-by: Graham Spencer on Patreon
2023-08-09 13:31:36 -04:00
Joey Hess
3efad7f5f4
info: Added --dead-repositories option
I considered a more wide-ranging config option to make other commands
also show dead repositories. But it would be difficult to implement that
because Remote.keyLocations is used to get locations, filtering out dead
repos, and commands like get then try to use those locations. So a config
setting would make dead repos sometimes be acted on by commands.

Sponsored-by: unqueued on Patreon
2023-08-09 12:43:48 -04:00
Joey Hess
6d83bcff0f
Fix behavior of onlyingroup
Sponsored-by: k0ld on Patreon
2023-08-07 13:05:11 -04:00
Joey Hess
d19139a10d
releasing package git-annex version 10.20230802 2023-08-02 16:09:14 -04:00
Joey Hess
6da6449fff
stack.yaml: Update to build with ghc-9.6.2 and aws-0.24
This enables some new features that need the new aws.

Use http-client-restricted-0.1.0 because it uses the crypton side of the
cryptonite/crypton fork, which seems to be needed for ghc-9.6.2.

Dependency on connection removed because of the cryptonite/crypton fork.
This avoids needing a build flag. It was only used to throw a typed
exception in Utility.Url, which nothing depended on.

Used a fork of bloomfilter because it's not being maintained and no longer
builds as-of this ghc version. (I have been trying to contact its
maintainer about it, and emailed him today suggesting I take over the
package.)

Sponsored-by: Brock Spratlen on Patreon
2023-08-01 18:53:26 -04:00
Joey Hess
68c9b08faf
fix build with unix-2.8.0
Changed the parameters to openFd. So needed to add a small wrapper
library to keep supporting older versions as well.
2023-08-01 18:41:27 -04:00
Joey Hess
fb640bc2f4
support building with unix-compat 0.7
It removed System.PosixCompat.User.
2023-08-01 15:17:43 -04:00
Joey Hess
393275c105
Setup.hs: Stop installing man pages, desktop files, and the git-annex-shell and git-remote-tor-annex symlinks
Anything still relying on that, eg via cabal v1-install will need to
change to using make install-home. Which was added back in 2019 in
6491b62614 because cabal new-build
(now the default) already didn't use Setup in a way that let its
installation of those things work.

Notably this means Setup does not need to depend on unix-compat, which is
useful because in 0.7 it removed System.PosixCompat.User, which Setup
needed to determine where to install the desktop files. See
https://github.com/haskell-pkg-janitors/unix-compat/issues/3
2023-08-01 15:08:56 -04:00
Joey Hess
fa92383993
onlyingroup
* Support "onlyingroup=" in preferred content expressions.
* Support --onlyingroup= matching option.

Sponsored-by: Jack Hill on Patreon
2023-07-31 14:43:58 -04:00
Joey Hess
518a51a8a0
--explain for preferred/required content matching
And annex.largefiles and annex.addunlocked.

Also git-annex matchexpression --explain explains why its input
expression matches or fails to match.

When there is no limit, avoid explaining why the lack of limit
matches. This is also done when no preferred content expression is set,
although in a few cases it defaults to a non-empty matcher, which will
be explained.

Sponsored-by: Dartmouth College's DANDI project
2023-07-26 14:50:04 -04:00
Joey Hess
f25eeedeac
initial implementation of --explain
Currently it only displays explanations of options like --in and --copies.

In the future, it should explain preferred content expression evaluation
and other decisions.

The explanations of a few things could be better. In particular,
"standard" will just appear as-is (or as "!standard" if it doesn't
match), rather than explaining why the standard preferred content expression
for the group matches or not.

Currently as implemented, it goes to stdout, and so commands like
git-annex find that have custom output will not display --explain
information. Perhaps that should change, dunno.

Sponsored-by: Dartmouth College's DANDI project
2023-07-25 16:52:57 -04:00
Joey Hess
2807ab0a09
gcrypt: Remove empty hash directories when dropping content
As was recently done with the directory special remote.

Note that the top directory passed to removeDirGeneric was changed to
avoid deleting .git/annex or .git/annex/objects if they ended up empty.

Sponsored-by: Brett Eisenberg on Patreon
2023-07-21 16:04:11 -04:00
Joey Hess
3b34266e9e
typo 2023-07-21 15:36:01 -04:00
Joey Hess
b15366494a
directory: Remove empty hash directories when dropping content
Failure to remove is not treated as a problem, and no permissions
modifications are done, to avoid unexpected states.

Sponsored-by: Luke Shumaker on Patreon
2023-07-21 14:57:29 -04:00
Joey Hess
7f38355860
dropunused: Support --jobs
Sponsored-by: Kevin Mueller on Patreon
2023-07-21 14:03:34 -04:00
Joey Hess
33ba537728
deal with Amazon S3 breaking change for public=yes
* S3: Amazon S3 buckets created after April 2023 do not support ACLs,
  so public=yes cannot be used with them. Existing buckets configured
  with public=yes will keep working.
* S3: Allow setting publicurl=yes without public=yes, to support
  buckets that are configured with a Bucket Policy that allows public
  access.

Sponsored-by: Joshua Antonishen on Patreon
2023-07-21 13:59:07 -04:00
Joey Hess
7fc6503812
fix waiting for all started feed downloads with -J
importfeed bug fix: When -J was used with multiple feeds, some feeds did
not get their items downloaded.

In my case, I had added a feed to the end of the list, and no items from it
were ever downloaded.

Sponsored-by: Leon Schuermann on Patreon
2023-07-11 22:08:35 -04:00