git-annex log --sizesof

This can take a lot of memory. I decided to violate the usual rule in
git-annex that it operate in constant memory no matter how many annexed
objects. In this case, it would be hard to be fast without using a big
map of the location logs. The main difficulty here is that there can be
many git-annex branches and it needs to display a consistent view at a
point in time, which means merging information from multiple git-annex
branches.

I have not checked if there are any laziness leaks in this code. It
takes 1 gb to run in my big repo, which is around what I estimated
before writing it.

2 options that are documented are not yet implemented.

Small bug: With eg --when=1h, it will display at 12:00 then 1:10 if the
next change after 12:59 is then. Then it waits until after 2:10 to
display the next change. It ought to wait until after 2:00.

Sponsored-by: Brock Spratlen on Patreon
This commit is contained in:
Joey Hess 2023-11-10 16:17:15 -04:00
parent 561c036664
commit 574514545c
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 244 additions and 32 deletions

View file

@ -12,8 +12,8 @@ Displays statistics and other information for the specified item.
When no item is specified, displays overall information. This includes a
list of all known repositories, how much annexed data is present in the
local repository, the total size of all annexed data in the working
tree, and the combined size of all annexed data in all known repositories.
local repository, and the total size of all annexed data in the working
tree.
When a directory is specified, displays information
about the annexed files in that directory (and subdirectories).

View file

@ -1,6 +1,6 @@
# NAME
git-annex log - shows location log
git-annex log - shows location log information
# SYNOPSIS
@ -8,21 +8,68 @@ git annex log `[path ...]`
# DESCRIPTION
Displays the location log for the specified file or files, showing each
repository they were added to ("+") and removed from ("-"). Note that the
location log is for the particular file contents currently at these paths,
not for any different content that was there in earlier commits.
This command displays information from the history of the git-annex branch.
This displays information from the history of the git-annex branch. Several
things can prevent that information being available to display. When
[[git-annex-dead]] and [[git-annex-forget]] are used, old historical
data gets cleared from the branch. When annex.private or
Several things can prevent that information being available to display.
When [[git-annex-dead]] and [[git-annex-forget]] are used, old historical
data gets cleared from the branch. When annex.private or
remote.name.annex-private is configured, git-annex does not write
information to the branch at all. And when annex.alwayscommit is set to
false, information may not have been committed to the branch yet.
# OPTIONS
* `[path ...]`
Displays the location log for the specified file or files, showing each
repository they were added to ("+") and removed from ("-"). Note that
it displays information about the file content currently at these paths,
not for any different content that was there in earlier commits.
* matching options
The [[git-annex-matching-options]](1)
can be used to control what to act on when displaying the location log
for specified files.
* `--all` `-A`
Shows location log changes to all content, with the most recent changes first.
In this mode, the names of files are not available and keys are displayed
instead.
* `--sizesof=repository`
Displays a history of the size of the annexed files in a repository as it
changed over time from the creation of the repository to the present.
The repository can be "here" for the current repository, or the name of a
remote, or a repository description or uuid.
Note that keys that do not have a known size are skipped.
* `--sizes`
This is like --sizesof, but rather than display the size of a single
repository, it displays the sizes of all known repositories in a table.
* `--totalsizes`
This is like `--sizesof`, but it displays the total size of all
known repositories.
Note that dead repositories have their size included in the total
for times before the point they were marked dead. Once marked dead,
their size will no longer be included in the total.
* `--when=time`
When using `--sizesof`, `--sizes`, and `--totalsizes`, this
controls how often to display the size. The default is to
display each change to the size.
The time is of the form "30d" or "1y".
* `--since=date`, `--after=date`, `--until=date`, `--before=date`, `--max-count=N`
These options are passed through to `git log`, and can be used to limit
@ -30,6 +77,13 @@ false, information may not have been committed to the branch yet.
For example: `--since "1 month ago"`
These options do not have an affect when using `--sizesof`, `--sizes`,
and `--totalsizes`.
* `--bytes`
Show sizes in bytes, disabling the default nicer units.
* `--raw-date`
Rather than the normal display of a date in the local time zone,
@ -38,27 +92,25 @@ false, information may not have been committed to the branch yet.
* `--gource`
Generates output suitable for the `gource` visualization program.
This option does not have an affect when using `--sizesof`, `--sizes`,
and `--totalsizes`.
* `--json`
Enable JSON output. This is intended to be parsed by programs that use
git-annex. Each line of output is a JSON object.
This option does not have an affect when using `--sizesof`, `--sizes`,
and `--totalsizes`.
* `--json-error-messages`
Messages that would normally be output to standard error are included in
the JSON instead.
* matching options
The [[git-annex-matching-options]](1)
can be used to control what to act on.
* `--all` `-A`
Shows location log changes to all content, with the most recent changes first.
In this mode, the names of files are not available and keys are displayed
instead.
This option does not have an affect when using `--sizesof`, `--sizes`,
and `--totalsizes`.
* Also the [[git-annex-common-options]](1) can be used.