Commit graph

2812 commits

Author SHA1 Message Date
Joey Hess
1e31bf8122
copy/move --from-anywhere --to remote
Implementation was simple because it's equivilant to
--from=foo --to remote for each other remote, followed by
--to remote when there's a local copy.

(Or, in the edge case of --from-anywhere --to=here,
it's the same as --to=here.)

Note that, when the local repo does not have a copy,
fromToPerform gets it from a remote, sends it to the destination,
and drops the local copy. Another call to that for a second remote
will notice that the dest now has a copy, and simply drop from the
second remote, avoiding a second transfer.

Also note that, when numcopies doesn't allow dropping it from
everywhere, it will drop it from the cheapest remotes first
(maybe not ideal) up to more expensive remotes, and finally from the local
repo. So the local repo will generally end up holding a copy. Maybe not
ideal in all cases either, but it seems no worse to do that than to end up
with a copy undropped from a remote.

And I'm not entirely happy with the output, eg:

	copy bigfile (from r3...) ok
	copy bigfile ok

That makes sense if you think of the second line as being
the same as what is output by `git-annex copy bigfile --to bar`,
but it's less clear in this context. Maybe add "(from here...)"?
Also the --json output doesn't have a machine-readable field for
the "from" uuid, and maybe it should?

Sponsored-by: Dartmouth College's DANDI project
2023-11-30 16:34:30 -04:00
Joey Hess
1654572bc1
fix --from overriding annex-ignore
Make git-annex get/copy/move --from foo override configuration of
remote.foo.annex-ignore, as documented.

This already worked for remotes supporting hasKeyCheap. For others though,
git-annex copy --from foo would silently not do anything, while
git-annex copy --to foo would use the annex-ignored remote.

Also improved the annex-ignore docs, to reflect that `git-annex get`
without --from will skip using annex-ignored remotes, for example.

Sponsored-by: Dartmouth College's DANDI project
2023-11-30 15:12:07 -04:00
Joey Hess
6e3bcbf4dd
Make git-annex copy --from --to --fast actually fast
Eg when the destination is logged as containing a file, skip
actively checking that it does contain it.

Note that --fast does not prevent other verifications of content
location that are done in a copy --from --to. Perhaps it could, but this
change will already avoid the real unnecessary work of operating on
files that are already in the remote.

And avoiding other verifications
might cause it to fail if the location log thinks that --to does not
contain the content but does. Such complications with `git-annex copy
--to remote --fast` led to commit d006586cd0
which added a note that gets displayed when that fails, mentioning it
might be due to --fast being enabled.

copy --from --to is already complicated enough without needing to worry
about such edge cases, so continuing to doing some verification of
content location after the initial --fast filtering seems ok.

Sponsored-by: Dartmouth College's DANDI project
2023-11-17 17:37:58 -04:00
Joey Hess
96c0e720c5
small refactoring
Sponsored-by: Dartmouth College's DANDI project
2023-11-17 17:23:24 -04:00
Joey Hess
7a8393ce7d
Fix bug in git-annex copy --from --to
Caused it to skip files that were locally present.

Sponsored-by: Dartmouth College's DANDI project
2023-11-17 16:30:20 -04:00
Joey Hess
7d67229884
git-annex log --gnuplot
The gnuplot output is pretty good, but could still be improved with:

* more colors (repeating colors is confusing with a lot of repos)
* better positioning of the legend, making the plot wider and moving it
  from over top of the graph

Sponsored-by: Kevin Mueller on Patreon
2023-11-14 14:56:58 -04:00
Joey Hess
0fdc1a54db
git-annex log --received modifier option
Only counting received and not dropped makes this show the bandwidth of
data coming into the repository, although only in a sense. Since
git-annex branch updates only happen at the end of a command, and we
don't know when a command started, it's only an approximation of the
actual bandwidth. (A previous git-annex branch update made have
happened in a different repository.)

It would be possible to also add a --dropped option, but I don't know
how useful that would be?

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-11-14 14:04:46 -04:00
Joey Hess
21e66ce209
rename --when to --interval
More accurately describes its behavior.
2023-11-14 11:45:16 -04:00
Joey Hess
cde081a025
guard against wrong timestamps in git log
For example, my sound repo has in the git-annex branch a commit from
2036, which is followed by one from 2034, in amoung commits from 2013.
Clearly there was a problem with the clock.

Since git log --date-order has a behavior of
"Show no parents before all of its children are shown", the data still
gets processed ok. The future timestamp just prevented displaying data
after that commit. It seems better, when the clock was wrong, to display
a wrong date, and then return to right dates.

It would be nice to filter out the wrong dates from display entirely,
but that seems it would need to buffer the whole output. This command is
too slow to buffer it all before displaying anything, and anyway this
kind of problem is probably rare.

Sponsored-by: Joshua Antonishen on Patreon
2023-11-13 15:06:04 -04:00
Joey Hess
2ab11fe06e
treat dead repos as 0 size
With this, git annex log --totalsizes can be compared with
git-annex info's "combined annex size of all repositories"
to double-check it works correctly.

In my sound repo, the two match.

In my big repo, the two report slightly different sizes,
with the former being 1.3 gb smaller than the latter.
I don't know the reason for this disreprency. Given the 30+tb size of
the repo, it's a small difference.

It seems possible that a bug in an old version of git-annex could
explain it. Eg, if an old git-annex lost a line when updating trust.log
or a location log in a merge, git-annex info would see only what it
replaced it with, while git-annex log will see the previous value as
well.

Sponsored-by: Leon Schuermann on Patreon
2023-11-13 15:05:48 -04:00
Joey Hess
38b9ebc5fd
newtype MapLog
Noticed that Semigroup instance of Map is not suitable to use
for MapLog. For example, it behaved like this:

ghci>  parseTrustLog "foo 1 timestamp=10\nfoo 2 timestamp=11" <> parseTrustLog "foo X timestamp=12"
fromList [(UUID "foo",LogEntry {changed = VectorClock 11s, value = SemiTrusted})]

Which was wrong, it lost the newer DeadTrusted value.

Luckily, nothing used that Semigroup when operating on a MapLog. And this
provides a safe instance.

Sponsored-by: Graham Spencer on Patreon
2023-11-13 14:37:22 -04:00
Joey Hess
5d8b8a8ad0
git-annex log --totalsizes
Note that dead repositories are not yet handled so their sizes show as
nonzero after they are marked dead.

Sponsored-By: unqueued on Patreon
2023-11-13 13:15:36 -04:00
Joey Hess
dc02236c85
git-annex log --sizes
CSV format so it can be fed into a program to graph it.

Note that dead repositories are not yet handled so their sizes show as
nonzero after they are marked dead.

Sponsored-By: k0ld on Patreon
2023-11-13 13:07:22 -04:00
Joey Hess
6203b8afba
the last line is for the current time 2023-11-10 17:37:55 -04:00
Joey Hess
574514545c
git-annex log --sizesof
This can take a lot of memory. I decided to violate the usual rule in
git-annex that it operate in constant memory no matter how many annexed
objects. In this case, it would be hard to be fast without using a big
map of the location logs. The main difficulty here is that there can be
many git-annex branches and it needs to display a consistent view at a
point in time, which means merging information from multiple git-annex
branches.

I have not checked if there are any laziness leaks in this code. It
takes 1 gb to run in my big repo, which is around what I estimated
before writing it.

2 options that are documented are not yet implemented.

Small bug: With eg --when=1h, it will display at 12:00 then 1:10 if the
next change after 12:59 is then. Then it waits until after 2:10 to
display the next change. It ought to wait until after 2:00.

Sponsored-by: Brock Spratlen on Patreon
2023-11-10 17:26:10 -04:00
Joey Hess
561c036664
split out generic git log parser
Sponsored-By: Jack Hill on Patreon
2023-11-10 15:40:03 -04:00
Joey Hess
11cc9f1933
info: Added calculation of combined annex size of all repositories
Factored out overLocationLogs from CmdLine.Seek, which can calculate this
pretty fast even in a large repo. In my big repo, the time to run git-annex
info went up from 1.33s to 8.5s.

Note that the "backend usage" stats are for annexed files in the working
tree only, not all annexed files. This new data source would let that be
changed, but that would be a confusing behavior change. And I cannot
retitle it either, out of fear something uses the current title (eg parsing
the json).

Also note that, while time says "402108maxresident" in my big repo now,
up from "54092maxresident", top shows the RES constant at 64mb, and it
was 48mb before. So I don't think there is a memory leak. I tried using
deepseq to force full evaluation of addKeyCopies and memory use didn't
change, which also says no memory leak. And indeed, not even calling
addKeyCopies resulted in the same memory use. Probably the increased memory
usage is buffering the stream of data from git in overLocationLogs.

Sponsored-by: Brett Eisenberg on Patreon
2023-11-08 13:35:11 -04:00
Joey Hess
8768966d97
improve comments 2023-11-08 12:06:03 -04:00
Joey Hess
f8d35d9480
lookupkey: Sped up --batch
When the file is relative, it does not need to be passed
through git lsfiles to normalize it.

Sponsored-by: Kevin Mueller on Patreon
2023-10-30 14:59:09 -04:00
Joey Hess
d9fd205cbb
push RawFilePath down into Annex.ReplaceFile
Minor optimisation, but a win in every case, except for a couple where
it's a wash.

Note that replaceFile still takes a FilePath, because it needs to
operate on Chars to truncate unicode filenames properly.
2023-10-26 13:36:49 -04:00
Joey Hess
c873586e14
eliminate s2w8 and w82s
Note that the use of s2w8 in genUUIDInNameSpace made it truncate unicode
characters. Luckily, genUUIDInNameSpace is only ever used on ASCII
strings as far as I can determine. In particular, git-remote-gcrypt's
gcrypt-id is an ASCII string.
2023-10-26 13:12:57 -04:00
Joey Hess
8bde6101e3
sqlite datbase for importfeed
importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.

Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.

Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.

Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.

Sponsored-by: unqueued on Patreon
2023-10-23 16:46:22 -04:00
Joey Hess
41f4d0bda9
enableremote: Avoid overwriting existing git remote when passed the uuid of a specialremote that was earlier initialized with the same name 2023-09-22 13:29:48 -04:00
Joey Hess
ef7c867238
fix some build warnings from ghc 9.4.6
It now notices that a RepoLocation may not be Local, in which case
pattern matching on Local wouldn't do.
2023-09-21 13:40:22 -04:00
Joey Hess
a147a31baa
fix some build warnings from ghc 9.4.6
For some reason it doesn't notice that req must be a Req, because
the toplevel function matched on that.
2023-09-21 13:38:36 -04:00
Joey Hess
a18e40bdd7
lookupkey: Added --ref option
Sponsored-by: Joshua Antonishen on Patreon
2023-09-12 12:49:11 -04:00
Joey Hess
7be8950138
propigateAdjustedCommits in seekExportContent
push: When on an adjusted branch, propagate changes to parent branch before
updating export remotes.

This is a somewhat redundant call to propigateAdjustedCommits, since it
also gets called at pushLocal time. That other one needs to come after
importing from importtree remotes though, and seekExportContent has to come
earlier, so I don't see a way to avoid doing it twice.

Note that git-annex sync also manages to avoid the problem, it's only
git-annex push that had the bug.

Sponsored-by: Leon Schuermann on Patreon
2023-09-11 14:54:26 -04:00
Joey Hess
aeaadb8eb8
improve warning message when unable to update export
A misleading message was displayed in several cases.

If the user has run eg: git config
remote.push-win-remote.annex-tracking-branch 'adjusted/main(unlocked)'
That is not supported, and now it will tell them it's not a valid
configuration. A user reported doing that, but I don't know if it's a
common point of confusion. If it is a common problem, a better message
would be possible, or it could convert back from the adjusted branch to
the actual branch.

Sponsored-by: Graham Spencer on Patreon
2023-09-11 14:21:36 -04:00
Joey Hess
49b97b0675
oldkeys: check associated files by default and add --unchecked
Removed the prior code that checked for keys used by current versions of
the files being acted on. It is redundant with the associated files
check (so long as the associated files database is always up-to-date,
which reconcileStaged should accomplish).

Sponsored-by: Luke T. Shumaker on Patreon
2023-08-23 13:46:41 -04:00
Joey Hess
5489c2cdd6
oldkeys --revision-range
Sponsored-by: Brett Eisenberg on Patreon
2023-08-22 15:00:29 -04:00
Joey Hess
cf8b30c914
oldkeys: New command that lists the keys used by old versions of a file
The tricky thing about this turned out to be handling renames and reverts.
For that, it has to make two passes over the git log, and to avoid
buffering a possibly huge amount of logs in memory (ie the whole git log of
an entire repository!), runs git log twice.

(It might be possible to speed this up by asking git log to show a diff,
and so avoid needing to use catKey.)

Sponsored-By: Brock Spratlen on Patreon
2023-08-22 14:51:06 -04:00
Joey Hess
379d58b499
diffdriver: Added --get option
Removed the dontCheck repoExists, because running it in a repo that has not
been initialized yet would update location log with nouuid. And I guess
it's ok for it to only support running in git-annex repos.
2023-08-22 11:58:53 -04:00
Joey Hess
67c99a4db7
info: Added available to the info displayed for a remote
Sponsored-by: Kevin Mueller on Patreon
2023-08-16 14:52:58 -04:00
Joey Hess
9286769d2c
let Remote.availability return Unavilable
This is groundwork for making special remotes like borg be skipped by
sync when on an offline drive.

Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension
to the external special remote protocol. The extension is needed because
old git-annex, if it sees that response, will display a warning
message. (It does continue as if the remote is globally available, which
is acceptable, and the warning is only displayed at initremote due to
remote.name.annex-availability caching, but still it seemed best to make
this a protocol extension.)

The remote.name.annex-availability git config is no longer used any
more, and is documented as such. It was only used by external special
remotes to cache the availability, to avoid needing to start the
external process every time. Now that availability is queried as an
Annex action, the external is only started by sync (and the assistant),
when they actually check availability.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-08-16 14:31:31 -04:00
Joey Hess
7f7c95b771
move comment 2023-08-16 13:19:17 -04:00
Joey Hess
10b5f79e2d
fix empty tree import when directory does not exist
Fix behavior when importing a tree from a directory remote when the
directory does not exist. An empty tree was imported, rather than the
import failing. Merging that tree would delete every file in the
branch, if those files had been exported to the directory before.

The problem was that dirContentsRecursive returned [] when the directory
did not exist. Better for it to throw an exception. But in commit
74f0d67aa3 back in 2012, I made it never
theow exceptions, because exceptions throw inside unsafeInterleaveIO become
untrappable when the list is being traversed.

So, changed it to list the contents of the directory before entering
unsafeInterleaveIO. So exceptions are thrown for the directory. But still
not if it's unable to list the contents of a subdirectory. That's less of a
problem, because the subdirectory does exist (or if not, it got removed
after being listed, and it's ok to not include it in the list). A
subdirectory that has permissions that don't allow listing it will have its
contents omitted from the list still.

(Might be better to have it return a type that includes indications of
errors listing contents of subdirectories?)

The rest of the changes are making callers of dirContentsRecursive
use emptyWhenDoesNotExist when they relied on the behavior of it not
throwing an exception when the directory does not exist. Note that
it's possible some callers of dirContentsRecursive that used to ignore
permissions problems listing a directory will now start throwing exceptions
on them.

The fix to the directory special remote consisted of not making its
call in listImportableContentsM use emptyWhenDoesNotExist. So it will
throw an exception as desired.

Sponsored-by: Joshua Antonishen on Patreon
2023-08-15 12:57:41 -04:00
Joey Hess
d467c70ef7
change sync content transition plan and fine tune warning
Only display warning when git-annex sync (without --content or
--no-content) is used with repositories that have preferred content
configured.

Sponsored-by: Leon Schuermann on Patreon
2023-08-14 13:51:35 -04:00
Joey Hess
be028f10e5
split out Utility.Url.Parse
This is mostly for git-repair which can't include all of Utility.Url
without adding many dependencies that are not really necessary.
2023-08-14 12:28:10 -04:00
Joey Hess
3efad7f5f4
info: Added --dead-repositories option
I considered a more wide-ranging config option to make other commands
also show dead repositories. But it would be difficult to implement that
because Remote.keyLocations is used to get locations, filtering out dead
repos, and commands like get then try to use those locations. So a config
setting would make dead repos sometimes be acted on by commands.

Sponsored-by: unqueued on Patreon
2023-08-09 12:43:48 -04:00
Joey Hess
68c9b08faf
fix build with unix-2.8.0
Changed the parameters to openFd. So needed to add a small wrapper
library to keep supporting older versions as well.
2023-08-01 18:41:27 -04:00
Joey Hess
aa5e333cb7
fix whitespace
Thanks to a compile warning from new ghc
2023-08-01 18:36:54 -04:00
Joey Hess
518a51a8a0
--explain for preferred/required content matching
And annex.largefiles and annex.addunlocked.

Also git-annex matchexpression --explain explains why its input
expression matches or fails to match.

When there is no limit, avoid explaining why the lack of limit
matches. This is also done when no preferred content expression is set,
although in a few cases it defaults to a non-empty matcher, which will
be explained.

Sponsored-by: Dartmouth College's DANDI project
2023-07-26 14:50:04 -04:00
Joey Hess
7f38355860
dropunused: Support --jobs
Sponsored-by: Kevin Mueller on Patreon
2023-07-21 14:03:34 -04:00
Joey Hess
7fc6503812
fix waiting for all started feed downloads with -J
importfeed bug fix: When -J was used with multiple feeds, some feeds did
not get their items downloaded.

In my case, I had added a feed to the end of the list, and no items from it
were ever downloaded.

Sponsored-by: Leon Schuermann on Patreon
2023-07-11 22:08:35 -04:00
Joey Hess
240bae38f6
sync: When in an adjusted branch, merge changes from the original branch
This causes changes to the original branch to get merged with a single
sync. Before, it took 2 syncs; the first happened to update the synced/
branch, and the second merged changes from the synced/ branch into the
ajusted branch.

Using mergeToAdjustedBranch when tomerge == origbranch is probably
overkill, but it does work fine.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-07-06 12:42:24 -04:00
Joey Hess
51b24aac91
importfeed: Add feedurl to the metadata
(And allow it to be used in the --template although that seems unlikely to
be very useful.)

My use case for this is that one of the podcast feeds I subscribe to is
sometimes leaking episodes of some other podcast. The other podcast is also
very close to spam, so this may be a form of intentional spamming. I have
not been able to catch the podcast feed containing those episodes, so I
don't know which one is at fault. So putting this in the metadata will let
me eventually catch it.
2023-07-06 00:11:38 -04:00
Joey Hess
3d810726af
diffdriver --text support options for diff
Sponsored-by: KDM on Patreon
2023-07-05 15:43:29 -04:00
Joey Hess
3c1d18cb3b
assist: With --jobs, parallelize transferring content to/from remotes
Command.Add.seek starts concurrency with CommandStages. And for
Command.Sync, it needs TransferStages. So, to get both types of concurrency
for the two different parts, it either needs to change the type of
concurrency in between, or just call startConcurrency once for each.

It seems safe enough to call startConcurrency twice, because it does shut
down concurrency (mostly) at the end, and eg the old Annex.workers get
emptied.

Sponsored-by: unqueued on Patreon
2023-07-05 12:47:30 -04:00
Joey Hess
e1fc9e204e
added git-annex satisfy
This ended up having an interface like sync, rather than like get/copy/drop.
That let it be implemented in terms of sync, which took a lot less code.
Also, it lets it handle many of the edge cases that sync does, such as
getting files that are not visible in a --hide-missing branch, and sending
files to exporttree remotes.

As well as being easier to implement, `git-annex satisfy myremote` makes
sense as it satisfies the preferred content settings of the remote.
`git-annex satisfy somefile` does not form a sentence that makes sense. So
while -C can be a little bit annoying, it still makes sense to have this
syntax.

Note that, while I initially thought this would also satisfy numcopies, it
does not. Arguably it ought to. But, sync does not send files in order to
satisfy numcopies, it only sends files to satisfy preferred content. And
it's important that this transfer the same files as sync does, because
it will probably be used in a workflow where the user sometimes syncs and
sometimes satisfies, and does not expect satisfy to do things that sync
would not do.

(Also opened a new bug that also affects sync et all, not only this command.)

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-06-29 15:34:53 -04:00
Joey Hess
d5c6197791
diffdriver: Added --text option for easy diffing of the contents of annexed text files
This was already possible, but it was rather hard to come up with the
complex shell command needed.

Note that the diff output starts with "diff a/... b/...".
I left off the "--git" because it's not a git format diff.
2023-06-28 15:27:16 -04:00
Joey Hess
549d390d03
display drop from remote more consistently
With eg copy --to remote

This is particularly an improvement in sync --content output, which
mixes the two, so it's nice to have consistent display.
2023-06-27 19:01:33 -04:00
Joey Hess
d98aa35b3b
reinject: Added --guesskeys option
Sponsored-by: Noam Kremen on Patreon
2023-06-26 14:05:31 -04:00
Joey Hess
39f3d783fe
consolidate 2023-06-20 15:10:11 -04:00
Joey Hess
72715845a1
display destination file before youtube-dl download
Rather than after it, which can leave one wondering what file it's
downloading.

youtubeDl should not ever return Right Nothing in normal operation,
becaause it's already asked youtube-dl if it supports the url. So it
would have to succeed at that, then not download any file, but also
exit successfully, in order for the new error message to display.

Also display the name of yt-dlp when using it.
2023-06-20 14:55:25 -04:00
Joey Hess
958c2fa6d2
Improve resuming interrupted download when using yt-dlp or youtube-dl
Fixes a failure like this:

curl: (33) HTTP server doesn't seem to support byte ranges. Cannot resume.

That happens because the whole web page has already been downloaded
previously, and kept, so now addurl tries to download it, and curl asks the
server to resume from the last byte. And youtube.com can't, for whatever
stupid reason.

So, delete the temp file after determining that youtube-dl can be used.
2023-06-19 15:01:47 -04:00
Joey Hess
1f09b709fc
skip sending individual files to export remotes
That will fail, and it already exports whole trees.
f6dd34ca81 made it sync content with
import remotes, and if an import remote is also an export remote, that
caused this new failure mode.

Sponsored-by: Brock Spratlen on Patreon
2023-06-19 11:24:32 -04:00
Joey Hess
64738ea157
config: Added the --show-origin and --for-file options
* config: Added the --show-origin and --for-file options.
* config: Support annex.numcopies and annex.mincopies.

There is a little bit of redundancy here with other code elsewhere that
combines the various configs and selects which to use. But really only
for the special case of annex.numcopies, which is a git config that does
not override the annex branch setting and for annex.mincopies, which does
not have a git config but does have gitattributes settings as well as the
annex branch setting.

That seems small enough, and unlikely enough to grow into a mess that it was
worth supporting annex.numcopies and annex.mincopies in git-annex config
--show-origin. Because these settings are a prime thing that someone might
get confused about and want to know where they were configured.

And, it followed that git-annex config might as well support those two
for --set and --get as well. While this is redundant with the speclialized
commands, it's only a little code and it makes it more consistent.

Note that --set does not have as nice output as numcopies/mincopies
commands in some special cases like setting to 0 or a negative number.
It does avoid setting to a bad value thanks to the smart
constructors (eg configuredNumCopies).

As for other git-annex branch configurations that are not set by git-annex
config, things like trust and wanted that are specific to a repository
don't map to a git config name, so don't really fit into git-annex config.
And they are only configured in the git-annex branch with no local override
(at least so far), so --show-origin would not be useful for them.

Sponsored-by: Dartmouth College's DANDI project
2023-06-12 16:24:31 -04:00
Joey Hess
f6dd34ca81
sync content with import remotes
This didn't used to be needed because importKeys would import all
content and so doing another pass was redundant.

But since 40017089f2 it uses
importChanges, so only new files are imported. If a file that was
already imported before was dropped, that would prevent sync --content
from gettng its content again.

Sponsored-by: Jack Hill on Patreon
2023-06-01 18:52:19 -04:00
Joey Hess
40017089f2
use importChanges optimisation
Large speed up to importing trees from special remotes that contain a lot
of files, by only processing changed files.

Benchmarks:

Importing from a special remote that has 10000 files, that have all been
imported before, and 1 new file sped up from 26.06 to 2.59 seconds.

An import with no change and 10000 unchanged files sped up from 24.3 to
1.99 seconds.

Going up to 20000 files, an import with no changes sped up from
125.95 to 3.84 seconds.

Sponsored-by: k0ld on Patreon
2023-06-01 13:47:00 -04:00
Joey Hess
c6acf574c7
implement importChanges optimisaton (not used yet)
For simplicity, I've not tried to make it handle History yet, so when
there is a history, a full import will still be done. Probably the right
way to handle history is to first diff from the current tree to the last
imported tree. Then, diff from the current tree to each of the
historical trees, and recurse through the history diffing from child tree
to parent tree.

I don't think that will need a record of the previously imported
historical trees, and so Logs.Import doesn't store them. Although I did
leave room for future expansion in that log just in case.

Next step will be to change importTree to importChanges and modify
recordImportTree et all to handle it, by using adjustTree.

Sponsored-by: Brett Eisenberg on Patreon
2023-05-31 16:01:34 -04:00
Joey Hess
f1cdb79ca4
assist: honor gitignore
Sponsored-by: Graham Spencer on Patreon
2023-05-24 14:04:09 -04:00
Joey Hess
0f89d221bd
version: Avoid error message when entire output is not read
Sponsored-by: Dartmouth College's Datalad project
2023-05-19 15:00:57 -04:00
Joey Hess
9ed59dab5b
assist: operate on all files in working tree by default
Consistency with sync and internal consistency is more important than
consistency with the assistant, which is not itself consistent about
what it does when run in a subdirectory.

Note that with -C, it will still commit staged changes to files outside
the directory. Like sync does. Presumably if the user is manually
staging things, then running this command, they intend to build up a
commit.

Sponsored-by: unqueued on Patreon
2023-05-19 14:47:05 -04:00
Joey Hess
908b9687cc
assist: fix bug committing just added file when -J is used
Need to wait for worker threads adding files before flushing the queue.
2023-05-18 15:02:10 -04:00
Joey Hess
fc3df2da9e
assist: fix bug commiting just added file 2023-05-18 14:56:13 -04:00
Joey Hess
9619f562f4
fix assist to commit 2023-05-18 14:50:05 -04:00
Joey Hess
9ec5d7fb05
improve --cleanup desc 2023-05-18 14:41:20 -04:00
Joey Hess
e955912ad0
git-annex assist
assist: New command, which is the same as git-annex sync but with
new files added and content transferred by default.

(Also this fixes another reversion in git-annex sync,
--commit --no-commit, and --message were not enabled, oops.)

See added comment for why git-annex assist does commit staged
changes elsewhere in the work tree, but only adds files under
the cwd.

Note that it does not support --no-commit, --no-push, --no-pull
like sync does. My thinking is, why should it? If you want that
level of control, use git commit, git annex push, git annex pull.
Sync only got those options because pull and push were not split
out.

Sponsored-by: k0ld on Patreon
2023-05-18 14:37:43 -04:00
Joey Hess
935330aaab
fix inverted logic (fixes test fail)
Sponsored-by: Jack Hill on Patreon
2023-05-18 10:48:08 -04:00
Joey Hess
f93a7fce1d
sync: Started transition to --content being enabled by default
When used without --content or --no-content, warn about the upcoming
transition, and suggest using one of the options, or setting
annex.synccontent.

Sponsored-by: Brett Eisenberg on Patreon
2023-05-17 13:23:42 -04:00
Joey Hess
af6b73a7e6
push: Support --cleanup
This option is not specific to sync, so it seemed it should be in either
pull or push as well as sync. Since it does modify the remote, it seems
better to have it in push; the modification of the local repo pulls in
the direction of pull, but not hard enough.

Maybe it would be better to have it in both?

Sponsored-by: Luke Shumaker on Patreon
2023-05-17 12:51:49 -04:00
Joey Hess
40731ff9fd
sync: Added -g as a short option for --no-content
I anticipate that if sync is transitioned to syncing content by default,
people will want a short option. And in repositories where
annex.synccontent = true, they already would. And pull and push sync
content by default, so a short option is useful with them too.

Mnemonic: -g makes only git data be synced
Also, -a makes only annex data be synced.

Would have preferred -c, which would complement -C, but it
was already taken to set git configs.

Sponsored-by: Noam Kremen on Patreon
2023-05-17 12:34:26 -04:00
Joey Hess
5df89d58c7
git-annex pull and push
Split out two new commands, git-annex pull and git-annex push. Those plus a
git commit are equivilant to git-annex sync.

In a sense, git-annex sync conflates 3 things, and it would have been
better to have push and pull from the beginning and not sync. Although
note that git-annex sync --content is faster than a pull followed by a
push, because it only has to walk the tree once, look at preferred
content once, etc. So there is some value in git-annex sync in speed, as
well as user convenience.

And it would be hard to split out pull and push from sync, as far as the
implementaton goes. The implementation inside sync was easy, just adjust
SyncOptions so it does the right thing.

Note that the new commands default to syncing content, unless
annex.synccontent is explicitly set to false. I'd like sync to also do
that, but that's a hard transition to make. As a start to that
transition, I added a note to git-annex-sync.mdwn that it may start to
do so in a future version of git-annex. But a real transition would
necessarily involve displaying warnings when sync is used without
--content, and time.

Sponsored-by: Kevin Mueller on Patreon
2023-05-16 16:51:07 -04:00
Joey Hess
b1c396a695
remove unused imports 2023-05-16 16:33:02 -04:00
Joey Hess
2e984c51b6
sync --no-pull and --no-push affect download and upload of content
The man page is somewhat vague about this, but I do think it was a bug
that these options didn't alreay behave that way. The options are
documented to disable imports and exports, which is the same operations
just with a special remote that uses trees.

The real motivation for this is that I'm adding git-annex pull and
git-annex push, and I want these options to turn off the equivilant of
those commands. And git-annex pull will certianly download and push
upload.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-05-16 16:25:23 -04:00
Joey Hess
212442dd9b
pullOption should be pushOption in seekExportContent
sync: Fix bug that made --no-pull, rather than --no-push prevent exporting
trees to special remotes.

Sponsored-by: Joshua Antonishen on Patreon
2023-05-16 15:55:24 -04:00
Joey Hess
07e0d2a35b
clean up uninit output
Don't think including the location of .git/annex/objects in the json is
really useful.
2023-05-11 13:52:22 -04:00
Joey Hess
55dfa929d6
uninit: remove unncessary ExistSuccess
That was added in 2011 to prevent writing to the git-annex branch on
shutdown. But, the use of saveState causes pending git-annex branch
writes to be completed before the branch is deleted. So, an unusual exit
is not needed.
2023-05-11 13:50:20 -04:00
Joey Hess
271f3b1ab4
uninit: Support --json and --json-error-messages
Had to convert uninit to do everything that can error out inside a
CommandStart. This was harder than feels nice.

(Also, in passing, converted CommandCheck to use a data type, not a
weird number that it was not clear how it managed to be unique.)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-11 13:43:02 -04:00
Joey Hess
1904cebbb3
fix typo 2023-05-11 13:26:55 -04:00
Joey Hess
02cfef1f91
uninit: Avoid buffering the names of all annexed files in memory
Oops, using the same list twice does prevent streaming in constant memory.

Sponsored-by: unqueued on Patreon
2023-05-11 13:25:55 -04:00
Joey Hess
b8d9c18e98
remove unused import 2023-05-11 13:24:34 -04:00
Joey Hess
cd5108bb47
uninit: remove undocumented suport for specifying files to act on
I think this was just copied from another command without paying
attention to what it did, because there does not seem to be any valid
reason to want to only unannex some files when running uninit.
2023-05-11 13:23:41 -04:00
Joey Hess
de84abb210
configremote: Support --json and --json-error-messages
Seems unlikely to be too useful, but who knows.

Moved the checkSafeConfig call to happen after an action is started, so
it will be captured by --json-error-messages

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 14:21:42 -04:00
Joey Hess
a242eabc7a
enableremote: Support --json and --json-error-messages
Seems unlikely to be too useful, but who knows. Was trivial anyway.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 14:09:27 -04:00
Joey Hess
b3cc8dbacb
initremote: Support --json and --json-error-messages
Including special --whatelse handling.

Otherwise, it seems unlikely to be too useful, but who knows.

Refactored code to call starting before displaying error messages.
This makes the error messages be captured by --json-error-messages

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 14:03:40 -04:00
Joey Hess
9812d9aaec
support aeson for Map
Make unused --json use it, which is better than the doubly nested lists
it was using.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 13:51:37 -04:00
Joey Hess
8d8e044458
upgrade: Support --json and --json-error-messages and --json-progress
Seems unlikely to be very useful, but trivial.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 12:54:48 -04:00
Joey Hess
c98fb0b637
merge: Support --json and --json-error-messages and --json-progress
Seems unlikely to be very useful, but trivial.
And, this completes the story that git-annex sync does not need json,
since every sub-operation is available in a command that does support json.
(Well, except for committing, but that's not a git-annex command.)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 12:34:19 -04:00
Joey Hess
2fdf0ae38d
include url in json output
The input field is consistently the url of the feed, which makes sense
as that is the user input, but to differentiate multiple urls downloaded
from a feed when using --json-progress -J, need the url that is being
downloaded too.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:59:44 -04:00
Joey Hess
7919349cee
importfeed: Support --json and --json-error-messages and --json-progress
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:51:16 -04:00
Joey Hess
6b54ea69e3
importfeed: Move error to where --json-error-messages can capture it
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:27:23 -04:00
Joey Hess
04ee6c4c6b
importfeed: Support -J (and work toward supporting --json)
Both -J and --json needed importfeed to be refactored to use commandAction.

That was difficult, because of the interrelated nature of downloading feeds
and then downloading files from feeds, both of which needed to use
commandAction. And then checking for problems in feeds has to come after
these actions, which may be run as background jobs.

As for --json support, it's most of the way there, but still has some
warts, so I didn't enable jsonOptions yet. The warts include:

- An initial empty json record is displayed by getCache.
- Input is not populated, should be feed url
- feedProblem at end will not be captured by --json-error-messages
  (see FIXME)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:13:56 -04:00
Joey Hess
a71c831949
renameremote: Support --json and --json-error-messages
Seems unlikely to be useful, but it works so

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 16:25:40 -04:00
Joey Hess
a5d0c85ae1
factor out maybeAddJSONField
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 16:15:41 -04:00
Joey Hess
3d8f93dc0a
reinject: Support --json and --json-error-messages
Also fix support for operating on multiple pairs of files and keys.

Moved notAnnexed to inside starting, so error message will get into the json.

Cannot include the key in the starting as it's not known yet, so instead
add it to the json later.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 15:43:37 -04:00
Joey Hess
91b9915b09
reinit: Support --json and --json-error-messages
Basically same concerns as init..

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 15:07:40 -04:00
Joey Hess
f09a248fe2
init: Support --json and --json-error-messages
Dunno how useful this will be, since about all that's accessible from
the json is whether it succeeded or failed, and the error messages
which were already on stderr.

Note that, when autoenabling a special remote, it would be possible for
one to stop and prompt or output not using Messages and so not output as
part of the json. I don't think that happens, but I'm not 100% sure
something doesn't manage to break it. Of course, the same could be the
case for commands that transfer objects. Using Annex.Init.autoEnableSpecialRemotes
in --json mode would avoid the problem, but I've chosen to wait until I
know it's needed to use it.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 14:58:08 -04:00
Joey Hess
c208442292
unused: Support --json and --json-error-messages
Generalized AddJSONActionItemField to allow it to add several fields. Not entirely
happy with that, since the names of the fields have to be carefully chosen to
not conflict with other json fields. And fields added that way can't be parsed
back in FromJSON, except for the "fields" field that is special cased for metadata.
Still, I couldn't see another way to do it.

Also, omit file:null from the json output. Which does affect other commands,
eg git-annex whereis --all --json. Hopefully that won't break something that expects
a null file. If it did, that could be reverted, but it would be ugly to have
file:null in the unused --json

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 14:39:57 -04:00
Joey Hess
365dbc89dc
expire, trust et al, dead, describe: Support --json and --json-error-messages
For expire, the normal output is unchanged, but the --json output includes the uuid
in machine parseable form. Which could be very useful for this somewhat obscure
command. That needed ActionItemUUID to be implemented, which seemed like a lot
of work, but then ---

I had been going to skip implementing them for trust, untrust, dead, semitrust,
and describe, but putting the uuid in the json is useful information, it tells
what uuid git-annex picked given the input. It was not hard to support
these once ActionItemUUID was implemented.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-05 15:33:30 -04:00