Commit graph

2941 commits

Author SHA1 Message Date
Joey Hess
cc17ac423b
implement isCryptographicallySecureKey for VURL
Considerable difficulty to work around an import cycle. Had to move the
list of backends (except for VURL) to Backend.Variety to VURL could use
it.

Sponsored-by: Kevin Mueller on Patreon
2024-02-29 17:26:35 -04:00
Joey Hess
0f7143d226
support VURL backend
Not yet implemented is recording hashes on download from web and
verifying hashes.

addurl --verifiable option added with -V short option because I
expect a lot of people will want to use this.

It seems likely that --verifiable will become the default eventually,
and possibly rather soon. While old git-annex versions don't support
VURL, that doesn't prevent using them with keys that use VURL. Of
course, they won't verify the content on transfer, and fsck will warn
that it doesn't know about VURL. So there's not much problem with
starting to use VURL even when interoperating with old versions.

Sponsored-by: Joshua Antonishen on Patreon
2024-02-29 13:48:51 -04:00
Joey Hess
3475b09c3e
pre-commit: Avoid committing the git-annex branch
Except when a commit is made in a view, which changes metadata.

Make the assistant commit the git-annex branch after git commit of working
tree changes.

This allows using the annex.commitmessage-command in the assistant to
generate a commit message for the git-annex branch that relies on state
gathered during the commit of the working tree. Eg, it might reuse the
commit message.

Note that, when not using the assistant, a git-annex add still commits
the git-annex branch, so such a annex.commitmessage-command set up would
not work then. But if someone is using the assistant and wants
programmatic control over commit messages, this is useful. Someone not
using the assistant can get the same result by using annex.alwayscommit=false
during the git-annex add, and git-annex merge after they git commit.

pre-commit was never really intended to commit the git-annex branch
(except after recording changed metadata), but the assistant did sort of
rely on it. It does later commit the git-annex branch before pushing to
remotes, but I didn't want to risk building up lots of uncommitted changes
to it if that didn't happen frequently.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-02-12 14:42:11 -04:00
Joey Hess
fa9197560d
move commitStaged out of Command.Sync which no longer uses it
It's trivial enough that it it's not worth factoring it out to somewhere
in common with Command.Undo and the assistant.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-02-07 16:19:28 -04:00
Joey Hess
21123ba368
assistant, undo: When committing, let the usual git commit hooks run
Was doing a Git.Branch.commit for historical reasons to do with direct
mode, which no longer apply.

Note that the preCommitAnnexHook is no longer called in commitStaged
because git-annex installs a pre-commit hook that runs the pre-commit-annex
hook. And git commit will run the pre-commit hook.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-02-07 16:15:35 -04:00
Joey Hess
6b38d0c427
addurl, importfeed: Added --raw-except option
--raw-except=web allows using yt-dlp but not any other special remotes.

Currently this option can only be used once, trying to use it repeatedly
will make option parsing fail. Perhaps it ought to support being used more
than once, but it seemed like an unlikely use case to need that.

Note that getParsed is called repeatedly when the option is used with
several urls. While implementing DeferredParseClass would avoid that
innefficiency, it didn't seem worth the added boilerplate since
getParsed only calls byNameWithUUID which does minimal work.

Sponsored-by: Dartmouth College's DANDI project
2024-02-05 15:16:25 -04:00
Joey Hess
2f3fe4d904
fix importfeed --force skip behavior reversion
importfeed --force: Don't treat it as a failure when an already downloaded
file exists. (Fixes a behavior change introduced in 10.20230626.)

04ee6c4c6b caused the reversion. Inside a CommandPerform, stop causes it
to fail. Before that commit, it was inside a CommandStart, where stop
causes it to skip.
2024-02-02 15:57:07 -04:00
Joey Hess
0c64cd30c2
compare urls irrespective of downloader
importfeed --force: Avoid creating duplicates of existing already
downloaded files when yt-dlp or a special remote was used.
2024-02-02 15:50:56 -04:00
Joey Hess
90db97d9a2
importfeed: Added --scrape option
Which uses yt-dlp to screen scrape the equivilant of an RSS feed.

Note that youtubedlscraped is a speed optimisation. Since yt-dlp found
the urls, we know it can download them. That avoids calling
youtubeDlSupported on each url, which makes --fast a lot faster.

Almost all the same metadata fields and file formatting fields are
populated, when yt-dlp is able to get the data. Note that yt-dlp has some
additional useful metadata that could be exposed. But, much of it is
specific to particular websites, and it would be hard to document on the
git-annex importfeed man page.

Sponsored-by: unqueued on Patreon
2024-01-30 15:37:29 -04:00
Joey Hess
d7949f8202
move Feed and Item out of ToDownload
This is groundwork for producing ToDownload in other ways, that may not
be entirely isomorphic with feeds. Eg by using yt-dlp.
2024-01-30 14:11:26 -04:00
Joey Hess
8e9ee31621
webapp: Added --port option, and annex.port config
The getSocket comment that mentioned using ":port"
in the hostname seems to have been incorrect or be out of date.
After all, the bug report came when the user first tried doing that,
and it didn't work.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-01-25 14:08:36 -04:00
Joey Hess
e765d3e24c
import: --message/-m option 2024-01-18 12:41:44 -04:00
Joey Hess
f6cf2dec4c
disk free checking for unsized keys
Improve disk free space checking when transferring unsized keys to
local git remotes. Since the size of the object file is known, can
check that instead.

Getting unsized keys from local git remotes does not check the actual
object size. It would be harder to handle that direction because the size
check is run locally, before anything involving the remote is done. So it
doesn't know the size of the file on the remote.

Also, transferring unsized keys to other remotes, including ssh remotes and
p2p remotes don't do disk size checking for unsized keys. This would need a
change in protocol.

(It does seem like it would be possible to implement the same thing for
directory special remotes though.)

In some sense, it might be better to not ever do disk free checking for
unsized keys, than to do it only sometimes. A user might notice this
direction working and consider it a bug that the other direction does not.
On the other hand, disk reserve checking is not implemented for most
special remotes at all, and yet it is implemented for a few, which is also
inconsistent, but best effort. And so doing this best effort seems to make
some sense. Fundamentally, if the user wants the size to always be checked,
they should not use unsized keys.

Sponsored-by: Brock Spratlen on Patreon
2024-01-16 14:29:10 -04:00
Joey Hess
1e775ab83b
remove slightly incorrect comments 2023-12-29 13:23:27 -04:00
Joey Hess
a4a5ec6366
info: Added "annex sizes of repositories" table to the overall display
Thanks to previous work in 11cc9f1933,
this is almost entirely free, it only needs to do some additional map
lookups and math.

The strictness annotations keep the memory use from blowing up.

Sponsored-by: unqueued on Patreon
2023-12-29 12:09:30 -04:00
Joey Hess
64db927d73
optimisation 2023-12-29 10:51:05 -04:00
Joey Hess
6d789c9c81
sync, push: Avoid trying to send individual files to special remotes configured with importtree=yes exporttree=no
That will always fail. It already skipped doing this when exporttree=yes.
2023-12-26 15:56:58 -04:00
Joey Hess
86dbe9a825
migrate: support adding size back to URL keys
migrate: Support adding size to URL keys that were added with --relaxed, by
running eg: git-annex migrate --backend=URL foo

Since url keys cannot be generated, that used to fail. Make it notice that
the backend is not changed, and just get the size of the content.

Sponsored-by: Brock Spratlen on Patreon
2023-12-08 16:22:14 -04:00
Joey Hess
257f01729c
distributed migration for pull and sync --content
pull, sync: When operating on content, automatically hard link objects
that have been migrated.

Added annex.syncmigrations config that can be set to false to prevent
pull and sync from migrating object content.

I think that true is a good default for this config, because it avoids
users having to re-download migrated content or learning about migration.
But, some users will surely not like it, whether because it does take some
time (especially for the first git-annex branch scan when there is a long
history), or because they want to deal with it manually, or because their
filesystem doesn't support hard links and they don't want it to copy
objects.

Sponsored-by: k0ld on Patreon
2023-12-08 14:18:18 -04:00
Joey Hess
4ed71b34de
migrate --apply
And avoid migrate --update/--aply migrating when the new key was already
present in the repository, and got dropped. Luckily, the location log
allows distinguishing from the new key never having been present!

That is mostly useful for --apply because otherwise dropped files would
keep coming back until the old objects were reaped as unused. But it
seemed to make sense to also do it for --update. for consistency in edge
cases if nothing else. One case where --update can use it is when one
branch got migrated earlier, and we dropped the file, and now another
branch has migrated the same file.

Sponsored-by: Jack Hill on Patreon
2023-12-08 13:23:46 -04:00
Joey Hess
51b974d9f0
skip distributed migration to insecure key when annex.securehashesonly is set
This only avoids extra work and a warning messsage. It seems likely that
in such a situation, the user does not want migrations to insecure
hashes, and so best to ignore them as much as possible. If
the user merges a branch that switches annexed files to an insecure
hash, they will notice that the file contents are unavailable,
and git-annex get will tell them the problem then. So it does not seem
useful to have migrate --update also complain about it.
2023-12-08 12:41:50 -04:00
Joey Hess
30c2728d65
always verify content in distributed migration
doc/todo/distributed_migration.mdwn discusses security of distributed
migration, and this was identified as necessary to do.
2023-12-07 20:05:42 -04:00
Joey Hess
62ce56c4ea
display filenames in migrate --update
Have to go to a lot of bother to find them, but I think it's worth it
for usability.

Sponsored-by: Luke T. Shumaker on Patreon
2023-12-07 18:00:09 -04:00
Joey Hess
abea01d9e0
migrate --update fully working
Could use some more testing.

When the old key is not present, Command.ReKey.linkKey' will return
False, so this handles that case ok.

But, I do wonder if distributed migration may need to deal with the old
key getting copied into the repository later. In that situation,
re-running migrate --update won't link it to the new key. It may be that
some users will need that. They can delete .git/annex/migrate.log and
run it again, but that is not a good user interface. Maybe either have
a way to re-run all distributed migrations, or record migrations
in a database and scan the db to find migrations to do in a future run?

Sponsored-by: Kevin Mueller on Patreon
2023-12-07 17:27:51 -04:00
Joey Hess
7c7c9912c1
migrate --update gets keys
The git log is outputting the diff, but this only looks at the new
files. When we have a new file, we can get the old filename by just
replacing "new" with "old". And then use branchFileRef to refer to it
allows catting the old key.

While this does have to skip past the old files in the diff, it's still
faster than calling git diff separately.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-12-07 17:25:56 -04:00
Joey Hess
f1ce15036f
started migrate --update
This is most of the way there, but not quite working.

The layout of migrate.tree/ needs to be changed to follow this approach.
git log will list all the files in tree order, so the new layout needs
to alternate old and new keys. Can that be done? git may not document
tree order, or may not preserve it here.

Alternatively, change to using git log --format=raw and extract
the tree header from that, then use
git diff --raw $tree:migrate.tree/old $tree:migrate.tree/new
That will be a little more expensive, but only when there are lots of
migrations.

Sponsored-by: Joshua Antonishen on Patreon
2023-12-07 15:50:52 -04:00
Joey Hess
0bd8b17b59
log migration trees to git-annex branch
This will allow distributed migration: Start a migration in one clone of
a repo, and then update other clones.

commitMigration is a bit of a bear.. There is some inversion of control
that needs some TMVars. Also streamLogFile's finalizer does not handle
recording the trees, so an interrupt at just the wrong time can cause
migration.log to be emptied but the git-annex branch not updated.

Sponsored-by: Graham Spencer on Patreon
2023-12-06 15:40:03 -04:00
Joey Hess
b55efc179a
add startAction parameter for KeySha
I have a use planned for this in Command.Migrate.

Sponsored-by: unqueued on Patreon
2023-12-06 13:28:02 -04:00
Joey Hess
0485dd3161
sync: Fix locking problems during merge when annex.pidlock is set
Presumably git merge sometimes needs to verifiy if a worktree file is
modified, and so will then run git-annex filter-process which would try to
take the pid lock. And for whatever reason, git-annex sync already had the
pidlock held. I have not replicated that, but it does make enough sense to
deploy the workaround.

Like I said back in commit 7bdb0cdc0d,

   Arguably, it would be better to have a way to make any process git-annex
   runs have the env var set. But then it would need to take the pid lock
   when running any and all processes, and that would be a problem when
   git-annex runs two processes concurrently. So, I'm left doing it ad-hoc
   in places where git-annex really does run a child process, directly
   or indirectly via a particular git command.

Sponsored-by: KDM on Patreon
2023-12-04 13:40:28 -04:00
Joey Hess
1e31bf8122
copy/move --from-anywhere --to remote
Implementation was simple because it's equivilant to
--from=foo --to remote for each other remote, followed by
--to remote when there's a local copy.

(Or, in the edge case of --from-anywhere --to=here,
it's the same as --to=here.)

Note that, when the local repo does not have a copy,
fromToPerform gets it from a remote, sends it to the destination,
and drops the local copy. Another call to that for a second remote
will notice that the dest now has a copy, and simply drop from the
second remote, avoiding a second transfer.

Also note that, when numcopies doesn't allow dropping it from
everywhere, it will drop it from the cheapest remotes first
(maybe not ideal) up to more expensive remotes, and finally from the local
repo. So the local repo will generally end up holding a copy. Maybe not
ideal in all cases either, but it seems no worse to do that than to end up
with a copy undropped from a remote.

And I'm not entirely happy with the output, eg:

	copy bigfile (from r3...) ok
	copy bigfile ok

That makes sense if you think of the second line as being
the same as what is output by `git-annex copy bigfile --to bar`,
but it's less clear in this context. Maybe add "(from here...)"?
Also the --json output doesn't have a machine-readable field for
the "from" uuid, and maybe it should?

Sponsored-by: Dartmouth College's DANDI project
2023-11-30 16:34:30 -04:00
Joey Hess
1654572bc1
fix --from overriding annex-ignore
Make git-annex get/copy/move --from foo override configuration of
remote.foo.annex-ignore, as documented.

This already worked for remotes supporting hasKeyCheap. For others though,
git-annex copy --from foo would silently not do anything, while
git-annex copy --to foo would use the annex-ignored remote.

Also improved the annex-ignore docs, to reflect that `git-annex get`
without --from will skip using annex-ignored remotes, for example.

Sponsored-by: Dartmouth College's DANDI project
2023-11-30 15:12:07 -04:00
Joey Hess
6e3bcbf4dd
Make git-annex copy --from --to --fast actually fast
Eg when the destination is logged as containing a file, skip
actively checking that it does contain it.

Note that --fast does not prevent other verifications of content
location that are done in a copy --from --to. Perhaps it could, but this
change will already avoid the real unnecessary work of operating on
files that are already in the remote.

And avoiding other verifications
might cause it to fail if the location log thinks that --to does not
contain the content but does. Such complications with `git-annex copy
--to remote --fast` led to commit d006586cd0
which added a note that gets displayed when that fails, mentioning it
might be due to --fast being enabled.

copy --from --to is already complicated enough without needing to worry
about such edge cases, so continuing to doing some verification of
content location after the initial --fast filtering seems ok.

Sponsored-by: Dartmouth College's DANDI project
2023-11-17 17:37:58 -04:00
Joey Hess
96c0e720c5
small refactoring
Sponsored-by: Dartmouth College's DANDI project
2023-11-17 17:23:24 -04:00
Joey Hess
7a8393ce7d
Fix bug in git-annex copy --from --to
Caused it to skip files that were locally present.

Sponsored-by: Dartmouth College's DANDI project
2023-11-17 16:30:20 -04:00
Joey Hess
7d67229884
git-annex log --gnuplot
The gnuplot output is pretty good, but could still be improved with:

* more colors (repeating colors is confusing with a lot of repos)
* better positioning of the legend, making the plot wider and moving it
  from over top of the graph

Sponsored-by: Kevin Mueller on Patreon
2023-11-14 14:56:58 -04:00
Joey Hess
0fdc1a54db
git-annex log --received modifier option
Only counting received and not dropped makes this show the bandwidth of
data coming into the repository, although only in a sense. Since
git-annex branch updates only happen at the end of a command, and we
don't know when a command started, it's only an approximation of the
actual bandwidth. (A previous git-annex branch update made have
happened in a different repository.)

It would be possible to also add a --dropped option, but I don't know
how useful that would be?

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-11-14 14:04:46 -04:00
Joey Hess
21e66ce209
rename --when to --interval
More accurately describes its behavior.
2023-11-14 11:45:16 -04:00
Joey Hess
cde081a025
guard against wrong timestamps in git log
For example, my sound repo has in the git-annex branch a commit from
2036, which is followed by one from 2034, in amoung commits from 2013.
Clearly there was a problem with the clock.

Since git log --date-order has a behavior of
"Show no parents before all of its children are shown", the data still
gets processed ok. The future timestamp just prevented displaying data
after that commit. It seems better, when the clock was wrong, to display
a wrong date, and then return to right dates.

It would be nice to filter out the wrong dates from display entirely,
but that seems it would need to buffer the whole output. This command is
too slow to buffer it all before displaying anything, and anyway this
kind of problem is probably rare.

Sponsored-by: Joshua Antonishen on Patreon
2023-11-13 15:06:04 -04:00
Joey Hess
2ab11fe06e
treat dead repos as 0 size
With this, git annex log --totalsizes can be compared with
git-annex info's "combined annex size of all repositories"
to double-check it works correctly.

In my sound repo, the two match.

In my big repo, the two report slightly different sizes,
with the former being 1.3 gb smaller than the latter.
I don't know the reason for this disreprency. Given the 30+tb size of
the repo, it's a small difference.

It seems possible that a bug in an old version of git-annex could
explain it. Eg, if an old git-annex lost a line when updating trust.log
or a location log in a merge, git-annex info would see only what it
replaced it with, while git-annex log will see the previous value as
well.

Sponsored-by: Leon Schuermann on Patreon
2023-11-13 15:05:48 -04:00
Joey Hess
38b9ebc5fd
newtype MapLog
Noticed that Semigroup instance of Map is not suitable to use
for MapLog. For example, it behaved like this:

ghci>  parseTrustLog "foo 1 timestamp=10\nfoo 2 timestamp=11" <> parseTrustLog "foo X timestamp=12"
fromList [(UUID "foo",LogEntry {changed = VectorClock 11s, value = SemiTrusted})]

Which was wrong, it lost the newer DeadTrusted value.

Luckily, nothing used that Semigroup when operating on a MapLog. And this
provides a safe instance.

Sponsored-by: Graham Spencer on Patreon
2023-11-13 14:37:22 -04:00
Joey Hess
5d8b8a8ad0
git-annex log --totalsizes
Note that dead repositories are not yet handled so their sizes show as
nonzero after they are marked dead.

Sponsored-By: unqueued on Patreon
2023-11-13 13:15:36 -04:00
Joey Hess
dc02236c85
git-annex log --sizes
CSV format so it can be fed into a program to graph it.

Note that dead repositories are not yet handled so their sizes show as
nonzero after they are marked dead.

Sponsored-By: k0ld on Patreon
2023-11-13 13:07:22 -04:00
Joey Hess
6203b8afba
the last line is for the current time 2023-11-10 17:37:55 -04:00
Joey Hess
574514545c
git-annex log --sizesof
This can take a lot of memory. I decided to violate the usual rule in
git-annex that it operate in constant memory no matter how many annexed
objects. In this case, it would be hard to be fast without using a big
map of the location logs. The main difficulty here is that there can be
many git-annex branches and it needs to display a consistent view at a
point in time, which means merging information from multiple git-annex
branches.

I have not checked if there are any laziness leaks in this code. It
takes 1 gb to run in my big repo, which is around what I estimated
before writing it.

2 options that are documented are not yet implemented.

Small bug: With eg --when=1h, it will display at 12:00 then 1:10 if the
next change after 12:59 is then. Then it waits until after 2:10 to
display the next change. It ought to wait until after 2:00.

Sponsored-by: Brock Spratlen on Patreon
2023-11-10 17:26:10 -04:00
Joey Hess
561c036664
split out generic git log parser
Sponsored-By: Jack Hill on Patreon
2023-11-10 15:40:03 -04:00
Joey Hess
11cc9f1933
info: Added calculation of combined annex size of all repositories
Factored out overLocationLogs from CmdLine.Seek, which can calculate this
pretty fast even in a large repo. In my big repo, the time to run git-annex
info went up from 1.33s to 8.5s.

Note that the "backend usage" stats are for annexed files in the working
tree only, not all annexed files. This new data source would let that be
changed, but that would be a confusing behavior change. And I cannot
retitle it either, out of fear something uses the current title (eg parsing
the json).

Also note that, while time says "402108maxresident" in my big repo now,
up from "54092maxresident", top shows the RES constant at 64mb, and it
was 48mb before. So I don't think there is a memory leak. I tried using
deepseq to force full evaluation of addKeyCopies and memory use didn't
change, which also says no memory leak. And indeed, not even calling
addKeyCopies resulted in the same memory use. Probably the increased memory
usage is buffering the stream of data from git in overLocationLogs.

Sponsored-by: Brett Eisenberg on Patreon
2023-11-08 13:35:11 -04:00
Joey Hess
8768966d97
improve comments 2023-11-08 12:06:03 -04:00
Joey Hess
f8d35d9480
lookupkey: Sped up --batch
When the file is relative, it does not need to be passed
through git lsfiles to normalize it.

Sponsored-by: Kevin Mueller on Patreon
2023-10-30 14:59:09 -04:00
Joey Hess
d9fd205cbb
push RawFilePath down into Annex.ReplaceFile
Minor optimisation, but a win in every case, except for a couple where
it's a wash.

Note that replaceFile still takes a FilePath, because it needs to
operate on Chars to truncate unicode filenames properly.
2023-10-26 13:36:49 -04:00
Joey Hess
c873586e14
eliminate s2w8 and w82s
Note that the use of s2w8 in genUUIDInNameSpace made it truncate unicode
characters. Luckily, genUUIDInNameSpace is only ever used on ASCII
strings as far as I can determine. In particular, git-remote-gcrypt's
gcrypt-id is an ASCII string.
2023-10-26 13:12:57 -04:00
Joey Hess
8bde6101e3
sqlite datbase for importfeed
importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.

Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.

Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.

Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.

Sponsored-by: unqueued on Patreon
2023-10-23 16:46:22 -04:00
Joey Hess
41f4d0bda9
enableremote: Avoid overwriting existing git remote when passed the uuid of a specialremote that was earlier initialized with the same name 2023-09-22 13:29:48 -04:00
Joey Hess
ef7c867238
fix some build warnings from ghc 9.4.6
It now notices that a RepoLocation may not be Local, in which case
pattern matching on Local wouldn't do.
2023-09-21 13:40:22 -04:00
Joey Hess
a147a31baa
fix some build warnings from ghc 9.4.6
For some reason it doesn't notice that req must be a Req, because
the toplevel function matched on that.
2023-09-21 13:38:36 -04:00
Joey Hess
a18e40bdd7
lookupkey: Added --ref option
Sponsored-by: Joshua Antonishen on Patreon
2023-09-12 12:49:11 -04:00
Joey Hess
7be8950138
propigateAdjustedCommits in seekExportContent
push: When on an adjusted branch, propagate changes to parent branch before
updating export remotes.

This is a somewhat redundant call to propigateAdjustedCommits, since it
also gets called at pushLocal time. That other one needs to come after
importing from importtree remotes though, and seekExportContent has to come
earlier, so I don't see a way to avoid doing it twice.

Note that git-annex sync also manages to avoid the problem, it's only
git-annex push that had the bug.

Sponsored-by: Leon Schuermann on Patreon
2023-09-11 14:54:26 -04:00
Joey Hess
aeaadb8eb8
improve warning message when unable to update export
A misleading message was displayed in several cases.

If the user has run eg: git config
remote.push-win-remote.annex-tracking-branch 'adjusted/main(unlocked)'
That is not supported, and now it will tell them it's not a valid
configuration. A user reported doing that, but I don't know if it's a
common point of confusion. If it is a common problem, a better message
would be possible, or it could convert back from the adjusted branch to
the actual branch.

Sponsored-by: Graham Spencer on Patreon
2023-09-11 14:21:36 -04:00
Joey Hess
49b97b0675
oldkeys: check associated files by default and add --unchecked
Removed the prior code that checked for keys used by current versions of
the files being acted on. It is redundant with the associated files
check (so long as the associated files database is always up-to-date,
which reconcileStaged should accomplish).

Sponsored-by: Luke T. Shumaker on Patreon
2023-08-23 13:46:41 -04:00
Joey Hess
5489c2cdd6
oldkeys --revision-range
Sponsored-by: Brett Eisenberg on Patreon
2023-08-22 15:00:29 -04:00
Joey Hess
cf8b30c914
oldkeys: New command that lists the keys used by old versions of a file
The tricky thing about this turned out to be handling renames and reverts.
For that, it has to make two passes over the git log, and to avoid
buffering a possibly huge amount of logs in memory (ie the whole git log of
an entire repository!), runs git log twice.

(It might be possible to speed this up by asking git log to show a diff,
and so avoid needing to use catKey.)

Sponsored-By: Brock Spratlen on Patreon
2023-08-22 14:51:06 -04:00
Joey Hess
379d58b499
diffdriver: Added --get option
Removed the dontCheck repoExists, because running it in a repo that has not
been initialized yet would update location log with nouuid. And I guess
it's ok for it to only support running in git-annex repos.
2023-08-22 11:58:53 -04:00
Joey Hess
67c99a4db7
info: Added available to the info displayed for a remote
Sponsored-by: Kevin Mueller on Patreon
2023-08-16 14:52:58 -04:00
Joey Hess
9286769d2c
let Remote.availability return Unavilable
This is groundwork for making special remotes like borg be skipped by
sync when on an offline drive.

Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension
to the external special remote protocol. The extension is needed because
old git-annex, if it sees that response, will display a warning
message. (It does continue as if the remote is globally available, which
is acceptable, and the warning is only displayed at initremote due to
remote.name.annex-availability caching, but still it seemed best to make
this a protocol extension.)

The remote.name.annex-availability git config is no longer used any
more, and is documented as such. It was only used by external special
remotes to cache the availability, to avoid needing to start the
external process every time. Now that availability is queried as an
Annex action, the external is only started by sync (and the assistant),
when they actually check availability.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-08-16 14:31:31 -04:00
Joey Hess
7f7c95b771
move comment 2023-08-16 13:19:17 -04:00
Joey Hess
10b5f79e2d
fix empty tree import when directory does not exist
Fix behavior when importing a tree from a directory remote when the
directory does not exist. An empty tree was imported, rather than the
import failing. Merging that tree would delete every file in the
branch, if those files had been exported to the directory before.

The problem was that dirContentsRecursive returned [] when the directory
did not exist. Better for it to throw an exception. But in commit
74f0d67aa3 back in 2012, I made it never
theow exceptions, because exceptions throw inside unsafeInterleaveIO become
untrappable when the list is being traversed.

So, changed it to list the contents of the directory before entering
unsafeInterleaveIO. So exceptions are thrown for the directory. But still
not if it's unable to list the contents of a subdirectory. That's less of a
problem, because the subdirectory does exist (or if not, it got removed
after being listed, and it's ok to not include it in the list). A
subdirectory that has permissions that don't allow listing it will have its
contents omitted from the list still.

(Might be better to have it return a type that includes indications of
errors listing contents of subdirectories?)

The rest of the changes are making callers of dirContentsRecursive
use emptyWhenDoesNotExist when they relied on the behavior of it not
throwing an exception when the directory does not exist. Note that
it's possible some callers of dirContentsRecursive that used to ignore
permissions problems listing a directory will now start throwing exceptions
on them.

The fix to the directory special remote consisted of not making its
call in listImportableContentsM use emptyWhenDoesNotExist. So it will
throw an exception as desired.

Sponsored-by: Joshua Antonishen on Patreon
2023-08-15 12:57:41 -04:00
Joey Hess
d467c70ef7
change sync content transition plan and fine tune warning
Only display warning when git-annex sync (without --content or
--no-content) is used with repositories that have preferred content
configured.

Sponsored-by: Leon Schuermann on Patreon
2023-08-14 13:51:35 -04:00
Joey Hess
be028f10e5
split out Utility.Url.Parse
This is mostly for git-repair which can't include all of Utility.Url
without adding many dependencies that are not really necessary.
2023-08-14 12:28:10 -04:00
Joey Hess
3efad7f5f4
info: Added --dead-repositories option
I considered a more wide-ranging config option to make other commands
also show dead repositories. But it would be difficult to implement that
because Remote.keyLocations is used to get locations, filtering out dead
repos, and commands like get then try to use those locations. So a config
setting would make dead repos sometimes be acted on by commands.

Sponsored-by: unqueued on Patreon
2023-08-09 12:43:48 -04:00
Joey Hess
68c9b08faf
fix build with unix-2.8.0
Changed the parameters to openFd. So needed to add a small wrapper
library to keep supporting older versions as well.
2023-08-01 18:41:27 -04:00
Joey Hess
aa5e333cb7
fix whitespace
Thanks to a compile warning from new ghc
2023-08-01 18:36:54 -04:00
Joey Hess
518a51a8a0
--explain for preferred/required content matching
And annex.largefiles and annex.addunlocked.

Also git-annex matchexpression --explain explains why its input
expression matches or fails to match.

When there is no limit, avoid explaining why the lack of limit
matches. This is also done when no preferred content expression is set,
although in a few cases it defaults to a non-empty matcher, which will
be explained.

Sponsored-by: Dartmouth College's DANDI project
2023-07-26 14:50:04 -04:00
Joey Hess
7f38355860
dropunused: Support --jobs
Sponsored-by: Kevin Mueller on Patreon
2023-07-21 14:03:34 -04:00
Joey Hess
7fc6503812
fix waiting for all started feed downloads with -J
importfeed bug fix: When -J was used with multiple feeds, some feeds did
not get their items downloaded.

In my case, I had added a feed to the end of the list, and no items from it
were ever downloaded.

Sponsored-by: Leon Schuermann on Patreon
2023-07-11 22:08:35 -04:00
Joey Hess
240bae38f6
sync: When in an adjusted branch, merge changes from the original branch
This causes changes to the original branch to get merged with a single
sync. Before, it took 2 syncs; the first happened to update the synced/
branch, and the second merged changes from the synced/ branch into the
ajusted branch.

Using mergeToAdjustedBranch when tomerge == origbranch is probably
overkill, but it does work fine.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-07-06 12:42:24 -04:00
Joey Hess
51b24aac91
importfeed: Add feedurl to the metadata
(And allow it to be used in the --template although that seems unlikely to
be very useful.)

My use case for this is that one of the podcast feeds I subscribe to is
sometimes leaking episodes of some other podcast. The other podcast is also
very close to spam, so this may be a form of intentional spamming. I have
not been able to catch the podcast feed containing those episodes, so I
don't know which one is at fault. So putting this in the metadata will let
me eventually catch it.
2023-07-06 00:11:38 -04:00
Joey Hess
3d810726af
diffdriver --text support options for diff
Sponsored-by: KDM on Patreon
2023-07-05 15:43:29 -04:00
Joey Hess
3c1d18cb3b
assist: With --jobs, parallelize transferring content to/from remotes
Command.Add.seek starts concurrency with CommandStages. And for
Command.Sync, it needs TransferStages. So, to get both types of concurrency
for the two different parts, it either needs to change the type of
concurrency in between, or just call startConcurrency once for each.

It seems safe enough to call startConcurrency twice, because it does shut
down concurrency (mostly) at the end, and eg the old Annex.workers get
emptied.

Sponsored-by: unqueued on Patreon
2023-07-05 12:47:30 -04:00
Joey Hess
e1fc9e204e
added git-annex satisfy
This ended up having an interface like sync, rather than like get/copy/drop.
That let it be implemented in terms of sync, which took a lot less code.
Also, it lets it handle many of the edge cases that sync does, such as
getting files that are not visible in a --hide-missing branch, and sending
files to exporttree remotes.

As well as being easier to implement, `git-annex satisfy myremote` makes
sense as it satisfies the preferred content settings of the remote.
`git-annex satisfy somefile` does not form a sentence that makes sense. So
while -C can be a little bit annoying, it still makes sense to have this
syntax.

Note that, while I initially thought this would also satisfy numcopies, it
does not. Arguably it ought to. But, sync does not send files in order to
satisfy numcopies, it only sends files to satisfy preferred content. And
it's important that this transfer the same files as sync does, because
it will probably be used in a workflow where the user sometimes syncs and
sometimes satisfies, and does not expect satisfy to do things that sync
would not do.

(Also opened a new bug that also affects sync et all, not only this command.)

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-06-29 15:34:53 -04:00
Joey Hess
d5c6197791
diffdriver: Added --text option for easy diffing of the contents of annexed text files
This was already possible, but it was rather hard to come up with the
complex shell command needed.

Note that the diff output starts with "diff a/... b/...".
I left off the "--git" because it's not a git format diff.
2023-06-28 15:27:16 -04:00
Joey Hess
549d390d03
display drop from remote more consistently
With eg copy --to remote

This is particularly an improvement in sync --content output, which
mixes the two, so it's nice to have consistent display.
2023-06-27 19:01:33 -04:00
Joey Hess
d98aa35b3b
reinject: Added --guesskeys option
Sponsored-by: Noam Kremen on Patreon
2023-06-26 14:05:31 -04:00
Joey Hess
39f3d783fe
consolidate 2023-06-20 15:10:11 -04:00
Joey Hess
72715845a1
display destination file before youtube-dl download
Rather than after it, which can leave one wondering what file it's
downloading.

youtubeDl should not ever return Right Nothing in normal operation,
becaause it's already asked youtube-dl if it supports the url. So it
would have to succeed at that, then not download any file, but also
exit successfully, in order for the new error message to display.

Also display the name of yt-dlp when using it.
2023-06-20 14:55:25 -04:00
Joey Hess
958c2fa6d2
Improve resuming interrupted download when using yt-dlp or youtube-dl
Fixes a failure like this:

curl: (33) HTTP server doesn't seem to support byte ranges. Cannot resume.

That happens because the whole web page has already been downloaded
previously, and kept, so now addurl tries to download it, and curl asks the
server to resume from the last byte. And youtube.com can't, for whatever
stupid reason.

So, delete the temp file after determining that youtube-dl can be used.
2023-06-19 15:01:47 -04:00
Joey Hess
1f09b709fc
skip sending individual files to export remotes
That will fail, and it already exports whole trees.
f6dd34ca81 made it sync content with
import remotes, and if an import remote is also an export remote, that
caused this new failure mode.

Sponsored-by: Brock Spratlen on Patreon
2023-06-19 11:24:32 -04:00
Joey Hess
64738ea157
config: Added the --show-origin and --for-file options
* config: Added the --show-origin and --for-file options.
* config: Support annex.numcopies and annex.mincopies.

There is a little bit of redundancy here with other code elsewhere that
combines the various configs and selects which to use. But really only
for the special case of annex.numcopies, which is a git config that does
not override the annex branch setting and for annex.mincopies, which does
not have a git config but does have gitattributes settings as well as the
annex branch setting.

That seems small enough, and unlikely enough to grow into a mess that it was
worth supporting annex.numcopies and annex.mincopies in git-annex config
--show-origin. Because these settings are a prime thing that someone might
get confused about and want to know where they were configured.

And, it followed that git-annex config might as well support those two
for --set and --get as well. While this is redundant with the speclialized
commands, it's only a little code and it makes it more consistent.

Note that --set does not have as nice output as numcopies/mincopies
commands in some special cases like setting to 0 or a negative number.
It does avoid setting to a bad value thanks to the smart
constructors (eg configuredNumCopies).

As for other git-annex branch configurations that are not set by git-annex
config, things like trust and wanted that are specific to a repository
don't map to a git config name, so don't really fit into git-annex config.
And they are only configured in the git-annex branch with no local override
(at least so far), so --show-origin would not be useful for them.

Sponsored-by: Dartmouth College's DANDI project
2023-06-12 16:24:31 -04:00
Joey Hess
f6dd34ca81
sync content with import remotes
This didn't used to be needed because importKeys would import all
content and so doing another pass was redundant.

But since 40017089f2 it uses
importChanges, so only new files are imported. If a file that was
already imported before was dropped, that would prevent sync --content
from gettng its content again.

Sponsored-by: Jack Hill on Patreon
2023-06-01 18:52:19 -04:00
Joey Hess
40017089f2
use importChanges optimisation
Large speed up to importing trees from special remotes that contain a lot
of files, by only processing changed files.

Benchmarks:

Importing from a special remote that has 10000 files, that have all been
imported before, and 1 new file sped up from 26.06 to 2.59 seconds.

An import with no change and 10000 unchanged files sped up from 24.3 to
1.99 seconds.

Going up to 20000 files, an import with no changes sped up from
125.95 to 3.84 seconds.

Sponsored-by: k0ld on Patreon
2023-06-01 13:47:00 -04:00
Joey Hess
c6acf574c7
implement importChanges optimisaton (not used yet)
For simplicity, I've not tried to make it handle History yet, so when
there is a history, a full import will still be done. Probably the right
way to handle history is to first diff from the current tree to the last
imported tree. Then, diff from the current tree to each of the
historical trees, and recurse through the history diffing from child tree
to parent tree.

I don't think that will need a record of the previously imported
historical trees, and so Logs.Import doesn't store them. Although I did
leave room for future expansion in that log just in case.

Next step will be to change importTree to importChanges and modify
recordImportTree et all to handle it, by using adjustTree.

Sponsored-by: Brett Eisenberg on Patreon
2023-05-31 16:01:34 -04:00
Joey Hess
f1cdb79ca4
assist: honor gitignore
Sponsored-by: Graham Spencer on Patreon
2023-05-24 14:04:09 -04:00
Joey Hess
0f89d221bd
version: Avoid error message when entire output is not read
Sponsored-by: Dartmouth College's Datalad project
2023-05-19 15:00:57 -04:00
Joey Hess
9ed59dab5b
assist: operate on all files in working tree by default
Consistency with sync and internal consistency is more important than
consistency with the assistant, which is not itself consistent about
what it does when run in a subdirectory.

Note that with -C, it will still commit staged changes to files outside
the directory. Like sync does. Presumably if the user is manually
staging things, then running this command, they intend to build up a
commit.

Sponsored-by: unqueued on Patreon
2023-05-19 14:47:05 -04:00
Joey Hess
908b9687cc
assist: fix bug committing just added file when -J is used
Need to wait for worker threads adding files before flushing the queue.
2023-05-18 15:02:10 -04:00
Joey Hess
fc3df2da9e
assist: fix bug commiting just added file 2023-05-18 14:56:13 -04:00
Joey Hess
9619f562f4
fix assist to commit 2023-05-18 14:50:05 -04:00
Joey Hess
9ec5d7fb05
improve --cleanup desc 2023-05-18 14:41:20 -04:00
Joey Hess
e955912ad0
git-annex assist
assist: New command, which is the same as git-annex sync but with
new files added and content transferred by default.

(Also this fixes another reversion in git-annex sync,
--commit --no-commit, and --message were not enabled, oops.)

See added comment for why git-annex assist does commit staged
changes elsewhere in the work tree, but only adds files under
the cwd.

Note that it does not support --no-commit, --no-push, --no-pull
like sync does. My thinking is, why should it? If you want that
level of control, use git commit, git annex push, git annex pull.
Sync only got those options because pull and push were not split
out.

Sponsored-by: k0ld on Patreon
2023-05-18 14:37:43 -04:00
Joey Hess
935330aaab
fix inverted logic (fixes test fail)
Sponsored-by: Jack Hill on Patreon
2023-05-18 10:48:08 -04:00
Joey Hess
f93a7fce1d
sync: Started transition to --content being enabled by default
When used without --content or --no-content, warn about the upcoming
transition, and suggest using one of the options, or setting
annex.synccontent.

Sponsored-by: Brett Eisenberg on Patreon
2023-05-17 13:23:42 -04:00
Joey Hess
af6b73a7e6
push: Support --cleanup
This option is not specific to sync, so it seemed it should be in either
pull or push as well as sync. Since it does modify the remote, it seems
better to have it in push; the modification of the local repo pulls in
the direction of pull, but not hard enough.

Maybe it would be better to have it in both?

Sponsored-by: Luke Shumaker on Patreon
2023-05-17 12:51:49 -04:00
Joey Hess
40731ff9fd
sync: Added -g as a short option for --no-content
I anticipate that if sync is transitioned to syncing content by default,
people will want a short option. And in repositories where
annex.synccontent = true, they already would. And pull and push sync
content by default, so a short option is useful with them too.

Mnemonic: -g makes only git data be synced
Also, -a makes only annex data be synced.

Would have preferred -c, which would complement -C, but it
was already taken to set git configs.

Sponsored-by: Noam Kremen on Patreon
2023-05-17 12:34:26 -04:00
Joey Hess
5df89d58c7
git-annex pull and push
Split out two new commands, git-annex pull and git-annex push. Those plus a
git commit are equivilant to git-annex sync.

In a sense, git-annex sync conflates 3 things, and it would have been
better to have push and pull from the beginning and not sync. Although
note that git-annex sync --content is faster than a pull followed by a
push, because it only has to walk the tree once, look at preferred
content once, etc. So there is some value in git-annex sync in speed, as
well as user convenience.

And it would be hard to split out pull and push from sync, as far as the
implementaton goes. The implementation inside sync was easy, just adjust
SyncOptions so it does the right thing.

Note that the new commands default to syncing content, unless
annex.synccontent is explicitly set to false. I'd like sync to also do
that, but that's a hard transition to make. As a start to that
transition, I added a note to git-annex-sync.mdwn that it may start to
do so in a future version of git-annex. But a real transition would
necessarily involve displaying warnings when sync is used without
--content, and time.

Sponsored-by: Kevin Mueller on Patreon
2023-05-16 16:51:07 -04:00
Joey Hess
b1c396a695
remove unused imports 2023-05-16 16:33:02 -04:00
Joey Hess
2e984c51b6
sync --no-pull and --no-push affect download and upload of content
The man page is somewhat vague about this, but I do think it was a bug
that these options didn't alreay behave that way. The options are
documented to disable imports and exports, which is the same operations
just with a special remote that uses trees.

The real motivation for this is that I'm adding git-annex pull and
git-annex push, and I want these options to turn off the equivilant of
those commands. And git-annex pull will certianly download and push
upload.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-05-16 16:25:23 -04:00
Joey Hess
212442dd9b
pullOption should be pushOption in seekExportContent
sync: Fix bug that made --no-pull, rather than --no-push prevent exporting
trees to special remotes.

Sponsored-by: Joshua Antonishen on Patreon
2023-05-16 15:55:24 -04:00
Joey Hess
07e0d2a35b
clean up uninit output
Don't think including the location of .git/annex/objects in the json is
really useful.
2023-05-11 13:52:22 -04:00
Joey Hess
55dfa929d6
uninit: remove unncessary ExistSuccess
That was added in 2011 to prevent writing to the git-annex branch on
shutdown. But, the use of saveState causes pending git-annex branch
writes to be completed before the branch is deleted. So, an unusual exit
is not needed.
2023-05-11 13:50:20 -04:00
Joey Hess
271f3b1ab4
uninit: Support --json and --json-error-messages
Had to convert uninit to do everything that can error out inside a
CommandStart. This was harder than feels nice.

(Also, in passing, converted CommandCheck to use a data type, not a
weird number that it was not clear how it managed to be unique.)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-11 13:43:02 -04:00
Joey Hess
1904cebbb3
fix typo 2023-05-11 13:26:55 -04:00
Joey Hess
02cfef1f91
uninit: Avoid buffering the names of all annexed files in memory
Oops, using the same list twice does prevent streaming in constant memory.

Sponsored-by: unqueued on Patreon
2023-05-11 13:25:55 -04:00
Joey Hess
b8d9c18e98
remove unused import 2023-05-11 13:24:34 -04:00
Joey Hess
cd5108bb47
uninit: remove undocumented suport for specifying files to act on
I think this was just copied from another command without paying
attention to what it did, because there does not seem to be any valid
reason to want to only unannex some files when running uninit.
2023-05-11 13:23:41 -04:00
Joey Hess
de84abb210
configremote: Support --json and --json-error-messages
Seems unlikely to be too useful, but who knows.

Moved the checkSafeConfig call to happen after an action is started, so
it will be captured by --json-error-messages

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 14:21:42 -04:00
Joey Hess
a242eabc7a
enableremote: Support --json and --json-error-messages
Seems unlikely to be too useful, but who knows. Was trivial anyway.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 14:09:27 -04:00
Joey Hess
b3cc8dbacb
initremote: Support --json and --json-error-messages
Including special --whatelse handling.

Otherwise, it seems unlikely to be too useful, but who knows.

Refactored code to call starting before displaying error messages.
This makes the error messages be captured by --json-error-messages

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 14:03:40 -04:00
Joey Hess
9812d9aaec
support aeson for Map
Make unused --json use it, which is better than the doubly nested lists
it was using.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 13:51:37 -04:00
Joey Hess
8d8e044458
upgrade: Support --json and --json-error-messages and --json-progress
Seems unlikely to be very useful, but trivial.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 12:54:48 -04:00
Joey Hess
c98fb0b637
merge: Support --json and --json-error-messages and --json-progress
Seems unlikely to be very useful, but trivial.
And, this completes the story that git-annex sync does not need json,
since every sub-operation is available in a command that does support json.
(Well, except for committing, but that's not a git-annex command.)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-10 12:34:19 -04:00
Joey Hess
2fdf0ae38d
include url in json output
The input field is consistently the url of the feed, which makes sense
as that is the user input, but to differentiate multiple urls downloaded
from a feed when using --json-progress -J, need the url that is being
downloaded too.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:59:44 -04:00
Joey Hess
7919349cee
importfeed: Support --json and --json-error-messages and --json-progress
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:51:16 -04:00
Joey Hess
6b54ea69e3
importfeed: Move error to where --json-error-messages can capture it
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:27:23 -04:00
Joey Hess
04ee6c4c6b
importfeed: Support -J (and work toward supporting --json)
Both -J and --json needed importfeed to be refactored to use commandAction.

That was difficult, because of the interrelated nature of downloading feeds
and then downloading files from feeds, both of which needed to use
commandAction. And then checking for problems in feeds has to come after
these actions, which may be run as background jobs.

As for --json support, it's most of the way there, but still has some
warts, so I didn't enable jsonOptions yet. The warts include:

- An initial empty json record is displayed by getCache.
- Input is not populated, should be feed url
- feedProblem at end will not be captured by --json-error-messages
  (see FIXME)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:13:56 -04:00
Joey Hess
a71c831949
renameremote: Support --json and --json-error-messages
Seems unlikely to be useful, but it works so

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 16:25:40 -04:00
Joey Hess
a5d0c85ae1
factor out maybeAddJSONField
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 16:15:41 -04:00
Joey Hess
3d8f93dc0a
reinject: Support --json and --json-error-messages
Also fix support for operating on multiple pairs of files and keys.

Moved notAnnexed to inside starting, so error message will get into the json.

Cannot include the key in the starting as it's not known yet, so instead
add it to the json later.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 15:43:37 -04:00
Joey Hess
91b9915b09
reinit: Support --json and --json-error-messages
Basically same concerns as init..

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 15:07:40 -04:00
Joey Hess
f09a248fe2
init: Support --json and --json-error-messages
Dunno how useful this will be, since about all that's accessible from
the json is whether it succeeded or failed, and the error messages
which were already on stderr.

Note that, when autoenabling a special remote, it would be possible for
one to stop and prompt or output not using Messages and so not output as
part of the json. I don't think that happens, but I'm not 100% sure
something doesn't manage to break it. Of course, the same could be the
case for commands that transfer objects. Using Annex.Init.autoEnableSpecialRemotes
in --json mode would avoid the problem, but I've chosen to wait until I
know it's needed to use it.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 14:58:08 -04:00
Joey Hess
c208442292
unused: Support --json and --json-error-messages
Generalized AddJSONActionItemField to allow it to add several fields. Not entirely
happy with that, since the names of the fields have to be carefully chosen to
not conflict with other json fields. And fields added that way can't be parsed
back in FromJSON, except for the "fields" field that is special cased for metadata.
Still, I couldn't see another way to do it.

Also, omit file:null from the json output. Which does affect other commands,
eg git-annex whereis --all --json. Hopefully that won't break something that expects
a null file. If it did, that could be reverted, but it would be ugly to have
file:null in the unused --json

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-08 14:39:57 -04:00
Joey Hess
365dbc89dc
expire, trust et al, dead, describe: Support --json and --json-error-messages
For expire, the normal output is unchanged, but the --json output includes the uuid
in machine parseable form. Which could be very useful for this somewhat obscure
command. That needed ActionItemUUID to be implemented, which seemed like a lot
of work, but then ---

I had been going to skip implementing them for trust, untrust, dead, semitrust,
and describe, but putting the uuid in the json is useful information, it tells
what uuid git-annex picked given the input. It was not hard to support
these once ActionItemUUID was implemented.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-05 15:33:30 -04:00
Joey Hess
1a9af823bc
addunused, dropunused: Support --json and --json-error-messages
This also changes addunused to display the names of the files that it adds.
That seems like a general usability improvement, and not displaying the input
number does not seem likely to be a problem to a user, since the filename
is based on the key. Displaying the filename was necessary to get it and the key
included in the json.

dropunused does not include the key in the json. It would be possible to
add, but would need more changes. And I doubt that dropunused --json
would be used in a situation where a program cared which keys were
dropped. Note that drop --unused does have the key in its json, so such
a program could just use it. Or could just dropkey --batch with the
specific keys it wants to drop if it cares about specific keys.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-05 14:01:40 -04:00
Joey Hess
1d4bd2dcb8
migrate, undo: Support --json and --json-error-messages
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 16:34:35 -04:00
Joey Hess
38fc5d3fc7
rekey, setpresentkey: Support --json and --json-error-messages
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 16:03:54 -04:00
Joey Hess
f20c8b087e
fix: Support --json and --json-error-messages
And triaged out some commands that don't need to support these options.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 14:28:21 -04:00
Joey Hess
46c7c30140
log: Support --json and --json-error-messages
Also in passing the --all display was fixed up to not quote keys like filenames.

Note that the check added to compareChanges was needed to avoid logging when
nothing changed.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 12:36:31 -04:00
Joey Hess
f56f6140fa
remote tailing 's' from log --raw-data
log: When --raw-date is used, display only seconds from the epoch, as
documented, omitting a trailing "s" that was included in the output
before.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 11:53:38 -04:00
Joey Hess
c235488e2d
rmurl: Support --json and --json-error-messages
The json does not include an url field, but it does have an input field that is
"file url" when using --batch and ["file", "url"] when using the command line.
I chose not to change that because it would complicate batchInput.
An url field could be added if it turns out to be useful.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-04 11:28:27 -04:00
Joey Hess
6cbcba484c
unannex: Support --json and --json-error-messages
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-03 15:56:20 -04:00
Joey Hess
1beca851ff
back compat fix for info --json on unknown item
This was changed in a0e6fa18eb in a way
that broke datalad, which expected to see a "file" field in the --json.

See 45ddd4b12f
2023-05-01 12:05:21 -04:00
Joey Hess
4d6c918eff
avoid quoting spaces in git-annex find output to terminal
That's too much quoting, the user expects the filename to be copy and
pasteable. It would be ok to slash-escape space ('\ ')
which is what gnu find does, but it doesn't seem necessary either.

${escaped_file} has always quoted spaces though, so keep on doing it
there.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-04-26 00:18:30 -04:00
Joey Hess
be36e208c2
json object for FileNotFound
When a nonexistant file is passed to a command and  --json-error-messages
is enabled, output a JSON object indicating the problem.

(But git ls-files --error-unmatch still displays errors about such files in
some situations.)

I don't like the duplication of the name of the command introduced by this,
but I can't see a great way around it. One way would be to pass the Command
instead.

When json is not enabled, the stderr is unchanged. This is necessary
because some commands like find have custom output. So dislaying
"find foo not found" would be wrong. So had to complicate things with
toplevelFileProblem having different output with and without json.

When not using --json-error-messages but still using --json, it displays
the error to stderr, but does display a json object without the error. It
does have an errorid though. Unsure how useful that behavior is.

Sponsored-by: Dartmouth College's Datalad project
2023-04-25 19:26:20 -04:00
Joey Hess
31e4b6dee1
catch chdir exception in --autostop
assistant --autostop: Avoid crashing when ~/.config/git-annex/autostart
lists a directory that it cannot chdir to.

Sponsored-by: k0ld on Patreon
2023-04-19 12:42:02 -04:00
Joey Hess
9155ed1072
configremote
New command, currently limited to changing autoenable= setting of a special remote.

It will probably never be used for more than that given the limitations on
it.

Sponsored-by: Brock Spratlen on Patreon
2023-04-18 15:30:49 -04:00
Joey Hess
8728695b9c
support enableremote of git repo changing eg autoenable=
enableremote: Support enableremote of a git remote (that was previously set
up with initremote) when additional parameters such as autoenable= are
passed.

The enableremote special case for regular git repos is intended to handle
ones that don't have a UUID probed, and the user wants git-annex to
re-probe. So, that special case is still needed. But, in that special
case, the user is not passing any extra parameters. So, when there are
parameters, instead run the special remote setup code. That requires there
to be a uuid known already, and it allows changing things like autoenable=

Remote.Git.enableRemote changed to be a no-op if a git remote with the name
already exists. Which it generally will in this case.

Sponsored-by: Jack Hill on Patreon
2023-04-18 14:00:24 -04:00
Joey Hess
160d4c9254
whereused: Fix display of branch:file when run in a subdirectory
The file needs to be relative to the top of the repository
in that case, but it was relative to the subdir.

Sponsored-by: Luke Shumaker on Patreon
2023-04-12 15:18:04 -04:00
Joey Hess
275d974120
improve display of relative path to file
When in a subdirectory, and the file is too, it used to display eg
../subdir/thefile and now will display thefile.
2023-04-12 15:11:44 -04:00
Joey Hess
3e5829721f
fix build 2023-04-12 14:31:56 -04:00
Joey Hess
11790df3e6
fix build 2023-04-12 14:18:29 -04:00
Joey Hess
3346aa9659
safe output to terminal for calckey inprogress and lookupkey
These are quite low-level, but still there is no point in displaying
escape sequences that have been embedded in a key to the terminal.

I think these are the only remaining commands that didn't use safe
output, except for cases where git-annex is speaking a protocol to
itself.

Sponsored-by: Kevin Mueller on Patreon
2023-04-12 14:03:44 -04:00
Joey Hess
a576fc3b12
fix mojibake reversion in display of utf8
When displaying a ByteString like "💕", safeOutput operates on
individual bytes like "\240\159\146\149" and isControl '\146' = True,
so it got truncated to just "\240".

So, only treat the low control characters, and DEL, as control
characters.

Also split Utility.Terminal out of Utility.SafeOutput. The latter needs
win32, but Utility.SafeOutput is used by Control.Exception, which is
used by Setup.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-04-12 13:53:30 -04:00
Joey Hess
afa5b883dc
find, findkeys, examinekey: escape output to terminal when --format is not used
Note that filenames are not quoted, only escaped. This is to match the
output of --format with escaping.

Sponsored-by: Lawrence Brogan on Patreon
2023-04-11 15:27:07 -04:00
Joey Hess
df6f9f1ee8
filter out control characters and quote filenames
Searched for uses of putStr and hPutStr and changed appropriate ones to filter
out control characters and quote filenames.

This notably does not make find and findkeys quote filenames in their default
output. Because they should only do that when stdout is non a pipe.

A few commands like calckey and lookupkey seem too low-level to make sense to filter
output, so skipped those.

Also when relaying output from other commands that is not progress output,
have git-annex filter out control characters.

Sponsored-by: k0ld on Patreon
2023-04-11 14:27:22 -04:00
Joey Hess
8b6c7bdbcc
filter out control characters in all other Messages
This does, as a side effect, make long notes in json output not
be indented. The indentation is only needed to offset them
underneath the display of the file they apply to, so that's ok.

Sponsored-by: Brock Spratlen on Patreon
2023-04-11 12:58:01 -04:00
Joey Hess
a0e6fa18eb
eliminate showStart showStartOther
These were not handling control characters and are redundant.

Sponsored-by: Jack Hill on Patreon
2023-04-10 16:28:58 -04:00
Joey Hess
3290a09a70
filter out control characters in warning messages
Converted warning and similar to use StringContainingQuotedPath. Most
warnings are static strings, some do refer to filepaths that need to be
quoted, and others don't need quoting.

Note that, since quote filters out control characters of even
UnquotedString, this makes all warnings safe, even when an attacker
sneaks in a control character in some other way.

When json is being output, no quoting is done, since json gets its own
quoting.

This does, as a side effect, make warning messages in json output not
be indented. The indentation is only needed to offset warning messages
underneath the display of the file they apply to, so that's ok.

Sponsored-by: Brett Eisenberg on Patreon
2023-04-10 15:55:44 -04:00
Joey Hess
cd544e548b
filter out control characters in error messages
giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)

error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.

Changed a lot of existing error to giveup when it is not strictly an
internal error.

Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.

Sponsored-by: Noam Kremen on Patreon
2023-04-10 13:50:51 -04:00
Joey Hess
063c00e4f7
git style filename quoting for giveup
When the filenames are part of the git repository or other files that
might have attacker-controlled names, quote them in error messages.

This is fairly complete, although I didn't do the one in
Utility.DirWatcher.INotify.hs because that doesn't have access to
Git.Filename or Annex.

But it's also quite possible I missed some. And also while scanning for
these, I found giveup used with other things that could be attacker
controlled to contain control characters (eg Keys). So, I'm thinking
it would also be good for giveup to just filter out control characters.
This commit is then not the only line of defence, but just good
formatting when git-annex displays a filename in an error message.

Sponsored-by: Kevin Mueller on Patreon
2023-04-10 12:56:45 -04:00
Joey Hess
da83652c76
addurl --preserve-filename: reject control characters
As well as escape sequences, control characters seem unlikely to be desired when
doing addurl, and likely to trip someone up. So disallow them as well.

I did consider going the other way and allowing filenames with control characters
and escape sequences, since git-annex is in the process of escaping display
of all filenames. Might still be a better idea?

Also display the illegal filename git quoted when it rejects it.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-04-10 12:18:25 -04:00
Joey Hess
2ba1559a8e
git style quoting for ActionItemOther
Added StringContainingQuotedPath, which is used for ActionItemOther.

In the process, checked every ActionItemOther for those containing
filenames, and made them use quoting.

Sponsored-by: Graham Spencer on Patreon
2023-04-08 16:30:01 -04:00
Joey Hess
d689a5b338
git style filename quoting controlled by core.quotePath
This is by no means complete, but escaping filenames in actionItemDesc does
cover most commands.

Note that for ActionItemBranchFilePath, the value is branch:file, and I
choose to only quote the file part (if necessary). I considered quoting the
whole thing. But, branch names cannot contain control characters, and while
they can contain unicode, git coes not quote unicode when displaying branch
names. So, it would be surprising for git-annex to quote unicode in a
branch name.

The find command is the most obvious command that still needs to be
dealt with. There are probably other places that filenames also get
displayed, eg embedded in error messages.

Some other commands use ActionItemOther with a filename, I think that
ActionItemOther should either be pre-sanitized, or should explicitly not
be used for filenames, so that needs more work.

When --json is used, unicode does not get escaped, but control
characters were already escaped in json.

(Key escaping may turn out to be needed, but I'm ignoring that for now.)

Sponsored-by: unqueued on Patreon
2023-04-08 14:52:26 -04:00
Joey Hess
98a3ba0ea5
restore old registerurl location tracking behavior
registerurl: When an url is claimed by a special remote other than the web,
update location tracking for that special remote.

registerurl's behavior was changed in commit
451171b7c1, apparently accidentially to not
update location tracking except for the web.

This makes registerurl followed by unregisterurl not be a no-op, when the
url happens to be claimed by a remote other than the web. It is a noop when
the url is unclaimed except by the web. I don't like the inconsistency,
and wish that registerurl and unregisterurl never updated location
tracking, which would be more in keeping with them being plumbing.

But there is the fact that it used to behave this way, and also it was
inconsistent that it updated location tracking for the web but not for
other remotes, unlike addurl. And there's an argument that the user might
not know what remote to expect to claim an url, so would be considerably in
the dark when using registerurl. (Although they have to know what content
gets downloaded, since they specify a key..)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-04-05 17:06:44 -04:00
Joey Hess
2b940f7725
registerurl, unregisterurl: Added --remote option
This serves two purposes. --remote=web bypasses other special remotes that
claim the url, same as addurl --raw. And, specifying some other remote
allows making sure that an url is claimed by the remote you expect,
which makes then using setpresentkey not be fragile.

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-04-05 15:54:41 -04:00
Joey Hess
24ae4b291c
addurl, importfeed: Fix failure when annex.securehashesonly is set
The temporary URL key used for the download, before the real key is
generated, was blocked by annex.securehashesonly.

Fixed by passing the Backend that will be used for the final key into
runTransfer. When a Backend is provided, have preCheckSecureHashes
check that, rather than the key being transferred.

Sponsored-by: unqueued on Patreon
2023-03-27 15:10:46 -04:00
Joey Hess
cd076cd085
Windows: Support urls like "file:///c:/path"
That is a legal url, but parseUrl parses it to "/c:/path"
which is not a valid path on Windows. So as a workaround, use
parseURIPortable everywhere, which removes the leading slash when
run on windows.

Note that if an url is parsed like this and then serialized back
to a string, it will be different from the input. Which could
potentially be a problem, but is probably not in practice.

An alternative way to do it would be to have an uriPathPortable
that fixes up the path after parsing. But it would be harder to
make sure that is used everywhere, since uriPath is also used
when constructing an URI.

It's also worth noting that System.FilePath.normalize "/c:/path"
yields "c:/path". The reason I didn't use it is that it also
may change "/" to "\" in the path and I wanted to keep the url
changes minimal. Also noticed that convertToWindowsNativeNamespace
handles "/c:/path" the same as "c:/path".

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-03-27 13:38:02 -04:00
Joey Hess
e900e3caf3
avoid build warning on windows 2023-03-27 12:21:40 -04:00
Joey Hess
a0badc5069
sync: Fix parsing of gcrypt::rsync:// urls that use a relative path
Such an url is not valid; parseURI will fail on it. But git-annex doesn't
actually need to parse the url, because all it needs to do to support
syncing with it is know that it's not a local path, and use git pull and
push.

(Note that there is no good reason for the user to use such an url. An
absolute url is valid and I patched git-remote-gcrypt to support them
years ago. Still, users gonna do anything that tools allow, and
git-remote-gcrypt still supports them.)

Sponsored-by: Jack Hill on Patreon
2023-03-23 15:20:00 -04:00
Joey Hess
e822df2a09
fix build warnings on windows 2023-03-21 18:41:23 -04:00
Yaroslav Halchenko
84b0a3707a
Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
Yaroslav Halchenko
e018ae1125
Fix ambigous typos 2023-03-17 15:14:47 -04:00
Joey Hess
f1b678face
copy --from --to location tracking update
copy: When --from and --to are combined and the content is already present
on the destination remote, update location tracking as necessary.

Sponsored-by: Dartmouth College's DANDI project
2023-03-13 14:51:09 -04:00
Joey Hess
2323af3736
importfeed: Display feed title
When importing a bunch of feeds, this makes it more clear what it's working
on. Also, I sometimes want to delete a particular feed from a list of feeds
but don't know which url belongs to the feed, and this solves that.

Control characters are filtered out just to protect against some feed
putting escape character stuff in the feed, which could be a
security problem. (Control characters also get filtered out of
importfeed filenames.)

Sponsored-by: Luke Shumaker on Patreon
2023-03-11 13:52:45 -04:00
Joey Hess
ff141c093e
include subdir when checking export branch is checked out
sync: Fix a reversion that prevented sending files to exporttree=yes
remotes when annex-tracking-branch was configured to branch:subdir
(Introduced in version 10.20230214)

Sponsored-by: Kevin Mueller on Patreon
2023-03-10 11:41:52 -04:00
Joey Hess
54ad1b4cfb
Windows: Support long filenames in more (possibly all) of the code
Works around this bug in unix-compat:
https://github.com/jacobstanley/unix-compat/issues/56
getFileStatus and other FilePath using functions in unix-compat do not do
UNC conversion on Windows.

Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the
necessary conversion on windows to support long filenames.

Audited all imports of System.PosixCompat.Files to make sure that no
functions that operate on FilePath were imported from it. Instead, use
the equvilants from Utility.RawFilePath. In particular the
re-export of that module in Common had to be removed, which led to lots
of other changes throughout the code.

The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig
make Utility.Directory not be needed to build setup. And so let it use
Utility.RawFilePath, which depends on unix, which cannot be in
setup-depends.

Sponsored-by: Dartmouth College's Datalad project
2023-03-01 15:55:58 -04:00
Joey Hess
9c3c4c1712
deprecate git-annex status w/o runtime warning
As far as I can see, git-annex status was added to support direct mode, and
like other things added for that, it ought to be deprecated.

Behavior is similar to git status --short, though not identical in a few
cases eg renamed files.

I think datalad does not use this command, although it might have in the
past. Could not find any use of it in the current datalad code.

A deprecation warning at runtime would be the next step, probably will wait
and do that for all the deprecated commands together (except findref).
2023-02-28 16:34:31 -04:00
Joey Hess
80478cc145
support git-annex view in an adjusted branch
Rather than entering a view of the adjusted branch, enter an adjusted
view branch. This way, it's the same as first using git-amnnex view
followed by git-annex adjust, and everything already implemented to
support that works.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-02-27 15:48:58 -04:00
Joey Hess
1c4f4b449a
support --unlock-present adjustment of view branches
When generating the view, check if the key is present.

When syncing in a view branch with an adjustment, run adjustedBranchRefreshFull
the same as is done when syncing in other adjusted branches. This is
needed because the docs for git-annex adjust --unlock-present suggest
using git-annex sync to update the branch when annex.adjustedbranchrefresh
is not set.

Note that, with annex.adjustedbranchrefresh set, it just works! The
adjusted branch gets updated in the usual way and it doesn't matter that
there's a view branch underneath.

And of course, re-running git-annex adjut --unlock-present also works,
as suggested in the docs.

Sponsored-by: Erik Bjäreholt on Patreon
2023-02-27 15:37:57 -04:00
Joey Hess
f09e299156
rawfilepath conversion 2023-02-27 15:06:32 -04:00
Joey Hess
cc32e31161
understand adjusted view branch names
An adjusted view branch has a name like
"refs/heads/adjusted/views/master(author=_)(unlocked)", so it is a view
branch that has been converted to an adjusted branch.

Made Logs.View support such branch names. So now git-annex sync and
pre-commit handle updating metadata on commit in such a branch.

Much remains to be done to fully support adjusted view branches,
including actually applying the adjustment when updating the view branch.

Sponsored-by: Graham Spencer on Patreon
2023-02-27 14:57:58 -04:00
Joey Hess
16d3097a08
fix reversion in info, and add test case
info: Fix reversion in last release involving handling of unsupported input
by continuing to handle any other inputs, before exiting nonzero at the
end.

Sponsored-by: Dartmouth College's Datalad project
2023-02-20 14:31:37 -04:00
Joey Hess
452b080dba
better handling of multiple repositories with the same name
Used to fail with a bad error message, indicating there was no
repository with the specified name, or something like that. Now, suggest
they use the uuid to disambiguate.

* info, enableremotemote, renameremote: Avoid a confusing message when more
  than one repository matches the user provided name.
* info: Exit nonzero when the input is not supported.

Sponsored-by: Kevin Mueller on Patreon
2023-02-13 14:31:09 -04:00
Joey Hess
e9b6efac5a
fix buggy sync to exporttree remote when annex-tracking-branch is not checked out
sync: Fix a bug that caused files to be removed from an importtree=yes
exporttree=yes special remote when the remote's annex-tracking-branch was
not the currently checked out branch.

Sponsored-by: Max Thoursie on Patreon
2023-02-10 15:49:15 -04:00
Joey Hess
c2b3e870df
finishing up sync in view branch
sync: When run in a view branch, avoid updating synced/ branches, or trying
to merge anything from remotes.

Sponsored-by: Erik Bjäreholt on Patreon
2023-02-10 15:27:42 -04:00
Joey Hess
5f9bf51438
sync in view branch updates the view branch
* sync: When run in a view branch, refresh the view branch to reflect any
    changes that have been made to the parent branch or metadata.

This is basically working, but probably needs some more work to deal with
all the edge cases of things sync does.

Sponsored-by: Lawrence Brogan on Patreon
2023-02-08 15:37:28 -04:00
Joey Hess
a11d6e0baf
avoid sync pushing view branches to remotes, and better view branch names
* sync: Avoid pushing view branches to remotes.
* Changed the name of view branches to include the parent branch.
  Existing view branches checked out using an old name will still work.

It does not seem useful for sync to push view branches around, because
the information in a view branch can entirely be derived from other
information in git. And sync doesn't push adjusted branches around either.

The better view branch names make it more in line with adjusted branch
names, but were also needed to make fromViewBranch be able to return the
original branch name.

Kept the old view branch names still working. But, when those branches
exist in a repo, sync will still try to push them as before. Avoiding
that would need more complicated and/or expensive changes to sync.

Sponsored-By: Boyd Stephen Smith Jr. on Patreon
2023-02-08 13:57:48 -04:00
Joey Hess
c209e0f643
add FIELD?=GLOB to git-annex view usage
And also to vadd usage.

Also added some other things to the usage that were omitted before to
save space.

Adding even FIELD?=GLOB made the git-annex --help list of commands grow
too wide for an 80 column display. So, removed the description of
parameters from that list of commands.

Sponsored-By: Brock Spratlen on Patreon
2023-02-07 18:09:10 -04:00
Joey Hess
aa0350ff49
add directory to views for files that lack specified metadata
* view: New field?=glob and ?tag syntax that includes a directory "_"
  in the view for files that do not have the specified metadata set.
* Added annex.viewunsetdirectory git config to change the name of the
  "_" directory in a view.

When in a view using the new syntax, old git-annex will fail to parse the
view log. It errors with "Not in a view.", which is not ideal. But that
only affects view commands.

annex.viewunsetdirectory is included in the View for a couple of reasons.
One is to avoid needing to warn the user that it should not be changed when
in a view, since that would confuse git-annex. Another reason is that it
helped with plumbing the value through to some pure functions.

annex.viewunsetdirectory is actually mangled the same as any other view
directory. So if it's configured to something like "N/A", there won't be
multiple levels of directories, which would also confuse git-annex.

Sponsored-By: Jack Hill on Patreon
2023-02-07 16:28:46 -04:00
Joey Hess
152be2948b
use transfer stages for copy --from
See commit e04a931439 for an explanation
of why move uses transfer stages for --from, but command stages for
--to. At the point of that commit, copy was actually already using
command stages for everything, so the commit was incorrect about
improving copy --to.

But, the same reasoning about --from applies to copy as to move; when
verification is not done incrementally, download and verification are
the main two stages. The cleanup stage for copy is even less work than
for move (it doesn't drop from the remote).

Sponsored-by: Dartmouth College's DANDI project
2023-01-24 14:07:49 -04:00
Joey Hess
579d9b60c1
improve concurrency of move/copy --from --to
Use separate stages for download and upload. In the common case where
it downloads the file from one remote and then uploads to the other,
those are by far the most expensive operations, and there's a decent
chance the two remotes bottleneck on different resources.

Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads
will be started both downloading from the src remote. They will probably
finish at the same time. Then two threads will be started uploading to
the dst remote. They will probably take the same time as well. Before
this change, it would alternate back and forth, bottlenecking on src and dst.
With this change, as soon as the two threads start uploading to dst, two
more threads are able to start, downloading from src. So bandwidth to
both remotes is saturated more often.

Other commands that use transferStages only send in one direction at a
time. So the worker threads for the other direction will sit idle, and
there will be no change in their behavior.

Sponsored-by: Dartmouth College's DANDI project
2023-01-24 13:59:39 -04:00
Joey Hess
77266e46dd
fix behavior of copy --from --to
Sponsored-by: Dartmouth College's DANDI project
2023-01-23 17:55:16 -04:00
Joey Hess
acc3f6211f
finishing up move --from --to
Lock the local content for drop after getting it from src, to prevent another
process from using the local content as a copy and dropping it from src,
which would prevent dropping the local content after sending it to dest.

Support resuming an interrupted move that downloaded the content from
src, leaving the local content populated. In this case, the location log
has not been updated to say the content is present locally, so we can
assume that it's resuming and go ahead and drop the local content after
sending it to dest.

Note that if a `git-annex get` is being ran at the same time as a
`git-annex move --from --to`, it may get a file just before the move
processes it. So the location log has not been updated yet, and the move
thinks it's resuming. Resulting in local copy being dropped after it's
sent to the dest. This race is something we'll just have to live with,
it seems.

I also gave up on the idea of checking if the location log had been updated
by a `git-annex get` that is ran at the same time. That wouldn't work, because
the location log is precached in the seek stage, so reading it again after
sending the content to dest would not notice changes made to it, unless the cache
were invalidated, which would slow it down a lot. That idea anyway was subject
to races where it would not detect the concurrent `git-annex get`.

So concurrent `git-annex get` will have results that may be surprising.
To make that less surprising, updated the documentation of this feature to
be explicit that it downloads content to the local repository
temporarily.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 17:43:48 -04:00
Joey Hess
f5f799f17e
fully working move --from --to (not release quality)
When the destination already has a copy, it behaves the same as
drop --from really, but display it as a move and implement it
reusing the factored out code from fromPerform.

(Note that willDropMakeItWorse never returns DropAllowed in that
situation, because it's told that dest has a copy. So numcopies is
always checked.)

And when only the source and not the local repo or destination have a
copy, do the full copy from source to local, then copy from local to
dest, then drop from local, then drop from source dance.

This is complicated by fromPerform being hardcoded to assume there is a
local copy, but the local copy has already been dropped. That's why
it uses cleanupfromsrc RemoveNever to avoid the code that makes that
assumption, and finishes with a call to dropfromsrc.

And, since the location log has not yet been updated, checking numcopies
was not working, until I added UnVerifiedRemote dest to the list of
things to check.

This is not yet quite mergeable though. There are two things in the
comment above fromToPerform that are not implemented yet: Checking the
location log before dropping the local copy, and locking the temporary
local copy for drop.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 16:12:33 -04:00
Joey Hess
1abd457e98
push location log updating up to callers of download
Prep for move --to --from, which needs to download from a src repo
without updating the location log for the local repo, before sending the
content on to the dest repo.

Note that caller of download' already update the log themselves.
See previous commit a422a056f2
that pushed it up to download from getViaTmpFrom.

(Also removed in passing a debug print + readline that I accidentially
committed last week on this branch.)

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 13:47:41 -04:00
Joey Hess
8c349b8802
implement move --from --to when there is a local copy already
This is rather trivial, since it does not need to temporarily get the
local copy.

Added fromPerform' to handle the situation where the local copy
is dropped by another process during the copy to the dest. This avoids
ever re-downloading the local copy before dropping from the src.

Sponsored-by: Dartmouth College's DANDI project
2023-01-23 13:17:35 -04:00
Joey Hess
a46c385aec
move/copy: started implementing --from src --to dest
This is not in a usable state, but I have a possible plan for how to do
it.

Sponsored-by: Dartmouth College's DANDI project
2023-01-20 11:10:38 -04:00
Joey Hess
a6c1d9752b
move/copy: option parsing for --from with --to
Allowing --from and --to as an alternative to --from or --to
is hard to do with optparse-applicative!

The obvious approach of (pfrom <|> pto <|> pfromandto) does not work
when pfromandto uses the same option names as pfrom and pto do.
It compiles but the generated parser does not work for all desired
combinations.

Instead, have to parse optionally from and optionally to. When neither
is provided, the parser succeeds, but it's a result that can't be
handled. So, have to giveup after option parsing. There does not seem to
be a way to make an optparse-applicative Parser give up internally
either.

Also, need seek' because I first tried making fto be a where binding,
but that resulted in a hang when git-annex move was run without --from
or --to. I think because startConcurrency was not expecting the stages
value to contain an exception and so ended up blocking.

Sponsored-by: Dartmouth College's DANDI project
2023-01-18 14:42:39 -04:00
Joey Hess
f8bc208e89
findkeys: New command, very similar to git-annex find but operating on keys
I've long been asked for `git-annex find --all` or something like that,
but pushed back on it because I feel that the command is analagous to
find(1) and so it would be surprising for it to list keys rather than
files. So instead, add a new findkeys subcommand.

Note that the use of withKeyOptions is rather strange because usually
that is used to fall back to --all rather than listing files, but here
it's made to default to --all like behavior and never list files.

A performance thing that could be improved is that withKeyOptions
always reads and caches location logs. But findkeys with no options does
not need them, so it could be made faster. That caching does speed up
options like --in though. This is really just a subset of a more general
performance thing that --all reads location logs sometimes unncessarily.
Anyway, it needs to read the location log in order to checkDead,
and it seems good that findkeys does skip dead keys.

Also, cleaned up comments on git-annex-find man page asking for --all
option.

Sponsored-by: Dartmouth College's DANDI project
2023-01-17 14:51:57 -04:00
Joey Hess
9d60385001
convert renameFile to moveFile to support cross-device moves
Improve handling of some .git/annex/ subdirectories being on other
filesystems, in the bittorrent special remote, and youtube-dl integration,
and git-annex addurl.

The only one of these that I've confirmed to be a problem is in the
bittorrent special remote when .git/annex/tmp and .git/annex/othertmp are
on different filesystems.

As well as auditing for renameFile, also audited for createLink, all of
those are ok as are the other remaining renameFile calls. Also audited all
code paths that use .git/annex/othertmp, and did not find any other
cross-device problems. So, removing mention of othertmp needing to be on
the same device.

Sponsored-by: Dartmouth College's Datalad project
2022-12-20 15:17:50 -04:00
Joey Hess
2b014f1a8b
don't frontload reconcileStaged in git-annex init
init: Avoid scanning for annexed files, which can be lengthy in a
large repository. Instead that scan is done on demand. This lets git-annex
init be run and some query commands be used in a repository without
waiting.

Note that autoinit already behaved this way, so while this will mean some
commands like git-annex get/unlock/add will do the scan the first time run,
that is not really a significant behavior change.

And, it's really better to have a consistent behavior. The reason for
the inconsistency was a strange bug discussed in
b3c4579c79. Avoiding reconcileStaged in
init will keep avoiding whatever that was.

Sponsored-by: Dartmouth College's DANDI project
2022-11-18 13:58:47 -04:00
Joey Hess
b2cc63d5bf
export: fix multi-file delete bug
export: Fix a bug that left a file on a special remote when two files with
the same content were both deleted in the exported tree.

Case of the wrong data structure leading to the wrong result.
The DiffMap now contains all the old filenames, and all the new filenames.

Note that, when 2 files with the same content are both renamed,
it only renames the first, but deletes and re-exports the second.
Improving that is possible, but it would need to use a different temporary
filename. Anyway, that is an unusual case, and there are known to be other
unusual cases where export does not rename with maximum efficiency, IIRC.
(Or maybe this is the case that I remember?)

Sponsored-by: Dartmouth College's OpenNeuro project
2022-11-09 16:24:37 -04:00
Joey Hess
14f7a386f0
Make git-annex enable-tor work when using the linux standalone build
Clean the standalone environment before running the su command
to run "sh". Otherwise, PATH leaked through, causing it to run
git-annex.linux/bin/sh, but GIT_ANNEX_DIR was not set,
which caused that script to not work:

[2022-10-26 15:07:02.145466106] (Utility.Process) process [938146] call: pkexec ["sh","-c","cd '/home/joey/tmp/git-annex.linux/r' && '/home/joey/tmp/git-annex.linux/git-annex' 'enable-tor' '1000'"]
/home/joey/tmp/git-annex.linux/bin/sh: 4: exec: /exe/sh: not found

Changed programPath to not use GIT_ANNEX_PROGRAMPATH,
but instead run the scripts at the top of GIT_ANNEX_DIR.
That works both when the standalone environment is set up, and when it's
not.

Sponsored-by: Kevin Mueller on Patreon
2022-10-26 15:45:08 -04:00
Joey Hess
731e806c96
use lookupKeyStaged in --batch code paths
Make --batch mode handle unstaged annexed files consistently whether the
file is unlocked or not. Before this, a unstaged locked file
would have the symlink on disk examined and operated on in --batch mode,
while an unstaged unlocked file would be skipped.

Note that, when not in batch mode, unstaged files are skipped over too.
That is actually somewhat new behavior; as late as 7.20191114 a
command like `git-annex whereis .` would operate on unstaged locked
files and skip over unstaged unlocked files. That changed during
optimisation of CmdLine.Seek with apparently little fanfare or notice.

Turns out that rmurl still behaved that way when given an unstaged file
on the command line. It was changed to use lookupKeyStaged to
handle its --batch mode. That also affected its non-batch mode, but
since that's just catching up to the change earlier made to most
other commands, I have not mentioed that in the changelog.

It may be that other uses of lookupKey should also change to
lookupKeyStaged. But it may also be that would slow down some things,
or lead to unwanted behavior changes, so I've kept the changes minimal
for now.

An example of a place where the use of lookupKey is better than
lookupKeyStaged is in Command.AddUrl, where it looks to see if the file
already exists, and adds the url to the file when so. It does not matter
there whether the file is staged or not (when it's locked). The use of
lookupKey in Command.Unused likewise seems good (and faster).

Sponsored-by: Nicholas Golder-Manning on Patreon
2022-10-26 14:43:06 -04:00