Commit graph

630 commits

Author SHA1 Message Date
Joey Hess
7f6b2ca49c
handle overBranchFileContents with read-only unmerged git-annex branches
This makes --all error out in that situation. Which is better than
ignoring information from the branches.

To really handle the branches right, overBranchFileContents would need
to both query all the branches and union merge file contents
(or perhaps not provide any file content), as well as diffing between
branches to find files that are only present in the unmerged branches.
And also, it would need to handle transitions..

Sponsored-by: Dartmouth College's Datalad project
2021-12-27 14:30:51 -04:00
Joey Hess
68257e9076
add git-annex filter-process
filter-process: New command that can make git add/checkout faster when
there are a lot of unlocked annexed files or non-annexed files, but that
also makes git add of large annexed files slower.

Use it by running: git
config filter.annex.process 'git-annex filter-process'

Fully tested and working, but I have not benchmarked it at all.
And, incremental hashing is not done when git add uses it, so extra work is
done in that case.

Sponsored-by: Mark Reidenbach on Patreon
2021-11-04 15:02:36 -04:00
Joey Hess
e8496d62e4
improved bwrate limiting implementation
New method is much better. Avoids unrestrained transfer at the beginning
(except for the first block. Keeps right at or a few kb/s below the
configured limit, with very little varation in the actual reported bandwidth.

Removed the /s part of the config as it's not needed.

Ready to merge.

Sponsored-by: Luke Shumaker on Patreon
2021-09-22 15:27:16 -04:00
Joey Hess
18e00500ce
bwlimit
Added annex.bwlimit and remote.name.annex-bwlimit config that works for git
remotes and many but not all special remotes.

This nearly works, at least for a git remote on the same disk. With it set
to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with
occasional spikes to 160 kb/s. So it needs to delay just a bit longer...
I'm unsure why.

However, at the beginning a lot of data flows before it determines the
right bandwidth limit. A granularity of less than 1s would probably improve
that.

And, I don't know yet if it makes sense to have it be 100ks/1s rather than
100kb/s. Is there a situation where the user would want a larger
granularity? Does granulatity need to be configurable at all? I only used that
format for the config really in order to reuse an existing parser.

This can't support for external special remotes, or for ones that
themselves shell out to an external command. (Well, it could, but it
would involve pausing and resuming the child process tree, which seems
very hard to implement and very strange besides.) There could also be some
built-in special remotes that it still doesn't work for, due to them not
having a progress meter whose displays blocks the bandwidth using thread.
But I don't think there are actually any that run a separate thread for
downloads than the thread that displays the progress meter.

Sponsored-by: Graham Spencer on Patreon
2021-09-21 16:58:10 -04:00
Joey Hess
6d4a728455
Added annex.youtube-dl-command config
This can be used to run some forks of youtube-dl.

Sponsored-by: Brett Eisenberg on Patreon
2021-08-27 09:44:23 -04:00
Joey Hess
1acdd18ea8
deal better with clock skew situations, using vector clocks
* Deal with clock skew, both forwards and backwards, when logging
  information to the git-annex branch.
* GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1)
  rather than needing to be advanced each time a new change is made.
* Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex.

When changing a file in the git-annex branch, the vector clock to use is now
determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK
when set), and comparing it to the newest vector clock already in use in
that file. If a newer time stamp was already in use, advance it forward by
a second instead.

When the clock is set to a time in the past, this avoids logging with
an old timestamp, which would risk that log line later being ignored in favor
of "newer" line that is really not newer.

When a log entry has been made with a clock that was set far ahead in the
future, this avoids newer information being logged with an older timestamp
and so being ignored in favor of that future-timestamped information.
Once all clocks get fixed, this will result in the vector clocks being
incremented, until finally enough time has passed that time gets back ahead
of the vector clock value, and then it will return to usual operation.

(This latter situation is not ideal, but it seems the best that can be done.
The issue with it is, since all writers will be incrementing the last
vector clock they saw, there's no way to tell when one writer made a write
significantly later in time than another, so the earlier write might
arbitrarily be picked when merging. This problem is why git-annex uses
timestamps in the first place, rather than pure vector clocks.)

Advancing forward by 1 second is somewhat arbitrary. setDead
advances a timestamp by just 1 picosecond, and the vector clock could
too. But then it would interfere with setDead, which wants to be
overrulled by any change. So it could use 2 picoseconds or something,
but that seems weird. It could just as well advance it forward by a
minute or whatever, but then it would be harder for real time to catch
up with the vector clock when forward clock slew had happened.

A complication is that many log files contain several different peices of
information, and it may be best to only use vector clocks for the same peice
of information. For example, a key's location log file contains
InfoPresent/InfoMissing for each UUID, and it only looks at the vector
clocks for the UUID that is being changed, and not other UUIDs.

Although exactly where the dividing line is can be hard to determine.
Consider metadata logs, where a field "tag" can have multiple values set
at different times. Should it advance forward past the last tag?
Probably. What about when a different field is set, should it look at
the clocks of other fields? Perhaps not, but currently it does, and
this does not seems like it will cause any problems.

Another one I'm not entirely sure about is the export log, which is
keyed by (fromuuid, touuid). So if multiple repos are exporting to the
same remote, different vector clocks can be used for that remote.
It looks like that's probably ok, because it does not try to determine
what order things occurred when there was an export conflict.

Sponsored-by: Jochen Bartl on Patreon
2021-08-04 12:33:46 -04:00
Joey Hess
26e6def0ef
wording 2021-07-14 17:08:38 -04:00
Joey Hess
47d3dccf19
whereused implemented
except --historical

Sponsored-by: Jack Hill on Patreon
2021-07-14 14:27:21 -04:00
Joey Hess
980613ec8c
man page for git-annex whereused
And an implementation strategy.

Sponsored-by: Brett Eisenberg on Patreon
2021-07-14 13:43:45 -04:00
Joey Hess
8885bd3c5b
addistant: honor annex.delayadd for non-large files
assistant: When adding non-large files to git, honor annex.delayadd
configuration.

Also, don't add non-large files to git when they are still
being written to. This came for free, since the changes to non-large
files get queued up with the ones to large files, and run through the lsof
check.

Sponsored-by: Luke Shumaker on Patreon
2021-07-13 12:17:00 -04:00
Joey Hess
4b1b9d7a83
Added annex.freezecontent-command and annex.thawcontent-command configs
Freeze first sets the file perms, and then runs
freezecontent-command. Thaw runs thawcontent-command before
restoring file permissions. This is in case the freeze command
prevents changing file perms, as eg setting a file immutable does.
Also, changing file perms tends to mess up previously set ACLs.

git-annex init's probe for crippled filesystem uses them, so if file perms
don't work, but freezecontent-command manages to prevent write to a file,
it won't treat the filesystem as crippled.

When the the filesystem has been probed as crippled, the hooks are not
used, because there seems to be no point then; git-annex won't be relying
on locking annex objects down. Also, this avoids them being run when the
file perms have not been changed, in case they somehow rely on
git-annex's setting of the file perms in order to work.

Sponsored-by: Dartmouth College's Datalad project
2021-06-21 14:40:52 -04:00
Joey Hess
189fb05ffb
Added annex.adviceNoSshCaching config.
Sponsored-by: Brock Spratlen on Patreon
2021-05-27 12:37:49 -04:00
Joey Hess
84366fa2d0
fix by improving docs 2021-05-19 11:13:53 -04:00
Joey Hess
984034f335
filter-branch working aside from some edge cases
Added a note to man page about what happens to information that is
recorded in the private journal. Since it uses Branch.get, that
information will be copied when options allow. It seemed better to allow
it and document it than not allow it, since the options allow excluding
repositories and so can be used to exclude private repos if desired.
2021-05-17 13:24:58 -04:00
Joey Hess
a71c002ac1
git-annex-filter-branch man page 2021-05-13 16:17:45 -04:00
Joey Hess
b184fc490a
split out common options to its own page and mention it on each subcommand page
Sometimes users would get confused because an option they were looking
for was not mentioned on a subcommand's man page, and they had not
noticed that the main git-annex man page had a list of common options.
This change lets each subcommand mention the common options, similarly
to how the matching options are handled.

This commit was sponsored by Svenne Krap on Patreon.
2021-05-10 15:00:13 -04:00
Joey Hess
1c05194430
fix pasto 2021-04-27 13:14:24 -04:00
Joey Hess
141a4e9750
consistent name for config
annex.private is a bit ambiguous about what kind of privacy, but
it keeps the name symmetric with remote.foo.annex-private.
2021-04-23 14:39:57 -04:00
Joey Hess
affdffb323
docs 2021-04-21 17:01:03 -04:00
Kyle Meyer
876b134b54 doc/git-annex.mdwn: Fix quoting of annex.supportunlocked 2021-04-21 12:36:34 -04:00
Joey Hess
1b645e1ace
added --debugfilter (and annex.debugfilter) 2021-04-05 15:31:10 -04:00
Joey Hess
798f685077
New annex.supportunlocked config
Can beet to false to avoid some expensive things needed to support unlocked
files.

See my comment for why this only controls what init sets up, and not other
behavior.

I didn't bother with making the v5 upgrade code path look at this, though
it easily could, because the docs say to run git-annex init after setting
it to make it take effect.
2021-03-23 14:04:34 -04:00
Joey Hess
eb594c710e
unregisterurl: New command
Implemented by generalizing registerurl. Without the implicit batch mode
of registerurl since that is only a backwards compatability thing
(see commit 1d1054faa6).
2021-03-01 14:28:24 -04:00
Joey Hess
dd39e9e255
suggest when user may want annex.stalldetection
When annex.stalldetection is not enabled, and a likely stall is detected,
display a suggestion to enable it.

Note that the progress meter display is not taken down when displaying
the message, so it will display like this:

	0%    8 B                 0 B/s
	  Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection
	0%    10 B                0 B/s

Although of course if it's really stalled, it will never update
again after the message. Taking down the progress meter and starting
a new one doesn't seem too necessary given how unusual this is,
also this does help show the state it was at when it stalled.

Use of uninterruptibleCancel here is ok, the thread it's canceling
only does STM transactions and sleeps. The annex thread that gets
forked off is separate to avoid it being canceled, so that it
can be joined back at the end.

A module cycle required moving from dupState the precaching of the
remote list. Doing it at startConcurrency should cover all the cases
where the remote list is used in concurrent actions.

This commit was sponsored by Kevin Mueller on Patreon.
2021-02-03 15:57:19 -04:00
Joey Hess
135757d64a
automatic stall detection
annex.stalldetection can now be set to "true" to make git-annex do
automatic stall detection when it detects a remote is updating its transfer
progress consistently enough.

This commit was sponsored by Luke Shumaker on Patreon.
2021-02-03 13:33:57 -04:00
Joey Hess
aec2cf0abe
addon commands
Seems only fair, that, like git runs git-annex, git-annex runs
git-annex-foo.

Implementation relies on O.forwardOptions, so that any options are passed
through to the addon program. Note that this includes options before the
subcommand, eg: git-annex -cx=y foo

Unfortunately, git-annex eats the --help/-h options.
This is because it uses O.hsubparser, which injects that option into each
subcommand. Seems like this should be possible to avoid somehow, to let
commands display their own --help, instead of the dummy one git-annex
displays.

The two step searching mirrors how git works, it makes finding
git-annex-foo fast when "git annex foo" is run, but will also support fuzzy
matching, once findAllAddonCommands gets implemented.

This commit was sponsored by Dr. Land Raider on Patreon.
2021-02-02 16:32:49 -04:00
Joey Hess
c8b1fa67b4
Behavior change: --trust-glacier option no longer overrides trust
Since that can lead to data loss, which should never be enabled by an
option other than --force.

This commit was sponsored by Jake Vosloo on Patreon.
2021-01-07 10:37:43 -04:00
Joey Hess
2bf34fc17f
Behavior change: --trust option no longer overrides trust
Since that can lead to data loss, which should never be enabled by an
option other than --force.

I suppose that using --trust was in some situation, safer than --force,
because it doesn't entirely disable checking for data loss, but only
disables checking involving data that is on the specified repository.
But it seems better to be able to say that data loss only happens with
--force.

This commit was sponsored by Graham Spencer on Patreon.
2021-01-07 10:34:57 -04:00
Joey Hess
cc89699457
mincopies
This is conceptually very simple, just making a 1 that was hard coded be
exposed as a config option. The hard part was plumbing all that, and
dealing with complexities like reading it from git attributes at the
same time that numcopies is read.

Behavior change: When numcopies is set to 0, git-annex used to drop
content without requiring any copies. Now to get that (highly unsafe)
behavior, mincopies also needs to be set to 0. It seemed better to
remove that edge case, than complicate mincopies by ignoring it when
numcopies is 0.

This commit was sponsored by Denis Dzyubenko on Patreon.
2021-01-06 14:15:19 -04:00
Joey Hess
0b2b666a38
fix bad annex.largefiles example syntax 2021-01-04 13:11:05 -04:00
Joey Hess
3207e8293b
start borg special remote
Compiles, but unusable so far.
2020-12-18 16:03:51 -04:00
Joey Hess
00352ebe37
man page improvement 2020-12-17 12:17:58 -04:00
Joey Hess
714e665c92
reorg 2020-12-10 11:25:02 -04:00
Joey Hess
677003a6df
rename helper
More consistent name with TransferrerPool
2020-12-09 13:24:24 -04:00
Joey Hess
05c0543e8e
move new interface to git-annex transfer
This is to avoid breakage when upgrading or downgrading git-annex with a
process running that uses the interface. It's better to keep the
compatability code for a few years than worry about such breakage.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-12-09 12:33:56 -04:00
Joey Hess
41f2c308ff
stall detection is working
New config annex.stalldetection, remote.name.annex-stalldetection, which
can be used to deal with remotes that stall during transfers, or are
sometimes too slow to want to use.

This commit was sponsored by Luke Shumaker on Patreon.
2020-12-08 15:22:18 -04:00
Joey Hess
2cb3f3a99d
annex.stalldetection docs
Not implemented yet.

This commit was sponsored by Svenne Krap on Patreon.
2020-12-07 16:55:24 -04:00
Joey Hess
557a6e11a6
avoid spurious blank line when updating adjusted branch
git checkout run with --quiet should have no output
2020-11-16 14:41:38 -04:00
Joey Hess
0896038ba7
annex.adjustedbranchrefresh
Added annex.adjustedbranchrefresh git config to update adjusted branches
set up by git-annex adjust --unlock-present/--hide-missing.

Note, in a few cases, I was not able to make the adjusted branch
be updated in calls to moveAnnex, because information about what
file corresponds to a key is not available. They are:

* If two files point to one file, then eg, `git annex get foo` will
  update the branch to unlock foo, but will not unlock bar, because it
  does not know about it. Might be fixable by making `git annex get
  bar` do something besides skipping bar?
* git-annex-shell recvkey likewise (so sends over ssh from old versions
  of git-annex)
* git-annex setkey
* git-annex transferkey if the user does not use --file
* git-annex multicast sends keys with no associated file info

Doing a single full refresh at the end, after any incremental refresh,
will deal with those edge cases.
2020-11-16 14:27:28 -04:00
Joey Hess
877ef84a1b
support --batch -J
--batch combined with -J now runs batch requests concurrently for many
commands. Before, the combination was accepted, but did not enable
concurrency. Since the output of batch requests can be in any order, --json
with the new "input" field is recommended to be used, to determine which
batch request each response corresponds to.

If --json is not used, batch mode still runs concurrently, using the usual
concurrent-output. That will not be very useful for most batch mode users,
probably, but who knows.

If a program was using --batch -J before, and was parsing non-json output,
this could break it. But, it was relying on git-annex not supporting
concurrency despite it being enabled, so it should have expected concurrent
output. So, I think that's ok.

annex.jobs does not enable concurrency in --batch mode, because that would
confuse programs that use --batch but don't expect concurrency.
2020-09-16 12:10:37 -04:00
Joey Hess
e36bae74da
Exposed annex.forward-retry git config
One reason is, 5 is an arbitrary number so ought to be configurable.

The real reason though, is I wanted to make the man page explain when
forward retry can override annex.retry, and having a config made the
man page easier to write.
2020-09-04 15:16:40 -04:00
Joey Hess
8656afd3e1
rename http special remote to httpalso
"http" was too generic and easy to confuse with web. The new name makes
clear it's used in addition to some other remote. And other protocols
can use the same naming scheme.
2020-09-02 10:41:53 -04:00
Joey Hess
571ec900ac
Added http special remote, which is useful for accessing other remotes that publish content stored in them via http/https.
With automatic layout learning!
2020-09-01 15:16:35 -04:00
Joey Hess
b68f214312
Display a message when git-annex has to wait for a pid lock file held by another process 2020-08-26 13:05:34 -04:00
Joey Hess
d3f3e8f587
document that annex.thin beats annex.hardlink 2020-08-25 13:34:37 -04:00
Joey Hess
85506a7015
import: Added --no-content option, which avoids downloading files from a special remote
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.

git-annex sync --no-content does not yet use this to do imports
2020-07-03 13:41:57 -04:00
Joey Hess
89b2542d3c
annex.skipunknown with transition plan
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.

Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.

annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
2020-05-28 15:55:17 -04:00
Joey Hess
959ae7733a
man pages improvements
Added some examples. Tightened up some language and removed some
unncessary duplicate documentaton.
2020-05-12 09:07:45 -04:00
Joey Hess
9f3c2dfeda
stop using remote.name.annex-readonly for two distinct things 2020-04-23 14:56:03 -04:00
Joey Hess
503abb6d54
remove warning about git gc for annex.alwayscommit=false
I doubt that warning has ever been right, but I'm sure it is not right
now.

For there to be a risk of git gc deleting objects that are in the annex
index, journal files would have to be staged into it, and deleted, but
the index not committed to the git-annex branch. And AFAICS, there is no
code path where that actually happens. I considered adding one recently,
but didn't.

The way it actually works is, as long as the user has annex.alwayscommit=false
the data lives in the journal, where it's safe from git gc. Then when
git-annex is run w/o that config, the journal is staged into the index,
which is immediately committed to the branch. There's no window where
git gc could delete the objects, because git gc only deletes objects
after some time (2 weeks by default).

Now, if git-annex gets suspended at just the wrong time, or interrupted,
then yes, it's possible. But doesn't matter whether that config was ever
set or not. And many uses of git-annex also recover from that situation
by committing to the git-annex branch.
2020-04-15 14:25:33 -04:00