Commit graph

1849 commits

Author SHA1 Message Date
Joey Hess
f58fb6a79a
fix build when dbus is enabled
Broken in commit 8040ecf9b8
2022-07-05 13:06:45 -04:00
Joey Hess
8040ecf9b8
final readonly values moves to AnnexRead
At this point I've checked all AnnexState values and these were all that
remained that could move.

Pity that Annex.repo can't move, but it gets modified sometimes..

A couple of AnnexState values are set by options and could be AnnexRead,
but happen to use Annex when being set.

Sponsored-by: Max Thoursie on Patreon
2022-06-28 16:04:58 -04:00
Joey Hess
cb9cf30c48
move several readonly values to AnnexRead
This improves performance to a small extent in several places.

Sponsored-by: Tobias Ammann on Patreon
2022-06-28 15:40:19 -04:00
Joey Hess
debcf86029
use RawFilePath version of rename
Some small wins, almost certianly swamped by the system calls, but still
worthwhile progress on the RawFilePath conversion.

Sponsored-by: Erik Bjäreholt on Patreon
2022-06-22 16:47:34 -04:00
Joey Hess
d00e23cac9
RawFilePath optimisations 2022-06-22 16:20:08 -04:00
Joey Hess
224a57f9ed
RawFilePath optimisation 2022-06-22 16:11:03 -04:00
Joey Hess
95a04920cf
remove objectDir' 2022-06-22 16:08:49 -04:00
Joey Hess
f80ec74128
RawFilePath optimisation 2022-06-22 16:08:26 -04:00
Joey Hess
78a3d44ea0
get rid of racy addLink
The remaining callers all did not rely on it checking gitignore, so were
easy to convert.

They were susceptable to the same overwrite race as add and fix,
although less likely to have it and a narrower window than add's race.

Command.Rekey in passing got an unncessary call to removeFile deleted.
addSymlink handles deleting any existing worktree file.
2022-06-14 14:47:15 -04:00
Joey Hess
7ace804d8e
avoid writing same symlink twice in a row
Oddly, the second write did not cause it to lose the mtime inherited
from the file being added, although the mtime was not provided to that
write but only to the first. I don't quite know why that worked before!
2022-06-14 14:30:12 -04:00
Joey Hess
5ef79125ad
fix overwrite race with git-annex add of annex symlink
In the unlikely case where git-annex add is run on an annex symlink that
is not already added, and while it's processing it, the annex symlink is
overwritten with something else, avoid git-annex overwriting that with
the symlink again.

Sponsored-by: Jack Hill on Patreon
2022-06-14 14:00:13 -04:00
Joey Hess
dd6dec4eb1
fix add overwrite race with git-annex add to annex
This is not a complete fix for all such races, only the one where a
large file gets changed while adding and gets added to git rather than
to the annex.

addLink needs to go away, any caller of it is probably subject to the
same kind of race. (Also, addLink itself fails to check gitignore when
symlinks are not supported.)

ingestAdd no longer checks gitignore. (It didn't check it consistently
before either, since there were cases where it did not run git add!)
When git-annex import calls it, it's already checked gitignore itself
earlier. When git-annex add calls it, it's usually on files found
by withFilesNotInGit, which handles checking ignores.

There was one other case, when git-annex add --batch calls it. In that
case, old git-annex behaved rather badly, it would seem to add the file,
but git add would later fail, leaving the file as an unstaged annex symlink.
That behavior has also been fixed.

Sponsored-by: Brett Eisenberg on Patreon
2022-06-14 13:37:19 -04:00
Joey Hess
c59ea5b1ca
info: Added --autoenable option
Use cases include using git-annex init --no-autoenable and then going back
and enabling the special remotes that have autoenable configured. As well
as just querying to remember which ones have it enabled.

It lists all special remotes that have autoenable=yes whether currently
enabled or not. And it can be used with --json.

I pondered making this "git-annex info autoenable", but that seemed wrong
because then if the use has a directory named "autoenable", it's unclear
what they are asking for. (Although "git-annex info remote" may be
similarly unclear.) Making it an option does mean that it can't be provided
via --batch though.

Sponsored-by: Dartmouth College's Datalad project
2022-06-01 14:20:38 -04:00
Joey Hess
f35c551d35
make path absolute for display
Avoid suggesting the user add "." to safe.directory.
2022-05-31 12:17:27 -04:00
Joey Hess
478ed28f98
revert windows-specific locking changes that broke tests
This reverts windows-specific parts of 5a98f2d509
There were no code paths in common between windows and unix, so this
will return Windows to the old behavior.

The problem that the commit talks about has to do with multiple different
locations where git-annex can store annex object files, but that is not
too relevant to Windows anyway, because on windows the filesystem is always
treated as criplled and/or symlinks are not supported, so it will only
use one object location. It would need to be using a repo populated
in another OS to have the other object location in use probably.
Then a drop and get could possibly lead to a dangling lock file.

And, I was not able to actually reproduce that situation happening
before making that commit, even when I forced a race. So making these
changes on windows was just begging trouble..

I suspect that the change that caused the reversion is in
Annex/Content/Presence.hs. It checks if the content file exists,
and then called modifyContentDirWhenExists, which seems like it would
not fail, but if something deleted the content file at that point,
that call would fail. Which would result in an exception being thrown,
which should not normally happen from a call to inAnnexSafe. That was a
windows-specific change; the unix side did not have an equivilant
change.

Sponsored-by: Dartmouth College's Datalad project
2022-05-23 13:21:26 -04:00
Joey Hess
63624c40a0
fix typo in comment 2022-05-23 12:53:55 -04:00
Joey Hess
af0d854460
deal with git's changes for CVE-2022-24765
Deal with git's recent changes to fix CVE-2022-24765, which prevent using
git in a repository owned by someone else.

That makes git config --list not list the repo's configs, only global
configs. So annex.uuid and annex.version are not visible to git-annex.
It displayed a message about that, which is not right for this situation.
Detect the situation and display a better message, similar to the one other
git commands display.

Also, git-annex init when run in that situation would overwrite annex.uuid
with a new one, since it couldn't see the old one. Add a check to prevent
it running too in this situation. It may be that this fix has security
implications, if a config set by the malicious user who owns the repo
causes git or git-annex to run code. I don't think any git-annex configs
get run by git-annex init. It may be that some git config of a command
does get run by one of the git commands that git-annex init runs. ("git
status" is the command that prompted the CVE-2022-24765, since
core.fsmonitor can cause it to run a command). Since I don't know how
to exploit this, I'm not treating it as a security fix for now.

Note that passing --git-dir makes git bypass the security check. git-annex
does pass --git-dir to most calls to git, which it does to avoid needing
chdir to the directory containing a git repository when accessing a remote.
So, it's possible that somewhere in git-annex it gets as far as running git
with --git-dir, and git reads some configs that are unsafe (what
CVE-2022-24765 is about). This seems unlikely, it would have to be part of
git-annex that runs in git repositories that have no (visible) annex.uuid,
and git-annex init is the only one that I can think of that then goes on to
run git, as discussed earlier. But I've not fully ruled out there being
others..

The git developers seem mostly worried about "git status" or a similar
command implicitly run by a shell prompt, not an explicit use of git in
such a repository. For example, Ævar Arnfjörð Bjarma wrote:
> * There are other bits of config that also point to executable things,
>   e.g. core.editor, aliases etc, but nothing has been found yet that
>   provides the "at a distance" effect that the core.fsmonitor vector
>   does.
>
>   I.e. a user is unlikely to go to /tmp/some-crap/here and run "git
>   commit", but they (or their shell prompt) might run "git status", and
>   if you have a /tmp/.git ...

Sponsored-by: Jarkko Kniivilä on Patreon
2022-05-20 14:38:27 -04:00
Joey Hess
aa414d97c9
make fsck normalize object locations
The purpose of this is to fix situations where the annex object file is
stored in a directory structure other than where annex symlinks point to.

But it will also move object files from the hashdirmixed back to
hashdirlower if the repo configuration makes that the normal location.
It would have been more work to avoid that than to let it do it.

Sponsored-by: Dartmouth College's Datalad project
2022-05-16 15:38:06 -04:00
Joey Hess
6b5029db29
fix hardcoding of number of hash directories
It can be changed to 1 via a tuning, rather than the 2 this assumed. So
it would have tried to rmdir .git/annex/objects in that case, which
would not hurt anything, but is not what it is supposed to do.

Sponsored-by: Dartmouth College's Datalad project
2022-05-16 15:08:42 -04:00
Joey Hess
5a98f2d509
avoid creating content directory when locking content
If the content directory does not exist, then it does not make sense to
lock the content file, as it also does not exist, and so it's ok for the
lock operation to fail.

This avoids potential races where the content file exists but is then
deleted/renamed, while another process sees that it exists and goes to
lock it, resulting in a dangling lock file in an otherwise empty object
directory.

Also renamed modifyContent to modifyContentDir since it is not only
necessarily used for modifying content files, but also other files in
the content directory.

Sponsored-by: Dartmouth College's Datalad project
2022-05-16 12:34:56 -04:00
Joey Hess
e8a601aa24
incremental verification for retrieval from import remotes
Sponsored-by: Dartmouth College's Datalad project
2022-05-09 15:39:43 -04:00
Joey Hess
2f2701137d
incremental verification for retrieval from all export remotes
Only for export remotes so far, not export/import.

Sponsored-by: Dartmouth College's Datalad project
2022-05-09 13:49:33 -04:00
Joey Hess
90950a37e5
support incremental verification when retrieving from export/import remotes
None of the special remotes do it yet, but this lays the groundwork.

Added MustFinishIncompleteVerify so that, when an incremental verify is
started but not complete, it can be forced to finish it. Otherwise, it
would have skipped doing it when verification is disabled, but
verification must always be done when retrievin from export remotes
since files can be modified during retrieval.

Note that retrieveExportWithContentIdentifier doesn't support incremental
verification yet. And I'm not sure if it can -- it doesn't know the Key
before it downloads the content. It seems a new API call would need to
be split out of that, which is provided with the key.

Sponsored-by: Dartmouth College's Datalad project
2022-05-09 12:25:04 -04:00
Joey Hess
8675b2b075
rename memoryUnits
It's not just used for memory sizes.
2022-05-05 15:35:11 -04:00
Joey Hess
d266a41f8d
prevent numcopies or mincopies being configured to 0
Ignore annex.numcopies set to 0 in gitattributes or git config, or by
git-annex numcopies or by --numcopies, since that configuration would make
git-annex easily lose data. Same for mincopies.

This is a continuation of the work to make data only be able to be lost
when --force is used. It earlier led to the --trust option being disabled,
and similar reasoning applies here.

Most numcopies configs had docs that strongly discouraged setting it to 0
anyway. And I can't imagine a use case for setting to 0. Not that there
might not be one, but it's just so far from the intended use case of
git-annex, of managing and storing your data, that it does not seem like
it makes sense to cater to such a hypothetical use case, where any
git-annex drop can lose your data at any time.

Using a smart constructor makes sure every place avoids 0. Note that this
does mean that NumCopies is for the configured desired values, and not the
actual existing number of copies, which of course can be 0. The name
configuredNumCopies is used to make that clear.

Sponsored-by: Brock Spratlen on Patreon
2022-03-28 15:20:34 -04:00
Joey Hess
982eb7ed0d
remove vendored http-client-restricted
Removed vendored copy of http-client-restricted, and removed the
HttpClientRestricted build flag that avoided that dependency.

http-client-restricted is in Debian stable, and the i386ancient build also
uses it, so I think this vendored copy is no longer needed.

Sponsored-by: Noam Kremen on Patreon
2022-03-22 11:50:06 -04:00
Joey Hess
952664641a
turn of PackageImports in cabal file
This makes it easier to build eg benchmarks of individual modules.

May be that most of these PackageImports are not really necessary,
dunno.
2022-02-25 13:16:36 -04:00
Joey Hess
51c528980c
avoid accidentally thawing git-annex symlink
It did nothing, since at this point the link is dangling. But when there
is a thaw hook, it would probably not be happy to be asked to run on a
symlink, or might do something unexpected.

Sponsored-by: Dartmouth College's Datalad project
2022-02-24 14:21:23 -04:00
Joey Hess
f4b046252a
Run annex.thawcontent-command before deleting an object file
In case annex.freezecontent-command did something that would prevent
deletion.

Sponsored-by: Dartmouth College's Datalad project
2022-02-24 14:11:02 -04:00
Joey Hess
346007a915
add debugging of freeze and thaw 2022-02-24 14:01:29 -04:00
Joey Hess
28bc5ce232
ignore write bits being set when there is a freeze hook
When annex.freezecontent-command is set, and the filesystem does not
support removing write bits, avoid treating it as a crippled filesystem.

The hook may be enough to prevent writing on its own, and some filesystems
ignore attempts to remove write bits.

Sponsored-by: Dartmouth College's Datalad project
2022-02-24 13:28:31 -04:00
Joey Hess
64ccb4734e
smudge: Warn when encountering a pointer file that has other content appended to it
It will then proceed to add the file the same as if it were any other
file containing possibly annexable content. Usually the file is one that
was annexed before, so the new, probably corrupt content will also be added
to the annex. If the file was not annexed before, the content will be added
to git.

It's not possible for the smudge filter to throw an error here, because
git then just adds the file to git anyway.

Sponsored-by: Dartmouth College's Datalad project
2022-02-23 15:17:08 -04:00
Joey Hess
67245ae00f
fully specify the pointer file format
This format is designed to detect accidental appends, while having some
room for future expansion.

Detect when an unlocked file whose content is not present has gotten some
other content appended to it, and avoid treating it as a pointer file, so
that appended content will not be checked into git, but will be annexed
like any other file.

Dropped the max size of a pointer file down to 32kb, it was around 80 kb,
but without any good reason and certianly there are no valid pointer files
anywhere that are larger than 8kb, because it's just been specified what it
means for a pointer file with additional data even looks like.

I assume 32kb will be good enough for anyone. ;-) Really though, it needs
to be some smallish number, because that much of a file in git gets read
into memory when eg, catting pointer files. And since we have no use cases
for the extra lines of a pointer file yet, except possibly to add
some human-visible explanation that it is a git-annex pointer file, 32k
seems as reasonable an arbitrary number as anything. Increasing it would be
possible, eg to 64k, as long as users of such jumbo pointer files didn't
mind upgrading all their git-annex installations to one that supports the
new larger size.

Sponsored-by: Dartmouth College's Datalad project
2022-02-23 14:20:31 -04:00
Joey Hess
5b373a9dd2
read a consistent amount from pointer file
A few places were reading the max symlink size of a pointer file,
then passing tp parseLinkTargetOrPointer. Which is fine currently, but
to support pointer files with lines of data after the pointer, enough
has to be read that parseLinkTargetOrPointer can be assured of seeing
enough of that data to know if it's correctly formatted.

Sponsored-by: Dartmouth College's Datalad project
2022-02-23 12:52:34 -04:00
Joey Hess
4cd9325c2c
fold parseLinkTarget into parseLinkTargetOrPointer
Only one place remained that differentiated between them.

It is the case that a symlink target that happens to contain a newline
somehow will be treated as a link to a key truncated at the newline.
This is super unlikely to happen, and since a key cannot actually
contain a newline, it's as good a behavior as any. Anyway, this commit
does not change the behavior there, although arguably it should be
changed. Note that getAnnexLinkTarget does prevent a symlink target
containing a newline.

Sponsored-by: Dartmouth College's Datalad project
2022-02-23 12:30:32 -04:00
Joey Hess
ce1b3a9699
info: Allow using matching options in more situations
File matching options like --include will be rejected in situations where
there is no filename to match against. (Or where there is a filename but
it's not relative to the cwd, or otherwise seemed too bothersome to match
against.)

The addition of listKeys' was necessary to avoid using more memory in the
common case of "git-annex info". Adding a filterM would have caused the
list to buffer in memory and not stream. This is an ugly hack, but listKeys
had previously run Annex operations inside unafeInterleaveIO (for direct
mode). And matching against a matcher should hopefully not change any Annex
state.

This does allow for eg `git-annex info somefile --include=*.ext`
although why someone would want to do that I don't really know. But it
seems to make sense to allow it.
But, consider: `git-annex info ./somefile --include=somefile`
This does not match, so will not display info about somefile.
If the user really wants to, they can `--include=./somefile`.

Using matching options like --copies or --in=remote seems likely to be
slower than git-annex find with those options, because unlike such
commands, info does not have optimised streaming through the matcher.

Note that `git-annex info remote` is not the same as
`git-annex info --in remote`. The former shows info about all files in
the remote. The latter shows local keys that are also in that remote.
The output should make that clear, but this still seems like a point
where users could get confused.

Sponsored-by: Jochen Bartl on Patreon
2022-02-21 14:46:07 -04:00
Joey Hess
faf84aa5c2
Avoid git status taking a long time after git-annex unlock of many files.
Implemented by making Git.Queue have a FlushAction, which can accumulate
along with another action on files, and runs only once the other action has
run.

This lets git-annex unlock queue up git update-index actions, without
conflicting with the restagePointerFiles FlushActions.

In a repository with filter-process enabled, git-annex unlock will
often not take any more time than before, though it may when the files are
large. Either way, it should always slow down less than git-annex status
speeds up.

When filter-process is not enabled, git-annex unlock will slow down as much
as git status speeds up.

Sponsored-by: Jochen Bartl on Patreon
2022-02-18 15:06:40 -04:00
Joey Hess
21e40b86d8
have v9 autoupgrade to v10
This was right before commit a27776f602,
which made v6 v7 autoupgrade to v8 but not yet to v10.

Sponsored-by: Dartmouth College's Datalad project
2022-01-26 13:16:06 -04:00
Joey Hess
a27776f602
init --version=6 upgrade to 8 not yet 10
autoUpgradeableVersions had latestVersion (10), but it did not make
sense for asking for old version 6 to get version 10, while asking for
version 8 got version 8. So use defaultVersion (8) instead.

Sponsored-by: Dartmouth College's Datalad project
2022-01-25 13:52:42 -04:00
Joey Hess
3618746a85
fix failing readonly test case
The problem is that withContentLockFile, in a v8 repo, has to take a shared
lock of `.git/annex/content.lck`. But, in a readonly repository, if that
file does not yet exist, it cannot lock it. And while it will sometimes
work to `chmod +r .git/annex`, the repository might be readonly due to
being owned by another user, or due to being mounted readonly.

So, it seems that the only solution is to use some other file than
`.git/annex/content.lck` as the lock file. The inode sential file
was almost the only option that should always exist. (And if it somehow
does not exist, creating an empty one for locking will be ok.)

Wow, what a hack!

Sponsored-by: Dartmouth College's Datalad project
2022-01-21 13:49:31 -04:00
Joey Hess
47084b8a1d
enable filter.annex.process in v9
This has tradeoffs, but is generally a win, and users who it causes git add to
slow down unacceptably for can just disable it again.

It needed to happen in an upgrade, since there are git-annex versions
that do not support it, and using such an old version with a v8
repository with filter.annex.process set will cause bad behavior.
By enabling it in v9, it's guaranteed that any git-annex version that
can use the repository does support it. Although, this is not a perfect
protection against problems, since an old git-annex version, if it's
used with a v9 repository, will cause git add to try to run
git-annex filter-process, which will fail. But at least, the user is
unlikely to have an old git-annex in path if they are using a v9
repository, since it won't work in that repository.

Sponsored-by: Dartmouth College's Datalad project
2022-01-21 13:11:18 -04:00
Joey Hess
dc14221bc3
detect v10 upgrade while running
Capstone of the v10 upgrade process.

Tested with a git-annex drop in a v8 repo that had a local v8 remote.
Upgrading the repo to v10 (with --force) immedaitely caused it to notice
and switch over to v10 locking. Upgrading the remote also caused it to
switch over when operating on the remote.

The InodeCache makes this fairly efficient, just an added stat call per
lock of an object file. After the v10 upgrade, there is no more
overhead.

Sponsored-by: Dartmouth College's Datalad project
2022-01-21 12:56:38 -04:00
Joey Hess
76e365769e
fix crash after drop in v10
After cleaning up the lock file, the content directory is gone, so
freezing it failed.

Sponsored-by: Dartmouth College's Datalad project
2022-01-20 14:03:27 -04:00
Joey Hess
d0a5714409
continue to use v8 by default for now, unless upgraded
Since it's easy to keep supporting v8, using it for a while (eg a few
months) will give users time to upgrade git-annex installations, before
it upgrades their repository to v9.

This commit should be reverted once ready to start upgrading
repositories by default.

Sponsored-by: Dartmouth College's Datalad project
2022-01-20 11:56:05 -04:00
Joey Hess
0904eac8b4
automatic upgrade from v8 to v9
Sponsored-by: Dartmouth College's Datalad project
2022-01-20 11:39:36 -04:00
Joey Hess
cea6f6db92
v10 upgrade locking
The v10 upgrade should almost be safe now. What remains to be done is
notice when the v10 upgrade has occurred, while holding the shared lock,
and switch to using v10 lock files.

Sponsored-by: Dartmouth College's Datalad project
2022-01-20 11:33:14 -04:00
Joey Hess
9d5db6a09a
add upgrade.log
The upgrade from V9 uses this to avoid an automatic upgrade until 1 year
after the V9 update. It can also be used in future such situations.

Sponsored-by: Dartmouth College's Datalad project
2022-01-19 15:52:29 -04:00
Joey Hess
856ce5cf5f
split upgrade into v9 and v10
v10 will run 1 year after the upgrade to v9, to give time for any v8
processes to die. Until that point, the v10 upgrade will be tried by
every process but deferred, so added support for deferring upgrades.

The upgrade prevention lock file that will be used by v10 is not yet
implemented, so it does not yet defer.

Sponsored-by: Dartmouth College's Datalad project
2022-01-19 13:09:33 -04:00
Joey Hess
4f7b8ce09d
fix spelling of upgradeable 2022-01-19 12:14:50 -04:00
Joey Hess
538d02d397
delete content lock file safely after shared lock
Upgrade the shared lock to an exclusive lock, and then delete the
lock file. If there is another process still holding the shared lock,
the first process will fail taking the exclusive lock, and not delete
the lock file; then the other process will later delete it.

Note that, in the time period where the exclusive lock is held, other
attempts to lock the content in place would fail. This is unlikely to be
a problem since it's a short period.

Other attempts to lock the content for removal would also fail in that
time period, but that's no different than a removal failing because
content is locked to prevent removal.

Sponsored-by: Dartmouth College's Datalad project
2022-01-13 14:54:57 -04:00