Commit graph

1825 commits

Author SHA1 Message Date
Joey Hess
16dd3dd4ca
catch more exceptions
I saw this:

  .git/annex/tmp/SHA256E-s1234376--5ba8e06e0163b217663907482bbed57684d7188024155ddc81da0710dfd2687d: openBinaryFile: resource busy (file is locked)

 guess catching IO exceptions did not catch that one.
2021-08-13 16:16:46 -04:00
Joey Hess
ff2dc5eb18
INotify.removeWatch can crash
Unsure why, possibly if the file has been replaced by another file.
2021-08-13 15:35:18 -04:00
Joey Hess
7503b8448b
inotify reports paths relative to directory being watched
Sponsored-by: Dartmouth College's DANDI project
2021-08-13 14:51:15 -04:00
Joey Hess
e07625df8a
convert tailVerify to not finalize the verification
Added failIncremental so it can force failure to verify.

Sponsored-by: Dartmouth College's DANDI project
2021-08-13 13:39:02 -04:00
Joey Hess
9d533b347f
tailVerify: return deferred action when it gets behind
Sponsored-by: Dartmouth College's DANDI project
2021-08-13 12:32:01 -04:00
Joey Hess
b6efba8139
add tailVerify
Not yet used, but this will let all remotes verify incrementally if it's
acceptable to pay the performance price. See comment for details of when
it will perform badly. I anticipate using this for all special remotes
that use fileRetriever. Except perhaps for a few like GitLFS that could
feed the incremental verifier themselves despite using that.

Sponsored-by: Dartmouth College's DANDI project
2021-08-12 14:38:02 -04:00
Joey Hess
fa62c98910
simplify and speed up Utility.FileSystemEncoding
This eliminates the distinction between decodeBS and decodeBS', encodeBS
and encodeBS', etc. The old implementation truncated at NUL, and the
primed versions had to do extra work to avoid that problem. The new
implementation does not truncate at NUL, and is also a lot faster.
(Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the
primed versions.)

Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation,
and upgrading to it will speed up to/fromRawFilePath.

AFAIK, nothing relied on the old behavior of truncating at NUL. Some
code used the faster versions in places where I was sure there would not
be a NUL. So this change is unlikely to break anything.

Also, moved s2w8 and w82s out of the module, as they do not involve
filesystem encoding really.

Sponsored-by: Shae Erisson on Patreon
2021-08-11 12:13:31 -04:00
Joey Hess
1acdd18ea8
deal better with clock skew situations, using vector clocks
* Deal with clock skew, both forwards and backwards, when logging
  information to the git-annex branch.
* GIT_ANNEX_VECTOR_CLOCK can now be set to a fixed value (eg 1)
  rather than needing to be advanced each time a new change is made.
* Misuse of GIT_ANNEX_VECTOR_CLOCK will no longer confuse git-annex.

When changing a file in the git-annex branch, the vector clock to use is now
determined by first looking at the current time (or GIT_ANNEX_VECTOR_CLOCK
when set), and comparing it to the newest vector clock already in use in
that file. If a newer time stamp was already in use, advance it forward by
a second instead.

When the clock is set to a time in the past, this avoids logging with
an old timestamp, which would risk that log line later being ignored in favor
of "newer" line that is really not newer.

When a log entry has been made with a clock that was set far ahead in the
future, this avoids newer information being logged with an older timestamp
and so being ignored in favor of that future-timestamped information.
Once all clocks get fixed, this will result in the vector clocks being
incremented, until finally enough time has passed that time gets back ahead
of the vector clock value, and then it will return to usual operation.

(This latter situation is not ideal, but it seems the best that can be done.
The issue with it is, since all writers will be incrementing the last
vector clock they saw, there's no way to tell when one writer made a write
significantly later in time than another, so the earlier write might
arbitrarily be picked when merging. This problem is why git-annex uses
timestamps in the first place, rather than pure vector clocks.)

Advancing forward by 1 second is somewhat arbitrary. setDead
advances a timestamp by just 1 picosecond, and the vector clock could
too. But then it would interfere with setDead, which wants to be
overrulled by any change. So it could use 2 picoseconds or something,
but that seems weird. It could just as well advance it forward by a
minute or whatever, but then it would be harder for real time to catch
up with the vector clock when forward clock slew had happened.

A complication is that many log files contain several different peices of
information, and it may be best to only use vector clocks for the same peice
of information. For example, a key's location log file contains
InfoPresent/InfoMissing for each UUID, and it only looks at the vector
clocks for the UUID that is being changed, and not other UUIDs.

Although exactly where the dividing line is can be hard to determine.
Consider metadata logs, where a field "tag" can have multiple values set
at different times. Should it advance forward past the last tag?
Probably. What about when a different field is set, should it look at
the clocks of other fields? Perhaps not, but currently it does, and
this does not seems like it will cause any problems.

Another one I'm not entirely sure about is the export log, which is
keyed by (fromuuid, touuid). So if multiple repos are exporting to the
same remote, different vector clocks can be used for that remote.
It looks like that's probably ok, because it does not try to determine
what order things occurred when there was an export conflict.

Sponsored-by: Jochen Bartl on Patreon
2021-08-04 12:33:46 -04:00
Joey Hess
6111958440
fix test suite
14683da9eb caused a test suite failure.
When the content of a key is not present, a LinkAnnexFailed is returned,
but replaceFile then tried to move the file into place, and since it was
not written, that crashed.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2021-08-02 13:59:23 -04:00
Joey Hess
b3c4579c79
work around strange auto-init bug
git-annex get when run as the first git-annex command in a new repo did not
populate unlocked files. (Reversion in version 8.20210621)

I am not entirely happy with this, because I don't understand how
428c91606b caused the problem in the first
place, and I don't fully understand how skipping calling scanAnnexedFiles
during autoinit avoids the problem.

Kept the explicit call to scanAnnexedFiles during git-annex init,
so that when reconcileStaged is expensive, it can be made to run then,
rather than at some later point when the information is needed.

Sponsored-by: Brock Spratlen on Patreon
2021-07-30 18:36:03 -04:00
Joey Hess
748addbe05
remove second pass in scanAnnexedFiles
The pass was needed to populate files when annex.thin was set,
but in commit 73e0cbbb19,
reconcileStaged started to do that. So, this second pass is not needed
any longer.
2021-07-30 17:46:11 -04:00
Joey Hess
817ccbbc47
split verifyKeyContent
This avoids it calling enteringStage VerifyStage when it's used in
places that only fall back to verification rarely, and which might be
called while in TransferStage and be going to perform a transfer after
the verification.
2021-07-29 13:58:40 -04:00
Joey Hess
897fd5c104
add note 2021-07-29 13:14:03 -04:00
Joey Hess
067a9c70c7
simplify code 2021-07-29 12:28:13 -04:00
Joey Hess
3e0b210039
remove unncessary debugs
Keeping the ones in Annex.InodeSentinal
2021-07-29 12:19:37 -04:00
Joey Hess
73e0cbbb19
fix problem populating pointer files
This is a result of an audit of every use of getInodeCaches,
to find places that misbehave when the annex object is not in the inode
cache, despite pointer files for the same key being in the inode cache.

Unfortunately, that is the case for objects that were in v7 repos that
upgraded to v8. Added a note about this gotcha to getInodeCaches.

Database.Keys.reconcileStaged, then annex.thin is set, would fail to
populate pointer files in this situation. Changed it to check if the
annex object is unmodified the same way inAnnex does, falling back to a
checksum if the inode cache is not recorded.

Sponsored-by: Dartmouth College's Datalad project
2021-07-27 14:26:49 -04:00
Joey Hess
de482c7eeb
move verifyKeyContent to Annex.Verify
The goal is that Database.Keys be able to use it; it can't use
Annex.Content.Presence due to an import loop.

Several other things also needed to be moved to Annex.Verify as a
conseqence.
2021-07-27 14:07:23 -04:00
Joey Hess
14683da9eb
fix potential race in updating inode cache
Some uses of linkFromAnnex are inside replaceWorkTreeFile, which was
already safe, but others use it directly on the work tree file, which
was race-prone. Eg, if the work tree file was first removed, then
linkFromAnnex called to populate it, the user could have re-written it in
the interim.

This came to light during an audit of all calls of addInodeCaches,
looking for such races. All the other uses of it seem ok.

Sponsored-by: Brett Eisenberg on Patreon
2021-07-27 13:08:08 -04:00
Joey Hess
e4b2a067e0
fix potential race in updating inode cache
In Annex.Content, the object file was statted after pointer files were
populated. But if annex.thin is set, once the pointer files are
populated, the object file can potentially be modified via the hard
link. So, it was possible, though seemingly very unlikely, for the inode
of the modified object file to be cached.

Command.Fix and Command.Fsck had similar problems, statting the work
tree files after they were in place. Changed them to stat the temp file
that gets moved into place. This does rely on .git/annex being on the
same filesystem. If it's not, the cached inode will not be the same as
the one that the temp file gets moved to. Result will be that git-annex
will later need to do an expensive verification of the content of the
worktree files. Note that the cross-filesystem move of the temp file
already is a larger amount of extra work, so this seems acceptable.

Sponsored-by: Luke Shumaker on Patreon
2021-07-27 12:29:10 -04:00
Joey Hess
3b5a3e168d
check if object is modified before starting to send it
Fix bug that caused some transfers to incorrectly fail with "content
changed while it was being sent", when the content was not changed.

While I don't know how to reproduce the problem that several people
reported, it is presumably due to the inode cache somehow being stale.
So check isUnmodified', and if it's not modified, include the file's
current inode cache in the set to accept, when checking for modification
after the transfer.

That seems like the right thing to do for another reason: The failure
says the file changed while it was being sent, but if the object file was
changed before the transfer started, that's wrong. So it needs to check
before allowing the transfer at all if the file is modified.

(Other calls to sameInodeCache or elemInodeCaches, when operating on inode
caches from the database, could also be problimatic if the inode cache is
somehow getting stale. This does not address such problems.)

Sponsored-by: Dartmouth College's Datalad project
2021-07-26 17:33:49 -04:00
Joey Hess
f195f3b541
more inode cache debugging 2021-07-26 12:57:35 -04:00
Joey Hess
0073384850
add debugging in sameInodeCache 2021-07-26 10:58:07 -04:00
Joey Hess
33a80d083a
sync --quiet
* sync: When --quiet is used, run git commit, push, and pull without
  their ususual output.
* merge: When --quiet is used, run git merge without its usual output.

This might also make --quiet work better for some other commands
that make commits, like git-annex adjust.

Sponsored-by: Kevin Mueller on Patreon
2021-07-19 11:28:47 -04:00
Joey Hess
635e7f3e26
split annexLocations
To avoid mistakes like commit 0ccbed4f6f,
be explicit about the two variants of this.

Incidentially avoids a small amount of overhead in calling reverse.

Sponsored-by: Shae Erisson on Patreon
2021-07-16 14:17:56 -04:00
Joey Hess
0ccbed4f6f
fix oops
dd31fe7b9e broke non-bare repos by using
bare hash dirs first, oops
2021-07-15 21:01:07 -04:00
Joey Hess
dd31fe7b9e
fall back to checking lower case hash directories in normal repo
Fix a bug that prevented getting content from a repository that started out
as a bare repository, or had annex.crippledfilesystem set, and was
converted to a non-bare repository.

This unfortunately means that inAnnex check gets slowed down by a stat call
in normal repos when the content is not present. Oh well, such is the cost
of backwards compatability with old mistakes.

Sponsored-by: Mark Reidenbach on Patreon
2021-07-15 12:16:31 -04:00
Joey Hess
6a581f8b8b
fix init reversion when core.sharedRepository = group
init: Fix misbehavior when core.sharedRepository = group that caused it to
enter an adjusted branch. (Reversion in version 8.20210630)

Commit 4b1b9d7a83 made init call
freezeContent in case there was a hook that could prevent writing in
situations where perms don't. But with the above git config, freezeContent
does not prevent write at all. So init needs to do what freezeContent does
with a non-shared git config.

Or init could check for that config, and skip the probing, since it
won't actually be preventing write to any files. But that would make init
too aware if details of Annex.Perms, and also would break if the git config
were changed after init.

Sponsored-by: Dartmouth College's Datalad project
2021-07-12 10:15:49 -04:00
Joey Hess
9905ec19a7
add pointer to annex.security.allowed-url-schemes
Sponsored-by: Kevin Mueller on Patreon
2021-07-02 10:53:45 -04:00
Joey Hess
3a14648142
dropping unused marks as dead
Dropping an object with drop --unused or dropunused will mark it as
dead, preventing fsck --all from complaining about it after it's been
dropped from all repositories.

If another repository still has a copy, it won't be treated as dead
until it's also dropped from there.

The drop has to use --unused, can't be --key or something else, because
this indicates that the user has recently ran git-annex unused. If it
checked the unused log on every drop, bad things would happen when the
unused log was out of date, eg a file used to be unused but then got
re-added. Marking such a file as dead could be confusing. When the user
uses --unused/dropunused, they must consider the unused information to be
up-to-date.

The particular workflow this enables is:

	git annex add foo
	git annex unannex foo
	git annex unused
	git annex drop --unused / dropunused
	git annex fsck --all # no warnings

The docs for git-annex unannex say to use git-annex unused and dropunused,
so the user should be pointed in this direction when they want to undo an
accidental add.

Sponsored-by: Brock Spratlen on Patreon
2021-06-25 15:22:26 -04:00
Joey Hess
df2001aa88
Improve display of errors when transfers fail
Transfers from or to a local git repo could fail without a reason being
given, if the content failed to verify, or if the object file's stat
changed while it was being copied. Now display messages in these cases.

Sponsored-by: Jack Hill on Patreon
2021-06-25 13:17:04 -04:00
Joey Hess
51c696679f
avoid using temp file size when deciding whether to retry failed transfer
When stall detection is enabled, and a transfer is in progress,
it would display a doubled message:

(transfer already in progress, or unable to take transfer lock) (transfer already in progress, or unable to take transfer lock)

That happened because the forward retry decider had a start size of 0,
and an end size of whatever amount of the object the other process had
downloaded. So it incorrectly thought that the transferrer process had
made progress, when it had in fact immediately given up with that
message.

Instead, use the reported value from the progress meter. If a remote
does not report progress, this will mean it doesn't forward retry, in a
situation where it used to. But most remotes do report progress, and any
remote that does not can be fixed to, by using watchFileSize when
downloading. Also, some remotes might preallocate the temp file (eg
bittorrent), so relying on statting its size at this level to get
progress is dubious.

The same change was made to Annex/Transfer.hs, although only
Annex/TransferrerPool.hs needed to be changed to avoid the duplicate
message.

(An alternate fix would have been to start the retry decider with the
size of the object file before downloading begins, rather than 0.)

Sponsored-by: Brett Eisenberg on Patreon
2021-06-25 12:04:23 -04:00
Joey Hess
0fe550af75
fix windows build 2021-06-22 09:46:06 -04:00
Joey Hess
4b1b9d7a83
Added annex.freezecontent-command and annex.thawcontent-command configs
Freeze first sets the file perms, and then runs
freezecontent-command. Thaw runs thawcontent-command before
restoring file permissions. This is in case the freeze command
prevents changing file perms, as eg setting a file immutable does.
Also, changing file perms tends to mess up previously set ACLs.

git-annex init's probe for crippled filesystem uses them, so if file perms
don't work, but freezecontent-command manages to prevent write to a file,
it won't treat the filesystem as crippled.

When the the filesystem has been probed as crippled, the hooks are not
used, because there seems to be no point then; git-annex won't be relying
on locking annex objects down. Also, this avoids them being run when the
file perms have not been changed, in case they somehow rely on
git-annex's setting of the file perms in order to work.

Sponsored-by: Dartmouth College's Datalad project
2021-06-21 14:40:52 -04:00
Joey Hess
ba62c3467b
remove dead code 2021-06-21 13:54:12 -04:00
Joey Hess
4eb3778aec
remove unused import 2021-06-21 12:32:36 -04:00
Joey Hess
694fe3702c
fix 2 build warnings 2021-06-21 11:27:18 -04:00
Joey Hess
d2be68907c
drop, move, mirror: when two files have the same content, honor the max numcopies and requiredcopies
Eg, before with a .gitattributes like:

*.2 annex.numcopies=2
*.1 annex.numcopies=1

And foo.1 and foo.2 having the same content and key, git-annex drop foo.1 foo.2
would succeed, leaving just 1 copy, despite foo.2 needing 2 copies.
It dropped foo.1 first and then skipped foo.2 since its content was gone.

Now that the keys database includes locked files, this longstanding wart
can be fixed.

Sponsored-by: Noam Kremen on Patreon
2021-06-15 11:38:44 -04:00
Joey Hess
0ed1369dcd
remove unused import 2021-06-15 11:31:59 -04:00
Joey Hess
af9fdf5dba
verify associated files when checking numcopies
Most of this is just refactoring. But, handleDropsFrom
did not verify that associated files from the keys db were still
accurate, and has now been fixed to.

A minor improvement to this would be to avoid calling catKeyFile
twice on the same file, when getting the numcopies and mincopies value,
in the common case where the same file has the highest value for both.
But, it avoids checking every associated file, so it will scale well to
lots of dups already.

Sponsored-by: Kevin Mueller on Patreon
2021-06-15 11:14:52 -04:00
Joey Hess
0b91afb57d
avoid warning 2021-06-15 11:11:55 -04:00
Joey Hess
77517ab506
avoid nub
It's O(N^2) which could matter when there are many dup files using the
same key.
2021-06-15 10:48:11 -04:00
Joey Hess
3af4c9a29a
fix exponential blowup when adding lots of identical files
This was an old problem when the files were being added unlocked,
so the changelog mentions that being fixed. However, recently it's also
affected locked files.

The fix for locked files is kind of stupidly simple. moveAnnex already
handles populating unlocked files, and only does it when the object file
was not already present. So remove the redundant populateUnlockedFiles
call. (That call was added all the way back in
cfaac52b88, and has always been
unncessary.)

Sponsored-by: Dartmouth College's Datalad project
2021-06-15 09:45:55 -04:00
Joey Hess
e147ae07f4
remove supportUnlocked check that is not worth its overhead
moveAnnex only gets to that check if the object file was not present
before. So in the case where dup files are being added repeatedly,
it will only run the first time, and so there's no significant speedup
from doing it; all it avoids is a single sqlite lookup. Since MVar
accesses do have overhead, it's better to optimise for the common case,
where unlocked files are supported.

removeAnnex is less clear cut, but I think mostly is skipped running on
keys when the object has already been dropped, so similar reasoning
applies.
2021-06-15 09:28:56 -04:00
Joey Hess
dcd2c95249
fix windows build 2021-06-14 12:43:26 -04:00
Joey Hess
014dc63a55
avoid sometimes expensive operations when annex.supportunlocked = false
This will mostly just avoid a DB lookup, so things get marginally
faster. But in cases where there are many files using the same key, it
can be a more significant speedup.

Added overhead is one MVar lookup per call, which should be small
enough, since this happens after transferring or ingesting a file,
which is always a lot more work than that. It would be nice, though,
to move getGitConfig to AnnexRead, which there is an open todo about.
2021-06-14 12:40:41 -04:00
Joey Hess
c4f1465a81
check symlink before reading file
This is faster because when multiple files are in a directory, it gets
cached.
2021-06-14 11:53:51 -04:00
Joey Hess
26a9ea12d1
handle edge case of symlink to something that is not really a pointer file
That seems very unlikely to happen, but still, it's possible it could.
And with the recent addition of locked files to the keys db, this could
be called by places that did not call it before, so it seems even more
important it's correct.

Adds an extra stat of the file, and is potentially racy, but both
problems are fixed by the unix-2.8.0 path. I have not tested that path
builds because that package is not yet released and it would be difficult
to install it since it's tightly tied to a ghc version.
2021-06-14 11:35:52 -04:00
Joey Hess
673b2feaf3
rename for clarity
Associated files are recorded now also for locked files, but this is
only needed to populate unlocked files.
2021-06-14 10:55:24 -04:00
Joey Hess
7b6deb1109
display scanning message whenever reconcileStaged has enough files to chew on
Clear visible progress bar first.

Removed showSideActionAfter because it can't be used in reconcileStaged
(import loop). Instead, it counts the number of files it
processes and displays it after it's seen a sufficient to know it's
taking a while.

Sponsored-by: Dartmouth College's Datalad project
2021-06-08 12:48:30 -04:00
Joey Hess
13b9a288d3
scanAnnexedFiles in smudge --update
This makes git checkout and git merge hooks do the work to catch up with
changes that they made to the tree. Rather than doing it at some later
point when the user is not thinking about that past operation.

Sponsored-by: Dartmouth College's Datalad project
2021-06-08 11:37:47 -04:00
Joey Hess
7f742589f9
claw back annexed file scan speedup
Following commit c941ab6f5b, this avoids
the second, redundant scan when annex.thin is not set.

The benchmark now runs in 35.5 seconds, down from 40 seconds.

Note that the inode cache of the annex object has to be passed to
addInodeCaches now, because it might not already be in the inode caches,
unlike previously.

Sponsored-by: Dartmouth College's Datalad project
2021-06-08 11:09:15 -04:00
Joey Hess
c941ab6f5b
avoid double work in git-annex init, second try
reconcileStaged populates the db, so scanAnnexedFiles does not need to
do it again. It still makes a pass over the HEAD tree, but populating
the db was most of the expensive part.

Benchmarking with 100,000 files, git-annex init now takes 40 seconds,
vs 37 seconds with the old, buggy version of this fix. It should be
possible to win those 3 precious seconds per 100k files back, in the
case when when annex.thin is not set, with improvements to reconcileStaged
that avoid needing this second pass.

Sponsored-by: Dartmouth College's Datalad project
2021-06-08 09:36:53 -04:00
Joey Hess
2cb7b7b336
Revert "avoid double work in git-annex init"
This reverts commit 0f10f208a7.

The implementation of this turns out to be unsafe; it can lead to a keys
db deadlock. scanAnnexedFiles injects a call to inAnnex into
reconcileStaged, but inAnnex sometimes needs to read from the keys db,
which will try to re-open it when it's in the process of being opened.
The exclusive lock of gitAnnexKeysDbLock will then deadlock.

This needs to be done in some other way...
2021-06-08 09:11:24 -04:00
Joey Hess
0f10f208a7
avoid double work in git-annex init
reconcileStaged was doing a redundant scan to scannAnnexedFiles.

It would probably make sense to move the body of scannAnnexedFiles
into reconcileStaged, the separation does not really serve any purpose.

Sponsored-by: Dartmouth College's Datalad project
2021-06-07 16:50:14 -04:00
Joey Hess
0434674c85
avoid displaying the scanning annexed files message when repo is not large
Avoids users thinking this scan is a big deal, when it's not in the
majority of repos.

showSideActionAfter has some ugly caveats, since it has to display in
the background of another action. I could not see a better way to do it
and it works fine in this particular case. It also doesn't really belong
in Annex.Concurrent, but cannot go in Messages due to an import loop.

Sponsored-by: Dartmouth College's Datalad project
2021-06-04 13:16:48 -04:00
Joey Hess
0f54e5e0ae
speed up initial scanning for annexed files
Streaming through git this way speeds it up by around 25%. This is
similar to the optimisations of seeking annexed files.

Sponsored-by: Dartmouth College's Datalad project
2021-05-31 14:29:34 -04:00
Joey Hess
aa00e171cb
annex.supportunlocked should not prevent scan for annexed files
That scan used to be only for unlocked files, but no longer..
2021-05-31 10:51:39 -04:00
Joey Hess
189fb05ffb
Added annex.adviceNoSshCaching config.
Sponsored-by: Brock Spratlen on Patreon
2021-05-27 12:37:49 -04:00
Joey Hess
cedc28a783
prevent dropping required content of other file using same content
When two files have the same content, and a required content expression
matches one but not the other, dropping the latter file will fail as it
would also remove the content of the required file.

This will slow down drop (w/o --auto), dropunused, mirror, and move, by one
keys db lookup per file. But I did include an optimisation to avoid a
double db lookup in the drop --auto / sync --content case. I suspect that
dropunused could also use PreferredContentChecked True, but haven't
entirely thought it through and it's rarely used with enough files for the
optimisation to matter.

Sponsored-by: Dartmouth College's Datalad project
2021-05-25 11:34:06 -04:00
Joey Hess
f46e4c9b7c
fix case where keys db was not initialized in time
When the keys db is opened for read, and did not exist yet, it used to
skip creating it, and return mempty values. But that prevents
reconcileStaged from populating associated files information in time for
the read. This fixes the one remaining case I know of where
the fix in a56b151f90 didn't work.

Note that, when there is a permissions error, it still avoids creating
the db and returns mempty for all queries. This does mean that
reconcileStaged does not run and so it may want to drop files that it
should not. However, presumably a permissions error on the keys database
also means that the user does not have permission to delete annex
objects, so they won't be able to drop the files anyway.

Sponsored-by: Dartmouth College's Datalad project
2021-05-24 14:46:59 -04:00
Joey Hess
a56b151f90
fix longstanding indeterminite preferred content for duplicated file problem
* drop: When two files have the same content, and a preferred content
  expression matches one but not the other, do not drop the file.
* sync --content, assistant: Fix an edge case where a file that is not
  preferred content did not get dropped.

The sync --content edge case is that handleDropsFrom loaded associated files
and used them without verifying that the information from the database was
not stale.

It seemed best to avoid changing --want-drop's behavior, this way when
debugging a preferred content expression with it, the files matched will
still reflect the expression. So added a note to the --want-drop documentation,
to make clear it may not behave identically to git-annex drop --auto.

While it would be possible to introspect the preferred content
expression to see if it matches on filenames, and only look up the
associated files when it does, it's generally fairly rare for 2 files to
have the same content, and the database lookup is already avoided when
there's only 1 file, so I did not implement that further optimisation.

Note that there are still some situations where the associated files
database does not get locked files recorded in it, which will prevent
this fix from working.

Sponsored-by: Dartmouth College's Datalad project
2021-05-24 14:07:05 -04:00
Joey Hess
428c91606b
include locked files in the keys database associated files
Before only unlocked files were included.

The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.

reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.

On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.

reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.

Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.

Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.

There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.

However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 16:24:37 -04:00
Joey Hess
8b6dad11a2
add createMessage
init: When annex.commitmessage is set, use that message for the commit
that creates the git-annex branch.

This will be used by filter-branch too, and it seems to make sense to let
annex.commitmessage affect it.
2021-05-17 13:07:47 -04:00
Joey Hess
1da9fe5bd8
implemented filter-branch for key info
Not tested yet but should work.

Noted a possible optimisation, which should probably be added, to
speed it up in cases where there is no uuid filtering being done.
It would need Annex.Branch to add a function like getRef that uses
catFileDetails, so the sha is also returned. The difficulty would be
making it support the precached file content; if it didn't it would
probably not be any faster and could even be slower. So probably the
precaching would need to be changed to also cache the sha.
2021-05-17 11:11:39 -04:00
Joey Hess
4ff8a1ae2b
refactoring
filterBranch should be reusable for copy-branch command.

Changed LogVariety to differentiate between LocationLog and UrlLog;
only location logs contain uuids and need to be filtered by uuid,
while url logs do not. This does not change current behavior,
but it will let filterBranch be reused without filtering url logs
incorrectly.
2021-05-13 14:43:25 -04:00
Joey Hess
947d2a10bc
assistant: Fix a crash on startup by avoiding using forkProcess
ghc 8.8.4 seems to have changed something that broke code that has been
successfully using forkProcess since 2012. Likely a change to GC internals.

Since forkProcess has never had clear documentation about how to
use it safely, avoid using it at all. Instead, when git-annex needs to
daemonize itself, re-run the git-annex command, in a new process group
and session.

This commit was sponsored by Luke Shumaker on Patreon.
2021-05-12 15:08:03 -04:00
Joey Hess
4bf7940d6b
fileRef: make paths relative and simplified
Fix behavior of several commands, including reinject, addurl, and rmurl
when given an absolute path to an unlocked file, or a relative path that
leaves and re-enters the repository.

To avoid slowing down all the cases where the paths are already ok
with an unncessary call to getCurrentDirectory, put in an optimisation
in relPathCwdToFile. That will probably also speed up other parts of
git-annex by some small amount, but I have not benchmarked.

Note that I did not convert branchFileRef, because it seems likely that
it will be used with a file that is not provided by the user, so is already
in a sane format. This is certainly true for the way git-annex uses it,
though maybe arguable to the extent Git.Ref is a reusable library.
2021-05-07 13:25:59 -04:00
Joey Hess
4588668a12
fromkey unlocked files support
fromkey: Create an unlocked file when used in an adjusted branch where the
file should be unlocked, or when configured by annex.addunlocked.

There is some overlap with code in Annex.Ingest, however it's not quite the
same because ingesting has a temp file with the content, where here the
content, if any, is in the annex object file. So it eg, makes sense for
Annex.Ingest to copy the execute mode of the content file, but it does not make
sense for fromkey to do that.

Also changed in passing to stage the file in git directly, rather than
using git add. One consequence of that is that if the file is gitignored,
it will still get added, rather than the old behavior:

The following paths are ignored by one of your .gitignore files:
ignored
hint: Use -f if you really want to add them.
hint: Turn this message off by running
hint: "git config advice.addIgnoredFile false"
git-annex: user error (xargs ["-0","git","--git-dir=.git","--work-tree=.","--literal-pathspecs","add","--"] exited 123)

That old behavior was a surprise to me, and so I consider it a bug, and doubt
anyone would have relied on it.

Note that, when on an --hide-missing branch, it is possible to fromkey a key
that is not present (needs --force). The annex link or pointer file still gets
written in this case. It doesn't seem to make any sense not to write it,
because then fromkey would not do anything useful in this case, and this way
the file can be committed and synced to master, and the branch re-adjusted to
hide the new missing file.

This commit was sponsored by Noam Kremen on Patreon.
2021-05-03 11:26:18 -04:00
Joey Hess
4edde98709
improve message
Pluralize copies appropriately.

This commit was sponsored by Mark Reidenbach on Patreon.
2021-04-27 13:44:08 -04:00
Joey Hess
a166d2520b
check mincopies is satisfied even when numcopies is known to be satisfied
I had been assuming that numcopies would be a larger or at most equal to
mincopies, so no need to check both. But users get confused and use configs
that don't really make sense, so make sure to handle mincopies being larger
than numcopies.

Also add something to the mincopies man page to discourage this
misconfiguration.

This commit was sponsored by Denis Dzyubenko on Patreon.
2021-04-27 13:37:18 -04:00
Joey Hess
32138b8cd8
implement annex.privateremote and remote.name.private configs
The slightly unusual parsing in Types.GitConfig avoids the need to look
at the remote list to get configs of remotes. annexPrivateRepos combines
all the configs, and will only be calculated once, so it's nice and
fast.

privateUUIDsKnown and regardingPrivateUUID now need to read from the
annex mvar, so are not entirely free. But that overhead can be optimised
away, as seen in getJournalFileStale. The other call sites didn't seem
worth optimising to save a single MVar access. The feature should have
impreceptable speed overhead when not being used.
2021-04-23 14:21:57 -04:00
Joey Hess
d5a05655b4
Merge branch 'master' into hiddenannex 2021-04-23 13:06:33 -04:00
Joey Hess
657d55c401
convert withKnownUrls to use overBranchFileContents
This only partly fixes importfeed to see journalled files, since it
separately cats metadata directly from the branch. Held off on a
changelog for a bug fix until that's dealt with.
2021-04-23 11:32:25 -04:00
Joey Hess
c687eae80b
got private repos really working
This new TODO will need private indexes to resolve; until then the
private journal has to be checked when private UUIDs are known.
2021-04-21 16:26:23 -04:00
Joey Hess
d0c5f6d2f0
optimisation
Avoid trying to read private journal files when no private uuids are
known.
2021-04-21 16:02:56 -04:00
Joey Hess
24eeacdba8
adapt recent bug fixes to support private journal
At this point, private repos should mostly work, except for a few
commands that directly read from the git-annex branch and will not see
the private journal.

Private index not yet implemented.
2021-04-21 16:01:13 -04:00
Joey Hess
0bb57702e1
Merge branch 'master' into hiddenannex 2021-04-21 15:45:12 -04:00
Joey Hess
653b719472
fix --all to include not yet committed files from the journal
Fix bug caused by recent optimisations that could make git-annex not see
recently recorded status information when configured with
annex.alwayscommit=false.

This does mean that --all can end up processing the same key more than once,
but before the optimisations that introduced this bug, it used to also behave
that way. So I didn't try to fix that; it's an edge case and anyway git-annex
behaves well when run on the same key repeatedly.

I am not too happy with the use of a MVar to buffer the list of files in the
journal. I guess it doesn't defeat lazy streaming of the list, if that
list is actually generated lazily, and anyway the size of the journal is
normally capped and small, so if configs are changed to make it huge and
this code path fire, git-annex using enough memory to buffer it all is not a
large problem.
2021-04-21 15:40:32 -04:00
Joey Hess
74acf17a31
refactoring 2021-04-21 14:29:02 -04:00
Joey Hess
6eb3c0a6b4
fix branch precacheing bug by checking journal
Fix bug caused by recent optimisations that could make git-annex not see
recently recorded status information when configured with
annex.alwayscommit=false.

When not using --all, precaching only gets triggered when the
command actually needs location logs, and so there's no speed hit there.

This is a minor speed hit for --all, because it precaches even when the
location log is not actually going to be used, and so checking the journal
is not necessary. It would have been possible to defer checking the journal
until the cache gets used. But that would complicate the usual Branch.get
code path with two different kinds of caches, and the speed hit is really
minimal. A better way to speed up --all, later, would be to avoid
precaching at all when the location log is not going to be used.
2021-04-21 14:02:15 -04:00
Joey Hess
05989556a2
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.

Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.

All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.

Currently, no UUIDs are treated as private yet, a way to configure that
is needed.

The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.

It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.

And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 15:04:53 -04:00
Joey Hess
b2222e4639
optimisation
Avoid unnecessary conversion to/from String.
2021-04-20 13:13:45 -04:00
Joey Hess
c30557594e
remove now redundant function 2021-04-20 12:42:57 -04:00
Joey Hess
e1a9b79fa6
fix hardcoded origin name in checkAdjustedClone
init: Fix a crash when the repo's was cloned from a repo that had an
adjusted branch checked out, and the origin remote is not named "origin".

The only other hardcoding of the name of origin is in:

- Upgrade.V2, which can be ignored probably
- Annex.Branch, which doesn't fail if it has some other name, but just
  doesn't set up the git-annex branch with quite as linear a history in
  that case.
2021-04-14 18:53:27 -04:00
Joey Hess
d18b37f769
remove part of comment that is no longer relevant 2021-04-14 18:32:15 -04:00
Joey Hess
b86206b553
directory CoW on import 2021-04-14 16:10:09 -04:00
Joey Hess
441f65c2cf
split out Annex.CopyFile
Goal is to use it in Remote.Directory, but also it's nice to shrink Remote.Git.
2021-04-14 14:06:43 -04:00
Joey Hess
8e7dc958d2
forget: Preserve currently exported trees
Avoiding problems with exporttree remotes in some unusual circumstances.

This commit was sponsored by Brett Eisenberg on Patreon.
2021-04-13 15:00:23 -04:00
Joey Hess
bdba2c5914
fastDebug Annex.Branch reads and writes
Reads of cached data are not debugged, only cache misses are, and since
many commands pre-cache location log data, this avoids a slew of
fastDebug calls when running commands such as git-annex get --from
2021-04-06 16:48:24 -04:00
Joey Hess
2e9d4ac754
fix fastDebug to check if debugging is actually enabled
Had to add to AnnexRead an indication of whether debugging is enabled.

Could have just made setupConsole not install a debug output action that
outputs, and have enableDebug be what installs that, but then in the
common case where there is no debug selector, and so all debug output is
selected, it would run the debug output action every time, which entails
an IORef access. Which would make fastDebug too slow..
2021-04-06 16:28:37 -04:00
Joey Hess
13c090b37a
use fastDebug everywhere it can be used
None of these are likely to yeild a noticable speedup though.
2021-04-06 15:41:24 -04:00
Joey Hess
d16d739ce2
implement fastDebug
Most of the changes here involve global option parsing: GlobalSetter
changed so it can both run an Annex action to set state, but can also
change the AnnexRead value, which is immutable once the Annex monad is
running.

That allowed a debugselector value to be added to AnnexRead, seeded
from the git config. The --debugfilter option's GlobalSetter then updates
the AnnexRead.

This improved GlobalSetter can later be used to move more stuff to
AnnexRead. Things that don't involve a git config will be easier to
move, and probably a *lot* of things can be moved eventually.

fastDebug, while implemented, is not used anywhere yet. But it should be
fast..
2021-04-06 15:24:28 -04:00
Joey Hess
aaba83795b
switch from hslogger to purpose-built Utility.Debug
This uses a DebugSelector, rather than debug levels, which will allow
for a later option like --debug-from=Process to only
see debuging about running processes.

The module name that contains the thing being debugged is used as the
DebugSelector (in most cases; does not need to be a hard and fast rule).
Debug calls were changed to add that. hslogger did not display
that first parameter to debugM, but the DebugSelector does get
displayed.

Also fastDebug will allow doing debugging in places that are used in
tight loops, with the DebugSelector coming from the Annex Reader
essentially for free. Not done yet.
2021-04-05 13:40:31 -04:00
Joey Hess
c2f612292a
start splitting out readonly values from AnnexState
Values in AnnexRead can be read more efficiently, without MVar overhead.
Only a few things have been moved into there, and the performance
increase so far is not likely to be noticable.

This is groundwork for putting more stuff in there, particularly a value
that indicates if debugging is enabled.

The obvious next step is to change option parsing to not run in the
Annex monad to set values in AnnexState, and instead return a pure value
that gets stored in AnnexRead.
2021-04-02 15:51:44 -04:00
Joey Hess
ced91b3fbd
Avoid excess commits to the git-annex branch when stall detection is enabled
When git-annex transferrer started up, and the journal contained something,
it would commit it to the git-annex branch. This caused excess commits to
the branch, in cases where normally several changes would be journalled and
committed together. That generated some excess git objects and was also
just noisy on stdout.

Since transferrer uses enableInteractiveBranchAccess, it does not need to
commit journalled changes, since the optimisation that avoids checking
the journal when reading from the branch is disabled for processes that
call that.

This commit was sponsored by Svenne Krap on Patreon.
2021-04-02 11:57:18 -04:00
Joey Hess
c75f7e1d98
improve comment 2021-04-02 10:35:15 -04:00
Joey Hess
31eb5fddf3
borg: Fix a bug that prevented importing keys of type URL and WORM
Keys stored on the filesystem are mangled by keyFile to avoid problem
chars. So, that mangling has to be reversed when parsing files from a
borg backup back to a key.

The directory special remote also so mangles them. Some other special
remotes do not; eg S3 just serializes the key -- but S3 object names are
not limited to filesystem valid filenames anyway, so a S3 server must
not map them directly to files in any case. It seems unlikely that a
borg backup of some such special remote will get broken by this change.

This commit was sponsored by Graham Spencer on Patreon.
2021-03-26 12:07:00 -04:00
Joey Hess
537f9d9a11
Improved display of errors when accessing a git http remote fails.
New error message:

  Remote foo not usable by git-annex; setting annex-ignore

  http://localhost/foo/config download failed: Configuration of annex.security.allowed-ip-addresses does not allow accessing address ::1

If git config parse fails, or the git config file is not available at the url,
a better error message for that is also shown.

This commit was sponsored by Mark Reidenbach on Patreon.
2021-03-24 14:19:32 -04:00
Joey Hess
fdf1ccbe3f
move comment 2021-03-24 13:57:00 -04:00
Joey Hess
5d78cd9d08
Sped up git-annex init in a clone of an existing repository
Seems that hasOrigin was never finding origin's git-annex branch, so a new
one got created each time. And so then it later needed to merge the two
branches, which is expensive.

Added --no-track to git branch to avoid it displaying a message about
setting up tracking branches. Of course there's no reason to make the
git-annex branch a tracking branch since git-annex auto-merges it.
2021-03-23 15:23:13 -04:00
Joey Hess
798f685077
New annex.supportunlocked config
Can beet to false to avoid some expensive things needed to support unlocked
files.

See my comment for why this only controls what init sets up, and not other
behavior.

I didn't bother with making the v5 upgrade code path look at this, though
it easily could, because the docs say to run git-annex init after setting
it to make it take effect.
2021-03-23 14:04:34 -04:00
Joey Hess
a8b837aaef
add git ls-tree --long parser
Not yet used, but allows getting the size of items in the tree fairly
cheaply.

I noticed that CmdLine.Seek uses ls-tree and the feeds the files into
another long-running process to check their size. That would be an
example of a place that might be sped up by using this. Although in that
particular case, it only needs to know the size of unlocked files, not
locked. And since enabling --long probably doubles the ls-tree runtime
or more, the overhead of using it there may outwweigh the benefit.
2021-03-23 12:47:00 -04:00
Joey Hess
5545e78a1e
Make --debug also enable debugging in child git-annex processes
Especially necessary with stalldetection using child processes for
transfers.

This commit was sponsored by Jack Hill on Patreon.
2021-03-22 14:25:28 -04:00
Joey Hess
0e44c252c8
avoid getting creds from environment during autoenable
When autoenabling special remotes of type S3, weddav, or glacier, do not
take login credentials from environment variables, as the user may not be
expecting the autoenable to happen, and may have those set for other
purposes.
2021-03-17 09:41:12 -04:00
Joey Hess
526b9ed9d6
update comment 2021-03-16 14:53:29 -04:00
Joey Hess
8bae692486
better interface for catKey'
It only needs the size, so don't require the other stuff. Should let it
be used in more places, making things faster.
2021-03-16 14:52:23 -04:00
Joey Hess
cdd512cd9f
simplify 2021-03-05 14:22:04 -04:00
Joey Hess
fc61915230
use GIT keys for export of non-annexed files
This solves the problem that import of such files gets confused and
converts them back to annexed files.

The import code already used GIT keys internally when it determined a
file should not be annexed. So now when it sees a GIT key that export
used, it already does the right thing.

This also means that even older version of git-annex can import and will
do the right thing, once a fixed version has exported. Still, there may
be other complications around upgrades; still need to think it all
through.

Moved gitShaKey and keyGitSha from Key to Annex.Export since they're
only used for export/import.

Documented GIT keys in backends, since they do appear in the git-annex
branch now.

This commit was sponsored by Graham Spencer on Patreon.
2021-03-05 14:12:11 -04:00
Joey Hess
cbf94fd13d
prep for fixing find --branch --unlocked
Added LinkType to ProvidedInfo, and unified MatchingKey with
ProvidedInfo. They're both used in the same way, so there was no real
reason to keep separate.

Note that addLocked and addUnlocked still set matchNeedsFileName,
because to handle MatchingFile, they do need it. However, they
don't use it when MatchingInfo is provided. This should be ok,
the --branch case will be able skip checking matchNeedsFileName,
since it will provide a filename in any case.
2021-03-02 13:39:31 -04:00
Joey Hess
ee4fd38ecf
remove unused contentFile = Nothing 2021-03-01 16:35:38 -04:00
Joey Hess
62e152f210
incremental checksum on download from ssh or p2p
Checksum as content is received from a remote git-annex repository, rather
than doing it in a second pass.

Not tested at all yet, but I imagine it will work!

Not implemented for any special remotes, and also not implemented for
copies from local remotes. It may be that, for local remotes, it will
suffice to use rsync, rely on its checksumming, and simply return Verified.
(It would still make a checksumming pass when cp is used for COW, I guess.)
2021-02-09 17:03:27 -04:00
Joey Hess
dd39e9e255
suggest when user may want annex.stalldetection
When annex.stalldetection is not enabled, and a likely stall is detected,
display a suggestion to enable it.

Note that the progress meter display is not taken down when displaying
the message, so it will display like this:

	0%    8 B                 0 B/s
	  Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection
	0%    10 B                0 B/s

Although of course if it's really stalled, it will never update
again after the message. Taking down the progress meter and starting
a new one doesn't seem too necessary given how unusual this is,
also this does help show the state it was at when it stalled.

Use of uninterruptibleCancel here is ok, the thread it's canceling
only does STM transactions and sleeps. The annex thread that gets
forked off is separate to avoid it being canceled, so that it
can be joined back at the end.

A module cycle required moving from dupState the precaching of the
remote list. Doing it at startConcurrency should cover all the cases
where the remote list is used in concurrent actions.

This commit was sponsored by Kevin Mueller on Patreon.
2021-02-03 15:57:19 -04:00
Joey Hess
7db4e62a90
remove accidental duplicated code
The code in Annex.WorkerStage and Annex.Concurrent was 100% identical.
2021-02-03 15:23:52 -04:00
Joey Hess
135757d64a
automatic stall detection
annex.stalldetection can now be set to "true" to make git-annex do
automatic stall detection when it detects a remote is updating its transfer
progress consistently enough.

This commit was sponsored by Luke Shumaker on Patreon.
2021-02-03 13:33:57 -04:00
Joey Hess
1b63132ca3
add searchPathContents
And rename related functions for consistency.
2021-02-02 19:06:15 -04:00
Joey Hess
6f78497572
When adding files to an adjusted branch set up by --unlock-present, add them unlocked, not locked
Missed this when implementing it because of the default case catching
the new constructor. So, removed that default case to make sure
future types of adjusted branches don't make the same mistake.

Complicated by git-annex addurl --fast which adds the file whose content
is not present, so it needs to stay unlocked when on such a branch.

This commit was sponsored by Brock Spratlen on Patreon.
2021-01-28 12:47:46 -04:00
Joey Hess
34a535ebea
adjust: Fix some bad behavior when unlocked files use URL keys.
This avoids the smudge --clean filter failing on the URL keys.

git checkout runs the post-checkout hook, which runs smudge --update.
That populates all the pointer files, but it neglected to store their inode
caches in the keys db. With that done, and the keys db flushed before
smudge --clean gets run (by restagePointerFile), the isUnmodifiedCheap
check can tell the file is not modified, so will not try to re-ingest it,
which does not work with URL keys because they do not support genKey.

It also seems possible that the isUnmodifiedCheap was also failing for
non-URL keys, which would cause them to be re-ingested, leading to a lot of
extra work. I have not verified that, but don't see why it wouldn't have
happened. So this probably also speeds up checking out adjusted branches.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2021-01-25 17:25:42 -04:00
Joey Hess
5c7e6629cf
Fix a bug in view filename generation when a metadata value ended with "/"
Or ":" or "\" on Windows, eg "c:" again.
2021-01-22 14:05:14 -04:00
Joey Hess
95cd49abdb
fix a bug that prevented git-annex init from working in a submodule
This is probably a reversion, but not sure what caused it. By the time
Annex.Init runs fixupUnusualReposAfterInit, another git-annex process has
at least sometimes already done the necessary fixups. (Eg, one run
indirectly by a git command.) But since the Repo is cached, it doesn't
realize and does them again. So, avoid crashing when git config --unset
fails.

This commit was sponsored by Jack Hill on Patreon.
2021-01-21 15:33:15 -04:00
Joey Hess
3847aa3c9c
change user-visible error to giveup 2021-01-21 14:13:14 -04:00
Joey Hess
7ccddd4aea
display exception as part of warnings
and comment that led to this change
2021-01-19 12:27:42 -04:00
Joey Hess
cc89699457
mincopies
This is conceptually very simple, just making a 1 that was hard coded be
exposed as a config option. The hard part was plumbing all that, and
dealing with complexities like reading it from git attributes at the
same time that numcopies is read.

Behavior change: When numcopies is set to 0, git-annex used to drop
content without requiring any copies. Now to get that (highly unsafe)
behavior, mincopies also needs to be set to 0. It seemed better to
remove that edge case, than complicate mincopies by ignoring it when
numcopies is 0.

This commit was sponsored by Denis Dzyubenko on Patreon.
2021-01-06 14:15:19 -04:00
Joey Hess
5ce61c6b2a
add: Significantly speed up adding lots of non-large files to git
* add: Significantly speed up adding lots of non-large files to git,
  by disabling the annex smudge filter when running git add.
* add --force-small: Run git add rather than updating the index itself,
  so any other smudge filters than the annex one that may be enabled will
  be used.
2021-01-04 13:12:28 -04:00
Joey Hess
1c5fc8f047
Git.Queue: allow providing git common options like -c 2021-01-04 12:51:55 -04:00
Joey Hess
46059ab0e5
split off versionedExport from appendonly
S3 uses versionedExport, while GitLFS uses appendonly.

This is groundwork for later changes.
2020-12-28 14:37:15 -04:00
Joey Hess
6280af2901
generate more compact git-annex branch for imports
Especially from borg, where the content identifier logs
all end up being the same identical file!

But also, for other imports, the location tracking logs can,
in some cases, be identical files.

Bonus optimisation: Avoid looking up (and parsing when set)
GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to.
Although the lookup does happen at startup even when no
log will be written now.
2020-12-23 15:25:16 -04:00
Joey Hess
7916fc98a3
graft in imported tree to avoid gc
Fix a bug that could prevent getting files from an importtree=yes remote,
because the imported tree was allowed to be garbage collected.
2020-12-23 14:27:38 -04:00
Joey Hess
4f9969d0a1
optimisation for borg
Skip needing to list importable contents when unchanged since last time.
2020-12-22 15:00:05 -04:00
Joey Hess
e1ac42be77
convert listImportableContents to throwing exceptions 2020-12-22 14:24:29 -04:00
Joey Hess
15000dee07
improve thirdpartypopulated support
May actually work now.

Note that, importKey now has to add the size to the key if it's supposed
to have size. Remote.Directory relied on the importer adding the size,
which is no longer done, so it was changed; it was the only one.
This way, importKey does not need to behave differently between regular
and thirdpartypopulated imports.
2020-12-21 16:19:44 -04:00
Joey Hess
57b03630b3
support thirdPartyPopulated
These don't have importTree in their config, because they don't support
tree import, but they do still support import, and do not support export
or key/value modification.
2020-12-21 13:49:47 -04:00
Joey Hess
1c054f1cf7
started borg special remote
Still need to implement 3 methods, but importKeyM looks like it will
work well to find annex object files.
2020-12-18 16:56:54 -04:00
Joey Hess
909318dcee
Merge branch 'master' into borg 2020-12-18 15:27:24 -04:00
Joey Hess
9a2c8757f3
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.

So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that  the tree is recorded in the export log, and gets grafted
into the git-annex branch.

importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.

It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.

Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.

(Untested and unused as of yet.)

This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 15:23:58 -04:00
Joey Hess
f62aee0525
fix handling of importtree-only remotes
Don't want to try to use these remotes as key/value remotes, which will
surely fail. It only recently became possible for importtree to be set
w/o exporttree, so before this code was ok.

(cherry picked from commit 97599cb0f7f4115aa5a3e81a91ee3d1d6c52dc84)
2020-12-18 15:13:30 -04:00
Joey Hess
400bdb48db
update warnExportImportConflict for import-only remotes 2020-12-17 16:25:46 -04:00
Joey Hess
a4451ac391
add missing space 2020-12-17 15:58:14 -04:00
Joey Hess
26aad24fd3
simplify
As the only blocking operation now is threadDelaySeconds, no need to
calculate actual time and actual expected minimum size.
2020-12-17 12:09:49 -04:00
Joey Hess
6b13574827
Windows: include= and exclude= containing '/' will also match filenames that are written using '\'
And vice-versa, but it's better to use '/' for portability.

Notably, standardPreferredContent contains "archive/*" and that might not
match if the filename ends up coming in with the slashes the other way
around.
2020-12-15 12:39:34 -04:00
Joey Hess
74c1e0660b
propagate git-annex -c on to transferrer child process
git -c was already propagated via environment, but need this for
consistency.

Also, notice it does not use gitAnnexChildProcess to run the
transferrer. So nothing is done about avoid it taking the
pid lock. It's possible that the caller is already doing something that
took the pid lock, and if so, the transferrer will certianly fail,
since it needs to take the pid lock too. This may prevent combining
annex.stalldetection with annex.pidlock, but I have not verified it's
really a problem. If it was, it seems git-annex would have to take
the pid lock when starting a transferrer, and hold it until shutdown,
or would need to take pid lock when starting to use a transferrer,
and hold it until done with a transfer and then drop it. The latter
would require starting the transferrer with pid locking disabled for the
child process, so assumes that the transferrer does not do anyting that
needs locking when not running a transfer.
2020-12-15 11:36:25 -04:00
Joey Hess
00526a6739
pass along -c options to child git-annex processes 2020-12-15 10:49:29 -04:00
Joey Hess
87de360e98
populate new field 2020-12-15 10:37:07 -04:00
Joey Hess
9de5506f19
improve readability and fix a warning 2020-12-14 17:48:30 -04:00
Joey Hess
01527b21d8
add key to FileInfo
MatchingKey is not the thing to use when matching on actual worktreee
files.

Fix reversion in 8.20201116 that made include= and exclude= in
preferred/required content expressions match a path relative to the current
directory, rather than the path from the top of the repository.
2020-12-14 17:42:02 -04:00
Joey Hess
75acf5f440
improve some edge cases around partial initialization
* Guard against running in a repo where annex.uuid is set but
  annex.version is set, or vice-versa.
* Avoid autoinit when a repo does not have annex.version or annex.uuid
  set, but has a git-annex objects directory, suggesting it was used
  by git-annex before.
2020-12-14 13:17:43 -04:00
Joey Hess
19e26f091d
rename and refactor 2020-12-14 12:32:21 -04:00
Joey Hess
0d0f6d9c23
fix stall detection to actually work when fully stalled
When fully stalled, the progress bar doesn't update, so waiting on a
MVar would block forever. There's no need to wait anyway, just wake up
after sleeping the configured period and check the current value.

Luckily Viasat makes it really easy for me to notice this kind of
mistake, by stalling long TCP connections frequently.
2020-12-11 18:28:46 -04:00
Joey Hess
d3f78da0ed
propagate signals to the transferrer process group
Done on unix, could not implement it on windows quite.

The signal library gets part of the way needed for windows.
But I had to open https://github.com/pmlodawski/signal/issues/1 because
it lacks raiseSignal.

Also, I don't know what the equivilant of getProcessGroupIDOf is on
windows. And System.Process does not provide a way to send any signal to
a process group except for SIGINT.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-12-11 15:32:00 -04:00
Joey Hess
a422a056f2
make getViaTmpFrom no longer update location log
All callers adjusted to update it themselves.

In Command.ReKey, and Command.SetKey, the cleanup action already did,
so it was updating the log twice before.

This fixes a bug when annex.stalldetection is set, as now
Command.Transferrer can skip updating the location log, and let it be
updated by the calling process.
2020-12-11 11:50:13 -04:00
Joey Hess
04c12aa6df
custom protocol for transferrer
Rather than using Read/Show, which would force me to preserve data types
into the future.

I considered just deriving json and sending that, but I don't much like
deriving json with data types that have named constructors (like Key
does) because again it locks in data type details.

So instead, used SimpleProtocol, with a fairly complex and unreadable
protocol. But it is as efficient as the p2p protocol at least, and as
future proof.

(Writing my own custom json instances would have worked but I thought
of it too late and don't want to do all the work twice. The only real
benefit might be that aeson could be faster.)

Note that, when a new protocol request type is added later, git-annex
trying to use it will cause the git-annex transferrer to display a
protocol error message. That seems ok; it would only happen if a new
git-annex found an old version of itself in PATH or the program
file. So it's unlikely, and all it can do anyway is display an error.
(The error message could perhaps be improved..)

This commit was sponsored by Jack Hill on Patreon.
2020-12-09 16:13:59 -04:00