Commit graph

2030 commits

Author SHA1 Message Date
Joey Hess
abe4b7ebd6
importfeed: Avoid erroring out when a feed has been repeatedly broken
That can leave other imported files not checked into git, because the git
command queue is not flushed when git-annex errors out. And since it only
happens once git-annex has concluded a feed is broken, it's an intermittent
bug, worst kind. Been seeing it for a while, only tracked down today.

Instead, by returning False, git-annex importfeed will cleanly shutdown and
still exit nonzero.

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-11-04 17:41:49 -04:00
Joey Hess
5ab0f48ffb
high-res mtimes
Cache high-resolution mtimes for improved detection of modified files in v7
(and direct mode).

Including on Windows.

With back-compat support so old low-res mtimes won't break anything, and
so the new information also won't break old versions of git-annex.
2018-10-30 00:41:26 -04:00
Joey Hess
2e9f128dea
moved module and relicensed 2018-10-29 23:13:36 -04:00
Joey Hess
5d97898a7c
touch files with high-resolution timestamp
Needs unix 2.7.2, but that was included in ghc 8.0.1 (and much older)
so not really a new dep.
2018-10-29 22:25:21 -04:00
Joey Hess
4431b82bce
migrate: Fix failure to migrate from URL keys. (Reversion introduced in version 6.20180926) 2018-10-29 16:36:36 -04:00
Joey Hess
a622488758
remove CHECKURL-MULTI single url response special case
Removed undocumented special case in handling of a CHECKURL-MULTI response
with only a single file listed. Rather than ignoring the url that was in
the response, use it. This allows external special remotes that want to
provide some better url to do so, although I don't entirely agree with
using CHECKURL-MULTI to accomplish that. I'm more of the feeling that an
undocumented special case that throws data away is just not a good idea.

This could in theory break some external special remote program that relied
on the current behavior, but its seems unlikely that it would because such
a program must already handle the multiple url case, unless it only ever
provides a single url response to CHECKURL-MULTI.

Make addurl --file work with a single item CHECKURL-MULTI response.
It already did for external special remotes due to the special case,
but now it also will for builtin ones like the BitTorrent special remote.

This commit was sponsored by Ilya Shlyakhter on Patron.
2018-10-29 14:52:12 -04:00
Joey Hess
9f87133bf5
snap --version= to auto-upgrade
This makes --version=6 still work, despite v6 not being in
supportedVersions. Which is useful for scripts that use it.

I didn't document it on the man page, because it's indistinguishable
from an automatic upgrade after initting as v6.
2018-10-26 11:44:05 -04:00
Joey Hess
234842a347
v7
Install new git hooks in this version.

This does beg the question of what to do if git later gets eg a
post-smudge hook, that could run git-annex smudge --update. I think the
thing to do in that case would be to make git-annex smudge --update
install the new hooks. That way, as the user uses git-annex, the hook
would be created pretty quickly and without needing any extra syscalls
except for when git-annex smudge --update is called.

I considered doing something like that for installation of the
post-checkout and post-merge hooks, which would have avoided the need
for v7. But the only place it was cheap to do it would be in git-annex smudge
which could cheaply notice that smudge.log didn't exist yet and so know
the hooks needed to be installed. But since smudge used to populate pointer
files, it would be quite surprising if a single git checkout/merge failed
to update the work tree, and so that idea didn't work out.

The other reason for v7 is psychological -- users don't need to worry
about whether they might be running an old version of git-annex that
doesn't support their v7 repository very well. And bug reports about
"v6" have gotten a bit of a bad association in my head since they often
hit one of the known limitations and didn't realize it was experimental.

newtyped RepoVersion Int to avoid needing 2 comparisons in
versionSupportsUnlockedPointers etc. Also it's just nicer.

This commit was sponsored by John Pellman on Patreon.
2018-10-25 18:24:23 -04:00
Joey Hess
c28ca8294f
optimize smudge --clean of unmodified file
Usually, git won't run clean filter when a file is unmodified. But, when
git checkout runs git annex smudge --update, it populates the pointer
runs git update-index, which sees the file has changed and runs
git annex smudge --clean, which was checksumming the file unncessarily
as it re-ingested it.

With annex.thin set, this is the difference between git checkout of a
branch with a 1 gb file taking 30s and 0.1s.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-10-25 16:46:46 -04:00
Joey Hess
ca7de61454
git post-checkout and post-merge hooks
* init, upgrade: Install git post-checkout and post-merge hooks that run
  git annex smudge --update.
* precommit: Run git annex smudge --update, because the post-merge
  hook is not run when there is a merge conflict. So the work tree will
  be updated when a commit is made to resolve the merge conflict.
* precommit: Run git annex smudge --update, because the post-merge
  hook is not run when there is a merge conflict. So the work tree will
  be updated when a commit is made to resolve the merge conflict.
* Note that git has no hooks run after git stash or git cherry-pick,
  so the user will have to manually run git annex smudge --update
  after such commands.

Nothing currently installs the hooks into v6 repos that already exist.
Something will need to be done about that, either move this behavior to v7,
or document that the user will need to manually fix up their v6 repos.

This commit was sponsored by Eric Drechsel on Patreon.
2018-10-25 15:59:51 -04:00
Joey Hess
917a2c6095
defer updating unlocked files until after smudge filter
The smuge filter no longer provides git with annexed file content, to
avoid a git memory leak, and because that did not honor annex.thin.

git annex smudge --update has to be run after a checkout to update
unlocked files in the working tree with annexed file contents.

No hooks yet to run it.

This commit was sponsored by Nick Piper on Patreon.
2018-10-25 15:08:20 -04:00
Joey Hess
f2a4db724c
remove redundant test
populatePointerFile checks the same thing
2018-10-25 14:31:45 -04:00
Joey Hess
fcca7adaff
instrument P2P --debug with connection and thread info
For debugging http://git-annex.branchable.com/bugs/annex_get_-J_16_via_ssh_stalls_/

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-10-22 15:52:11 -04:00
Joey Hess
4a6ebb1034
make sync update adjusted branch to hide/unhide
This completes initial support for --hide-missing, although the
assistant still needs to be updated and it perhaps needs to be sped up,
and maybe there needs to be a way for git-annex get to operate on
missing files. Opened some more todos for those things.

This commit was sponsored by Henrik Riomar.
2018-10-20 14:22:28 -04:00
Joey Hess
4a788fbb3b
sync --content now supports --hide-missing adjusted branches
This relies on git ls-files --with-tree, which I'm using in a way that
its man page does not document. Hm. I emailed the git list to try to get
the docs improved, but at least the git test suite does test the same
kind of use case I'm using here.

Performance impact when not in an adjusted branch is limited to some
additional MVar accesses, and a single git call to determine the name of
the current branch. So very minimal.

When in an adjusted branch, the performance impact is
in Annex.WorkTree.lookupFile, which starts doing an equal amount of work
for files that didn't exist as it already did for files that were
unlocked.

This commit was sponsored by Jochen Bartl on Patreon.
2018-10-19 17:51:25 -04:00
Joey Hess
8be5a7269a
refactor getCurrentBranch
Both Command.Sync and Annex.Ingest had their own versions of this.

The one in Annex.Ingest used Git.Branch.currentUnsafe, but does not seem
to need it. That is only checking to see if it's in an adjusted unlocked
branch, and when in an adjusted branch, the branch does in fact exist,
so the added check that Git.Branch.current does is fine.

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-10-19 17:29:18 -04:00
Joey Hess
24838547e2
adjust --hide-missing
* At long last there's a way to hide annexed files whose content
  is missing from the working tree: git-annex adjust --hide-missing
* When already in an adjusted branch, running git-annex adjust
  again will update the branch as needed. This is mostly
  useful with --hide-missing to hide/unhide files after their content
  has been dropped or received.

Still needs integration with sync and the assistant, and not as fast as it
could be, but already usable.

This commit was sponsored by Ethan Aubin.
2018-10-18 15:32:42 -04:00
Joey Hess
a6c8de84b6
improve types to allow combining some adjustments
Combinations like --hide-misssing --unlocked seem very useful. On the
other hand, combining --fix with --unlock doesn't make sense because a
file can be either unlocked or a symlink that can be fixed, but not
both.

Changed the serialization of HideMissingAdjustment in passing, but it
has not actually been used yet so nothing will be broken.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-10-18 12:59:05 -04:00
Joey Hess
558520d27a
fix rekey/migrate bookkeeping in v6
After 220317df5a the test suite still
detected a problem; migrate of an unlocked file replaced it with a
pointer file rather than a file with the content.

This was a bookeeping problem; the worktree file was being copied to the object
file and the inode cache updated, but if that database write didn't get
flushed in time, later checks would think the content was not present.
Fixed by copying the object file to the worktree file instead, which
avoids needing to update the inode cache.

Also, only copy when there's a hard link to break, not always.

This commit was sponsored by Brock Spratlen on Patreon.
2018-10-16 17:18:21 -04:00
Joey Hess
220317df5a
v6: fix migrate of unlocked file
After commit b2bafdb2fc the test suite
threw up a failure migrating unlocked files.

I'm not clear how that commit broke it (presumably by inAnnex reporting
the right information now), but the actual problem is plain:
The inodecache for the worktree file is generated, but then the file is
replaced with a copy (unncessarily unless annex.link is set, but the
code always does so) and so linkToAnnex/linkAnnex then fails because it
notices the inode cache is not valid.

This commit was sponsored by Jake Vosloo on Patreon.
2018-10-16 16:45:08 -04:00
Joey Hess
bdf6783b92
improve error message 2018-10-16 15:52:40 -04:00
Joey Hess
40dba8e933
prevent find running in bare repo 2018-10-16 10:44:09 -04:00
Joey Hess
38d691a10f
removed the old Android app
Running git-annex linux builds in termux seems to work well enough that the
only reason to keep the Android app would be to support Android 4-5, which
the old Android app supported, and which I don't know if the termux method
works on (although I see no reason why it would not).
According to [1], Android 4-5 remains on around 29% of devices, down from
51% one year ago.

[1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/

This is a rather large commit, but mostly very straightfoward removal of
android ifdefs and patches and associated cruft.

Also, removed support for building with very old ghc < 8.0.1, and with
yesod < 1.4.3, and without concurrent-output, which were only being used
by the cross build.

Some documentation specific to the Android app (screenshots etc) needs
to be updated still.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-10-13 01:41:11 -04:00
Joey Hess
91b799d1a6
export: Fix false positive in export conflict detection
It occurred when the same tree was exported by multiple clones. nub out
identical trees.

This commit was sponsored by Jochen Bartl on Patreon.
2018-10-09 15:54:12 -04:00
Joey Hess
451171b7c1
clean up url removal presence update
* rmurl: Fix a case where removing the last url left git-annex thinking
  content was still present in the web special remote.
* SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING
  used to update the presence information of the external special remote
  that called them; this was not documented behavior and is no longer done.

Done by making setUrlPresent and setUrlMissing only update presence info
for the web, and only when the url is a web url. See the comment for
reasoning about why that's the right thing to do.

In AddUrl, had to make it update location tracking, to handle the
non-web-url case.

This commit was sponsored by Ewen McNeill on Patreon.
2018-10-04 17:35:49 -04:00
Joey Hess
53526136e8
move commandAction out of CmdLine.Seek
This is groundwork for nested seek loops, eg seeking over all files and
then performing commandActions on a list of remotes, which can be done
concurrently.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-10-01 14:12:06 -04:00
Joey Hess
9adee3f2fb
sync: Warn when a remote's export is not updated to the current tree because export tracking is not configured.
Only display the warning when the current branch has a tree that is not
the same as the tree in the export.

Note that it doesn't check to see if the current tree is
in incompleteExportedTreeish; it might be worth checking that and reminding
the user about an incomplete export, but when export tracking is not
configured, they are probably not in the right clone of the repository to
resolve the incomplete export.

This commit was sponsored by Ethan Aubin.
2018-09-27 15:41:18 -04:00
Joey Hess
6134431254
clean P2P protocol shutdown on EOF try 2
Same goal as b18fb1e343 but without
breaking backwards compatability. Just return IO exceptions when running
the P2P protocol, so that git-annex-shell can detect eof and avoid the
ugly message.

This commit was sponsored by Ethan Aubin.
2018-09-25 16:49:59 -04:00
Joey Hess
4ecba916a1
annex.maxextensionlength
Added annex.maxextensionlength for use cases where extensions longer than 4
characters are needed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-09-24 12:10:18 -04:00
Joey Hess
1d1054faa6
added -z
Added -z option to git-annex commands that use --batch, useful for
supporting filenames containing newlines.

It only controls input to --batch, the output will still be line delimited
unless --json or etc is used to get some other output. While git often
makes -z affect both input and output, I don't like trying them together,
and making it affect output would have been a significant complication,
and also git-annex output is generally not intended to be machine parsed,
unless using --json or a format option.

Commands that take pairs like "file key" still separate them with a space
in --batch mode. All such commands take care to support filenames with
spaces when parsing that, so there was no need to change it, and it would
have needed significant changes to the batch machinery to separate tose
with a null.

To make fromkey and registerurl support -z, I had to give them a --batch
option. The implicit batch mode they enter when not provided with input
parameters does not support -z as that would have complicated option
parsing. Seemed better to move these toward using the same --batch as
everything else, though the implicit batch mode can still be used.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-09-20 16:11:47 -04:00
Joey Hess
50217f62a1
avoid duplicate add action for v6 unlocked modified file
The new second pass sees the file as type changed because the first
pass's changes have typically not reached git yet. So, have to
explicitly check for unmodified files in the second pass.

Note that, if the file has been touched but not really modified,
the first pass will handle it, and so the second pass does nothing.

This commit was sponsored by Jochen Bartl on Patreon.
2018-09-12 15:20:34 -04:00
Joey Hess
2743224658
change v6 git-annex add of staged unmodified unlocked file
v6: When a file is unlocked but has not been modified, and the unlocking is
only staged, git-annex add did not lock it. Now it will, for consistency
with how modified files are handled and with v5.

Note the removal of the sameInodeCache check. Otherwise it would see
that the unmodified file is unmodified and stop there. That check seems to have
been copied from the direct mode branch. But, direct mode had a specific
reason to check for unmodified content, that does not apply to v6.

The second pass means there is potential for a race, eg the unlocked
file could be modified in between the first and second passes.
No problem with that, since both passes do the same thing.

This commit was sponsored by Jake Vosloo on Patreon.
2018-09-12 14:00:05 -04:00
Joey Hess
fcff64f8bb
optimisation: avoid stat call
This commit was sponsored by Paul Walmsley on Patreon.
2018-09-05 17:26:12 -04:00
Joey Hess
d65a081f3f
improve message 2018-09-02 16:17:50 -04:00
Joey Hess
d0ef049cca
comment typo 2018-09-02 16:16:08 -04:00
Joey Hess
5c99f6247e
per-remote metadata storage
Actually very straightforward reuse of the metadata log file code.
Although I had to add a todo item as git-annex forget won't clean up
dead remote's metadata yet.

This would be worth adding to the external special remote interface
sometime. Have not opened a todo though, guess I'll wait until something
needs it.

This commit was supported by the NSF-funded DataLad project.
2018-08-31 12:23:22 -04:00
Joey Hess
76f32012af
avoid sync/assistant drop from appendonly
Make git-annex sync and the assistant skip trying to drop from appendonly
remotes since it's just going to fail.

git-annex drop and similar commands will still try to drop from
appendonly, so the user will see failure messages when they try to do
that. To do otherwise would be confusing since the user has explicitly
asked for a drop with those commands.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:23:57 -04:00
Joey Hess
8b39db20b5
export appendonly support
Make `git annex export` check appendonly when removing a file from an
export, and not update the location log, since the remote still contains
the content.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:18:20 -04:00
Joey Hess
6001b3cf45
fix build warning 2018-08-28 13:17:06 -04:00
Joey Hess
10138056dc
v6: avoid accidental conversion when annex.largefiles is not configured
v6: When annex.largefiles is not configured for a file, running git add or
git commit, or otherwise using git to stage a file will add it to the annex
if the file was in the annex before, and to git otherwise. This is to avoid
accidental conversion.

Note that git-annex add's behavior has not changed, for reasons explained
in the added comment.

Performance: No added overhead when annex.largefiles is configured.
When not configured, there is an added call to catObjectMetaData,
which involves a round trip through git cat-file --batch.
However, the earlier catKeyFile primes the cache for it.

This commit was supported by the NSF-funded DataLad project.
2018-08-27 14:51:10 -04:00
Joey Hess
98fd7ec6c9
recover from race between git mv+commit and git-annex get
Last of the known v6 races.

This also makes git add of a pointer file populate it when its content
is present in the annex. Which makes sense to do, I think.

This commit was supported by the NSF-funded DataLad project.
2018-08-22 16:01:50 -04:00
Joey Hess
7ee3b02d49
replace stack trace with an explanation 2018-08-20 21:26:07 -04:00
Joey Hess
48e9e12961
finally fixed v6 get/drop git status
After updating the worktree for an add/drop, update git's index, so git
status will not show the files as modified.

What actually happens is that the index update removes the inode
information from the index. The next git status (or similar) run
then has to do some work. It runs the clean filter.

So, this depends on the clean filter being reasonably fast and on git
not leaking memory when running it. Both problems were fixed in
a96972015d, but only for git 2.5. Anyone
using an older git will see very expensive git status after an add/drop.

This uses the same git update-index queue as other parts of git-annex, so
the actual index update is fairly efficient. Of course, updating the index
does still have some overhead. The annex.queuesize config will control how
often the index gets updated when working on a lot of files.

This is an imperfect workaround... Added several todos about new
problems this workaround causes. Still, this seems a lot better than the
old behavior.

This commit was supported by the NSF-funded DataLad project.
2018-08-14 16:23:58 -04:00
Joey Hess
a96972015d
massive v6 add speed/memory improvement
v6 add: Take advantage of improved SIGPIPE handler in git 2.5 to speed up
the clean filter by not reading the file content from the pipe. This also
avoids git buffering the whole file content in memory.

When built with an older git, still consumes stdin. If built with a newer
git and used with an older one, it breaks, but that's acceptable --
checking the git version every time would make repeated smudge runs slow.

This commit was supported by the NSF-funded DataLad project.
2018-08-09 18:17:46 -04:00
Joey Hess
12460fcea6
make --batch honor matching options
When --batch is used with matching options like --in, --metadata, etc, only
operate on the provided files when they match those options. Otherwise, a
blank line is output in the batch protocol.

Affected commands: find, add, whereis, drop, copy, move, get

In the case of find, the documentation for --batch already said it honored
the matching options. The docs for the rest didn't, but it makes sense to
have them honor them. While this is a behavior change, why specify the
matching options with --batch if you didn't want them to apply?

Note that the batch output for all of the affected commands could
already output a blank line in other cases, so batch users should
already be prepared to deal with it.

git-annex metadata didn't seem worth making support the matching options,
since all it does is output metadata or set metadata, the use cases for
using it in combination with the martching options seem small. Made it
refuse to run when they're combined, leaving open the possibility for later
support if a use case develops.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-08-08 12:07:06 -04:00
Joey Hess
4d4d238a08
add missing type signature 2018-08-06 15:41:44 -04:00
Joey Hess
38ddd6072d
addurl: Include filename in --json-progress output when known. 2018-08-06 12:53:44 -04:00
Joey Hess
ae11394efa
added annex.commitmessage
Added annex.commitmessage config that can specify a commit message for the
git-annex branch instead of the usual "update".

This commit was supported by the NSF-funded DataLad project.
2018-08-02 14:06:06 -04:00
Joey Hess
fd5a392006
cache remotes via annex-speculate-present
Added remote.name.annex-speculate-present config that can be used to
make cache remotes.

Implemented it in Remote.keyPossibilities, which is used by the
get/move/copy/mirror commands, and nothing else. This way, things like
whereis will not show content that's speculatively present.

The assistant and sync --content were not using Remote.keyPossibilities,
and were changed to use it.

The efficiency hit should be small; Remote.keyPossibilities is only
used before transferring a file, which is the expensive operation.
And, it's only doing one lookup of the remoteList and a very cheap
filter over it.

Note that, git-annex still updates the location log when copying content
to a remote with annex-speculate-present set. In this case, the location
tracking will indicate that content is present in the remote. This may
not be wanted for caches, or may not be a real problem for them. TBD.

This commit was supported by the NSF-funded DataLad project.
2018-08-01 14:28:05 -04:00
Joey Hess
cc2cb46857
unused --from: Allow specifiying a repository by uuid or description.
This commit was sponsored by Jake Vosloo on Patreon.
2018-07-11 16:01:35 -04:00