Commit graph

332 commits

Author SHA1 Message Date
Joey Hess
b45b37b088
wait for first pass to complete before second pass
Otherwise the bloom filter may not be fully populated when the second
pass starts, which could have led to incorrect behavior with --all -J,
probably in very rare circumstances.
2020-09-24 14:23:25 -04:00
Joey Hess
167da965b9
remove obsolete comment 2020-09-24 14:22:56 -04:00
Joey Hess
3a05d53761
add SeekInput (not yet used)
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.

Over the course of 2 grueling days.

withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.

Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
2020-09-15 15:41:13 -04:00
Joey Hess
f4c4b89aa3
refactor
Make all calls to git merge go through autoMergeFrom, in preparation
for fine-tuning git merge's config for automatic merge conflict
resolution.

This commit was sponsored by Ryan Newton on Patreon.
2020-09-07 13:26:16 -04:00
Joey Hess
7bdb0cdc0d
add gitAnnexChildProcess and use instead of incorrect use of runsGitAnnexChildProcess
Fixes reversion in 8.20200617 that made annex.pidlock being enabled result
in some commands stalling, particularly those needing to autoinit.

Renamed runsGitAnnexChildProcess to make clearer where it should be
used.

Arguably, it would be better to have a way to make any process git-annex
runs have the env var set. But then it would need to take the pid lock
when running any and all processes, and that would be a problem when
git-annex runs two processes concurrently. So, I'm left doing it ad-hoc
in places where git-annex really does run a child process, directly
or indirectly via a particular git command.
2020-08-25 14:57:49 -04:00
Joey Hess
5d380c6c5c
when workTreeItems finds a problem with a parameter, don't go on to process it
Part of workTreeItems is trying detect a case
where git porcelain refuses to process a file, and where
git ls-files silently outputs nothing. But, it's hard to perfectly
replicate git's behavior, and besides, git's behavior could change.
So it could be that we warn, but then git ls-files does not skip over
it, and so git-annex also processes it after warning about it.

So, if we think we have a problem with a parameter, display the warning,
and skip processing it at all.

Implementing this was complicated by needing to handle the case where
all command-line parameters get filtered out this way. Which is
different than the case where there are none, because we don't want to
operate on all files in this new case..
2020-08-06 13:47:45 -04:00
Joey Hess
1be92381ec
unify batch mode with non-batch by using AnnexedFileSeeker 2020-07-22 14:23:28 -04:00
Joey Hess
75aab72d23
mostly done with location log precaching
Some nice wins.
2020-07-13 17:04:02 -04:00
Joey Hess
df58609804
convert sync to use seekFilteredKeys
This only speeds up sync --content from 34.75 to 33.17 seconds;
location log precaching will probably be a bigger win.
2020-07-13 15:02:52 -04:00
Joey Hess
4c9ad1de46
optimisation: stream keys through git cat-file --buffer
This is only implemented for git-annex get so far. It makes git-annex
get nearly twice as fast in a repo with 10k files, all of them present!

But, see the TODO for some caveats.
2020-07-10 13:54:52 -04:00
Joey Hess
85506a7015
import: Added --no-content option, which avoids downloading files from a special remote
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.

git-annex sync --no-content does not yet use this to do imports
2020-07-03 13:41:57 -04:00
Joey Hess
96f6aa39dd
add runsGitAnnexChildProcess calls
This is all the calls to git-annex that seem capable of possibly locking
the same pidlock as their parent. Except possibly for some in the
assistant.
2020-06-17 15:31:03 -04:00
Joey Hess
89b2542d3c
annex.skipunknown with transition plan
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.

Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.

annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
2020-05-28 15:55:17 -04:00
Joey Hess
3824645368
change to new waitForAllRunningCommandActions
waitForAllRunningCommandActions is a subset of finishCommandActions
and more appropriate for what is being done here: Just a concurrency
barrier.
2020-05-26 14:00:51 -04:00
Joey Hess
e04a931439
improve transfer stages for some commands
move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup
actions in separate job pool from uploads.

transferStages was confusingly named, it's only useful when doing downloads
as then the verify actions can be run concurrently with other downloads.
For commands that upload, there will be more concurrency from running
cleanup actions in a separate job pool.

As for sync, I left it using downloadStages although that's not optimal
for the part of a sync that uploads. Perhaps it should use the union of
both?
2020-05-26 11:55:50 -04:00
Joey Hess
0040d2c129
sync: Avoid an ugly error message when nothing has been committed to master yet and there is a synced master branch to merge from
Now the warning gets displayed, which is better than an arcane git error.

The warning is still kind of ugly, especially when the pull later in the
sync will clear up what it warns about. But, this is an unusual situation
not likely to happen, and if there is no remote to pull from, the warning
message is needed or the sync will seem to succeed despite not merging the
synced master branch.

Would still be better if it could merge the synced master branch in this
situation, making an empty commit to master to do it seems wrong, and
otherwise it would need a whole separate code path, and would bypass using
git merge in favor of say, setting master to the syned branch. Which would
bypass git configs like arguably merge.ff and certianly
merge.verifySignatures. So don't want to do that.
2020-05-05 14:31:37 -04:00
Joey Hess
c05c4e549e
sync: When some remotes to sync with are specified, and --fast is too, pick the lowest cost of the specified remotes
Do not sync with a faster remote that was not specified.

That old behavior was only documented in the changelog, and was certianly
surprising. It also meant adding --fast made it slower..
2020-04-23 16:08:45 -04:00
Joey Hess
529f488ec4
fix a thundering herd problem
Avoid repeatedly opening keys db when accessing a local git remote and -J
is used.

What was happening was that Remote.Git.onLocal created a new annex state
as each thread started up. The way the MVar was used did not prevent that.
And that, in turn, led to repeated opening of the keys db, as well as
probably other extra work or resource use.

Also managed to get rid of Annex.remoteannexstate, and it turned out there
was an unncessary Maybe in the keysdbhandle, since the handle starts out
closed.
2020-04-17 17:09:29 -04:00
Joey Hess
c0cd07c36b
Ref ByteString conversion done
Test suite passes.
2020-04-07 17:41:09 -04:00
Joey Hess
87d5583a91
use programPath consistently, not readProgramFile
Improve git-annex's ability to find the path to its program, especially
when it needs to run itself in another repo to upgrade it.

Some parts of the code used readProgramFile, probably because I forgot that
programPath exists.

I noticed this when a git-annex auto-upgrade failed because it was running
git-annex upgrade --autoonly, but the code to run git-annex used
readProgramFile, which happened to point to an older build of git-annex.
2020-03-30 16:06:27 -04:00
Joey Hess
79a0435b77
automate remote.name.skipFetchAll
initremote, enableremote: Set remote.name.skipFetchAll when the remote
cannot be fetched from by git, so git fetch --all will not try to use it.
2020-02-19 13:58:26 -04:00
Joey Hess
69f2d1dd43
remoteConfig rework
remoteAnnexConfig will avoid bugs like
a3a674d15b

Use now more generic remoteConfig in a couple places that built
non-annex config settings manually before.
2020-02-19 13:45:11 -04:00
Joey Hess
72959b23e5
remove mention of receive.denyNonFastforwards on push failure
That was added back in 2013 commit 2af652e1b8
and I'm a bit unclear about the reasons.

It seemed that, at the time, receive.denyNonFastforwards=true, which is
the default in a repo created by git init --shared --bare (but not
without --shared), which the assistant did, caused problems syncing.
But even at the time the bug report showed an error message clearly
explaining that it was a non-fast-forward push being denied.

I tried it with the current version, and since git-annex sync pulls
from the bare repo and merges, it pushes a fast-forward. So there's no
failure to push. (There could be one if another push happened after the
pull, but you'd want it to fail then presumably.)

I'm not 100% sure what changed to make it not be a problem, but I know
I've seen this message in many circumstances and I can't ever recall it
having anything to do with any issue that prevented a push.

Based on doc/forum/non_fast_forward_error_with_git_annex_sync.mdwn,
which showed the problem when syncing from a direct mode repo,
and on doc/forum/receiving_indirect_renames_on_direct_repo___63__/comment_3_0246fff6c7c75f6be45bd257ec3872a5._comment
which seems to show the problem was actually a problem pulling,
I think there's a good chance that the problem actually involved direct
mode.
2020-02-19 11:46:24 -04:00
Joey Hess
06f6eb7a70
--only-annex --no-content combination 2020-02-18 12:29:31 -04:00
Joey Hess
a78eb6dd58
sync --only-annex and annex.synconlyannex
* Added sync --only-annex, which syncs the git-annex branch and annexed
  content but leaves managing the other git branches up to you.
* Added annex.synconlyannex git config setting, which can also be set with
  git-annex config to configure sync in all clones of the repo.

Use case is then the user has their own git workflow, and wants to use
git-annex without disrupting that, so they sync --only-annex to get the
git-annex stuff in sync in addition to their usual git workflow.

When annex.synconlyannex is set, --not-only-annex can be used to override
it.

It's not entirely clear what --only-annex --commit or --only-annex
--push should do, and I left that combination not documented because I
don't know if I might want to change the current behavior, which is that
such options do not override the --only-annex. My gut feeling is that
there is no good reasons to use such combinations; if you want to use
your own git workflow, you'll be doing your own committing and pulling
and pushing.

A subtle question is, how should import/export special remotes be handled?
Importing updates their remote tracking branch and merges it into master.
If --only-annex prevented that git branch stuff, then it would prevent
exporting to the special remote, in the case where it has changes that
were not imported yet, because there would be a unresolved conflict.

I decided that it's best to treat the fact that there's a remote tracking
branch for import/export as an implementation detail in this case. The more
important thing is that an import/export special remote is entirely annexed
content, and so it makes a lot of sense that --only-annex will still sync
with it.
2020-02-17 16:33:10 -04:00
Joey Hess
963239da5c
separate RemoteConfig parsing basically working
Many special remotes are not updated yet and are commented out.
2020-01-14 12:35:08 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
b88f89c1ef
get the most commonly used commands building again
A quick benchmark of whereis shows not much speed improvement, maybe a
few percent. Profiling it found a hotspot, adds to todo.
2019-12-04 13:45:18 -04:00
Joey Hess
99b509572d
post-receive hook updateInstead emulation cleanup
The code is only needed because for a long time, git-annex didn't
install hooks in repos on crippled filesystems. Now it does, and they
work at least on FAT (where all files are executable) and Windows.

It would be possible to remove this code in v8 simply by re-installing
the hooks.
2019-09-11 14:41:51 -04:00
Joey Hess
689d1fcc92
remove most remnants of direct mode
A few remain, as needed for upgrades, and for accessing objects from
remotes that are direct mode repos that have not been converted yet.
2019-08-26 16:27:48 -04:00
Joey Hess
9d36c826c0
use fine-grained WorkerStages when transferring and verifying
This means that Command.Move and Command.Get don't need to
manually set the stage, and is a lot cleaner conceptually.

Also, this makes Command.Sync.syncFile use the worker pool better.
In the scenario where it first downloads content and then uploads it to
some other remotes, it will start in TransferStage, then enter VerifyStage
and then go back to TransferStage for each transfer to the remotes.
Before, it entered CleanupStage after the download, and stayed in it for
the upload, so too many transfer jobs could run at the same time.

Note that, in Remote.Git, it uses runTransfer and also verifyKeyContent
inside onLocal. That has a Annex state for the remote, with no worker pool.
So the resulting calls to enteringStage won't block in there.

While Remote.Git.copyToRemote does do checksum verification, I
realized that should not use a verification slot in the WorkerPool
to do it. Because, it's reading back from eg, a removable disk to checksum.
That will contend with other writes to that disk. It's best to treat
that checksum verification as just part of the transer. So, removed the todo
item about that, as there's nothing needing to be done.
2019-06-19 13:24:20 -04:00
Joey Hess
53882ab4a7
make WorkerStage an open type
Rather than limiting it to PerformStage and CleanupStage, this opens it
up so any number of stages can be added as needed by commands.

Each concurrent command has a set of stages that it uses, and only
transitions between those can block waiting for a free slot in the
worker pool. Calling enteringStage for some other stage does not block,
and has very little overhead.

Note that while before the Annex state was duplicated on the first call
to commandAction, this now happens earlier, in startConcurrency.
That means that seek stage actions should that use startConcurrency
and then modify Annex state won't modify the state of worker threads
they then start. I audited all of them, and only Command.Seek
did so; prepMerge changes the working directory and so has to come
before startConcurrency.

Also, the remote list is built before duplicating the state, which means
that it gets built earlier now than it used to. This would only have an
effect of making commands that end up not needing to perform any actions
unncessary build the remote list (only when they're run with concurrency
enable), but that's a minor overhead compared to commands seeking
through the work tree and determining they don't need to do anything.
2019-06-19 13:05:03 -04:00
Joey Hess
04cc470201
run download checksum verification in separate job pool
get, move, copy, sync: When -J or annex.jobs has enabled concurrency,
checksum verification uses a separate job pool than is used for
downloads, to keep bandwidth saturated.

Not yet done for upload checksum verification, but that only affects
remotes on local disks.
2019-06-17 14:58:02 -04:00
Joey Hess
ba2551da6f
add startingNoMessage
Fixes the last wart in the StartMessage transition. A few commands
include other CommandStart actions that generate output, and
do not themselves need to display a start/end message.
2019-06-12 14:11:23 -04:00
Joey Hess
8e5ea28c26
finish CommandStart transition
The hoped for optimisation of CommandStart with -J did not materialize.
In fact, not runnign CommandStart in parallel is slower than -J3.
So, CommandStart are still run in parallel.

(The actual bad performance I've been seeing with -J in my big repo
has to do with building the remoteList.)

But, this is still progress toward making -J faster, because it gets rid
of the onlyActionOn roadblock in the way of making CommandCleanup jobs
run separate from CommandPerform jobs.

Added OnlyActionOn constructor for ActionItem which fixes the
onlyActionOn breakage in the last commit.

Made CustomOutput include an ActionItem, so even things using it can
specify OnlyActionOn.

In Command.Move and Command.Sync, there were CommandStarts that used
includeCommandAction, so output messages, which is no longer allowed.
Fixed by using startingCustomOutput, but that's still not quite right,
since it prevents message display for the includeCommandAction run
inside it too.
2019-06-12 13:24:01 -04:00
Joey Hess
436f107715
make CommandStart return a StartMessage
The goal is to be able to run CommandStart in the main thread when -J is
used, rather than unncessarily passing it off to a worker thread, which
incurs overhead that is signficant when the CommandStart is going to
quickly decide to stop.

To do that, the message it displays needs to be displayed in the worker
thread, after the CommandStart has run.

Also, the change will mean that CommandStart will no longer necessarily
run with the same Annex state as CommandPerform. While its docs already
said it should avoid modifying Annex state, I audited all the
CommandStart code as part of the conversion. (Note that CommandSeek
already sometimes runs with a different Annex state, and that has not been
a source of any problems, so I am not too worried that this change will
lead to breakage going forward.)

The only modification of Annex state I found was it calling
allowMessages in some Commands that default to noMessages. Dealt with
that by adding a startCustomOutput and a startingUsualMessages.
This lets a command start with noMessages and then select the output it
wants for each CommandStart.

One bit of breakage: onlyActionOn has been removed from commands that used it.
The plan is that, since a StartMessage contains an ActionItem,
when a Key can be extracted from that, the parallel job runner can
run onlyActionOn' automatically. Then commands won't need to worry about
this detail. Future work.

Otherwise, this was a fairly straightforward process of making each
CommandStart compile again. Hopefully other behavior changes were mostly
avoided.

In a few cases, a command had a CommandStart that called a CommandPerform
that then called showStart multiple times. I have collapsed those
down to a single start action. The main command to perhaps suffer from it
is Command.Direct, which used to show a start for each file, and no
longer does.

Another minor behavior change is that some commands used showStart
before, but had an associated file and a Key available, so were changed
to ShowStart with an ActionItemAssociatedFile. That will not change the
normal output or behavior, but --json output will now include the key.
This should not break it for anyone using a real json parser.
2019-06-06 17:13:54 -04:00
Joey Hess
258a7c5cd1
add Key to all ActionItem constructors 2019-06-06 12:53:24 -04:00
Joey Hess
568af1073e
filter exported tree through remote's preferred content setting
The filtering is fairly efficient as far as building the trees goes,
since it reuses adjustTree. But it still needs to traverse the whole
tree, and look up the keys used by every file.

The tree that gets recorded to export.log is the filtered tree.
This way resumes of interrupted sync to an export uses it without
needing to recalculate it. And, a change to the preferred content
settings of the remote will result in a different tree, so the export
will be updated accordingly.

The original tree is still used in the remote tracking branch.
That branch represents the special remote as a git remote, and if it
were a normal git remote, the tree in its head would not be affected by
preferred content.
2019-05-20 11:54:55 -04:00
Joey Hess
3d6f1b7dba
Made git-annex sync --content much faster when all the remotes it's syncing with are export/import remotes
It was unnecessarily going over all files and checking preferred content
against no remotes.
2019-04-10 12:42:10 -04:00
Joey Hess
37041b629d
improve messages around export/import conflicts
A conflict can be caused by either export or import when the remote
supports both.
2019-04-09 13:03:59 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
28e46d947a
avoid sync --content trying to sendKey to exporttree remotes 2019-03-11 14:09:46 -04:00
Joey Hess
057999f0fc
fix sync --content with remote.name.annex-tracking-branch=master:subdir
It was exporting the whole tree not just the subdir. Now tested fully
working in both directions.
2019-03-11 14:07:52 -04:00
Joey Hess
8ae0db925b
fix name of annex-tracking-branch config 2019-03-11 13:56:59 -04:00
Joey Hess
c755788256
sync: import when annex-tracking-branch is configured
This works, and tested syncing both gets changes from a special remote
and sends changes to it, keeping it fully in sync nicely!

But have not tried it with a subdir configured.
2019-03-09 13:57:49 -04:00
Joey Hess
633021e135
--no-push and remote.name.annex-push prevent exporting trees to special remotes
Users may want sync to only export, or only import and this is broadly
analagous to push and pull, so it makes sense to use the same
configuration for it.
2019-03-09 13:21:49 -04:00
Joey Hess
e3a704224f
fix export db locking deadlock 2019-03-07 16:06:02 -04:00
Joey Hess
18d7a1dbbb
make export and sync update special remote tracking branch
The branch is only updated once the export is 100% complete. This way,
if an export is started but interrupted and so the remote does not yet
contain some of the files, an import will make a commit on the old
branch, and so won't delete the missing files.
2019-03-01 16:35:48 -04:00
Joey Hess
4747fa923d
export: Deprecated the --tracking option.
Instead, users can configure remote.<name>.annex-tracking-branch themselves.
2019-02-23 15:54:33 -04:00
Joey Hess
9cebfd7002
purify exportActions
Purifying exportActions will allow introspecting and modifying it,
which is needed to add progress bar display to it.

Only S3 and WebDAV ran an Annex action while constructing ExportActions.
There was a small performance gain from them doing that, since a
resource was able to be prepared and reused for multiple actions by
Command.Export.

As seen in commit 809cfbbd8a and
5d394023eb S3 and WebDAV actually create a
new handle for each access in normal, non-export use. It doesn't seem
worth making export use of them marginally more efficient than normal
use. It would be better to do that work upfront when constructing the
remote. Or perhaps use a MVar to cache a handle.

This commit was sponsored by Nick Piper on Patreon.
2019-01-30 15:11:40 -04:00
Joey Hess
ad1d422dd7
fix false positive in export conflict detection
Like the earlier fixed one in Command.Export, it occurred when the same
tree was exported by multiple clones. Previous fix was incomplete since
several other places looked at the list of exported trees to detect when
there was an export conflict. Added a single unified function to avoid
missing any places it needed to be fixed.

This commit was sponsored by mo on Patreon.
2019-01-30 12:36:30 -04:00
Joey Hess
894716512d
add a UUIDDesc type containing a ByteString
Groundwork for handling uuid.log using ByteString
2019-01-01 16:17:54 -04:00
Joey Hess
6d381df0e6
sync --content: Fix dropping unwanted content from the local repository
This fixes a bug with the numcopies counting when using sync --content.
It did not always pass the local repo uuid to handleDropsFrom, and so the
numcopies counting was off by one, and unwanted local content would only be
dropped when there were numcopies+1 remote copies.

Also, support dropping local content that has reached an
exporttree remote that is not untrusted (currently only S3 remotes
with versioning).
2018-12-18 13:58:12 -04:00
Joey Hess
d65df7ab21
improve messages around export conflicts
When an export conflict prevents accessing a special remote, be clearer
about what the problem is and how to resolve it.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-11-13 15:50:06 -04:00
Joey Hess
4a6ebb1034
make sync update adjusted branch to hide/unhide
This completes initial support for --hide-missing, although the
assistant still needs to be updated and it perhaps needs to be sped up,
and maybe there needs to be a way for git-annex get to operate on
missing files. Opened some more todos for those things.

This commit was sponsored by Henrik Riomar.
2018-10-20 14:22:28 -04:00
Joey Hess
4a788fbb3b
sync --content now supports --hide-missing adjusted branches
This relies on git ls-files --with-tree, which I'm using in a way that
its man page does not document. Hm. I emailed the git list to try to get
the docs improved, but at least the git test suite does test the same
kind of use case I'm using here.

Performance impact when not in an adjusted branch is limited to some
additional MVar accesses, and a single git call to determine the name of
the current branch. So very minimal.

When in an adjusted branch, the performance impact is
in Annex.WorkTree.lookupFile, which starts doing an equal amount of work
for files that didn't exist as it already did for files that were
unlocked.

This commit was sponsored by Jochen Bartl on Patreon.
2018-10-19 17:51:25 -04:00
Joey Hess
8be5a7269a
refactor getCurrentBranch
Both Command.Sync and Annex.Ingest had their own versions of this.

The one in Annex.Ingest used Git.Branch.currentUnsafe, but does not seem
to need it. That is only checking to see if it's in an adjusted unlocked
branch, and when in an adjusted branch, the branch does in fact exist,
so the added check that Git.Branch.current does is fine.

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-10-19 17:29:18 -04:00
Joey Hess
53526136e8
move commandAction out of CmdLine.Seek
This is groundwork for nested seek loops, eg seeking over all files and
then performing commandActions on a list of remotes, which can be done
concurrently.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-10-01 14:12:06 -04:00
Joey Hess
9adee3f2fb
sync: Warn when a remote's export is not updated to the current tree because export tracking is not configured.
Only display the warning when the current branch has a tree that is not
the same as the tree in the export.

Note that it doesn't check to see if the current tree is
in incompleteExportedTreeish; it might be worth checking that and reminding
the user about an incomplete export, but when export tracking is not
configured, they are probably not in the right clone of the repository to
resolve the incomplete export.

This commit was sponsored by Ethan Aubin.
2018-09-27 15:41:18 -04:00
Joey Hess
ae11394efa
added annex.commitmessage
Added annex.commitmessage config that can specify a commit message for the
git-annex branch instead of the usual "update".

This commit was supported by the NSF-funded DataLad project.
2018-08-02 14:06:06 -04:00
Joey Hess
fd5a392006
cache remotes via annex-speculate-present
Added remote.name.annex-speculate-present config that can be used to
make cache remotes.

Implemented it in Remote.keyPossibilities, which is used by the
get/move/copy/mirror commands, and nothing else. This way, things like
whereis will not show content that's speculatively present.

The assistant and sync --content were not using Remote.keyPossibilities,
and were changed to use it.

The efficiency hit should be small; Remote.keyPossibilities is only
used before transferring a file, which is the expensive operation.
And, it's only doing one lookup of the remoteList and a very cheap
filter over it.

Note that, git-annex still updates the location log when copying content
to a remote with annex-speculate-present set. In this case, the location
tracking will indicate that content is present in the remote. This may
not be wanted for caches, or may not be a real problem for them. TBD.

This commit was supported by the NSF-funded DataLad project.
2018-08-01 14:28:05 -04:00
Joey Hess
a5f598a6aa
remove use of remoteGitConfig
Unfortunately one more use remains..

This should be just as fast as the other method. The remote's Git.Repo
has already had its config read, so Annex.new's call to Git.Config.read
is a noop.

Thid commit was sponsored by andrea rota.
2018-06-05 13:15:04 -04:00
Joey Hess
67e46229a5
change Remote.repo to Remote.getRepo
This is groundwork for letting a repo be instantiated the first time
it's actually used, instead of at startup.

The only behavior change is that some old special cases for xmpp remotes
were removed. Where before git-annex silently did nothing with those
no-longer supported remotes, it may now fail in some way.

The additional IO action should have no performance impact as long as
it's simply return.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2018-06-04 15:30:26 -04:00
Joey Hess
af8546990d
move: --safe/--unsafe and potential drop race fix
move: Added --safe option, which makes move honor numcopies settings.
Also --unsafe enables the default behavior, anticipating that the
default may one day change.

This commit was sponsored by Ethan Aubin.
2018-04-09 16:20:10 -04:00
Joey Hess
db057dcff0
fix sync bug in direct mode
sync: Fix bug that prevented pulling changes into direct mode repositories
that were committed to remotes using git commit rather than git-annex sync.

This commit was supported by the NSF-funded DataLad project.
2018-02-26 14:10:03 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
4781ca297b
showStart variant for when there's no worktree file
Clean up some uses of showStart with "" for the file,
or in some cases, a non-filename description string. That would
generate bad json, although none of the commands doing that
supported --json.

Using "" for the file resulted in output like "foo  rest";
now the extra space is eliminated.

This commit was sponsored by Fernando Jimenez on Patreon.
2017-11-28 15:14:16 -04:00
Joey Hess
e1ac299ad0
better dup key with -J fix
This avoids all the complication about redundant work discussed in
the previous try at fixing this. At the expense of needing each command
that could have the problem to be patched to simply wrap the action in
onlyActionOn once the key is known. But there do not seem to be many
such commands.

onlyActionOn' should not be used with a CommandStart (or CommandPerform),
although the types do allow it. onlyActionOn handles running the whole
CommandStart chain. I couldn't immediately see a way to avoid mistken
use of onlyActionOn'.

This commit was supported by the NSF-funded DataLad project.
2017-10-17 18:48:53 -04:00
Joey Hess
85ed38a574
Avoid repeated checking that files passed on the command line exist.
git annex add, git annex lock etc make multiple seek passes,
and each seek pass checked that files existed. That was unncessary
redundant work.

Fixed by adding a new WorkTreeItem type, make seek actions use it,
and check that the files exist when constructing it.

This commit was supported by the NSF-funded DataLad project.
2017-10-16 14:10:20 -04:00
Joey Hess
e8c9a5c515
sync: Added --cleanup, which removes local and remote synced/ branches.
Also deletes any tagged pushes that the assistant might have done,
since those would also prevent resetting a branch back.

This commit was sponsored by andrea rota.
2017-09-28 14:58:48 -04:00
Joey Hess
d71c65ca0a
add exporter thread to assistant
This is similar to the pusher thread, but a separate thread because git
pushes can be done in parallel with exports, and updating a big export
should not prevent other git pushes going out in the meantime.

The exportThread only runs at most every 30 seconds, since updating an
export is more expensive than pushing. This may need to be tuned.

Added a separate channel for export commits; the committer records a
commit in that channel.

Also, reconnectRemotes records a dummy commit, to make the exporter
thread wake up and make sure all exports are up-to-date. So,
connecting a drive with a directory special remote export will
immediately update it, and getting online will automatically
update S3 and WebDAV exports.

The transfer queue is not involved in exports. Instead, failed
exports are retried much like failed pushes.

This commit was sponsored by Ewen McNeill.
2017-09-20 15:29:13 -04:00
Joey Hess
2e69efea8d
git annex sync --content to exports
Assistant still todo.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2017-09-19 14:20:47 -04:00
Joey Hess
d39c120afa
add annex-ignore-command and annex-sync-command configs
Added remote configuration settings annex-ignore-command and
annex-sync-command, which are dynamic equivilants of the annex-ignore
and annex-sync configurations.

For this I needed a new DynamicConfig infrastructure. Its implementation
should be as fast as before when there is no dynamic config, and it caches
so shell commands are only run once.

Note that annex-ignore-command exits nonzero when the remote should be ignored.
While that may seem backwards, it allows using the same command for it as
for annex-sync-command when you want to disable both.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-08-17 13:54:14 -04:00
Joey Hess
94351daba6
configuration to disable automatic merge conflict resolution
* Added annex.resolvemerge configuration, which can be set to false to
  disable the usual automatic merge conflict resolution done by git-annex
  sync and the assistant.
* sync: Added --no-resolvemerge option.

Note that disabling merge conflict resolution is probably not a good idea
in a direct mode repo or adjusted branch. Since updates to both are done
outside the usual work tree, if it fails the tree is not left in a
conflicted state, and it would be hard to manually resolve the conflict.
Still, made annex.resolvemerge be supported in those cases for consistency.

This commit was sponsored by Riku Voipio.
2017-06-01 12:51:01 -04:00
Joey Hess
db1600b2de
de-Maybe remoteGitConfig
It's always set, so does not need to be a Maybe.
2017-05-11 16:05:01 -04:00
Joey Hess
29e73f76ef
Added remote.<name>.annex-push and remote.<name>.annex-pull
The former can be useful to make remotes that don't get fully synced with
local changes, which comes up in a lot of situations.

The latter was mostly added for symmetry, but could be useful (though less
likely to be).

Implementing `remote.<name>.annex-pull` was a bit tricky, as there's no one
place where git-annex pulls/fetches from remotes. I audited all
instances of "fetch" and "pull". A few cases were left not checking this
config:

* Git.Repair can try to pull missing refs from a remote, and if the local
  repo is corrupted, that seems a reasonable thing to do even though
  the config would normally prevent it.
* Assistant.WebApp.Gpg and Remote.Gcrypt and Remote.Git do fetches
  as part of the setup process of a remote. The config would probably not
  be set then, and having the setup fail seems worse than honoring it if it
  is already set.

I have not prevented all the code that does a "merge" from merging branches
from remotes with remote.<name>.annex-pull=false. That could perhaps
be done, but it would need a way to map from branch name to remote name,
and the way refspecs work makes that hard to get really correct. So if the
user fetches manually, the git-annex branch will get merged, for example.
Anther way of looking at/justifying this is that the setting is called
"annex-pull", not "annex-merge".

This commit was supported by the NSF-funded DataLad project.
2017-04-05 13:22:35 -04:00
Joey Hess
64f924dc93
sync --content-of=path
For when you want to sync only some files' contents, not the whole working
tree.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-03-20 16:00:48 -04:00
Joey Hess
c8e1e3dada
AssociatedFile newtype
To prevent any further mistakes like 301aff34c4

This commit was sponsored by Francois Marier on Patreon.
2017-03-10 13:35:31 -04:00
Joey Hess
e6857e75a6
sync hack to make updateInstead work on eg FAT
sync: When syncing with a local repository located on a crippled
filesystem, run the post-receive hook there, since it wouldn't get run
otherwise. This makes pushing to repos on FAT-formatted removable drives
update them when receive.denyCurrentBranch=updateInstead.

Made Remote.Git export onLocal, which was cleaned up to not have so many
caveats about its use.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-02-17 15:21:52 -04:00
Joey Hess
4594bece40
make git-annex:git-annex push quiet again
Recent changes had a side effect of displaying errors in the fairly
common case when this push fails. Since the synced/git-annex push
is always forced, those errors are noise, so hide again.

This means 3 separate pushes are done now, where before it only made 2.
A bit more expensive, but ssh connection caching eliminates most of
the costs.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-02-17 14:03:17 -04:00
Joey Hess
a73c8ce4a1
sync: Improve integration with receive.denyCurrentBranch=updateInstead
By displaying error messages from the remote then it fails to update
its checked out branch.

Error messages in the default receive.denyCurrentBranch are still
suppressed, which matches user expectations.

This commit was sponsored by Nick Daly on Patreon.
2017-02-15 16:13:30 -04:00
Joey Hess
2af5f727a9
forgot to compile last commit; fix mistakes 2017-02-15 13:55:06 -04:00
Joey Hess
69baa45f14
sync, merge: Fail when the current branch has no commits yet, instead of not merging in anything from remotes and appearing to succeed.
At first I wanted to make it go ahead and merge into the newborn branch,
so made it use Git.Branch.currentUnsafe to get the current branch. But that
failed:

fatal: ambiguous argument 'refs/heads/master..refs/heads/synced/master':
unknown revision or path not in the working tree.

A whole nother code path to handle merging into newborn branches seemed
excessive, so went with displaying a warning and propigating failure
status.

This commit was sponsored by Brock Spratlen on Patreon.
2017-02-14 16:09:55 -04:00
Joey Hess
c545701224
make sync --no-commit override annex.annex.autocommit 2017-02-03 14:36:14 -04:00
Joey Hess
b77903af48
New annex.synccontent config setting
.. which can be set to true to make git annex sync default to --content.

This may become the default at some point in the future.

As well as being configuable by git config, it can be configured by
git-annex config to control the default behavior in all clones of a
repository.

Had to add a separate --no-content switch to we can tell if it's been
explicitly set, and should override annex.synccontent. If --content was the
default, this complication would not be necessary.

This commit was sponsored by Jake Vosloo on Patreon.
2017-02-03 14:31:17 -04:00
Joey Hess
ed56dba868
annex.autocommit can be configured via git-annex config
... to control the default behavior in all clones of a repository.

This includes a new Configurable data type, so the GitConfig type indicates
which values can be configured this way.

The implementation should be quite efficient; the config log is only read
once, and only when a Configurable value has not already been set by
git-config.

Indeed, it would be nice in the future to extend this, so that git-config
is itself only read on demand. Some commands may not need to look at the
git configuration at all.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-02-03 13:58:53 -04:00
Joey Hess
10703dc817
improve comment 2016-11-16 16:03:23 -04:00
Joey Hess
0a4479b8ec
Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.

Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.

This commit was sponsored by Ethan Aubin.
2016-11-15 21:29:54 -04:00
Joey Hess
556b2ded2b
sync: Pass --allow-unrelated-histories to git merge when used with git git 2.9.0 or newer.
This makes merging a remote into a freshly created direct mode repository
work the same as it works in indirect mode.

The git-annex branches would get merged in any case by a sync,
since that doesn't use git merge.

This might need to be revisited later to better mirror git's behavior.
2016-11-15 18:26:17 -04:00
Joey Hess
a569f195b7
fix bugs in handing of deep branches with sync and adjusted branches
* sync: Previously, when run in a branch with a slash in its name,
  such as "foo/bar", the sync branch was "synced/bar". That conflicted
  with the sync branch used for branch "bar", so has been changed to
  "synced/foo/bar".
* adjust: Previously, when adjusting a branch with a slash in its name,
  such as "foo/bar", the adjusted branch was "adjusted/bar(unlocked)".
  That conflicted with the adjusted branch used for branch "bar",
  so has been changed to "adjusted/foo/bar(unlocked)"
* Also, running sync in an adjusted branch did not correctly sync
  changes back to the parent branch when it had a slash in its name.
  This bug has been fixed.

Eliminate use of Git.Ref.under and Git.Ref.basename; using
Git.Ref.underBase and Git.Ref.base make everything handle deep branches
correctly.

Probably noone was adjusting deep branches, and v6 is still experimental
anyway, so I'm not going to worry about the mess that was left by that bug.

In the case of git-annex sync, using a fixed git-annex with an old unfixed
one will mean they use different sync branches for a deep branch, and so
they may stop syncing until the old one is upgraded. However, that's only
a problem when syncing between repositories without going via a central
bare repository. Added a warning about this to the CHANGELOG, but it's
probably not going to affect many people at all.

This commit was sponsored by Riku Voipio.
2016-09-21 15:23:47 -04:00
Joey Hess
eb469bd139
use keyLocations not loggedLocations
Skip dead remotes.
2016-09-06 11:57:45 -04:00
Joey Hess
d13194b230
--branch, stage 2
Show branch:file that is being operated on.

I had to make ActionItem a type and not a type class because
withKeyOptions' passed two different types of values when using the type
class, and I could not get the type checker to accept that.
2016-07-20 15:23:43 -04:00
Joey Hess
bf8bf14e8e
--branch, stage 1
Added --branch option to copy, drop, fsck, get, metadata, mirror, move, and
whereis commands. This option makes git-annex operate on files that are
included in a specified branch (or other treeish).

The names of the files from the branch that are being operated on are not
displayed yet; only the keys. Displaying the filenames will need changes
to every affected command.

Also, note that --branch can be specified repeatedly. This is not really
documented, but seemed worth supporting, especially since we may later want
the ability to operate on all branches matching a refspec. However, when
operating on two branches that contain the same key, that key will be
operated on twice.
2016-07-20 12:05:26 -04:00
Joey Hess
fbf5045d4f
sync --content: Fix bug that caused transfers of files to be made to a git remote that does not have a UUID. This particularly impacted clones from gcrypt repositories.
Added guard in Annex.Transfer to prevent this problem at a deeper level.

I'm unhappy ith NoUUID, but having Maybe UUID instead wouldn't help either
if nothing checked that there was a UUID. Since there legitimately need to
be Remotes that do not have a UUID, I can't see a way to fix it at the type
level, short making there be two separate types of Remotes.
2016-06-02 13:50:43 -04:00
Joey Hess
7945dd3c3e
refactor 2016-04-22 14:35:48 -04:00
Joey Hess
46e3319995
assistant: Deal with upcoming git's refusal to merge unrelated histories by default
git 2.8.1 (or perhaps 2.9.0) is going to prevent git merge from merging in
unrelated branches. Since the webapp's pairing etc features often combine
together repositories with unrelated histories, work around this behavior
change by setting GIT_MERGE_ALLOW_UNRELATED_HISTORIES when the assistant
merges.

Note though that this is not done for git annex sync's merges, so
it will follow git's default or configured behavior.
2016-04-22 14:26:44 -04:00
Joey Hess
5e190913a4
add AdjBranch newtype; some simplications 2016-04-09 15:10:26 -04:00
Joey Hess
887ef93a7f
run out of tree merge with --no-ff
This is how direct mode does it too, and somehow, for reasons that
currently escape me, this makes git merge not care if it's run with an
empty work tree.
2016-04-06 18:40:28 -04:00
Joey Hess
60bdffe43e
fix auto merge conflict resolution when doing out of tree merge for adjusted branch 2016-04-06 17:32:04 -04:00
Joey Hess
eb9ac8d6d7
sync: Show output of git commit.
Rationalle: User might have hook scripts whose output they want to see.
Also, git commit output may tell the user they forgot to add a file.
The output is not too ugly when there's nothing to commit.
2016-04-05 16:22:21 -04:00