Commit graph

1404 commits

Author SHA1 Message Date
Joey Hess
3066bdb1fb
fix annex.largefiles largerthan/smallerthan bug
Fix bug in handling of annex.largefiles that use largerthan/smallerthan.
When adding a modified file, it incorrectly used the file size of the old
version of the file, not the current size.

That was the only largefiles limit that didn't directly look at the file on
disk already. Added a new type to keep straight the two different ways such
a limit can be matched. I kind of wanted to extend MatchingFile or FileInfo
to indicate that the matcher is supposed to operate on files from disk or
annex, but it turned out to be too complex to implement it that way.

This also changes the LimitAnnexFiles case when lookupFileKey does not find
a key. It used to fall back to statting the file, now it always returns
False. I doubt the old code could really get to that point, but if it
somehow does, it's better for preferred content matching to be consistent.
2019-09-30 17:15:08 -04:00
Joey Hess
9f27d03945
fix a typo that didn't matter so far 2019-09-27 14:08:16 -04:00
Joey Hess
fda1bdd679
Added --mimetype and --mimeencoding file matching options.
Already had these for largefiles matching, but I forgot to add them as
command-line options.
2019-09-19 12:09:59 -04:00
Joey Hess
53fd746705
avoid some build warnings on windows 2019-09-12 14:11:19 -04:00
Joey Hess
fef3cd055d
Removed support for git versions older than 2.1
debian oldoldstable has 2.1, and that's what i386ancient uses. It would be
better to require git 2.2, which is needed to use adjusted branches, but
can't do that w/o losing support for some old linux kernels or a
complicated git backport.
2019-09-11 16:14:43 -04:00
Joey Hess
061231621e
Merge branch 'master' into v7-default 2019-09-10 16:06:43 -04:00
Joey Hess
94c75d2bd9
init: Fix a reversion that broke initialization on systems that need to use pid locking
This brings back .git/annex/misctmp, but only for init. If an init
is interrupted while probing using that temp directory, the files it left
will get deleted 1 week later by a subsequent git-annex run.
2019-09-10 13:37:07 -04:00
Joey Hess
f845195354
Added annex.autoupgraderepository configuration
Can be set to false to prevent any automatic repository upgrades.

Also, removed direct mode specific upgrade code in Annex.Init, and made
needsUpgrade always include the name/path of the repo, so if
there's a problem it's clear what repo has the problem.

And, made needsUpgrade catch any exceptions that might occur during the
upgrade, so it can display a more useful error message than just the
exception.
2019-09-01 13:42:26 -04:00
Joey Hess
8f0af15020
one missed thing for automatic v5 -> v7 upgrades 2019-08-30 17:35:10 -04:00
Joey Hess
3f0eef4baa
v7 for all repositories
* Default to v7 for new repositories.
* Automatically upgrade v5 repositories to v7.
2019-08-30 14:09:14 -04:00
Joey Hess
d6e1f09ed2
init: Catch more exceptions when testing locking. 2019-08-29 12:19:07 -04:00
Joey Hess
4e30b06ffb
remove unused import 2019-08-28 15:38:29 -04:00
Joey Hess
e804f48f82
remove a few more isDirect tests 2019-08-28 11:53:10 -04:00
Joey Hess
863db16e53
remove unused 2019-08-27 16:13:49 -04:00
Joey Hess
9b1331881c
reorg remaining direct mode code
Only used for upgrading, so put it under there.
2019-08-27 14:05:38 -04:00
Joey Hess
e395ba2cb0
remove unused code 2019-08-27 13:57:17 -04:00
Joey Hess
da6f4d8887
remove direct mode support from Annex.Content
No longer used. The only possible user of it would be code in
Upgrade.V5, so I verified that the parts of Annex.Content it used were
not used to manipulate direct mode files.
2019-08-27 13:14:06 -04:00
Joey Hess
770b8ff926
clearer message when direct mode upgrade fails
When a remote is being upgraded, the message looked as if the local
repo was where the problem was. So include the path of the repo.
2019-08-27 12:23:34 -04:00
Joey Hess
586db7f06d
Avoid making a commit when upgrading from direct mode to v7
Three reasons:

* Committing as part of an upgrade is very unusual and unexpected.
* The commit was failing with a weird error message when done during an
  automatic upgrade.
* Let me remove more of that sweet^Whorrible direct mode code.
2019-08-26 16:35:44 -04:00
Joey Hess
689d1fcc92
remove most remnants of direct mode
A few remain, as needed for upgrades, and for accessing objects from
remotes that are direct mode repos that have not been converted yet.
2019-08-26 16:27:48 -04:00
Joey Hess
20741b1eb4
Automatically convert direct mode repositories to v7 with adjusted unlocked branches
* Automatically convert direct mode repositories to v7 with adjusted
  unlocked branches and set annex.thin.
* init: When run on a crippled filesystem with --version=5,
  will error out, since version 7 is needed for adjusted unlocked branch.
* direct: This command always errors out as direct mode is no longer
  supported.
* indirect: This command has become a deprecated noop.
* proxy: This command is deprecated because it was only needed in direct
  mode. (But it continues to work.)

Also removed mentions of direct mode throughough the documentation.

I have not removed all the direct mode code yet.
2019-08-26 15:05:25 -04:00
Joey Hess
f6fb4b8cdb
avoid side message when doing automatic upgrade to v7
An automatic upgrade is supposed to be silent.
2019-08-26 13:54:52 -04:00
Joey Hess
5877a15d7b
fix hard links when upgrading from direct mode
When upgrading a direct mode repo to v7 with adjusted unlocked branches,
fix a bug that prevented annex.thin from taking effect for the files in
working tree.

The hard links used to be ok, but commit 8e22114735 accidentially
broke them. It repopulates the worktree file, which is already a hard link,
and when it's creating the new file, the link count is already 2, and so it
doesn't make a hard link then.
2019-08-26 13:54:39 -04:00
Joey Hess
1e02360283
remove only case 2019-08-26 13:28:28 -04:00
Joey Hess
2fd27c6df5
assistant: When creating a new repository use v7 adjusted branches with annex.thin
Rather than direct mode, which this is a small step on the path to
removing.

Init on a crippled filesystem already used v7 adjusted branches,
and like that, this doesn't pose any interoperability issues with old
versions of git-annex that clone the same repo, because files are only
unlocked on the adjusted branch.
2019-08-26 12:54:14 -04:00
Joey Hess
b599e8e6ac
move module only used by assistant 2019-08-26 12:32:45 -04:00
Joey Hess
bb16a26109
use headExists
Turns out that 7be690f326 broke the
test suite on the i386ancient builder. There, git show-ref --verify HEAD
fails with "'HEAD' - not a valid ref". Apparently git 2.1.4 didn't
support that.

headExists works there and does the same thing.
2019-08-19 11:12:19 -04:00
Joey Hess
f845636e30
correct license to AGPL
This code was already AGPL, except for the bit split out
to Utility/MD5.hs in commit 426053cb6c.
That commit accidentially updated the license of this file from AGPL
to GPL.

Thanks to Sean Whitton for spotting this.
2019-08-17 14:08:07 -04:00
Joey Hess
e4a8366162
fix edge case failure in prop_view_roundtrips
"./" made it fail, because that gets eliminated
2019-08-16 11:35:32 -04:00
Joey Hess
dc672863c3
init: Install working hook scripts when run on a crippled filesystem and on Windows 2019-08-13 15:14:17 -04:00
Joey Hess
868942e19b
fix unused module import warnings when building on windows 2019-08-08 12:18:53 -04:00
Joey Hess
8ba4de2d9c
remove unused import 2019-07-30 12:16:41 -04:00
Joey Hess
5080a7be1e
fix build 2019-07-29 12:41:45 -04:00
Joey Hess
426053cb6c
Corrected some license statements
In 40ecf58d4b I changed the license of code I
wrote from GPL to AGPL. But, two files containing code I wrote combined
with code by others were updated to say their license is AGPL, while in
fact part of it was (the code I wrote) but part remained under the original
license (the code written by others).

Remote/Ddar.hs is now changed entirely back to GPL 3.

Annex/DirHashes.hs stays AGPL, but I broke out Utility/MD5.hs with the code
not written by me, and corrected its license statement to GPL-2, which
is the actual version of the GPL included with the code in its original
distribution at http://www.cs.ox.ac.uk/people/ian.lynagh/md5/
2019-07-28 14:27:33 -04:00
Joey Hess
4c5a489f3e
avoid build warning when built w/o magic-mime 2019-07-22 11:03:26 -04:00
Joey Hess
7fd650355e
merge from http-client-restricted
I made some improvements to its API after splitting it out of git-annex,
so merge those back in.

This is groundwork for removing the embedded copy of it and depending on
it.

Also moved the managerResponseTimeout disabling to Annex.Url as it's
git-annex specific.

This commit was sponsored by Ethan Aubin on Patreon.
2019-07-17 16:48:50 -04:00
Joey Hess
7be690f326
check headRef not Branch.current
Support running v7 upgrade in a repo where there is no branch checked out,
but HEAD is set directly to some other ref.

This commit was sponsored by Jack Hill on Patreon.
2019-07-16 12:36:29 -04:00
Joey Hess
9a5ddda511
remove many old version ifdefs
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.

The only remaining version ifdefs in the entire code base are now a couple
for aws!

This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of  git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.

This commit was sponsored by Ilya Shlyakhter.
2019-07-05 15:09:37 -04:00
Joey Hess
26c54d6ea3
make metered more generic
Allow it to be used when the Key is not known.
2019-06-25 12:33:36 -04:00
Joey Hess
8355dba5cc
plumb MeterUpdate into getKey
No behavior changes, but this shows everywhere that a progress meter
could be displayed when hashing a file to add to the annex.

Many of the places don't make sense to display a progress meter though,
eg when importing the copy of the file probably swamps the hashing of
the file.
2019-06-25 11:43:24 -04:00
Joey Hess
84e729fda5
fix init default description reversion
init: Fix a reversion in the last release that prevented automatically
generating and setting a description for the repository.

Seemed best to factor out uuidDescMapRaw that does not
have the default mempty descrition behavior.

I don't much like that behavior, but I know things depend on it.
One thing in particular is `git annex info` which lists the uuids and
descriptions; if the current repo has been initialized in some way that
means it does not have a description, it would not show up w/o that.

(Not only repos created due to this bug might lack that. For example a repo
that was marked dead and had --drop-dead delete its git-annex branch info,
and then came back from the dead would similarly not be in the uuid.log.
Also there have been other versions of git-annex that didn't set a default
description; for years there was no default description.)
2019-06-20 20:30:24 -04:00
Joey Hess
ba433bdc85
refactor 2019-06-19 20:19:38 -04:00
Joey Hess
26f0f8b20f
optimisation
Avoid an unncessary STM transaction. This will happen when the worker
pool is not completely full of the new stage, which is the common case.

In the uncommon case, this adds only a tiny bit of overhead for the
extra traversal of the worker pool. And the thread is going to block
for some time anyway.
2019-06-19 20:13:19 -04:00
Joey Hess
37d505dd6b
avoid STM deadlock
When all worker threads are running and enteringStage is called,
it waits for an idle slot. If all off the other threads then call it in
turn, a deadlock occurrs.

This is the same problem I didn't actually fix in
5a9842d7ed.

Fixed by doing two separate STM transactions, the first replaces its
active thread with an idle thread, and the second waits for another idle
thread. That guarantees there will eventually be an idle thread to find.

The changes to WorkerPool were necessary because it can't add an idle
thread containing the Annex state and go on to run an action using that
same state, so I had to remove the Annex state from IdleWorker.
2019-06-19 18:15:25 -04:00
Joey Hess
9671248fff
speed up enteringStage in non-concurrent mode
Avoid a STM transaction.

Also got rid of UnallocatedWorkerPool.
2019-06-19 15:47:54 -04:00
Joey Hess
05a908c3c9
fix oops 2019-06-19 14:52:44 -04:00
Joey Hess
9d36c826c0
use fine-grained WorkerStages when transferring and verifying
This means that Command.Move and Command.Get don't need to
manually set the stage, and is a lot cleaner conceptually.

Also, this makes Command.Sync.syncFile use the worker pool better.
In the scenario where it first downloads content and then uploads it to
some other remotes, it will start in TransferStage, then enter VerifyStage
and then go back to TransferStage for each transfer to the remotes.
Before, it entered CleanupStage after the download, and stayed in it for
the upload, so too many transfer jobs could run at the same time.

Note that, in Remote.Git, it uses runTransfer and also verifyKeyContent
inside onLocal. That has a Annex state for the remote, with no worker pool.
So the resulting calls to enteringStage won't block in there.

While Remote.Git.copyToRemote does do checksum verification, I
realized that should not use a verification slot in the WorkerPool
to do it. Because, it's reading back from eg, a removable disk to checksum.
That will contend with other writes to that disk. It's best to treat
that checksum verification as just part of the transer. So, removed the todo
item about that, as there's nothing needing to be done.
2019-06-19 13:24:20 -04:00
Joey Hess
53882ab4a7
make WorkerStage an open type
Rather than limiting it to PerformStage and CleanupStage, this opens it
up so any number of stages can be added as needed by commands.

Each concurrent command has a set of stages that it uses, and only
transitions between those can block waiting for a free slot in the
worker pool. Calling enteringStage for some other stage does not block,
and has very little overhead.

Note that while before the Annex state was duplicated on the first call
to commandAction, this now happens earlier, in startConcurrency.
That means that seek stage actions should that use startConcurrency
and then modify Annex state won't modify the state of worker threads
they then start. I audited all of them, and only Command.Seek
did so; prepMerge changes the working directory and so has to come
before startConcurrency.

Also, the remote list is built before duplicating the state, which means
that it gets built earlier now than it used to. This would only have an
effect of making commands that end up not needing to perform any actions
unncessary build the remote list (only when they're run with concurrency
enable), but that's a minor overhead compared to commands seeking
through the work tree and determining they don't need to do anything.
2019-06-19 13:05:03 -04:00
Joey Hess
8e5ea28c26
finish CommandStart transition
The hoped for optimisation of CommandStart with -J did not materialize.
In fact, not runnign CommandStart in parallel is slower than -J3.
So, CommandStart are still run in parallel.

(The actual bad performance I've been seeing with -J in my big repo
has to do with building the remoteList.)

But, this is still progress toward making -J faster, because it gets rid
of the onlyActionOn roadblock in the way of making CommandCleanup jobs
run separate from CommandPerform jobs.

Added OnlyActionOn constructor for ActionItem which fixes the
onlyActionOn breakage in the last commit.

Made CustomOutput include an ActionItem, so even things using it can
specify OnlyActionOn.

In Command.Move and Command.Sync, there were CommandStarts that used
includeCommandAction, so output messages, which is no longer allowed.
Fixed by using startingCustomOutput, but that's still not quite right,
since it prevents message display for the includeCommandAction run
inside it too.
2019-06-12 13:24:01 -04:00
Joey Hess
436f107715
make CommandStart return a StartMessage
The goal is to be able to run CommandStart in the main thread when -J is
used, rather than unncessarily passing it off to a worker thread, which
incurs overhead that is signficant when the CommandStart is going to
quickly decide to stop.

To do that, the message it displays needs to be displayed in the worker
thread, after the CommandStart has run.

Also, the change will mean that CommandStart will no longer necessarily
run with the same Annex state as CommandPerform. While its docs already
said it should avoid modifying Annex state, I audited all the
CommandStart code as part of the conversion. (Note that CommandSeek
already sometimes runs with a different Annex state, and that has not been
a source of any problems, so I am not too worried that this change will
lead to breakage going forward.)

The only modification of Annex state I found was it calling
allowMessages in some Commands that default to noMessages. Dealt with
that by adding a startCustomOutput and a startingUsualMessages.
This lets a command start with noMessages and then select the output it
wants for each CommandStart.

One bit of breakage: onlyActionOn has been removed from commands that used it.
The plan is that, since a StartMessage contains an ActionItem,
when a Key can be extracted from that, the parallel job runner can
run onlyActionOn' automatically. Then commands won't need to worry about
this detail. Future work.

Otherwise, this was a fairly straightforward process of making each
CommandStart compile again. Hopefully other behavior changes were mostly
avoided.

In a few cases, a command had a CommandStart that called a CommandPerform
that then called showStart multiple times. I have collapsed those
down to a single start action. The main command to perhaps suffer from it
is Command.Direct, which used to show a start for each file, and no
longer does.

Another minor behavior change is that some commands used showStart
before, but had an associated file and a Key available, so were changed
to ShowStart with an ActionItemAssociatedFile. That will not change the
normal output or behavior, but --json output will now include the key.
This should not break it for anyone using a real json parser.
2019-06-06 17:13:54 -04:00
Joey Hess
258a7c5cd1
add Key to all ActionItem constructors 2019-06-06 12:53:24 -04:00
Joey Hess
659640e224
separate queue for cleanup actions
When running multiple concurrent actions, the cleanup phase is run in a
separate queue than the main action queue. This can make some commands
faster, because less time is spent on bookkeeping in between each file
transfer.

But as far as I can see, nothing will be sped up much by this yet, because
all the existing cleanup actions are very light-weight. This is just groundwork
for deferring checksum verification to cleanup time.

This change does mean that if the user expects -J2 will mean that they see no
more than 2 jobs running at a time, they may be surprised to see 4 in some
cases (if the cleanup actions are slow enough to notice).

It might also make sense to enable background cleanup without the -J,
for at least one cleanup action. Indeed, that's the behavior that -J1
has now. At some point in the future, it make make sense to make the
behavior with no -J the same as -J1. The only reason it's not currently
is that git-annex can build w/o concurrent-output, and also any bugs
in concurrent-output (such as perhaps misbehaving on non-VT100 compatible
terminals) are avoided by default by only using it when -J is used.
2019-06-05 17:54:35 -04:00
Joey Hess
c04b2af3e1
improved WorkerPool abstraction
No behavior changes.
2019-06-05 14:26:48 -04:00
Joey Hess
082e1f1738
Don't try to import .git directories from special remotes
Because git does not support storing git repositories inside a git
repository.
2019-06-04 15:14:20 -04:00
Joey Hess
67c06f5121
add back support for ftp urls
Add back support for ftp urls, which was disabled as part of the fix for
security hole CVE-2018-10857 (except for configurations which enabled curl
and bypassed public IP address restrictions). Now it will work if allowed
by annex.security.allowed-ip-addresses.
2019-05-30 14:51:34 -04:00
Joey Hess
1871295765
rename annex.security.allowed-http-addresses
Renamed annex.security.allowed-http-addresses to
annex.security.allowed-ip-addresses because it is not really specific to
the http protocol, also limiting eg, git-annex's use of ftp and via
youtube-dl, several other protocols.

The old name for the config will still work.

If both old and new name are set, the new name will win.
2019-05-30 12:43:40 -04:00
Joey Hess
a14f6ce758
fix repo description setting bugs
* init: When the repository already has a description, don't change it.
* describe: When run with no description parameter it used to set
  the description to "", now it will error out.
2019-05-23 12:51:01 -04:00
Joey Hess
16a2bed710
avoid build warning on Windows about unused import 2019-05-23 12:15:33 -04:00
Joey Hess
e06feb7316
honor preferred content when importing
Importing from a special remote honors its preferred content too; unwanted
files are not imported. But, some preferred content expressions can't be
checked before files are imported, and trying to import with such an
expression will fail.

Tested this with scenarios including changing the preferred content
expression and making sure merging the import didn't delete files that were
no longer wanted.

There was one minor inefficiency mentioned in the todo that I punted on.
2019-05-21 14:38:06 -04:00
Joey Hess
0bd39c1315
remove a TODO I checked yesterday 2019-05-21 12:54:39 -04:00
Joey Hess
3b9a19171a
Merge branch 'master' into preferred 2019-05-21 11:34:45 -04:00
Joey Hess
5e1221ad53
Improve shape of commit tree when importing from unversioned special remotes
Make the import have the previous import as a parent, so eg `git log --stat`
displays a useful diff.

Also a minor optimisation, only calculate the depth of the imported history
once.
2019-05-21 11:32:54 -04:00
Joey Hess
97fd9da6e7
add back non-preferred files to imported tree
Prevents merging the import from deleting the non-preferred files from
the branch it's merged into.

adjustTree previously appended the new list of items to the old, which
could result in it generating a tree with multiple files with the same
name. That is not good and confuses some parts of git. Gave it a
function to resolve such conflicts.

That allowed dealing with the problem of what happens when the import
contains some files (or subtrees) with the same name as files that were
filtered out of the export. The files from the import win.
2019-05-20 16:43:52 -04:00
Joey Hess
568af1073e
filter exported tree through remote's preferred content setting
The filtering is fairly efficient as far as building the trees goes,
since it reuses adjustTree. But it still needs to traverse the whole
tree, and look up the keys used by every file.

The tree that gets recorded to export.log is the filtered tree.
This way resumes of interrupted sync to an export uses it without
needing to recalculate it. And, a change to the preferred content
settings of the remote will result in a different tree, so the export
will be updated accordingly.

The original tree is still used in the remote tracking branch.
That branch represents the special remote as a git remote, and if it
were a normal git remote, the tree in its head would not be affected by
preferred content.
2019-05-20 11:54:55 -04:00
Joey Hess
354c0eb57f
support standard and groupwanted in keyless mode
Only when the preferred content expression includes them will a parse
failure due to them needing keys result in the preferred content
expression not parsing in keyless mode.
2019-05-14 14:59:03 -04:00
Joey Hess
9411a7c93c
matching preferred content before key is known
This will let import try to match preferred content expressions before
downloading the content and generating its key.

If an expression needs a key, it preferredContentParser with
preferredContentKeylessTokens will fail to parse it.

standard and groupwanted are not in preferredContentKeylessTokens
because they may refer to an expression that refers to a key.
That needs further work to support them.
2019-05-14 14:28:23 -04:00
Joey Hess
aa7710982b
avoid list lookup by parseToken
Minor optimisation to parsing of a preferred content expression.
2019-05-14 13:11:29 -04:00
Joey Hess
c1957b6aeb
whitespace 2019-05-14 13:01:50 -04:00
Joey Hess
5cc0ee70c0
factor out MatchFiles Annex
This makes parseToken more general
2019-05-14 12:44:50 -04:00
Joey Hess
82186ca58f
annex.jobs=cpus etc
Added the ability to run one job per CPU (core), by setting annex.jobs=cpus,
or using option --jobs=cpus or -Jcpus.

Built with future expansion in mind, including not defaulting matching on
Concurrency so more constructors can later be added, and using "cpu"
instead of "0".
2019-05-10 13:27:08 -04:00
Joey Hess
2d33122215
avoid ingest lockdown file escaping the withOtherTmp call
Fixes bug that caused git-annex to fail to add a file when another
git-annex process cleaned up the temp directory it was using.

Solution is just to push withOtherTmp out to a higher level, so that
the whole ingest process can be completed inside it.

But in the assistant, that was not practical to do, since withOtherTmp runs
in the Annex monad and the assistant does not. Worked around by introducing
a separate temp directory that only the assistant uses for lockdown.
Since only one assistant can run at a time, it's easy to clean up that
directory of old cruft at startup.
2019-05-07 13:04:57 -04:00
Joey Hess
2a41712ef1
avoid stageJournal escaping withOtherTmp
This is only done for correctness sake; I don't see any way that it
would have caused a problem here. The jlog file escaped withOtherTmp
so another process could swoop in and delete it, but the file is only
used as a buffer for a list of filenames, and its handle gets rewound
and they're read back out, which will still work even if it's already
been deleted.

The only reason I didn't just pre-delete the file and keep the handle
open is I'm not sure that works on all OS's (eg Windows). If there was
a problem that this fixed it might involve an OS that doesn't support
deleting an open file or something like that.
2019-05-07 11:57:12 -04:00
Joey Hess
b03e65d260
Improved locking when multiple git-annex processes are writing to the .git/index file 2019-05-06 15:15:12 -04:00
Joey Hess
c5e0f9b3a5
fix setting imported tree
bf7ecd6892 went too far and broke
importing, the old tree was used on the remote tracking branch and not
the newly imported tree.

Test suite noticed the problem luckily.
2019-05-06 14:38:02 -04:00
Joey Hess
bf7ecd6892
fix export subtree reversion
Fix reversion in last release that caused wrong tree to be written to
remote tracking branch after an export of a subtree.

The invariant "commitsha should have the treesha as its tree"
was not met due to a bug. Guarantee it's met by catting the commitsha
to find its actual tree. A little bit slower, but this is not run often.
2019-05-06 13:57:13 -04:00
Joey Hess
96dfba7b53
fix build w/o MagicMime more 2019-05-03 11:30:20 -04:00
Joey Hess
740c9f7da8
fix build w/o MagicMime 2019-05-03 11:20:25 -04:00
Joey Hess
ab36f2f535
fix windows build 2019-05-03 10:58:34 -04:00
Joey Hess
ec697721e4
simplify
and a bit faster using Eq this way
2019-05-01 15:34:07 -04:00
Joey Hess
700a3f2787
Merge branch 'master' into import-from-s3 2019-05-01 14:30:52 -04:00
Joey Hess
a32f31235a
reuse old imported commits
This avoids proliferation of different import commits for the same
trees, and makes the resulting git history nice.
2019-05-01 14:20:26 -04:00
Joey Hess
2bd0e07ed8
make merge commit on export that preserves the import history 2019-05-01 13:13:00 -04:00
Joey Hess
d1c283b691
comments 2019-05-01 12:37:54 -04:00
Joey Hess
1503b86a14
make import tree from remote generate a merge commit
This way no history is lost, neither what was exported to the remote,
or the history of changes that is imported from it. No complicated
correlation of two possibly very different histories is needed, just
record what we know and then git merge will do a good job.

Also, it notices when the remote tracking branch doesn't need to be updated,
and avoids doing anything, so noop remotes are super cheap.

The only catch here is that, since the commits generated for imports
from the remote don't have a stable date or author/committer, each
(non-noop) import generates different commits for the same imported
trees. So, when the imported remote tracking branch is merged into master
and then a change is imported again, there will be an extra series of
commits, which will get more and more expensive each time.

This seems to call for making stable commits for imports. Also that
seems a good idea to make importing in several repositories have the
same result.
2019-04-30 16:13:21 -04:00
Joey Hess
b69d11ec42
wip 2019-04-30 14:00:27 -04:00
Joey Hess
28b4310abe
typo 2019-04-30 12:22:13 -04:00
Joey Hess
9dd764e6f7
Added mimeencoding= term to annex.largefiles expressions.
* Added mimeencoding= term to annex.largefiles expressions.
  This is probably mostly useful to match non-text files with eg
  "mimeencoding=binary"
* git-annex matchexpression: Added --mimeencoding option.
2019-04-30 12:17:22 -04:00
Joey Hess
18cf21d3ed
wip 2019-04-26 10:17:02 -04:00
Joey Hess
f08cd6a4ac
set S3 version id in retrieveExportWithContentIdentifierS3
This is necessary because of checks for a S3 version id being set
done when deleting the export or overwriting or renaming it.
2019-04-24 15:13:07 -04:00
Joey Hess
2d0dd34916
initial work toward correctly merging deeper import histories
Pure code is tested working, including with even histories that merge
several lines of development. Needs to be hooked up to git histories
next.
2019-04-23 16:34:19 -04:00
Joey Hess
29705d83f4
convert History to use Set
This way the Ord instance doesn't care what order parent
Histories come in.
2019-04-23 15:08:37 -04:00
Joey Hess
833980c0bc
indicate when an old version of a file is being imported 2019-04-19 15:05:08 -04:00
Joey Hess
f95f340c73
sync: When listing contents on an import remote fails, proceed with other syncing instead of aborting
Switch listContents to being a proper CommandStart, so if it throws an
exception, it will be treated like any other command action that fails.

downloadImport apparently does not ever throw an exception,
and itself uses commandAction, so it can't be a CommandStart.
2019-04-10 17:02:56 -04:00
Joey Hess
6babb2c73f
remove wrong uniqueness constraint from ContentIdentifier db
Fix bug that caused importing from a special remote to repeatedly download
unchanged files when multiple files in the remote have the same content.

Unfortunately, there's really no good way to remove a uniqueness constraint
from a sqlite database. The best that can be done is to make a new table
and copy the data over. But that would require using persistent's
migrations or raw sql, and I don't want to do either.

Instead, a sledgehammer approach: Renamed .git/annex/cid to
.git/annex/cids. When the new database doesn't exist, it will be populated
from the git-annex branch.

Noting deletes the old database. Don't want to delete it out from under
some long-running git-annex process that might be using it. It could
eventually be deleted. But this is such a new feature, probably few repos
have the database in any case.
2019-04-09 19:58:24 -04:00
Joey Hess
37041b629d
improve messages around export/import conflicts
A conflict can be caused by either export or import when the remote
supports both.
2019-04-09 13:03:59 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
c714a260a9
include remote name for consistency with export output 2019-03-11 14:46:37 -04:00
Joey Hess
e1fdc8b374
record only subtree in export db and log after importing a subtree 2019-03-11 13:45:01 -04:00
Joey Hess
6149a3b9bd
added progress bars
clean up old todo item I checked earlier, see commit
dec30d2b14
2019-03-08 12:43:03 -04:00
Joey Hess
e412129523
concurrency and status messages when downloading from import 2019-03-08 12:33:44 -04:00
Joey Hess
e3a704224f
fix export db locking deadlock 2019-03-07 16:06:02 -04:00
Joey Hess
4efd431136
remove obsolete TODO
updateExportDb runs addExportedLocation
2019-03-07 15:11:24 -04:00
Joey Hess
71fec9060c
move 2019-03-07 12:56:40 -04:00
Joey Hess
68d1661251
cross-repo import now working correctly 2019-03-07 12:31:35 -04:00
Joey Hess
ee251b2e2e
implement updating the ContentIdentifier db with info from the git-annex branch
untested

This won't be super slow, but it does need to diff two likely large
trees, and since the git-annex branch rarely sits still, it will most
likely be run at the beginning of every import.

A possible speed improvement would be to only run this when the database
did not contain a ContentIdentifier. But that would only speed up
imports when there is no new version of a file on the special remote,
at most renames of existing files being imported.

A better speed improvement would be to record something in the git-annex
branch that indicates when an import has been run, and only do the diff
if the git-annex branch has record of a newer import than we've seen
before. Then, it would only run when there is in fact new
ContentIdentifier information available from a remote. Certianly doable,
but didn't want to complicate things yet.
2019-03-06 18:04:30 -04:00
Joey Hess
cd3a2b023a
initial try at using storeExportWithContentIdentifier
Untested, and I'm not sure about the locking of the ContentIdentifier db.
2019-03-04 17:50:41 -04:00
Joey Hess
00722ba1f8
lock before writing to the ContentIdentifier db 2019-03-04 16:47:30 -04:00
Joey Hess
aaacf431d8
handle importtree=yes config
For now, it's only allowed when exporttree=yes is also set.
That simplified the implementation, but could later be changed if
there's a remote that makes sense to be an import but not an export.
However, it may work just as well to make a remote be readonly to
prevent export to it while still allowing import.
2019-03-04 16:07:35 -04:00
Joey Hess
3cd19fb4d0
use InodeCache to avoid races in import from directory special remote
This does not avoid all possible races, but it does avoid all likely
ones, and is demonstratably better than git's own handling of races
where files get modified at the same time as it's updating the working
tree.

The main thing this won't detect are not unlikely races where part
of a file gets changed while it's being copied and then the file is
restored to its original condition before the modification check.
No, it's more likely that the limitations of checking inode, size,
and mtime won't detect certian modifications, involving eg mmapped
files.
2019-03-04 13:57:23 -04:00
Joey Hess
519cadd1de
refactor RemoteTrackingBranch
Not specific to Import; export will use it too.
2019-03-01 14:47:56 -04:00
Joey Hess
1c8793691a
import: update location log for removed files 2019-03-01 13:26:59 -04:00
Joey Hess
d0066d9a87
fully update export db during import
This makes exporting immediately after import and merge be a no-op.
2019-02-27 15:29:41 -04:00
Joey Hess
b1f10fbb4d
update location log during import 2019-02-27 13:58:03 -04:00
Joey Hess
45aacd888b
import downloader complete (untested)
Made some api changes.

listImportableContents needs to provide the size
of the data, so the downloader can check disk free space.

retrieveExportWithContentIdentifier is passed the filepath to write to

Use temporary "CID" key during download of a ContentIdentifier from a
remote, so withTmp can be used and then move the content to the real key
once it's known.
2019-02-27 13:15:02 -04:00
Joey Hess
f4b773e9a1
incomplete action to download files from import 2019-02-26 15:25:28 -04:00
Joey Hess
e4e464da65
import command is updating tracking branch 2019-02-26 13:15:48 -04:00
Joey Hess
d805401708
fairly happy withbuildImportCommit now
still not yet tested
2019-02-23 15:47:55 -04:00
Joey Hess
33bb62ff13
fix parent 2019-02-22 12:44:22 -04:00
Joey Hess
bab6c570b0
buildImportTrees is fully working
buildImportCommit not yet tested
2019-02-22 12:41:17 -04:00
Joey Hess
7af55de83c
optimisation: use graftTree to remember the export branch
Sped up git-annex export in repositories with lots of keys.

Old method read whole git-annex branch tree into memory.
2019-02-22 11:16:22 -04:00
Joey Hess
8fdea8f444
WIP
Added graftTree but it's buggy.

Should use graftTree in Annex.Branch.graftTreeish; it will be faster
than the current implementation there.

Started Annex.Import, but untested and it doesn't yet handle tree
grafting.
2019-02-21 17:32:59 -04:00
Joey Hess
9887a378fe
renamings to make clean when old-format logs are being used 2019-02-21 13:43:44 -04:00
Joey Hess
a818bc5e73
add Database.ContentIdentifier
Does not yet have a way to update with new information from the
git-annex branch, which will be needed when multiple repos are importing
from the same remote.
2019-02-20 16:59:10 -04:00
Joey Hess
1e95bc4fd1
avoid git warning about CRLF in restagePointerFile
Saw it on Windows, could probably also happen on linux with some
configuration. Since this is a pointer file, the warning does not apply.
2019-02-18 18:35:36 -04:00
Joey Hess
1a367cad83
Fix path separator bug on Windows that completely broke git-annex since version 7.20190122. 2019-02-18 17:16:39 -04:00
Joey Hess
c7893bf9b7
init: Fix bug when direct mode needs to be enabled on a crippled filesystem, that left the repository in indirect mode. 2019-02-15 12:34:03 -04:00
Joey Hess
ed2a8498a4
fix build w/o libmagic 2019-02-09 13:49:46 -04:00
Joey Hess
9d53e1cddf
add a missing import 2019-02-08 13:24:21 -04:00
Joey Hess
6cba1950f2
avoid importing Git into module used by Setup
That would have needed Setup-Depends to include unix and other
libraries.
2019-02-08 13:16:10 -04:00
Joey Hess
c3f47ba389
make .noannex file prevent repo fixups
Avoid performing repository fixups for submodules and git-worktrees
when there's a .noannex file that will prevent git-annex from being
used in the repository.

This change is ok as long as the .noannex file is really going to prevent
git-annex from being used. But, init --force could override the file.
Which would result in the repo being initialized without the fixups
having run.

To avoid that situation decided to change init, to not let --force be used
to override a .noannex file. Instead the user can just delete the file.
2019-02-05 14:43:23 -04:00
Joey Hess
7b46b43c48
fromkey: Made idempotent
If the worktree file already exists, and is annexed and uses the same
key, avoid failing, nothing needs to be done.

Had to add lookupFileNotHidden to handle the case where an adjust --hide-missing
is in use, and the worktree file was hidden due to the object content
being missing. lookupFile would return the key of the hidden file,
but it makes sense that after fromkey succeeds, the worktree must
contain the file it was supposed to set up.
2019-02-05 13:13:13 -04:00
Joey Hess
a64fca92f6
Fix race in cleanup of othertmp directory that could result in a failure attempting to access it.
Need to create the directory after the lock is held, not before.

The other racing process would need to shut down at just the wrong time,
running cleanupOtherTmp.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2019-02-02 13:56:31 -04:00
Joey Hess
2e9becf989
typo 2019-01-24 00:10:16 -04:00
Joey Hess
467c3b393d
refactor magic 2019-01-23 12:40:59 -04:00
Joey Hess
47cb1a98b6
remove seemingly bogus sigINT handler stuff
I am very doubtful that commit 613e747d91
was right about this doing anything, and I've verified that without it,
ctrl-c sends sigINT to child processes, and git-annex get does not
continue to the next item.

It seems likely that the real problem back then was something catching
the async exception.

Hard to see how installing a default signal handler could cause any
change from default behavior either.

One reason to want to get rid of this cruft now is that tasty has a
sigINT handler of its own, and this would override it.
(Tasty is not currently setting that handler up the way git-annex uses
it, due to a problem in tasty, but that will hopefully change.)
2019-01-21 17:21:02 -04:00
Joey Hess
67c5a628eb
fix build with old ghc 2019-01-18 14:09:35 -04:00
Joey Hess
d5f2463702
misctmp cleanup
* Switch to using .git/annex/othertmp for tmp files other than partial
  downloads, and make stale files left in that directory when git-annex
  is interrupted be cleaned up promptly by subsequent git-annex processes.
* The .git/annex/misctmp directory is no longer used and git-annex will
  delete anything lingering in there after it's 1 week old.

Also, in Annex.Ingest, made the filename it uses in the tmp dir be
prefixed with "ingest-" to avoid potentially using a filename used by
some other code.
2019-01-17 16:02:22 -04:00
Joey Hess
c3afb3434d
remove recently added cache from KeyVariety
Adding that field broke the Read/Show serialization back-compat,
and also the Eq and Ord instances were not blinded to it, which broke
git annex fsck and probably more.

I think that the new approach used in formatKeyVariety will be nearly
as fast, but have not benchmarked it.
2019-01-16 16:33:08 -04:00
Joey Hess
96aba8eff7
Revert "cache the serialization of a Key"
This reverts commit 4536c93bb2.

That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.

It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.

Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
2019-01-16 16:21:59 -04:00
Joey Hess
2be6130053
better function name 2019-01-14 20:59:09 -04:00
Joey Hess
1b6319a2c8
double speed of keyFile
Optimising for the common case of nothing needing to be escaped, from 5.434 μs
to 1.727 μs.

In the uncommon case, it only runs around 70 ns slower.
2019-01-14 20:52:54 -04:00
Joey Hess
d9a33d98cf
remove unused import 2019-01-14 18:29:10 -04:00
Joey Hess
d5bbf123fd
bugfix
The first item in the list from split '&' did not start with a '&'
2019-01-14 17:42:18 -04:00
Joey Hess
e0c4ac99b5
convert serializeKey' to strict ByteString
The builder produces a lazy ByteString, and L.toStrict has to copy it,
but needing to use the builder is no longer to common case; the
serialization will normally be cached already as a strict ByteString,
and this avoids keyFile' needing to use L.toStrict . serializeKey'
2019-01-14 17:03:46 -04:00
Joey Hess
4536c93bb2
cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
2019-01-14 16:37:28 -04:00
Joey Hess
5d98cba923
use ByteStrings when reading annex symlinks and pointers
Now there's a ByteString used all the way from disk to Key.

The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.

Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.
2019-01-14 15:37:08 -04:00
Joey Hess
0a8d93cb8a
convert to ByteString 2019-01-14 14:02:47 -04:00
Joey Hess
1791447cc8
avoid creating work tree files in subdirectories in an edge case
A keyName could contain "/", though this is unlikely and certianly only
ever could happen with WORM keys.

The change to addunused to escape that is no problem at all.

The change to VariantFile to escape it means that different versions of
git-annex could resolve a merge conflict differently in this case, which
is unfortunate. There would be different .variant files used, so the two
resolutions would themselves merge together without additional
conflicts, but the user would have to clean up the extra .variant
files.
2019-01-14 13:14:25 -04:00
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
ff0a2bee2d
avoid unnecessary conversion from and back to ByteString 2019-01-14 12:40:13 -04:00