Commit graph

64 commits

Author SHA1 Message Date
Joey Hess
d9fd205cbb
push RawFilePath down into Annex.ReplaceFile
Minor optimisation, but a win in every case, except for a couple where
it's a wash.

Note that replaceFile still takes a FilePath, because it needs to
operate on Chars to truncate unicode filenames properly.
2023-10-26 13:36:49 -04:00
Joey Hess
54ad1b4cfb
Windows: Support long filenames in more (possibly all) of the code
Works around this bug in unix-compat:
https://github.com/jacobstanley/unix-compat/issues/56
getFileStatus and other FilePath using functions in unix-compat do not do
UNC conversion on Windows.

Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the
necessary conversion on windows to support long filenames.

Audited all imports of System.PosixCompat.Files to make sure that no
functions that operate on FilePath were imported from it. Instead, use
the equvilants from Utility.RawFilePath. In particular the
re-export of that module in Common had to be removed, which led to lots
of other changes throughout the code.

The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig
make Utility.Directory not be needed to build setup. And so let it use
Utility.RawFilePath, which depends on unix, which cannot be in
setup-depends.

Sponsored-by: Dartmouth College's Datalad project
2023-03-01 15:55:58 -04:00
Joey Hess
5a7f253974
support git 2.34.0's handling of merge conflict between annexed and non-annexed file
This version of git -- or its new default "ort" resolver -- handles such
a conflict by staging two files, one with the original name and the other
named file~ref. Use unmergedSiblingFile when the latter is detected.

(It doesn't do that when the conflict is between a directory and a file
or symlink though, so see previous commit for how that case is handled.)

The sibling file has to be deleted separately, because cleanConflictCruft
may not delete it -- that only handles files that are annex links,
but the sibling file may be the non-annexed file side of the conflict.

The graftin code had assumed that, when the other side of a conclict
is a symlink, the file in the work tree will contain the non-annexed
content that we want it to contain. But that is not the case with the new
git; the file may be the annex link and needs to be replaced with the
content, while the annex link will be written as a -variant file.

(The weird doesDirectoryExist check in graftin turns out to still be
needed, test suite failed when I tried to remove it.)

Test suite passes with new git with ort resolver default. Have not tried it
with old git or other defaults.

Sponsored-by: Noam Kremen on Patreon
2021-11-22 16:10:24 -04:00
Joey Hess
33a80d083a
sync --quiet
* sync: When --quiet is used, run git commit, push, and pull without
  their ususual output.
* merge: When --quiet is used, run git merge without its usual output.

This might also make --quiet work better for some other commands
that make commits, like git-annex adjust.

Sponsored-by: Kevin Mueller on Patreon
2021-07-19 11:28:47 -04:00
Joey Hess
1c5fc8f047
Git.Queue: allow providing git common options like -c 2021-01-04 12:51:55 -04:00
Joey Hess
a3b714ddd9
finish fixing removeLink on windows
9cb250f7be got the ones in RawFilePath,
but there were others that used the one from unix-compat, which fails at
runtime on windows. To avoid this,
import System.PosixCompat.Files hiding removeLink

This commit was sponsored by Ethan Aubin.
2020-11-24 13:20:44 -04:00
Joey Hess
9b0dde834e
convert getFileSize to RawFilePath
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-05 11:32:57 -04:00
Joey Hess
b4b02e4c61
more RawFilePath conversion
412/645
2020-10-30 13:31:35 -04:00
Joey Hess
ca80c3154c
more RawFilePath conversion
removeFile changed to removeLink, because AFAICS it should be fine to
remove non-file things here. In particular, it's fine to remove a
symlink, since we're about to write a symlink. (removeLink does not
remove directories, so file, symlink, and unix socket are the only
possibilities.)
2020-10-30 13:07:41 -04:00
Joey Hess
e505c03bcc
more RawFilePath conversion
nukeFile replaced with removeWhenExistsWith removeLink, which allows
using RawFilePath. Utility.Directory cannot use RawFilePath since setup
does not depend on posix.

This commit was sponsored by Graham Spencer on Patreon.
2020-10-29 10:50:29 -04:00
Joey Hess
62372ee052
resolvemerge: Improve cleanup of cruft left in the working tree by a conflicted merge
This commit was sponsored by Jake Vosloo on Patreon.
2020-09-07 16:50:27 -04:00
Joey Hess
0e21a3221e
clean up old code
withworktree is no longer doing anything useful so remove it
2020-09-07 16:16:15 -04:00
Joey Hess
03dee56546
revert change that broke test suite
Opened a new bug about it.

This commit was sponsored by Ethan Aubin.
2020-09-07 15:42:38 -04:00
Joey Hess
d120c73302
sync, assistant: When merge.directoryRenames is not set, default it it to "false"
Works better with automatic merge conflict resolution than git's ususual
default of "conflict".

This is not done when automatic merge conflict resolution is disabled.

This commit was sponsored by Mark Reidenbach on Patreon.
2020-09-07 13:50:58 -04:00
Joey Hess
f4c4b89aa3
refactor
Make all calls to git merge go through autoMergeFrom, in preparation
for fine-tuning git merge's config for automatic merge conflict
resolution.

This commit was sponsored by Ryan Newton on Patreon.
2020-09-07 13:26:16 -04:00
Joey Hess
69053a93a2
resolvemerge: Improve cleanup of files that were deleted by one side of a conflicted merge, and modified by the other side
This case was handled by cleanConflictCruft, but only when the annexed
file's object was present. When not present, it left the annexed file
with the original name, not checked into git, while adding the variant
file. So, add an explicit deletion of the deleted file in this case.

My specific case where this happened actually involves
merge.directoryRenames=conflict. After a merge involving that,
the situation was the file appears as "added by them", because that
caused the file that they added to be moved into a directory we renamed.

That case is the same as them adding a modified version of the file,
while we deleted it. (Except for the history of the file, since it's a
new file, but this doesn't look at history.)

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-09-07 12:25:57 -04:00
Joey Hess
a360437215
make automerge behavior when one side deleted explict
This does not actually change how the merge conflict is resolved when
one side deleted the file, but it was not documented before, and I think
it only worked by accident.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-09-07 12:01:03 -04:00
Joey Hess
89b2542d3c
annex.skipunknown with transition plan
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.

Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.

annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
2020-05-28 15:55:17 -04:00
Joey Hess
eaa49ab53d
convert replaceFile to createDirectoryUnder
Since it was used on both worktree and .git/annex files, split into
multiple functions.

In passing, this also improves permissions of created directories in
.git/annex, using createAnnexDirectory on those.
2020-03-06 11:31:01 -04:00
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
c756006374
fix hacked up AutoMerge module to work again 2019-12-02 10:51:43 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
4e30b06ffb
remove unused import 2019-08-28 15:38:29 -04:00
Joey Hess
689d1fcc92
remove most remnants of direct mode
A few remain, as needed for upgrades, and for accessing objects from
remotes that are direct mode repos that have not been converted yet.
2019-08-26 16:27:48 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
0b7f6d24d3
rename BlobType and add submodule to it
This was badly named, it's a not a blob necessarily, but anything that a
tree can refer to.

Also removed the Show instance which was used for serialization to git
format, instead use fmtTreeItemType.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 14:45:41 -04:00
Joey Hess
187b3e7780
enable LambdaCase and convert around 10% of places that could use it
Needs ghc 7.6.1, so minimum base version increased slightly. All builds
are well above this version of ghc, and debian oldstable is as well.

Code that could use lambdacase can be found by running:
git grep -B 1 'case ' | less
and searching in less for "<-"

This commit was sponsored by andrea rota.
2017-11-15 16:59:32 -04:00
Joey Hess
94351daba6
configuration to disable automatic merge conflict resolution
* Added annex.resolvemerge configuration, which can be set to false to
  disable the usual automatic merge conflict resolution done by git-annex
  sync and the assistant.
* sync: Added --no-resolvemerge option.

Note that disabling merge conflict resolution is probably not a good idea
in a direct mode repo or adjusted branch. Since updates to both are done
outside the usual work tree, if it fails the tree is not left in a
conflicted state, and it would be hard to manually resolve the conflict.
Still, made annex.resolvemerge be supported in those cases for consistency.

This commit was sponsored by Riku Voipio.
2017-06-01 12:51:01 -04:00
Joey Hess
9569d6be63
Fix bad automatic merge conflict resolution between an annexed file and a directory with the same name when in an adjusted branch.
When running in an overlay work tree, all unchanged files show as deleted,
so this code that stages deletions should not run.
2016-06-07 12:53:35 -04:00
Joey Hess
4efc26ca6c
move keys db closure to AutoMerge
This makes git-annex sync also do it, which makes sure that the keys db
info is fresh when doing a sync --content.
2016-05-16 15:11:14 -04:00
Joey Hess
46e3319995
assistant: Deal with upcoming git's refusal to merge unrelated histories by default
git 2.8.1 (or perhaps 2.9.0) is going to prevent git merge from merging in
unrelated branches. Since the webapp's pairing etc features often combine
together repositories with unrelated histories, work around this behavior
change by setting GIT_MERGE_ALLOW_UNRELATED_HISTORIES when the assistant
merges.

Note though that this is not done for git annex sync's merges, so
it will follow git's default or configured behavior.
2016-04-22 14:26:44 -04:00
Joey Hess
b7c8bf5274
Preserve execute bits of unlocked files in v6 mode.
When annex.thin is set, adding an object will add the execute bits to the
work tree file, and this does mean that the annex object file ends up
executable.

This doesn't add any complexity that wasn't already present, because git
annex add of an executable file has always ingested it so that the annex
object ends up executable.

But, since an annex object file can be executable or not, when populating
an unlocked file from one, the executable bit is always added or removed
to match the mode of the pointer file.
2016-04-14 14:47:08 -04:00
Joey Hess
887ef93a7f
run out of tree merge with --no-ff
This is how direct mode does it too, and somehow, for reasons that
currently escape me, this makes git merge not care if it's run with an
empty work tree.
2016-04-06 18:40:28 -04:00
Joey Hess
60bdffe43e
fix auto merge conflict resolution when doing out of tree merge for adjusted branch 2016-04-06 17:32:04 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Joey Hess
bafcbe95c3
fix one more test failure with v6 unlocked file merge conflict resolution 2016-01-08 15:23:15 -04:00
Joey Hess
b3d60ca285
use TopFilePath for associated files
Fixes several bugs with updates of pointer files. When eg, running
git annex drop --from localremote
it was updating the pointer file in the local repository, not the remote.
Also, fixes drop ../foo when run in a subdir, and probably lots of other
problems. Test suite drops from ~30 to 11 failures now.

TopFilePath is used to force thinking about what the filepath is relative
to.

The data stored in the sqlite db is still just a plain string, and
TopFilePath is a newtype, so there's no overhead involved in using it in
DataBase.Keys.
2016-01-05 17:22:19 -04:00
Joey Hess
a2c056df65
convert isPointerFile from Annex to IO 2016-01-01 13:22:38 -04:00
Joey Hess
5057fffccd
flush queue before cleaning cruft
Else, queued file stages won't have reached the index, and it won't find
everthing.

This evidently fixes a reversion in my work today, although I don't see how
I broke it. It didn't use to flush the queue first, before, and worked
somehow.

Test suite for v5 is back to 100% green now.
2015-12-29 17:35:57 -04:00
Joey Hess
f3be28eedc
test suite noticed a direct mode reversion 2015-12-29 17:12:57 -04:00
Joey Hess
10ecc43790
rename 2015-12-29 17:02:14 -04:00
Joey Hess
996ae9b172
don't disable smudge filter while merging
The smudge filter does need to be run, because if the key is in the local
annex already (due to renaming, or a copy of a file added, or a new file
added and its content has already arrived), git merge smudges the file and
this should provide its content.

This does probably mean that in merge conflict resolution, git smudges the
existing file, re-copying all its content to it, and then the file is
deleted. So, not efficient.
2015-12-29 16:36:21 -04:00
Joey Hess
24bbaa2346
avoid renaming file when auto-resolving conflict in annex pointer
This is a behavior change for merge conflicts between locked files
that both pointed to the same key, in different ways.
Before, the conflict was resolved, but the file was renamed to .variant.
This was unnecessary, because there was only one variant.

Of course, this also handles conflicts between unlocked and locked, or even
two unlocked files with different pointer contents.
2015-12-29 16:35:34 -04:00
Joey Hess
b6b34f4916
automatic conflict resolution for v6 unlocked files
Several tricky parts:

* When the conflict is just between the same key being locked and unlocked,
  the unlocked version wins, and the file is not renamed in this case.

* Need to update associated file map when conflict resolution renames
  an unlocked file.

* git merge runs the smudge filter on the conflicting file, and actually
  overwrites the file with the same content it had before, and so
  invalidates its inode cache. This makes it difficult to know when it's
  safe to remove such files as conflict cruft, without going so far as to
  compare their entire contents.

  Dealt with this by preventing the smudge filter from populating the file
  when a merge is run. However, that also prevents the smudge filter being
  run for non-conflicting files, so eg moving a file won't put its new
  content into place.

* Ideally, if a merge or a merge conflict resolution renames an unlocked
  file, the file in the work tree can just be moved, rather than copying
  the content to a new worktree file.

  This is attempted to be done in merge conflict resolution, but
  due to git merge's behavior of running smudge filters, what actually
  seems to happen is the old worktree file with the content is deleted and
  rewritten as a pointer file, so doesn't get reused.

So, this is probably not as efficient as it optimally could be.
If that becomes a problem, could look into running the merge in a separate
worktree and updating the real worktree more efficiently, similarly to the
direct mode merge. However, the direct mode merge had a lot of bugs, and
I'd rather not use that more error-prone method unless really needed.
2015-12-29 15:41:09 -04:00
Joey Hess
664cc987e8
support pointer files
Backend.lookupFile is changed to always fall back to catKey when
operating on a file that's not a symlink.

catKey is changed to understand pointer files, as well as annex symlinks.

Before, catKey needed a file mode witness, to be sure it was looking at a
symlink. That was complicated stuff. Now, it doesn't actually care if a
file in git is a symlink or not; in either case asking git for the content
of the file will get the pointer to the key.

This does mean that git-annex will treat a link
foo -> WORM--bar as a git-annex file, and also treats
a regular file containing annex/objects/WORM--bar as a git-annex file.

Calling catKey could make git-annex commands need to do more work than
before. This would especially be the case if a repo contained many regular
files, and only a few annexed files, as now git-annex will need to ask
git about the contents of the regular files.
2015-12-07 15:35:36 -04:00
Joey Hess
27eaa6f410
avoid making post-merge-conflict-resolution commit when no conflicts were resolved
sync, merge, assistant: When git merge failed for a reason other than a
conflicted merge, such as a crippled filesystem not allowing particular
characters in filenames, git-annex would make a merge commit that could
omit such files or otherwise be bad. Fixed by aborting the whole merge
process when git merge fails for any reason other than a merge conflict.
2015-10-15 14:22:46 -04:00
Joey Hess
eb33569f9d remove Params constructor from Utility.SafeCommand
This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.

In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.

Problems noticed while doing this conversion:

	* Some uses of Params "oneword" which was entirely unnecessary
	  overhead.
	* A few places that built up a list of parameters with ++
	  and then used Params to split it!

Test suite passes.
2015-06-01 13:52:23 -04:00
Joey Hess
2b79e6fe08 a few hlints 2015-04-11 00:10:34 -04:00
Joey Hess
70736d2b41 Repository tuning parameters can now be passed when initializing a repository for the first time.
* init: Repository tuning parameters can now be passed when initializing a
  repository for the first time. For details, see
  http://git-annex.branchable.com/tuning/
* merge: Refuse to merge changes from a git-annex branch of a repo
  that has been tuned in incompatable ways.
2015-01-27 17:38:06 -04:00