Commit graph

57 commits

Author SHA1 Message Date
Joey Hess
686791c4ed
more RawFilePath
Remove dup definitions and just use the RawFilePath one. </> etc are
enough faster that it's probably faster than building a String directly,
although I have not benchmarked.
2019-12-18 17:10:28 -04:00
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
5f391179f1
use RawFilePath getFileStatus for speed
Only done on those calls to getFileStatus that had a RawFilePath, not a
FilePath. The others would probably be just as fast if converted to use
it with toRawFilePath, but I'm not 100% sure.

Note that genInodeCache' uses fromRawFilePath, but that value only gets
used on Windows, so on unix the thunk will never be evaluated.
2019-12-06 14:44:42 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
99536e3a0b
remove one more warningIO
Had to generalize Git.Queue so it can run an Annex action, yipes.

Only remaining warningIO are in the legacy chunk code.
2019-11-12 10:45:52 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
1e95bc4fd1
avoid git warning about CRLF in restagePointerFile
Saw it on Windows, could probably also happen on linux with some
configuration. Since this is a pointer file, the warning does not apply.
2019-02-18 18:35:36 -04:00
Joey Hess
1a367cad83
Fix path separator bug on Windows that completely broke git-annex since version 7.20190122. 2019-02-18 17:16:39 -04:00
Joey Hess
5d98cba923
use ByteStrings when reading annex symlinks and pointers
Now there's a ByteString used all the way from disk to Key.

The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.

Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.
2019-01-14 15:37:08 -04:00
Joey Hess
53905490df
convert Git.HashObject to use ByteStrings
Both lazy and strict, because sometimes it's more efficient to build a
small strict bytestring, and other times better to lazily stream.
2019-01-03 13:21:01 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
54d49eeac8
avoid update-index race
This commit was supported by the NSF-funded DataLad project.
2018-08-17 16:03:40 -04:00
Joey Hess
0f25d48639
pass absolute path to update-index
Test suite found a case where this is necessary.

And the man page says this, although current behavior is not as
documented..

           Note that files beginning with .  are discarded.
           This includes ./file and dir/./file. If you don’t want
           this, then use cleaner names.

This may hit path length limits on Windows. shrug

This commit was supported by the NSF-funded DataLad project.
2018-08-16 16:00:29 -04:00
Joey Hess
82a239675f
narrow the race where a file gets modified before update-index
Check just before running update-index if the worktree file's content is
still the same, don't update it when it's been modified. This narrows
the race window a lot, from possibly minutes or hours, to seconds or
less.

(Use replaceFile so that the worktree update happens atomically,
allowing the InodeCache of the new worktree file to itself be gathered
w/o any other race.)

This doesn't eliminate the race; it can still occur in the window before
update-index runs. When annex.queue is large, a lot of files will be
statted by the checks, and so the window may still be large enough to be a
problem.

When only a few files are being processed, the window is as small as it
is in the race where a modification gets overwritten by git-annex when
it updates the worktree. Or maybe as small as whatever race git
checkout/pull/merge may have when the worktree gets modified during it.
Still, I've kept a todo about this race.

This commit was supported by the NSF-funded DataLad project.
2018-08-16 15:56:43 -04:00
Joey Hess
82cfcfc838
better index file refresh method
Use git update-index --refresh, since it's a little bit more
efficient and the user can be told to run it if a locked index prevents
git-annex from running it.

This also fixes the problem where an annexed file was deleted in the index
and a get of another file that uses the same key caused the index update to
add back the deleted file. update-index will not add back the deleted file.

Documented in tips/unlocked_files.mdwn the gotcha that the index update
may conflict with other operations. I can't see any way to possibly avoid
that conflict.

One new todo about a race that causes a modification to be accidentially
staged.

Note that the assistant only flushes the git command queue when it
commits a modification. I have not tested the assistant with v6 unlocked
files, but assume most users of the assistant won't care if the index
shows a file as modified for a while.

This commit was supported by the NSF-funded DataLad project.
2018-08-16 14:16:24 -04:00
Joey Hess
0b7f6d24d3
rename BlobType and add submodule to it
This was badly named, it's a not a blob necessarily, but anything that a
tree can refer to.

Also removed the Show instance which was used for serialization to git
format, instead use fmtTreeItemType.

This commit was supported by the NSF-funded DataLad project.
2018-05-14 14:45:41 -04:00
Joey Hess
fc845e6530
more lambda-case conversion 2017-12-05 15:00:50 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
34530e59d9
Avoid using a lot of memory when large objects are present in the git repository
.. and have to be checked to see if they are a pointed to an annexed file.

Cases where such memory use could occur included, but were not limited to:
  - git commit -a of a large unlocked file (in v5 mode)
  - git-annex adjust when a large file was checked into git directly
Generally, any use of catKey was a potential problem.

Fix by using git cat-file --batch-check to check size before catting.
This adds another git batch process, which is included in the CatFileHandle
for simplicity.

There could be performance impact, anywhere catKey is used. Particularly
likely to affect adjusted branch generation speed, and operations on
unlocked files in v6 mode. Hopefully since the --batch-check and
--batch read the same data, disk buffering will avoid most overhead.
Leaving only the overhead of talking to the process over the pipe and
whatever computation --batch-check needs to do.

This commit was sponsored by Bruno BEAUFILS on Patreon.
2016-10-05 15:24:13 -04:00
Joey Hess
b7c8bf5274
Preserve execute bits of unlocked files in v6 mode.
When annex.thin is set, adding an object will add the execute bits to the
work tree file, and this does mean that the annex object file ends up
executable.

This doesn't add any complexity that wasn't already present, because git
annex add of an executable file has always ingested it so that the annex
object ends up executable.

But, since an annex object file can be executable or not, when populating
an unlocked file from one, the executable bit is always added or removed
to match the mode of the pointer file.
2016-04-14 14:47:08 -04:00
Joey Hess
2046502407
v6: Close pointer file handles more quickly, to avoid problems on Windows.
Was using L.readFile, so the Handle would remain open until the garbage
collector got around to it. Changed to explicit open and close, so we know
it's always closed when the function returns.
2016-04-04 15:42:33 -04:00
Joey Hess
88a4a6f396
Sped up git-annex add in direct mode and v6 by using git hash-object --batch.
Speeds up hashSymlink and hashPointerFile.
2016-03-14 15:58:46 -04:00
Joey Hess
1df49506c4
Correct git-annex info to include unlocked files in v6 repository.
An unlocked present file does not have a pointer file in the worktree, so
info skipped counting it.

It may be that unused was also affected by the problem, but it seemed not
to be in my tests. I think because of the use of the associatedFilesFilter.

This fix slows down both info and unused a little bit, since they have to
query the contents of files from git, but only when handling unlocked files.
2016-03-14 13:14:01 -04:00
Joey Hess
b0081598c7
Fix memory leak in last release, which affected commands like git-annex status when a large non-annexed file is present in the work tree.
The whole file was strictly read, and so buffered in memory, and remained
buffered for some time when running git-annex status.
2016-02-19 14:45:26 -04:00
Joey Hess
adc27f081a
escape slashes in annex pointer files
The problem with having the slashes unescaped is, it broke parsing, since
the parser takes the filename to get the part containing the key.
That particularly affected URL keys.

This makes the format be the same as symlinks point to, which keeps things
simple.

Existing pointer files will continue to work ok.
2016-02-16 14:10:08 -04:00
Joey Hess
7899f7248a
force strict file read
Avoid possibly having the file open still when it gets deleted.

Needed on Windows, particularly.
2016-02-15 16:47:34 -04:00
Joey Hess
4d89a1ffd1
allow \r in pointer files
git-annex doesn't write \r, but it can be present due to line ending
conversions or perhaps user edits.
2016-02-15 16:37:40 -04:00
Joey Hess
f9d79d194b
Windows: Fix v6 unlocked files to actually work.
Pointer files were not being treated as annex content, so "git annex get"
didn't replace them with the object.
2016-02-15 16:12:18 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Joey Hess
a2c056df65
convert isPointerFile from Annex to IO 2016-01-01 13:22:38 -04:00
Joey Hess
06a8256bf6
always format pointer file with a trailing newline
Before the smudge filter added a trailing newline, but other things that
wrote formatPointer to a file did not.

also some new pointer staging code to use later
2015-12-10 16:06:58 -04:00
Joey Hess
78a6b8ce05
refactor and improve pointer file handling code 2015-12-09 14:27:43 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
6ecd3ff421 diffdriver: New git-annex command, to make git external diff drivers work with annexed files.
Closes https://github.com/datalad/datalad/issues/18
2014-11-24 16:14:06 -04:00
Joey Hess
ba42b67c70 Fix bug in automatic merge conflict resolution
When one side is an annexed symlink, and the other side is a non-annexed symlink.

In this case, git-merge does not replace the annexed symlink in the work
tree with the non-annexed symlink, which is different from it's handling of
conflicts between annexed symlinks and regular files or directories.
So, while git-annex generated the correct merge commit, the work tree
didn't get updated to reflect it.
See comments on bug for additional analysis.

Did not add this to the test suite yet; just unloaded a truckload of firewood
and am feeling lazy.

This commit was sponsored by Adam Spiers.
2014-07-08 13:55:11 -04:00
Joey Hess
67fd06af76 add git annex view command
(And a vpop command, which is still a bit buggy.)

Still need to do vadd and vrm, though this also adds their documentation.

Currently not very happy with the view log data serialization. I had to
lose the TDFA regexps temporarily, so I can have Read/Show instances of
View. I expect the view log format will change in some incompatable way
later, probably adding last known refs for the parent branch to View
or something like that.

Anyway, it basically works, although it's a bit slow looking up the
metadata. The actual git branch construction is about as fast as it can be
using the current git plumbing.

This commit was sponsored by Peter Hogg.
2014-02-18 18:22:20 -04:00
Joey Hess
1572c460e8 avoid using openFile when withFile can be used
Potentially fixes some FD leak if an action on an opened file handle fails
for some reason. There have been some hard to reproduce reports of
git-annex leaking FDs, and this may solve them.
2014-02-03 10:19:06 -04:00
Joey Hess
b405295aee hlint
test suite still passes
2013-09-25 03:09:06 -04:00
Joey Hess
7b0970b340 Fix inverted logic in last release's fix for data loss bug, that caused git-annex sync on FAT or other crippled filesystems to add symlink standin files to the annex. 2013-07-30 16:08:09 -04:00
Joey Hess
ecdfa40cbe avoid false positives when detecting core.symlinks=false symlink standin files
If the file is > 8192 bytes, it's certianly not a symlink file.

And if it contains nuls or newlines or whitespace, it's certianly
not a link to annexed content. But it might be a tarball containing
a git-annex repo.
2013-07-20 19:28:02 -04:00
Joey Hess
ae341c1a37 avoid reading files that are not symlinks when core.symlinks=false
This hack is only needed on FAT filesystems, so there's no point in doing
it the rest of the time. And it's possible for there to be a false
positive, so it's best to avoid the hack when possible.
2013-07-20 19:14:29 -04:00
Joey Hess
d80a0f62a4 avoid lazy read of file contents
On Windows, that means the file could still be open when later code wants
to delete it, which fails. Since we're only reading 8k anyway, just read
it, strictly. However, avoid reading the whole file strictly, so no
getContentsStrict here.
2013-06-17 21:12:09 -04:00
Joey Hess
b7674b464b typo in comment 2013-06-17 20:45:04 -04:00
Joey Hess
25cb9a48da fix the day's Windows permissions damage 2013-05-14 20:15:14 -04:00
Joey Hess
8a2ff023a3 convert from internal git path when checking symlink standin file 2013-05-14 15:08:40 -05:00
Joey Hess
e7936b1a34 always try to read symlink; only fall back to looking inside file
On Windows with Cygwin, checking out a git-annex repo will create symlinks
on disk, so we need to always try to read the symlink, even when
core.symlinks says they're not supported.
2013-05-14 14:18:47 -04:00
Joey Hess
03e8594369 fix the day's windows permissions damage 2013-05-12 19:09:48 -04:00
Joey Hess
73d2f8b280 deal with git using / internally, even on DOS 2013-05-12 17:29:49 -05:00