Commit graph

31 commits

Author SHA1 Message Date
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
1100e0d3c9
include upgrade code back in
Remaining things that need to be fixed up to get this branch into a
basically mergeable state: remotes, commands, and the assistant
2019-12-02 12:16:46 -04:00
Joey Hess
e2d4c133f5
init: fix data loss bug
Fix bug that lost modifications to unlocked files when init is re-ran in an
already initialized repo.

In retrospect needing scanUnlockedFiles False in the direct mode upgrade
path was a good hint that it was unsafe when used with True.

However, this bug did not affect upgrade from v5. In such an upgrade, an
unlocked file that is modified is left as-is. The only place
scanUnlockedFiles True did overwrite modified unlocked files is during an
git-annex init of a repo that was already initialized by git-annex.

(I also tried a scenario where the repo had not been initialized by
git-annex yet, but was cloned from a v7 repo with an unlocked file, and the
pointer file replaced with some other content, and the data loss did not
occur in that situation.)

Since the fixed scanUnlockedFiles avoids overwriting non-pointer files,
it should be safe to run in any situation, so there's no need any longer
for the parameter.
2019-11-05 12:41:15 -04:00
Joey Hess
1558e03014
Refuse to upgrade direct mode repositories when git is older than 2.22
That git fixed a memory leak that could cause an OOM during the upgrade.

Most git-annex builds have a new enough git already.
OSX git was upgraded with brew.

Linux i386ancient build's git was too old. Upgrading it to a fixed
git didn't work (due to the newer git not working with the old ssh,
https://bugs.chromium.org/p/git/issues/detail?id=7 )

Choices to deal with that were:

* Somehow make direct mode upgrade work with the old git, avoiding its
  OOM problem. One way would be to switch the repo to indirect mode
  first, and so upgrade to a repo with locked files. Not good when
  the filesystem does not support symlinks.
* backport the OOM fix from git 2.22
  (And do what about the version number so git-annex knows it's fixed?)
* backport openssh (and possibly more stuff)
* move the i386ancient build to at least Debian stretch (still backporting git)
  But this will make it no longer work with some of the ancient kernels it
  targets.

Of those, backporting the OOM fix seemed the best approach. Put "oomfix"
in the git version number to indicate it.

I have not automated building the git backport, so here's the patch I
used:

diff -ur orig/git-2.1.4/convert.c git-2.1.4/convert.c
--- orig/git-2.1.4/convert.c	2014-12-18 18:42:18.000000000 +0000
+++ git-2.1.4/convert.c	2019-08-29 20:05:04.371872338 +0100
@@ -404,7 +404,7 @@
 	if (start_async(&async))
 		return 0;	/* error was already reported */

-	if (strbuf_read(&nbuf, async.out, len) < 0) {
+	if (strbuf_read(&nbuf, async.out, 0) < 0) {
 		error("read from external filter %s failed", cmd);
 		ret = 0;
 	}
diff -ur orig/git-2.1.4/GIT-VERSION-GEN git-2.1.4/GIT-VERSION-GEN
--- orig/git-2.1.4/GIT-VERSION-GEN	2014-12-18 18:42:18.000000000 +0000
+++ git-2.1.4/GIT-VERSION-GEN	2019-08-29 20:06:39.132743228 +0100
@@ -1,7 +1,7 @@
 #!/bin/sh

 GVF=GIT-VERSION-FILE
-DEF_VER=v2.1.4
+DEF_VER=v2.1.4.oomfix

 LF='
 '
diff -ur orig/git-2.1.4/configure git-2.1.4/configure
--- orig/git-2.1.4/configure	2014-12-18 18:42:19.000000000 +0000
+++ git-2.1.4/configure	2019-08-29 20:27:45.896380015 +0100
@@ -580,8 +580,8 @@
 # Identity of this package.
 PACKAGE_NAME='git'
 PACKAGE_TARNAME='git'
-PACKAGE_VERSION='2.1.4'
-PACKAGE_STRING='git 2.1.4'
+PACKAGE_VERSION='2.1.4.oomfix'
+PACKAGE_STRING='git 2.1.4.oomfix'
 PACKAGE_BUGREPORT='git@vger.kernel.org'
 PACKAGE_URL=''

diff -ur orig/git-2.1.4/version git-2.1.4/version
--- orig/git-2.1.4/version	2014-12-18 18:42:19.000000000 +0000
+++ git-2.1.4/version	2019-08-29 20:06:17.572545210 +0100
@@ -1 +1 @@
-2.1.4
+2.1.4.oomfix
2019-08-29 15:24:41 -04:00
Joey Hess
75d1fbad17
improve comment 2019-08-27 14:08:46 -04:00
Joey Hess
9b1331881c
reorg remaining direct mode code
Only used for upgrading, so put it under there.
2019-08-27 14:05:38 -04:00
Joey Hess
11e3b2397c
update location log for missing content during direct mode conversion
If a direct mode file is deleted or modified, and there are no other
files containing the content, the content was lost. That's a normal
thing that can happen in direct mode, but not in v7, so the upgrade
code has to notice it in order for the location log to be accurate.
2019-08-27 13:54:21 -04:00
Joey Hess
da6f4d8887
remove direct mode support from Annex.Content
No longer used. The only possible user of it would be code in
Upgrade.V5, so I verified that the parts of Annex.Content it used were
not used to manipulate direct mode files.
2019-08-27 13:14:06 -04:00
Joey Hess
37b42e72e7
catch exceptions in upgrade
Exceptions due to eg, not being able to write to the repo are not very
useful. There tends to be an error message from git about permission
denied.
2019-08-27 12:33:35 -04:00
Joey Hess
586db7f06d
Avoid making a commit when upgrading from direct mode to v7
Three reasons:

* Committing as part of an upgrade is very unusual and unexpected.
* The commit was failing with a weird error message when done during an
  automatic upgrade.
* Let me remove more of that sweet^Whorrible direct mode code.
2019-08-26 16:35:44 -04:00
Joey Hess
689d1fcc92
remove most remnants of direct mode
A few remain, as needed for upgrades, and for accessing objects from
remotes that are direct mode repos that have not been converted yet.
2019-08-26 16:27:48 -04:00
Joey Hess
5877a15d7b
fix hard links when upgrading from direct mode
When upgrading a direct mode repo to v7 with adjusted unlocked branches,
fix a bug that prevented annex.thin from taking effect for the files in
working tree.

The hard links used to be ok, but commit 8e22114735 accidentially
broke them. It repopulates the worktree file, which is already a hard link,
and when it's creating the new file, the link count is already 2, and so it
doesn't make a hard link then.
2019-08-26 13:54:39 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
5d98cba923
use ByteStrings when reading annex symlinks and pointers
Now there's a ByteString used all the way from disk to Key.

The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.

Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.
2019-01-14 15:37:08 -04:00
Joey Hess
bbf7dcc193
fix bugs involving v7 unlocked files and direct mode
* Fix bug upgrading from direct mode to v7: when files in the repository
  were already committed as v7 unlocked files elsewhere, and the
  content was present in the direct mode repository, the annexed files
  got their full content checked into git.
* Fix bug that caused v7 unlocked files in a direct mode repository
  to get locked when committing.

This commit was sponsored by Nick Piper on Patreon.
2018-12-11 13:47:35 -04:00
Joey Hess
a6c8de84b6
improve types to allow combining some adjustments
Combinations like --hide-misssing --unlocked seem very useful. On the
other hand, combining --fix with --unlock doesn't make sense because a
file can be either unlocked or a symlink that can be fixed, but not
both.

Changed the serialization of HideMissingAdjustment in passing, but it
has not actually been used yet so nothing will be broken.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-10-18 12:59:05 -04:00
Joey Hess
401a79675b
run git status before enabling clean filter
Avoids annex.largefiles inconsitency and also avoids a lot of
unneccessary calls to the clean filter when a large repo's clone
is being initialized.

This commit was supported by the NSF-funded DataLad project.
2018-08-28 10:36:22 -04:00
Joey Hess
148bd0dbfd
refactor 2016-10-17 14:58:33 -04:00
Joey Hess
b7c8bf5274
Preserve execute bits of unlocked files in v6 mode.
When annex.thin is set, adding an object will add the execute bits to the
work tree file, and this does mean that the annex object file ends up
executable.

This doesn't add any complexity that wasn't already present, because git
annex add of an executable file has always ingested it so that the annex
object ends up executable.

But, since an annex object file can be executable or not, when populating
an unlocked file from one, the executable bit is always added or removed
to match the mode of the pointer file.
2016-04-14 14:47:08 -04:00
Joey Hess
5e190913a4
add AdjBranch newtype; some simplications 2016-04-09 15:10:26 -04:00
Joey Hess
c3e0859846
Upgrading a direct mode repository to v6 has changed to enter an adjusted unlocked branch.
This makes the direct mode to v6 upgrade able to be performed in one clone
of a repository without affecting other clones, which can continue using v5
and direct mode.
2016-04-04 13:17:24 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Joey Hess
b3d60ca285
use TopFilePath for associated files
Fixes several bugs with updates of pointer files. When eg, running
git annex drop --from localremote
it was updating the pointer file in the local repository, not the remote.
Also, fixes drop ../foo when run in a subdir, and probably lots of other
problems. Test suite drops from ~30 to 11 failures now.

TopFilePath is used to force thinking about what the filepath is relative
to.

The data stored in the sqlite db is still just a plain string, and
TopFilePath is a newtype, so there's no overhead involved in using it in
DataBase.Keys.
2016-01-05 17:22:19 -04:00
Joey Hess
e9a33088e8
reorder
May be unlocked files pulled from elsewhere, but converting from direct
mode will add more. So, do the scan for them first, since it empties
anything currently in the database.
2016-01-01 15:15:15 -04:00
Joey Hess
f36f24197a
scan for unlocked files on init/upgrade of v6 repo 2016-01-01 15:09:42 -04:00
Joey Hess
121f5d5b0c
annex.thin
Decided it's too scary to make v6 unlocked files have 1 copy by default,
but that should be available to those who need it. This is consistent with
git-annex not dropping unused content without --force, etc.

* Added annex.thin setting, which makes unlocked files in v6 repositories
  be hard linked to their content, instead of a copy. This saves disk
  space but means any modification of an unlocked file will lose the local
  (and possibly only) copy of the old version.
* Enable annex.thin by default on upgrade from direct mode to v6, since
  direct mode made the same tradeoff.
* fix: Adjusts unlocked files as configured by annex.thin.
2015-12-27 15:59:59 -04:00
Joey Hess
4392140946
make linkAnnex detect when the file changes as it's being copied/linked in
This fixes a race where the modified file ended up in annex/objects, and
the InodeCache stored in the database was for the modified version, so
git-annex didn't know it had gotten modified.

The race could occur when the smudge filter was running; now it gets the
InodeCache before generating the Key, which avoids the race.
2015-12-22 15:20:03 -04:00
Joey Hess
f9d077186a
implemented upgrade of direct mode repo to v6 2015-12-15 16:00:26 -04:00
Joey Hess
3311c48631
move InodeSentinal from direct mode code to its own module
Will be used outside of direct mode for v6 unlocked files, and is already
used outside of direct mode when adding files to annex.
2015-12-09 15:52:11 -04:00
Joey Hess
ccc49861ca
add v6; keep v5 working for now and manual upgrade
Since all places where a repo is used in direct mode need to have git-annex
upgraded before the repo can safely be converted to v6, the upgrade needs
to be manual for now.

I suppose that at some point I'll want to drop all the direct mode support
code. At that point, will stop supporting v5, and will need to auto-upgrade
any remaining v5 repos. If possible, I'd like to carry the direct mode
support for say, a year or so, to give people plenty of time to upgrade and
avoid disruption.
2015-12-04 16:14:48 -04:00