Commit graph

167 commits

Author SHA1 Message Date
Joey Hess
a928190985 optimise ObjectHashLower handling
A repo tuned with ObjectHashLower only ever uses lower-case hash
directories, so unlike a bare repo which could have old mixed case
directories, there is no need to do an expensive check of the filesystem in
this case.
2015-06-12 13:05:44 -04:00
Joey Hess
0a998032ed Fix bug that prevented enumerating locally present objects in repos tuned with annex.tune.objecthash1=true
Need to walk 1 level of subdirs less in this case.

The git-annex branch traversal code didn't have a similar bug.
2015-06-11 15:15:05 -04:00
Joey Hess
2b79e6fe08 a few hlints 2015-04-11 00:10:34 -04:00
Joey Hess
cf903d5a3c fixup annex link target calculation when submodules are used in filesystems not supporting symlinks 2015-03-04 16:08:41 -04:00
Joey Hess
df4543eb38 avoid checking location of content when calculating gitAnnexLink
It doesn't matter if the object is present or not, gitAnnexLink should
always yield the same symlink target.

This is an optimisation; no behavior should be changed.
2015-03-04 15:45:16 -04:00
Joey Hess
af254615b2 use WAL mode to ensure read from db always works, even when it's being written to
Also, moved the database to a subdir, as there are multiple files.

This seems to work well with concurrent fscks, although they still do
redundant work due to the commit granularity. Occasionally two writes will
conflict, and one is then deferred and happens later.

Except, with 3 concurrent fscks, I got failures:

git-annex: user error (SQLite3 returned ErrorBusy while attempting to perform prepare "SELECT \"fscked\".\"key\"\nFROM \"fscked\"\nWHERE \"fscked\".\"key\" = ?\n": database is locked)

Argh!!!
2015-02-18 15:54:24 -04:00
Joey Hess
3414229354 fsck: Multiple incremental fscks of different repos (some remote) can now be in progress at the same time in the same repo without it getting confused about which files have been checked for which remotes. 2015-02-17 17:08:11 -04:00
Joey Hess
afb3e3e472 avoid crash when starting fsck --incremental when one is already running
Turns out sqlite does not like having its database deleted out from
underneath it. It might suffice to empty the table, but I would rather
start each fsck over with a new database, so I added a lock file, and
running incremental fscks use a shared lock.

This leaves one concurrency bug left; running two concurrent fsck --more
will lead to: "SQLite3 returned ErrorBusy while attempting to perform step."
and one or both will fail. This is a concurrent writers problem.
2015-02-17 13:30:24 -04:00
Joey Hess
91e9146d1b convert incremental fsck to using sqlite database
Did not keep backwards compat for sticky bit records. An incremental fsck
that is already in progress will start over on upgrade to this version.

This is not yet ready for merging. The autobuilders need to have sqlite
installed.

Also, interrupting a fsck --incremental does not commit the database.
So, resuming with fsck --more restarts from beginning.

Memory: Constant during a fsck of tens of thousands of files.
(But, it does seem to buffer whole transation in memory, so
may really scale with number of files.)

CPU: ?
2015-02-16 15:35:26 -04:00
Joey Hess
3e78b83875 Windows: Fix bug in dropping an annexed file, which caused a symlink to be staged that contained backslashes. 2015-02-09 15:37:26 -04:00
Joey Hess
c8163ce29a use a Set 2015-01-28 18:17:10 -04:00
Joey Hess
e0187d5d12 test suite found a problem with today's work
". def" did not do what I thought it would, at all.
2015-01-28 18:05:08 -04:00
Joey Hess
009bd050c1 implement annex.tune.objecthashlower
Split out Annex.DirHashes which never really belonged in Locations.
2015-01-28 16:52:08 -04:00
Joey Hess
0fd5f257d0 groundwork for parameterizing hash depth 2015-01-28 15:55:17 -04:00
Joey Hess
ba3825441c rework Differences data type
Eliminated complexity and future proofed. The most important change is that
all functions over Difference are now total; any Difference that can be
expressed should be handled. Avoids needs for sanity checking of inputs,
and version skew with the future.

Also, the difference.log now serializes a [Difference], not a Differences.
This saves space and keeps it simpler.

Note that [Difference] might contain conflicting differences (eg,
[Version5, Version6]. In this case, one of them needs to consistently win
over the others, probably based on Ord.
2015-01-28 13:50:02 -04:00
Joey Hess
70736d2b41 Repository tuning parameters can now be passed when initializing a repository for the first time.
* init: Repository tuning parameters can now be passed when initializing a
  repository for the first time. For details, see
  http://git-annex.branchable.com/tuning/
* merge: Refuse to merge changes from a git-annex branch of a repo
  that has been tuned in incompatable ways.
2015-01-27 17:38:06 -04:00
Joey Hess
09a66f702d Revert "remove absNormPathUnix, using my absPathFrom replacement"
This reverts commit a7f05c007b.

Consider: relPathDirToFile (absPathFrom "/tmp/repo/xxx" "y/bar") "/tmp/repo/.git/annex/objects/xxx"

This needs to always yield "../../../.git/annex/objects/xxx" but on
Windows, it is "..\\..\\/tmp/repo/.git/annex/objects/xxx"
2015-01-21 13:54:47 -04:00
Joey Hess
a7f05c007b remove absNormPathUnix, using my absPathFrom replacement 2015-01-21 13:37:09 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
3bab5dfb1d revert parentDir change
Reverts 965e106f24

Unfortunately, this caused breakage on Windows, and possibly elsewhere,
because parentDir and takeDirectory do not behave the same when there is a
trailing directory separator.
2015-01-09 13:11:56 -04:00
Joey Hess
858d776352 Merge branch 'master' into relativepaths
Conflicts:
	Locations.hs
	debian/changelog
2015-01-06 19:00:01 -04:00
Joey Hess
965e106f24 made parentDir return a Maybe FilePath; removed most uses of it
parentDir is less safe than takeDirectory, especially when working
with relative FilePaths. It's really only useful in loops that
want to terminate at /

This commit was sponsored by Audric SCHILTKNECHT.
2015-01-06 18:55:56 -04:00
Joey Hess
cd865c3b8f Switch to using relative paths to the git repository.
This allows the git repository to be moved while git-annex is running in
it, with fewer problems.

On Windows, this avoids some of the problems with the absurdly small
MAX_PATH of 260 bytes. In particular, git-annex repositories should
work in deeper/longer directory structures than before. See
http://git-annex.branchable.com/bugs/__34__git-annex:_direct:_1_failed__34___on_Windows/

There are several possible ways this change could break git-annex:

1. If it changes its working directory while it's running, that would
   be Bad News. Good news everyone! git-annex never does so. It would also
   break thread safety, so all such things were stomped out long ago.

2. parentDir "." -> "" which is not a valid path. I had to fix one
   instace of this, and I should probably wipe all calls to parentDir out
   of the git-annex code base; it was never a good idea.

3. Things like relPathDirToFile require absolute input paths,
   and code assumes that the git repo path is absolute and passes it to it
   as-is. In the case of relPathDirToFile, I converted it to not make
   this assumption.

Currently, the test suite has 16 failures.
2015-01-06 16:19:41 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
8f93982df6 use same hash directories for chunked key as are used for its parent
This avoids a proliferation of hash directories when using new-style
chunking, and should improve performance since chunks are accessed
in sequence and so should have a common locality.

Of course, when a chunked key is encrypted, its hash directories have no
relation to the parent key.

This commit was sponsored by Christian Kellermann.
2014-07-25 16:09:23 -04:00
Joey Hess
58acaf8026 prospective fix for bad_merge_commit_deleting_all_files
Assuming my analysis of a race is correct. In any case, this certianly closes a
race..
2014-07-09 15:08:19 -04:00
Joey Hess
a44fd2c019 export CreateProcess fields from Utility.Process
update code to avoid cwd and env redefinition warnings
2014-06-10 19:20:14 -04:00
Joey Hess
7dc6804154 unannex, uninit: Avoid committing after every file is unannexed, for massive speedup.
pre-commit hook lock added, so unannex can prevent the hook from running
in a confusing state.

This commit was sponsored by Fredrik Hammar
2014-03-21 14:41:05 -04:00
Joey Hess
3c3744c9a9 use https when .git/annex/privkey.pem and .git/annex/certificate.pem exist (untested)
I have not managed to generate a key that is accepted by the old version of
warp-tls I have here.
2014-02-28 21:32:18 -04:00
Joey Hess
a1432bce2f Put non-object tmp files in .git/annex/misctmp, leaving .git/annex/tmp for only partially transferred objects.
This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO
of temp object files is too annoying (and if you don't want to keep
partially transferred objects across reboots).

.git/annex/misctmp must be on the same filesystem as the git work tree,
since files are moved to there in a way that will not work cross-device,
as well as symlinked into there.

I first wanted to put the tmp objects in .git/annex/objects/tmp, but
that would pose transition problems on upgrade when partially transferred
objects existed.

git annex info does not currently show the size of .git/annex/misctemp,
since it should stay small. It would also be ok to make something clean it
out, periodically.
2014-02-26 16:52:56 -04:00
Joey Hess
67fd06af76 add git annex view command
(And a vpop command, which is still a bit buggy.)

Still need to do vadd and vrm, though this also adds their documentation.

Currently not very happy with the view log data serialization. I had to
lose the TDFA regexps temporarily, so I can have Read/Show instances of
View. I expect the view log format will change in some incompatable way
later, probably adding last known refs for the parent branch to View
or something like that.

Anyway, it basically works, although it's a bit slow looking up the
metadata. The actual git branch construction is about as fast as it can be
using the current git plumbing.

This commit was sponsored by Peter Hogg.
2014-02-18 18:22:20 -04:00
Joey Hess
40cec65ace more hlint 2014-02-11 10:48:52 -04:00
Joey Hess
897d877472 work around absNormPath not working on Windows
When making git-annex links, we want unix-style paths in the link targets.
2014-02-06 17:17:35 -04:00
Joey Hess
721cc0cd22 rework annexed object locking in direct mode & support Windows
Seems that locking of annexed objects when they're being dropped was broken
in direct mode:

* When taking the lock before dropping, it created the .git/annex/objects
  file, as an empty file. It seems that the dropping code deleted that,
  but that is not right, and for all I know could in some situation cause
  a corrupted object to leak out.
* When the lock was checked, it actually tried to open each direct mode
  file, and checked if it was locked. Not the same lock used above, and
  could also fail if some consumer of the file locked it.

Fixed this, and added windows support by switching direct mode to lock a
.lck file.
2014-01-28 16:43:11 -04:00
Joey Hess
d345e5b52f add git fsck to cronner, and UI for repository repair (not yet wired up) 2013-10-22 16:02:52 -04:00
Joey Hess
396e47b07e tighten file2key to not produce invalid keys with no keyName
A file named "foo-" or "foo-bar" was taken as a key's file, with a backend
of "foo", and an empty keyName. This led to various problems, especially
because converting that key back to a file did not yeild the same filename.
2013-10-16 12:46:24 -04:00
Joey Hess
af5e1d0494 half way complete cronner thread to run scheduled activities 2013-10-08 11:48:28 -04:00
Joey Hess
635c9a1549 assistant: Detect stale git lock files at startup time, and remove them.
Extends the index.lock handling to other git lock files. I surveyed
all lock files used by git, and found more than I expected. All are
handled the same in git; it leaves them open while doing the operation,
possibly writing the new file content to the lock file, and then closes
them when done.

The gc.pid file is excluded because it won't affect the normal operation
of the assistant, and waiting for a gc to finish on startup wouldn't be
good.

All threads except the webapp thread wait on the new startup sanity checker
thread to complete, so they won't try to do things with git that fail
due to stale lock files. The webapp thread mostly avoids doing that kind of
thing itself. A few configurators might fail on lock files, but only if the
user is explicitly trying to run them. The webapp needs to start
immediately when the user has opened it, even if there are stale lock
files.

Arranging for the threads to wait on the startup sanity checker was a bit
of a bear. Have to get all the NotificationHandles set up before the
startup sanity checker runs, or they won't see its signal. Perhaps
the NotificationBroadcaster is not the best interface to have used for
this. Oh well, it works.

This commit was sponsored by Michael Jakl
2013-10-05 17:04:21 -04:00
Joey Hess
1be4d281d6 Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.

In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.

So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..

Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)

Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.

(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)

This commit was sponsored by Ian Downes
2013-10-05 15:01:49 -04:00
Joey Hess
3dac026598 move some code around 2013-10-05 13:49:45 -04:00
Joey Hess
83b4b8d589 rename confusing function
The index.lck file is not a lock file. Kept the historical name for now as
changing it would be work.
2013-10-03 15:06:58 -04:00
Joey Hess
7a9a16b337 lockJournal when running performTransitions
This may not strictly be needed -- the transition code bypasses the
journal. However, this ensures that the git-annex branch is only
committed with the journal locked. This will allow for further
improvements.
2013-10-03 14:37:46 -04:00
Joey Hess
4c954661a1 git-annex-shell: Added support for operating inside gcrypt repositories.
* Note that the layout of gcrypt repositories has changed, and
  if you created one you must manually upgrade it.
  See http://git-annex.branchable.com/upgrades/gcrypt/
2013-09-24 17:25:47 -04:00
Joey Hess
fcd5c167ef untested transition detection on merging, and transition running code 2013-08-28 15:57:42 -04:00
Joey Hess
24c8a6042b importfeed: Ignores transient problems with feeds. Only exits nonzero when a feed has repeatedly had a problems for at least 1 day. 2013-08-03 01:40:21 -04:00
Joey Hess
a96e982bd3 fuzz tester 2013-05-23 19:00:46 -04:00
Joey Hess
03e8594369 fix the day's windows permissions damage 2013-05-12 19:09:48 -04:00
Joey Hess
a2f83b28f3 fix path separator 2013-05-12 17:18:34 -05:00
Joey Hess
f1b0a4b404 Use lower case hash directories for storing files on crippled filesystems, same as is already done for bare repositories.
* since this is a crippled filesystem anyway, git-annex doesn't use
  symlinks on it
* so there's no reason to use the mixed case hash directories that we're
  stuck using to avoid breaking everyone's symlinks to the content
* so we can do what is already done for all bare repos, and make non-bare
  repos on crippled filesystems use the all-lower case hash directories
* which are, happily, all 3 letters long, so they cannot conflict with
  mixed case hash directories
* so I was able to 100% fix this and even resuming `git annex add` in the
  test case will recover and it will all just work.
2013-04-04 15:46:33 -04:00
Joey Hess
38d61f934d Update working tree files fully atomically
This avoids commit churn by the assistant when eg,
replacing a file with a symlink.

But, just as importantly, it prevents the working tree being left with a
deleted file if git-annex, or perhaps the whole system, crashes at the
wrong time.

(It also probably avoids confusing displays in file managers.)
2013-04-02 15:02:00 -04:00