Commit graph

795 commits

Author SHA1 Message Date
Joey Hess
54f87ef95f
get associated files from Keys database 2015-12-26 15:09:53 -04:00
Joey Hess
7593917147
cleanup 2015-12-26 15:09:47 -04:00
Joey Hess
289a3592c3
support v6 unlocked files
This optimisation was not necessary, and didn't work for v6 unlocked files.
Typically only a small number of files will be changed by a commit, so just
catKey them all.
2015-12-26 15:04:26 -04:00
Joey Hess
60c36ef6ba
make views work with v6 unlocked files
Have to only use the view index in one place; lookupFile was failing for
unlocked files because it was run using the view index, which was empty.
2015-12-26 14:52:58 -04:00
Joey Hess
49fca49991
remove dead code 2015-12-26 14:45:07 -04:00
Joey Hess
f324ad24c1
improve comment 2015-12-26 13:47:36 -04:00
Joey Hess
0c03629173
clean up cruft in assistant fast rename code path 2015-12-22 18:03:47 -04:00
Joey Hess
d8a8c77a8f
move cleanOldKey into ingest 2015-12-22 16:55:49 -04:00
Joey Hess
cfaac52b88
populate unlocked files with newly available content when ingesting
This can happen when ingesting a new file in either locked or unlocked
mode, when some unlocked files in the repo use the same key, and the
content was not locally available before.
2015-12-22 16:22:28 -04:00
Joey Hess
4f60234690
finish v6 support for assistant
Seems to basically work now!
2015-12-22 15:23:27 -04:00
Joey Hess
4392140946
make linkAnnex detect when the file changes as it's being copied/linked in
This fixes a race where the modified file ended up in annex/objects, and
the InodeCache stored in the database was for the modified version, so
git-annex didn't know it had gotten modified.

The race could occur when the smudge filter was running; now it gets the
InodeCache before generating the Key, which avoids the race.
2015-12-22 15:20:03 -04:00
Joey Hess
8e9608d7f0
refactoring
no behavior changes
2015-12-22 13:42:58 -04:00
Joey Hess
ca2c977704
wip v6 support for assistant
Files are not yet added to v6 repos in unlocked mode.
2015-12-21 18:41:15 -04:00
Joey Hess
35f6a78b66
fix reversion in v5 git-annex add of unlocked file
In v5, lookupFile is supposed to only look at symlinks on disk (except when
in direct mode).

Note that v6 also has a bug when a locked file's symlink is deleted and is
replaced with a new file. It sees that a link is staged and gets that
key.
2015-12-16 14:27:12 -04:00
Joey Hess
38a23928e9
temporarily remove cached keys database connection
The problem is that shutdown is not always called, particularly in the test
suite. So, a database connection would be opened, possibly some changes
queued, and then not shut down.

One way this can happen is when using Annex.eval or Annex.run with a new
state. A better fix might be to make both of them call Keys.shutdown
(and be sure to do it even if the annex action threw an error).

Complication: Sometimes they're run reusing an existing state, so shutting
down a database connection could cause problems for other users of that
same state. I think this would need a MVar holding the database handle,
so it could be emptied once shut down, and another user of the database
connection could then start up a new one if it got shut down. But, what if
2 threads were concurrently using the same database handle and one shut it
down while the other was writing to it? Urgh.

Might have to go that route eventually to get the database access to run
fast enough. For now, a quick fix to get the test suite happier, at the
expense of speed.
2015-12-16 14:05:26 -04:00
Joey Hess
7d0e79b9e1
Use git-annex init --version=6 to get v6 for now
Not ready to make it default because of the direct mode upgrade needing to
all happen at once.
2015-12-15 17:17:13 -04:00
Joey Hess
f9d077186a
implemented upgrade of direct mode repo to v6 2015-12-15 16:00:26 -04:00
Joey Hess
cdd27b8920
reorg 2015-12-15 15:34:28 -04:00
Joey Hess
2bc920e266
update inode cache to cover file even when nothing needs to be done to linkAnnex
This covers the case where multiple files have the same content and are
added with git add. Previously only the one that was linked to the annex
got its inode cached; now both are.
2015-12-15 13:02:33 -04:00
Joey Hess
1dad3af3fc
checked getKeysPresent; it's ok for v6 unlocked files
When a v6 unlocked files is removed from the work tree,
unused doesn't show it. When it gets removed from the index,
unused does show it. This is the same as a locked file.
2015-12-11 16:12:42 -04:00
Joey Hess
7790e059b2
finish v6 git-annex lock
This was a doozy!
2015-12-11 15:28:34 -04:00
Joey Hess
50e83b606c
only make 1 hardlink max between pointer file and annex object
If multiple files point to the same annex object, the user may want to
modify them independently, so don't use a hard link.

Also, check diskreserve when copying.
2015-12-11 14:00:21 -04:00
Joey Hess
c608a752a5
Merge branch 'master' into smudge 2015-12-11 13:50:31 -04:00
Joey Hess
abd66c7089
fsck: Failed to honor annex.diskreserve when checking a remote. 2015-12-11 13:50:27 -04:00
Joey Hess
c910b4e255
wip 2015-12-11 10:42:18 -04:00
Joey Hess
9dffd3d255
add generalized linkAnnex' 2015-12-10 16:08:19 -04:00
Joey Hess
06a8256bf6
always format pointer file with a trailing newline
Before the smudge filter added a trailing newline, but other things that
wrote formatPointer to a file did not.

also some new pointer staging code to use later
2015-12-10 16:06:58 -04:00
Joey Hess
f80a3d8cd0
check InodeCache in inAnnex et al
This avoids querying the database when the content file doen't exist
(or otherwise fails the provided check). However, it does add overhead of
querying the database, and will certianly impact performance.
2015-12-10 14:51:04 -04:00
Joey Hess
2b8f6b8b2f
check inode cache in prepSendAnnex
This does mean one query of the database every time an object is sent.
May impact performance.
2015-12-10 14:50:52 -04:00
Joey Hess
3b2a7f216d
move 2015-12-10 14:20:38 -04:00
Joey Hess
3719d1b390
make clear when code is using deprecated direct mode files 2015-12-09 19:43:15 -04:00
Joey Hess
aa88851ec1
reorder 2015-12-09 19:38:37 -04:00
Joey Hess
ce73a96e4e
use InodeCache when dropping a key to see if a pointer file can be safely reset
The Keys database can hold multiple inode caches for a given key. One for
the annex object, and one for each pointer file, which may not be hard
linked to it.

Inode caches for a key are recorded when its content is added to the annex,
but only if it has known pointer files. This is to avoid the overhead of
maintaining the database when not needed.

When the smudge filter outputs a file's content, the inode cache is not
updated, because git's smudge interface doesn't let us write the file. So,
dropping will fall back to doing an expensive verification then. Ideally,
git's interface would be improved, and then the inode cache could be
updated then too.
2015-12-09 17:54:54 -04:00
Joey Hess
5e8c628d2e
add inode cache to the db
Renamed the db to keys, since it is various info about a Keys.

Dropping a key will update its pointer files, as long as their content can
be verified to be unmodified. This falls back to checksum verification, but
I want it to use an InodeCache of the key, for speed. But, I have not made
anything populate that cache yet.
2015-12-09 17:00:37 -04:00
Joey Hess
3311c48631
move InodeSentinal from direct mode code to its own module
Will be used outside of direct mode for v6 unlocked files, and is already
used outside of direct mode when adding files to annex.
2015-12-09 15:52:11 -04:00
Joey Hess
8a818088a3
link/copy pointer files to object content when it's added 2015-12-09 15:27:29 -04:00
Joey Hess
751120c171
avoid pre-commit hook messing up new-style unlocked files in v6 repo 2015-12-09 15:18:54 -04:00
Joey Hess
78a6b8ce05
refactor and improve pointer file handling code 2015-12-09 14:27:43 -04:00
Joey Hess
712c9fc590
require "annex/objects/" before key in pointer files
This removes ambiguity, because while someone might have "WORM--foo" in a
file that's not intended to be a git-annex pointer file,
"annex/objects/WORM--foo" is less likely.

Also, 664cc987e8 had a caveat about symlink
targets being parsed as pointer files, and now the same parser is used for
both.

I did not include any hash directories before the key in the pointer file,
as they're not needed. However, if they were included, the parser would
still work ok.
2015-12-07 15:45:08 -04:00
Joey Hess
664cc987e8
support pointer files
Backend.lookupFile is changed to always fall back to catKey when
operating on a file that's not a symlink.

catKey is changed to understand pointer files, as well as annex symlinks.

Before, catKey needed a file mode witness, to be sure it was looking at a
symlink. That was complicated stuff. Now, it doesn't actually care if a
file in git is a symlink or not; in either case asking git for the content
of the file will get the pointer to the key.

This does mean that git-annex will treat a link
foo -> WORM--bar as a git-annex file, and also treats
a regular file containing annex/objects/WORM--bar as a git-annex file.

Calling catKey could make git-annex commands need to do more work than
before. This would especially be the case if a repo contained many regular
files, and only a few annexed files, as now git-annex will need to ask
git about the contents of the regular files.
2015-12-07 15:35:36 -04:00
Joey Hess
62a2fba1cd
Merge branch 'master' into smudge 2015-12-07 12:29:34 -04:00
Joey Hess
2936153fc4
fix temp filename
Was not putting it inside the temp dir, but next to it!

This was just wrong, and it led to a longer filename that desired being
used, leading to some bug reports.
2015-12-06 16:54:01 -04:00
Joey Hess
6e71094e7d
avoid too long temp dir template
The filename might be at or close to the filename length limit, so using it
as the template for the temp dir would then fail.
2015-12-06 16:42:40 -04:00
Joey Hess
e7f75b079d
don't let git-annex direct be run in a v6 repo 2015-12-04 16:33:09 -04:00
Joey Hess
ccc49861ca
add v6; keep v5 working for now and manual upgrade
Since all places where a repo is used in direct mode need to have git-annex
upgraded before the repo can safely be converted to v6, the upgrade needs
to be manual for now.

I suppose that at some point I'll want to drop all the direct mode support
code. At that point, will stop supporting v5, and will need to auto-upgrade
any remaining v5 repos. If possible, I'd like to carry the direct mode
support for say, a year or so, to give people plenty of time to upgrade and
avoid disruption.
2015-12-04 16:14:48 -04:00
Joey Hess
34ead644d9
auto-configure filter.annex.smudge and clean on init 2015-12-04 16:14:11 -04:00
Joey Hess
983c1894eb
avoid unnecessary reading of git-annex branch data when matching on annex.largefiles
This makes git annex clean not look at the git-annex branch at all,
and so speeds it up by 50% or more.
2015-12-04 15:06:41 -04:00
Joey Hess
99b2a524a0
clean filter should update location log when adding new content to annex 2015-12-04 14:20:32 -04:00
Joey Hess
2c6454a2e2
basic clean filter working 2015-12-04 13:39:14 -04:00
Joey Hess
0d432dd1a4
annex object file mode for core.sharedRepository
When core.sharedRepository is set, annex object files are not made mode
444, since that prevents a user other than the file owner from locking
them. Instead, a mode such as 664 is used in this case.
2015-11-18 15:45:32 -04:00
Joey Hess
3449c0e8ec
avoid spawning file size polling thread when not in -J mode 2015-11-16 21:21:58 -04:00
Joey Hess
e97fce35a6
Display progress meter in -J mode when downloading from the web.
Including in addurl, and get --from web, but also in S3 and External
special remotes when a web url is known for content in those remotes.
2015-11-16 21:00:54 -04:00
Joey Hess
262c37c16e
add missing checkSaneLock wrapper for pidlocks 2015-11-16 15:35:41 -04:00
Joey Hess
bb86eebfbd
init: Automatically enable annex.pidlock when necessary. 2015-11-13 13:35:29 -04:00
Joey Hess
aaf1ef268d
convert from Utility.LockPool to Annex.LockPool everywhere 2015-11-12 18:13:37 -04:00
Joey Hess
aa4192aea6
pid locking configuration and abstraction layer for git-annex
(not actually used anywhere yet)
2015-11-12 17:50:34 -04:00
Joey Hess
7c741302cc
assistant: Pass ssh-options through 3 more git pull/push calls that were missed before.
It was used for regular pull, but not for regular push, tagged push, or the
fallback fetching.
2015-11-10 16:52:30 -04:00
Joey Hess
7938b87864
add: Fix error recovery rollback to not move the injested file content out of the annex back to the file, because other files may point to that same content. Instead, copy the injected file content out to recover.
That was not a data loss, but it came close!
2015-11-06 15:28:20 -04:00
Joey Hess
51e60259e1
fix replaceFile makeAnnexLink race
replaceFile created a temp file, which was guaranteed to not overlap with
another temp file. However, makeAnnexLink then deleted that file, in
preparation for making the symlink in its place. This caused a race, since
some other replaceFile could create a temp file, using the same name!

I was able to reproduce the race easily running git-annex add -J10 in a
directory with 100 files (all with different contents). Some files would
get ingested into the annex, but their annex links would fail to be added.

There could be other situations where this same problem could occur.
Perhaps when the assistant is adding a file, if the user manually also ran
git-annex add. Perhaps in cases not involving adding a file.

The new replaceFile makes a temprary directory, which is guaranteed to be
unique, and doesn't make a temp file in there. makeAnnexLink can thus
create the symlink without problem and the race is avoided.

Audited all calls to replaceFile to make sure that the old behavior of
providing an empty temp file was not relied on.

The general problem of asking for a temp file and deleting it as part of
the process of using it could reach beyond replaceFile. Did some quick
audits and didn't find other cases of it. Probably only symlink creation
stuff would tend to make that mistake, mostly.
2015-11-06 15:08:19 -04:00
Joey Hess
31472161e4
merge git command queue when joining with concurrent thread 2015-11-05 18:21:48 -04:00
Joey Hess
a4dd8503b8
add regions to concurrent output
still no progress displays when getting files etc, but a big improvement
2015-11-04 14:52:07 -04:00
Joey Hess
640dba43b6
enableremote: List uuids and descriptions of remotes that can be enabled, and accept either the uuid or the description in leu if the name. 2015-10-26 14:55:40 -04:00
Joey Hess
806819be57
Avoid displaying network transport warning when a ssh remote does not yet have an annex.uuid set.
Instead, only display transport error if the configlist output doesn't
include an annex.uuid line, even an empty one.

A recent change made git-annex init try to get all the remote uuids, and so
the transport error would be displayed by it. It was also displayed when
eg, copying files to a remote that had no uuid yet.
2015-10-15 15:36:54 -04:00
Joey Hess
3879f6e6be
do tmp dir cleanup in error case too 2015-10-15 14:27:14 -04:00
Joey Hess
27eaa6f410
avoid making post-merge-conflict-resolution commit when no conflicts were resolved
sync, merge, assistant: When git merge failed for a reason other than a
conflicted merge, such as a crippled filesystem not allowing particular
characters in filenames, git-annex would make a merge commit that could
omit such files or otherwise be bad. Fixed by aborting the whole merge
process when git merge fails for any reason other than a merge conflict.
2015-10-15 14:22:46 -04:00
Joey Hess
9e90c033d3
Changed drop ordering when using git annex sync --content or the assistant, to drop from remotes first and from the local repo last. This works better with the behavior changes to drop in many cases. 2015-10-14 12:33:02 -04:00
Joey Hess
1ff7610118
fix windows build 2015-10-12 15:48:59 -04:00
Joey Hess
f9adb905fc
Avoid unncessary write to the location log when a file is unlocked and then added back with unchanged content.
Implemented with no additional overhead of compares etc.

This is safe to do for presence logs because of their locality of change;
a given repo's presence logs are only ever changed in that repo, or in a
repo that has just been actively changing the content of that repo.

So, we don't need to worry about a split-brain situation where there'd
be disagreement about the location of a key in a repo. And so, it's ok to
not update the timestamp when that's the only change that would be made
due to logging presence info.
2015-10-12 14:46:47 -04:00
Joey Hess
fa9333e99f
use action, not sideAction
sideAction is for things not generally related to the current action being
performed. And, it adds a newline after the side action. This was not the
right thing to use for stuff like "checksum", where doing a checksum is
part of the git annex get process, and indeed we want it to display
"(checksum...) ok"
2015-10-11 13:29:44 -04:00
Joey Hess
3b89d5a20c
implement lockContent for ssh remotes 2015-10-09 16:55:41 -04:00
Joey Hess
e392ec112f
also generate a drop safety proof for move --from remote 2015-10-09 16:16:03 -04:00
Joey Hess
6a72045707
fix local dropping to not require extra locking of copies, but only that the local copy be locked for removal 2015-10-09 15:48:02 -04:00
Joey Hess
1043880432
improve message when drop failed due to no locked copy 2015-10-09 15:14:25 -04:00
Joey Hess
b021321aae
rename constructor 2015-10-09 15:01:33 -04:00
Joey Hess
45e1a7c361
verify local copy of content with locking 2015-10-09 14:57:32 -04:00
Joey Hess
4c6095b6f5
content locking during drop working for local git remotes
Only ssh remotes lack locking now
2015-10-09 13:12:58 -04:00
Joey Hess
ceb5819538
finish and use lockContent interface 2015-10-09 12:36:04 -04:00
Joey Hess
cf79dffa4c
improve drop proof code 2015-10-09 11:09:46 -04:00
Joey Hess
f57ac29be1
refactor 2015-10-09 10:30:22 -04:00
Joey Hess
7f5958eec2
TrustedCopy is good enough to allow dropping
By definition, a trusted repository is trusted to always have its location
tracking log accurate. Thus, it should never be in a position where content
is being dropped from it concurrently, as that would result in the location
tracking log not being accurate.
2015-10-08 18:34:48 -04:00
Joey Hess
e4a33967a1
try harder to verify until at least one VerifiedCopyLock is obtained
This avoids a failure where eg, we start with RecentlyVerifiedCopies
for all remotes, and so didn't do any active verification, which is
required.

Also, dedup the list of VerifiedCopies when checking if we have enough,
in case 2 copies of a UUID slip in.
2015-10-08 18:20:36 -04:00
Joey Hess
b17f5da6c9
require 1 locked copy while dropping from local or a remote
See doc/bugs/concurrent_drop--from_presence_checking_failures.mdwn for
discussion about why 1 locked copy is all we can require, and how this
fixes concurrent dropping bugs.

Note that, since nothing yet generates a VerifiedCopyLock yet, this commit
breaks dropping temporarily.
2015-10-08 18:11:39 -04:00
Joey Hess
c75c79864d
support invalidating existing VerifiedCopys 2015-10-08 17:58:32 -04:00
Joey Hess
90f7c4b6a2
add VerifiedCopy data type
There should be no behavior changes in this commit, it just adds a more
expressive data type and adjusts code that had been passing around a [UUID]
or sometimes a Maybe Remote to instead use [VerifiedCopy].

Although, since some functions were taking two different [UUID] lists,
there's some potential for me to have gotten it horribly wrong.
2015-10-08 16:55:11 -04:00
Joey Hess
beedf1da25
unused import 2015-10-08 14:59:34 -04:00
Joey Hess
9cb9dab69b
I think this comment is stale/confusing; remove 2015-10-08 14:51:44 -04:00
Joey Hess
4d50958ed7
add lockContentShared
Also, rename lockContent to lockContentExclusive

inAnnexSafe should perhaps be eliminated, and instead use
`lockContentShared inAnnex`. However, I'm waiting on that, as there are
only 2 call sites for inAnnexSafe and it's fiddly.
2015-10-08 14:29:35 -04:00
Joey Hess
2def1d0a23 other 80% of avoding verification when hard linking to objects in shared repo
In c6632ee5c8, it actually only handled
uploading objects to a shared repository. To avoid verification when
downloading objects from a shared repository, was a lot harder.

On the plus side, if the process of downloading a file from a remote
is able to verify its content on the side, the remote can indicate this
now, and avoid the extra post-download verification.

As of yet, I don't have any remotes (except Git) using this ability.
Some more work would be needed to support it in special remotes.

It would make sense for tahoe to implicitly verify things downloaded from it;
as long as you trust your tahoe server (which typically runs locally),
there's cryptographic integrity. OTOH, despite bup being based on shas,
a bup repo under an attacker's control could have the git ref used for an
object changed, and so a bup repo shouldn't implicitly verify. Indeed,
tahoe seems unique in being trustworthy enough to implicitly verify.
2015-10-02 14:35:12 -04:00
Joey Hess
7c7fe895f9 disabling verification also disables size verification
It's not expensive to do size verification, but let's be consistent and
turn it off too.
2015-10-02 12:38:02 -04:00
Joey Hess
c6632ee5c8 avoid verification when hard linking to objects in shared repository
Such a repository is implicitly trusted, so there's no point.
2015-10-02 12:36:03 -04:00
Joey Hess
2fb3722ce9 Do verification of checksums of annex objects downloaded from remotes.
* When annex objects are received into git repositories, their checksums are
  verified then too.
* To get the old, faster, behavior of not verifying checksums, set
  annex.verify=false, or remote.<name>.annex-verify=false.
* setkey, rekey: These commands also now verify that the provided file
  matches the key, unless annex.verify=false.
* reinject: Already verified content; this can now be disabled by
  setting annex.verify=false.

recvkey and reinject already did verification, so removed now duplicate
code from them. fsck still does its own verification, which is ok since it
does not use getViaTmp, so verification doesn't happen twice when using fsck
--from.
2015-10-01 15:56:39 -04:00
Joey Hess
b72d3fbeba rename function 2015-10-01 14:18:57 -04:00
Joey Hess
807ba6a903 refactor 2015-10-01 14:07:06 -04:00
Joey Hess
dc2f1f09b7 Improve robustness of direct mode merge, avoiding a crash if the index file is missing.
I couldn't find a good way to make an *empty* index file (zero byte file
won't do), so I punted and just don't make index.lock when there's no index
yet. This means some other git process could race and write an index file
at the same time as the merge is ongoing, in theory. Only happens in new
repos though.
2015-09-22 13:00:18 -04:00
Joey Hess
b88739f0d0 avoid auto-enabling a remote that's already enabled 2015-09-14 15:34:15 -04:00
Joey Hess
c919489c3e avoid autoenable of dead special remotes 2015-09-14 15:28:14 -04:00
Joey Hess
9cfb96c53d Special remotes configured with autoenable=true will be automatically enabled when git-annex init is run. 2015-09-14 14:49:48 -04:00
Joey Hess
97962591d6 init: Fix reversion in detection of repo made with git clone --shared 2015-09-09 13:56:37 -04:00
Joey Hess
c242e248e8 Fix reversion in init when ran as root, introduced in version 5.20150731. 2015-08-19 12:36:17 -04:00
Joey Hess
0f5d6c09ac importfeed --relaxed: Avoid hitting the urls of items in the feed. 2015-08-19 12:24:55 -04:00