Commit graph

1725 commits

Author SHA1 Message Date
Joey Hess
5afc2eaa54
reinject --known: Avoid second, unncessary checksum of file. 2016-11-07 12:07:36 -04:00
Joey Hess
8dcf79694d
enable forwardRetry for command-line transfers
If a transfer fails for some reason, but some data managed to be sent, the
transfer will be retried. (The assistant already did this.)

Possible impacts:

* More ssh prompts if ssh needs to prompt for a password to connect to a
  host, or is prompting about some other problem like a ssh key mismatch.

* More data transfer due to retrying, epecially when a remote does not
  support resuming a transfer.

  In the worst case, a lot of data will be transferred but it fails before
  the end, and then all that data gets transferred again plus one byte more;
  repeat until it manages to get the whole file.
2016-10-26 15:38:27 -04:00
Joey Hess
0b1c061382
importfeed: Drop URL parameters from file extension.
Thanks, James MacMahon.
2016-10-17 16:02:05 -04:00
Joey Hess
8e22114735
upgrade: Handle upgrade to v6 when the repository already contains v6 unlocked files whose content is already present.
Closes https://github.com/datalad/datalad/issues/1020

The use of runWriter in scanUnlockedFiles broke due to this change;
it failed with blocked indefinitely in mvar, because the database write
handle was taken while linkFromAnnex needed to also write to it (to update
the inode cache). So, switched to using a separate runWriter for each call
to addAssociatedFileFast. A little less efficient, but not greatly; the
writes should all still be cached.
2016-10-17 15:19:47 -04:00
Joey Hess
ee309d6941
lock: Fix edge cases where data loss could occur in v6 mode.
In the case where the pointer file is in place, and not the content
of the object, lock's  performNew was called with filemodified=True,
which caused it to try to repopulate the object from an unmodified
associated file, of which there were none. So, the content of the object
got thrown away incorrectly. This was the cause (although not the root
cause) of data loss in https://github.com/datalad/datalad/issues/1020

The same problem could also occur when the work tree file is modified,
but the object is not, and lock is called with --force. Added a test case
for this, since it's excercising the same code path and is easier to set up
than the problem above.

Note that this only occurred when the keys database did not have an inode
cache recorded for the annex object. Normally, the annex object would be in
there, but there are of course circumstances where the inode cache is out
of sync with reality, since it's only a cache.

Fixed by checking if the object is unmodified; if so we don't need to
try to repopulate it. This does add an additional checksum to the unlock
path, but it's already checksumming the worktree file in another case,
so it doesn't slow it down overall.

Further investigation found a similar problem occurred when smudge --clean
is called on a file and the inode cache is not populated. cleanOldKeys
deleted the unmodified old object file in this case. This was also
fixed by checking if the object is unmodified.

In general, use of getInodeCaches and sameInodeCache is potentially
dangerous if the inode cache has not gotten populated for some reason.
Better to use isUnmodified. I breifly auited other places that check the
inode cache, and did not see any immediate problems, but it would be easy
to miss this kind of problem.
2016-10-17 13:58:43 -04:00
Joey Hess
f867fc157f
When auto-upgrading a v3 remote, avoid upgrading to version 6, instead keep it at version 5.
Fixes a bug introduced with v6 mode that I didn't notice until now.
Probably not many v3 repos left out there, and upgrading them to v6 mode
is not disastrous, only a little premature.

This commit was sponsored by Riku Voipio
2016-10-05 16:23:09 -04:00
Joey Hess
166d70db77
convert TMVars that are never left empty into TVars
This is probably more efficient, and it avoids mistakenly leaving them
empty.
2016-09-30 19:51:16 -04:00
Joey Hess
c910004d50
addurl, importfeed: Improve behavior when file being added is gitignored. 2016-09-21 17:21:48 -04:00
Joey Hess
a569f195b7
fix bugs in handing of deep branches with sync and adjusted branches
* sync: Previously, when run in a branch with a slash in its name,
  such as "foo/bar", the sync branch was "synced/bar". That conflicted
  with the sync branch used for branch "bar", so has been changed to
  "synced/foo/bar".
* adjust: Previously, when adjusting a branch with a slash in its name,
  such as "foo/bar", the adjusted branch was "adjusted/bar(unlocked)".
  That conflicted with the adjusted branch used for branch "bar",
  so has been changed to "adjusted/foo/bar(unlocked)"
* Also, running sync in an adjusted branch did not correctly sync
  changes back to the parent branch when it had a slash in its name.
  This bug has been fixed.

Eliminate use of Git.Ref.under and Git.Ref.basename; using
Git.Ref.underBase and Git.Ref.base make everything handle deep branches
correctly.

Probably noone was adjusting deep branches, and v6 is still experimental
anyway, so I'm not going to worry about the mess that was left by that bug.

In the case of git-annex sync, using a fixed git-annex with an old unfixed
one will mean they use different sync branches for a deep branch, and so
they may stop syncing until the old one is upgraded. However, that's only
a problem when syncing between repositories without going via a central
bare repository. Added a warning about this to the CHANGELOG, but it's
probably not going to affect many people at all.

This commit was sponsored by Riku Voipio.
2016-09-21 15:23:47 -04:00
Joey Hess
0e30e71e9c
info: Support being passed a treeish, and show info about the annexed files in it similar to how a directory is handled. 2016-09-15 12:51:00 -04:00
Joey Hess
3e22d60549
copy, move, mirror: Support --json and --json-progress. 2016-09-09 16:24:26 -04:00
Joey Hess
a108235565
better locking for json with -J
Avoid threads emitting json at the same time and scrambling, which was
still possible even with the buffering, just less likely.

Converted json IO actions to JSONChunk data too.
2016-09-09 15:51:34 -04:00
Joey Hess
05d4438383
addurl, get: Added --json-progress option, which adds progress objects to the json output.
This doesn't work right when used with -J yet, and there is some really
ugly hand-crafting of part of the json output.
2016-09-09 15:06:54 -04:00
Joey Hess
8ef494a833
disentangle concurrency and message type
This makes -Jn work with --json and --quiet, where before
setting -Jn disabled those options.

Concurrent json output is currently a mess though since threads output
chunks over top of one-another.
2016-09-09 12:57:42 -04:00
Joey Hess
31289da691
get -J: Download different files from different remotes when the remotes have the same costs.
Only done in -J mode because only if there's concurrency can downloading
from two remotes be faster. Without concurrency, it's likely the case that
sequential downloads from the same remote are faster than switching back
and forth between two remotes.

There is some hairy MVar code here, but basically it just keeps
the activeremotes MVar full except when deciding which remote to assign
to a thread.

Also affects gets by sync --content -J

This commit was sponsored by Jochen Bartl.
2016-09-06 12:45:21 -04:00
Joey Hess
eb469bd139
use keyLocations not loggedLocations
Skip dead remotes.
2016-09-06 11:57:45 -04:00
Joey Hess
5d70eaacaf
examimekey: Allow being run in a git repo that is not initialized by git-annex yet.
No reason not to; indeed there's no real reason to need a git repository
at all except the implementation uses the Annex monad.
2016-09-05 12:26:59 -04:00
Joey Hess
10ddf2c3bd
remove TransferObserver
unused after last commit
2016-08-03 13:46:20 -04:00
Joey Hess
f461bcae4b
Re-enable accumulating transfer failure log files for command-line actions
This was disabled in commit 61ccf95004,
because only the assistant used them, and they were clutter. But, now
--failed also uses them.

Remove the failure log files after successful transfers. Should avoid
most of the clutter problems.

Commit 61ccf95004 mentions a subtle behavior
change, which has now been reverted:

    There is one behavior change from this. If glacier is being used, and a
    manual git annex get --from glacier fails because the file isn't available
    yet, the assistant will no longer later see that failed transfer file and
    retry the get.
2016-08-03 13:41:07 -04:00
Joey Hess
1a0e2c9901
get, move, copy, mirror: Added --failed switch which retries failed copies/moves
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.

Noisy due to some refactoring into Types/
2016-08-03 12:37:12 -04:00
Joey Hess
f0886a1bdd
info: When run on a file now includes an indication of whether the content is present locally. 2016-07-30 12:29:59 -04:00
Joey Hess
bf3327ff25
Added metadata --batch option, which allows getting, setting, deleting, and modifying metadata for multiple files/keys. 2016-07-27 10:46:25 -04:00
Joey Hess
928fbb162d
improved use of Aeson for JSONActionItem 2016-07-26 19:50:02 -04:00
Joey Hess
870873bdaa
Removed dependency on json library; all JSON is now handled by aeson.
I've eyeballed all --json commands, and the only difference should be
that some fields are re-ordered.
2016-07-26 19:15:34 -04:00
Joey Hess
8bc8469c38
saner format for metadata --json
metadata --json output format has changed, adding a inner json object
named "fields" which contains only the fields and their values.

This should be easier to parse than the old format, which mixed up
metadata fields with other keys in the json object.

Any consumers of the old format will need to be updated.

This adds a dependency on unordered-containers for parsing MetaData
from JSON, but it's a free dependency; aeson pulls in that library.
2016-07-26 15:41:04 -04:00
Joey Hess
a030d0a8b7
allow using Aeson for streaming JSON output
Keeping Text.JSON use for now, because it seems a better fit for most of
the commands, which don't use very structured JSON objects, but just output
whatever fields suites them. But this lets Aeson be used when a more
structured data type is available to serialize to JSON.
2016-07-26 13:30:07 -04:00
Joey Hess
d13194b230
--branch, stage 2
Show branch:file that is being operated on.

I had to make ActionItem a type and not a type class because
withKeyOptions' passed two different types of values when using the type
class, and I could not get the type checker to accept that.
2016-07-20 15:23:43 -04:00
Joey Hess
847944e6b1
more generic showStart' 2016-07-20 14:03:54 -04:00
Joey Hess
bf8bf14e8e
--branch, stage 1
Added --branch option to copy, drop, fsck, get, metadata, mirror, move, and
whereis commands. This option makes git-annex operate on files that are
included in a specified branch (or other treeish).

The names of the files from the branch that are being operated on are not
displayed yet; only the keys. Displaying the filenames will need changes
to every affected command.

Also, note that --branch can be specified repeatedly. This is not really
documented, but seemed worth supporting, especially since we may later want
the ability to operate on all branches matching a refspec. However, when
operating on two branches that contain the same key, that key will be
operated on twice.
2016-07-20 12:05:26 -04:00
Joey Hess
c4d011bf3e
log: Added --all option. 2016-07-17 15:15:08 -04:00
Joey Hess
0c713a94bd
uninit: Fix crash due to trying to write to deleted keys db.
Reversion introduced by v6 mode support, affects v5 too.

Also fix a similar crash when the webapp is used to delete a repository.
2016-07-12 14:18:35 -04:00
Joey Hess
5642daa651
fsck: Fix a reversion in direct mode fsck of a file that is present when the location log thinks it is not. Reversion introduced in version 5.20151208. 2016-07-12 13:41:03 -04:00
Joey Hess
5171e98988
drop: Add --batch and --json options. 2016-07-06 11:56:11 -04:00
Joey Hess
ed8ecbea0c
get: Add --batch and --json options. 2016-07-05 08:57:45 -04:00
Joey Hess
b6b5a11601
Make git clean filter preserve the backend that was used for a file. 2016-06-09 15:17:08 -04:00
Joey Hess
7fe2ecff91
Fix update of associated files db when unlocking a file in a v6 repo. 2016-06-09 14:45:00 -04:00
Joey Hess
0bc7fee660
Make lock and unlock work in v6 repos on files whose content is not present. 2016-06-09 14:40:44 -04:00
Joey Hess
74e01a2d01
move --to: Better behavior when system is completely out of disk space; drop content from disk before writing location log.
I noticed move --to failing when there was no disk space. The file was sent
to the remote, but it crashed before it could be dropped locally. This
could fix that.
2016-06-05 13:51:22 -04:00
Joey Hess
9996e04f41
list: Do not include dead repositories. 2016-06-04 14:33:31 -04:00
Joey Hess
26887745a0
refactor isBareRepo 2016-06-02 16:59:47 -04:00
Joey Hess
fbf5045d4f
sync --content: Fix bug that caused transfers of files to be made to a git remote that does not have a UUID. This particularly impacted clones from gcrypt repositories.
Added guard in Annex.Transfer to prevent this problem at a deeper level.

I'm unhappy ith NoUUID, but having Maybe UUID instead wouldn't help either
if nothing checked that there was a UUID. Since there legitimately need to
be Remotes that do not have a UUID, I can't see a way to fix it at the type
level, short making there be two separate types of Remotes.
2016-06-02 13:50:43 -04:00
Joey Hess
1b3bde0625
enableremote: Remove annex-ignore configuration from a remote. 2016-05-24 15:58:27 -04:00
Joey Hess
b33a649a25
enableremote: Can now be used to explicitly enable git-annex to use git remotes. Using the command this way prevents other git-annex commands from probing new git remotes to auto-enable them. 2016-05-24 15:24:38 -04:00
Joey Hess
91df4c6b53
Pass the various gnupg-options configs to gpg in several cases where they were not before.
Removed the instance LensGpgEncParams RemoteConfig because it encouraged
code that does not take the RemoteGitConfig into account.

RemoteType's setup was changed to take a RemoteGitConfig,
although the only place that is able to provide a non-empty one is
enableremote, when it's changing an existing remote. This led to several
folow-on changes, and got RemoteGitConfig plumbed through.
2016-05-23 17:03:20 -04:00
Joey Hess
16efe45a35
remove unused 2016-05-23 16:46:43 -04:00
Joey Hess
eda5d9cc74
adjust: Add --fix adjustment, which is useful when the git directory is in a nonstandard place. 2016-05-16 17:18:33 -04:00
Joey Hess
76170b0457
add: Adding a v6 pointer file used to annex it; now the pointer file is added to git as-is.
(git add of a pointer file already did the right thing)
2016-05-16 15:30:40 -04:00
Joey Hess
0860731760
reorder associated file db update
There's a potential race where the smudge filter is run at the same time an
object is being downloaded. If the download finished after the inAnnex
check, and before the keys db was updated, the associated file would not
get updated with the downloaded content.

I'm not sure this closes the race; it may only narrow the window. Problem
is, the keys database needs to communicate between two different processes.
In the case of the assistant, the transferkeys command is the other
process, and it closes the db handle after getting the file. So, it should
re-open the database and so see the update that the smudge filter has
written to it. But, what if the smudge filter takes a while to update the
database?
2016-05-16 14:55:05 -04:00
Joey Hess
5f0b551c0c
assistant: Fix race in v6 mode that caused downloaded file content to sometimes not replace pointer files.
The keys database handle needs to be closed after merging, because the
smudge filter, in another process, updates the database. Old cached info
can be read for a while from the open database handle; closing it ensures
that the info written by the smudge filter is available.

This is pretty horribly ad-hoc, and it's especially nasty that the
transferrer closes the database every time.
2016-05-16 14:49:12 -04:00
Joey Hess
9f05be393e
adjust: If the adjusted branch already exists, avoid overwriting it, since it might contain changes that have not yet been propigated to the original branch.
Could not think of a foolproof way to detect if the old adjusted branch was
just behind the current branch. It's possible that the user amended the
adjusting commit at the head of the adjusted branch, for example.

I decided to bail in this situation, instead of just entering the old
branch, so that if git annex adjust succeeds the user is always in a
*current* adjusted branch, not some old and out of date one.

What could perhaps be done is enter the old branch and then update it. But
that seems too magical; the user may have rebased master or something or
may not want to propigate the changes from the old branch. Best to error
out.
2016-05-13 14:04:22 -04:00