Commit graph

534 commits

Author SHA1 Message Date
Joey Hess
d3ba9fe5c8
matchexpression: New plumbing command to check if a preferred content expression matches some data. 2016-01-25 16:16:18 -04:00
Joey Hess
f9c5aa84e0
add database benchmark
The benchmark shows that the database access is quite fast indeed!
And, it scales linearly to the number of keys, with one exception,
getAssociatedKey.

Based on this benchmark, I don't think I need worry about optimising
for cases where all files are locked and the database is mostly empty.
In those cases, database access will be misses, and according to this
benchmark, should add only 50 milliseconds to runtime.

(NB: There may be some overhead to getting the database opened and locking
the handle that this benchmark doesn't see.)

joey@darkstar:~/src/git-annex>./git-annex benchmark
setting up database with 1000
setting up database with 10000
benchmarking keys database/getAssociatedFiles from 1000 (hit)
time                 62.77 μs   (62.70 μs .. 62.85 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 62.81 μs   (62.76 μs .. 62.88 μs)
std dev              201.6 ns   (157.5 ns .. 259.5 ns)

benchmarking keys database/getAssociatedFiles from 1000 (miss)
time                 50.02 μs   (49.97 μs .. 50.07 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 50.09 μs   (50.04 μs .. 50.17 μs)
std dev              206.7 ns   (133.8 ns .. 295.3 ns)

benchmarking keys database/getAssociatedKey from 1000 (hit)
time                 211.2 μs   (210.5 μs .. 212.3 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 211.0 μs   (210.7 μs .. 212.0 μs)
std dev              1.685 μs   (334.4 ns .. 3.517 μs)

benchmarking keys database/getAssociatedKey from 1000 (miss)
time                 173.5 μs   (172.7 μs .. 174.2 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 173.7 μs   (173.0 μs .. 175.5 μs)
std dev              3.833 μs   (1.858 μs .. 6.617 μs)
variance introduced by outliers: 16% (moderately inflated)

benchmarking keys database/getAssociatedFiles from 10000 (hit)
time                 64.01 μs   (63.84 μs .. 64.18 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 64.85 μs   (64.34 μs .. 66.02 μs)
std dev              2.433 μs   (547.6 ns .. 4.652 μs)
variance introduced by outliers: 40% (moderately inflated)

benchmarking keys database/getAssociatedFiles from 10000 (miss)
time                 50.33 μs   (50.28 μs .. 50.39 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 50.32 μs   (50.26 μs .. 50.38 μs)
std dev              202.7 ns   (167.6 ns .. 252.0 ns)

benchmarking keys database/getAssociatedKey from 10000 (hit)
time                 1.142 ms   (1.139 ms .. 1.146 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.142 ms   (1.140 ms .. 1.144 ms)
std dev              7.142 μs   (4.994 μs .. 10.98 μs)

benchmarking keys database/getAssociatedKey from 10000 (miss)
time                 1.094 ms   (1.092 ms .. 1.096 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.095 ms   (1.095 ms .. 1.097 ms)
std dev              4.277 μs   (2.591 μs .. 7.228 μs)
2016-01-12 13:07:03 -04:00
Joey Hess
121f5d5b0c
annex.thin
Decided it's too scary to make v6 unlocked files have 1 copy by default,
but that should be available to those who need it. This is consistent with
git-annex not dropping unused content without --force, etc.

* Added annex.thin setting, which makes unlocked files in v6 repositories
  be hard linked to their content, instead of a copy. This saves disk
  space but means any modification of an unlocked file will lose the local
  (and possibly only) copy of the old version.
* Enable annex.thin by default on upgrade from direct mode to v6, since
  direct mode made the same tradeoff.
* fix: Adjusts unlocked files as configured by annex.thin.
2015-12-27 15:59:59 -04:00
Joey Hess
723e4e31a1
merge clean into smudge command
The git filter config can be used to map the single git-annex command to
the 2 actions, and this avoids "git annex clean" being used for this thing,
it might have a better use for that name later.
2015-12-04 15:32:47 -04:00
Joey Hess
20ca89dfa3
skeleton smudge/clean filters 2015-12-04 13:03:39 -04:00
Joey Hess
f16e235983
addurl, importfeed: Changed to honor annex.largefiles settings, when the content of the url is downloaded. (Not when using --fast or --relaxed.)
importfeed just calls addurl functions, so inherits this from it.

Note that addurl still generates a temp file, and uses that key to download
the file. It just adds it to the work tree at the end when the file is small.
2015-12-02 15:12:33 -04:00
Joey Hess
dc8099872a
import: Changed to honor annex.largefiles settings. 2015-12-02 14:49:03 -04:00
Joey Hess
5ec67335f4
improve annex.largefiles documentation 2015-12-02 14:26:49 -04:00
Joey Hess
7fce3a0f81
more warnings about networked filesystems 2015-11-13 15:55:16 -04:00
Joey Hess
aa4192aea6
pid locking configuration and abstraction layer for git-annex
(not actually used anywhere yet)
2015-11-12 17:50:34 -04:00
Joey Hess
2fb3722ce9 Do verification of checksums of annex objects downloaded from remotes.
* When annex objects are received into git repositories, their checksums are
  verified then too.
* To get the old, faster, behavior of not verifying checksums, set
  annex.verify=false, or remote.<name>.annex-verify=false.
* setkey, rekey: These commands also now verify that the provided file
  matches the key, unless annex.verify=false.
* reinject: Already verified content; this can now be disabled by
  setting annex.verify=false.

recvkey and reinject already did verification, so removed now duplicate
code from them. fsck still does its own verification, which is ok since it
does not use getViaTmp, so verification doesn't happen twice when using fsck
--from.
2015-10-01 15:56:39 -04:00
Joey Hess
ffa8221517 annex.hardlink extended to also try to use hard links when copying from the repository to a remote.
Also, it used to only check that one of the repos was not in direct mode;
now when either repo is direct mode, annex.hardlink won't have an effect.
2015-09-14 12:13:38 -04:00
Yaroslav Halchenko
72129503a9 DOC: refer to corresponding manpage not to non-existing PREFERRED CONTENT section 2015-09-02 12:05:08 -07:00
Øyvind A. Holm
67f7de5986 doc/*.mdwn: Minor fixes (typos, letter case) 2015-07-26 04:21:06 +02:00
Joey Hess
386b8c394e got bash completion working for "git annex" not just "git-annex"
This needs a patch to git to cause the git-annex completion to be
auto-loaded when completing "git annex <tab>". Otherwise, it will only
load when "git-annex" is tab completed. Once loaded, it works for both
uses. I've submitted the git patch to the git mailing list.
2015-07-16 13:32:23 -04:00
Joey Hess
42948e960f typo 2015-07-13 13:25:49 -04:00
Joey Hess
b4d22e6d49 doc updates 2015-07-10 13:49:37 -04:00
Joey Hess
a51b98cdd5 sync: When annex.autocommit=false, avoid making any commit of local changes, while still merging with remote to the extent possible. 2015-07-07 16:36:11 -04:00
Joey Hess
1529add61a Brought back the setkey plumbing command that was removed in 2011, since we found a use case for it. Note that the command's syntax was changed for consistency. 2015-07-02 17:44:25 -04:00
Joey Hess
a099dc3f6a comment and warning 2015-07-02 15:21:25 -04:00
anarcat
0d2151beb7 explicitely describe exit status in the standard section 2015-06-23 16:56:03 +00:00
Joey Hess
8b74aec3ea Increased the default annex.bloomaccuracy from 1000 to 10000000
This makes git annex unused use around 48 mb more memory than it did before,
but the massive increase in accuracy makes this worthwhile for all but the
smallest systems.

Also, I want to use the bloom filter for sync --all --content, to avoid
dropping files that the preferred content doesn't want, and 1/1000
false positives would be far too many in that use case, even if it were
acceptable for unused.

Actual memory use numbers:

1000: 21.06user 3.42system 0:26.40elapsed 92%CPU (0avgtext+0avgdata 501552maxresident)k
1000000: 21.41user 3.55system 0:26.84elapsed 93%CPU (0avgtext+0avgdata 549496maxresident)k
10000000: 21.84user 3.52system 0:27.89elapsed 90%CPU (0avgtext+0avgdata 549920maxresident)k

Based on these numbers, 10 million seemed a better pick than 1 million.
2015-06-16 18:12:00 -04:00
Joey Hess
f8ab3bc449 dead --key: Can be used to mark a key as dead. 2015-06-09 14:52:05 -04:00
Antoine Beaupré
1393797373 add and fix refs in man mainpage 2015-05-29 12:12:11 -04:00
Joey Hess
823bb8031b add annex.used-refspec 2015-05-14 15:44:08 -04:00
Joey Hess
ef2202fd94 required: New command, like wanted, but for required content.
Also refactored some code to reduce duplication.
2015-04-18 16:04:35 -04:00
Joey Hess
ce0a82f493 contentlocationn: New plumbing command. 2015-04-09 15:34:47 -04:00
Joey Hess
9445556c97 rethought distributed fsck; instead add activity.log and expire command
This is much more space efficient!
2015-04-05 12:50:02 -04:00
Joey Hess
20fb91a7ad WIP on making --quiet silence progress, and infra for concurrent progress bars 2015-04-03 16:48:30 -04:00
Øyvind A. Holm
490e97ec10 Various typo fixes in doc/*.mdwn 2015-04-02 01:50:17 +02:00
Joey Hess
9e25cbde20 importfeed: Avoid downloading a redundant item from a feed whose guid has been downloaded before, even when the url has changed.
To support this, always store itemid in metadata; before this was only done
when annex.genmetadata was set.
2015-03-31 13:30:13 -04:00
Joey Hess
cd6b62f35e --auto is no longer a global option; only get, drop, and copy accept it.
Not a behavior change unless you were passing it to a command that ignored it.
2015-03-25 17:06:14 -04:00
Joey Hess
0b029570a7 finished splitting out man pages for all commands 2015-03-25 12:09:49 -04:00
Joey Hess
0850e8eaf9 separated man pages for all the maintenance commands 2015-03-24 15:23:59 -04:00
Joey Hess
f10282807e separated man pages for all the setup commands while at the gate in ATL 2015-03-23 18:20:42 -04:00
Joey Hess
3cc7c03721 Man pages for individual commands now available, and can be opened using "git annex help <command>" 2015-03-23 17:50:03 -04:00
Joey Hess
daec4b007a splitting up the man page
Common command man pages all split out and often expanded.

A few sections split out into their own pages.

Still need to do all the other commands..
2015-03-23 15:36:10 -04:00
Joey Hess
c233f98564 migrate: --force will force migration of keys already using the destination backend. Useful in rare cases. 2015-03-23 12:11:16 -04:00
Joey Hess
798da6cf2e Added a post-update-annex hook, which is run after the git-annex branch is updated. Needed for git update-server-info.
See https://github.com/datalad/datalad/issues/1#issuecomment-84094406
2015-03-20 14:52:58 -04:00
Joey Hess
e6158130c6 checkpresentkey: New plumbing command to check if a key can be verified to be present on a remote. 2015-03-20 11:44:46 -04:00
Joey Hess
50ef4105e3 readpresentkey: New plumbing command for checking location log. 2015-03-20 11:22:27 -04:00
Joey Hess
abfe3c09b2 registerurl: New plumbing command for mass-adding urls to keys. 2015-03-15 14:37:33 -04:00
Joey Hess
b24bb6b435 fromkey: Add stdin mode. 2015-03-15 14:07:43 -04:00
Joey Hess
fa180c1ba1 fromkey --force: Skip test that the key has its content in the annex. 2015-03-15 13:51:58 -04:00
Joey Hess
504dda82a4 addurl: Added --raw option, which bypasses special handling of quvi, bittorrent etc urls. 2015-03-05 14:46:08 -04:00
Joey Hess
022461d773 add a link 2015-02-25 15:49:18 -04:00
Joey Hess
68725d27e5 wording 2015-02-25 14:31:17 -04:00
Joey Hess
8066a1c3cc The file matching options are now only accepted by commands that can actually use them. 2015-02-06 17:16:41 -04:00
Joey Hess
dfab5e6ff4 import: Support file matching options such as --exclude, --include, --smallerthan, --largerthan 2015-02-06 15:58:06 -04:00
Joey Hess
febb1c2082 groupwanted: New command to set the groupwanted preferred content expression. 2015-02-06 15:12:42 -04:00
Joey Hess
ba3825441c rework Differences data type
Eliminated complexity and future proofed. The most important change is that
all functions over Difference are now total; any Difference that can be
expressed should be handled. Avoids needs for sanity checking of inputs,
and version skew with the future.

Also, the difference.log now serializes a [Difference], not a Differences.
This saves space and keeps it simpler.

Note that [Difference] might contain conflicting differences (eg,
[Version5, Version6]. In this case, one of them needs to consistently win
over the others, probably based on Ord.
2015-01-28 13:50:02 -04:00
Joey Hess
70736d2b41 Repository tuning parameters can now be passed when initializing a repository for the first time.
* init: Repository tuning parameters can now be passed when initializing a
  repository for the first time. For details, see
  http://git-annex.branchable.com/tuning/
* merge: Refuse to merge changes from a git-annex branch of a repo
  that has been tuned in incompatable ways.
2015-01-27 17:38:06 -04:00
Joey Hess
587f6a919b addurl: When a Content-Disposition header suggests a filename to use, addurl will consider using it, if it's reasonable and doesn't conflict with an existing file. (--file overrides this) 2015-01-22 14:52:52 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
4fcf65cda3 devblog 2015-01-13 18:36:17 -04:00
Joey Hess
534c29deae implemented old Richih wishlist about remote/uuid info
* info: Can now display info about a given uuid.
  * Added to remote/uuid info: Count of the number of keys present
    on the remote, and their size. This is rather expensive to calculate,
    so comes last and --fast will disable it.
  * Git remote info now includes the date of the last sync with the remote.
2015-01-13 18:13:14 -04:00
Joey Hess
43dc7f678f setpresentkey: A new plumbing-level command. 2014-12-29 15:16:40 -04:00
Joey Hess
aba3e11776 sync: Now supports remote groups, the same way git remote update does. 2014-12-29 13:42:58 -04:00
Jean Jordaan
c011fe156d Get rid of mysterious "_why_" 2014-12-20 14:18:24 +02:00
Joey Hess
0e44d95964 update for torrents 2014-12-18 00:57:41 -04:00
Joey Hess
a7690de016 Added bittorrent special remote
addurl behavior change: When downloading an url ending in .torrent,
it will download files from bittorrent, instead of the old behavior
of adding the torrent file to the repository.

Added Recommends on aria2 and bittornado | bittorrent.

This commit was sponsored by Asbjørn Sloth Tønnesen.
2014-12-16 23:22:46 -04:00
Joey Hess
6ecd3ff421 diffdriver: New git-annex command, to make git external diff drivers work with annexed files.
Closes https://github.com/datalad/datalad/issues/18
2014-11-24 16:14:06 -04:00
Joey Hess
13260ccc3a undo command
This commit was sponsored by Andrew Cant.
2014-11-14 14:41:07 -04:00
Joey Hess
d22d650f59 proxy command is closer to plumbing than a general use command 2014-11-13 13:59:00 -04:00
Joey Hess
864086a956 proxy: for all your direct mode repository munging needs
This allows bypassing the direct mode guard in a safe way to do all sorts
of things including git revert, git mv, git checkout ...

This commit was sponsored by the WikiMedia Foundation.
2014-11-12 15:51:46 -04:00
Joey Hess
a0297915c1 add per-remote-type info
Now `git annex info $remote` shows info specific to the type of the remote,
for example, it shows the rsync url.

Remote types that support encryption or chunking also include that in their
info.

This commit was sponsored by Ævar Arnfjörð Bjarmason.
2014-10-21 14:36:09 -04:00
Joey Hess
aafaa363e3 info: When passed the name or uuid of a remote, displays info about that remote.
No per-remote-type info yet.

This commit was sponsored by Stanley Yamane.
2014-10-21 14:35:07 -04:00
Joey Hess
4a9e70c705 info: When run on a single annexed file, displays some info about the file, including its key and size. 2014-10-21 13:24:15 -04:00
Joey Hess
fe994e58e5 clarify that sync only commits changes to files already added to the repo 2014-09-18 15:43:20 -04:00
Joey Hess
3c49a5d29c typo 2014-09-12 12:29:04 -04:00
Joey Hess
b874f84086 New annex.hardlink setting. Closes: #758593
* New annex.hardlink setting. Closes: #758593
* init: Automatically detect when a repository was cloned with --shared,
  and set annex.hardlink=true, as well as marking the repository as
  untrusted.

Had to reorganize Logs.Trust a bit to avoid a cycle between it and
Annex.Init.
2014-09-05 13:44:09 -04:00
Joey Hess
4b3f03ef38 clarify that --all doesn't operate on a single file 2014-08-19 12:11:19 -04:00
Yaroslav Halchenko
2d2d0a4d75 doc/ minor typos/trailing whitespaces + extension on get options 2014-08-19 01:22:24 -04:00
Joey Hess
93f20541f5 testremote --fast 2014-08-03 18:08:34 -04:00
Joey Hess
1ee24a0366 testremote now tests with and without encryption 2014-08-01 17:52:40 -04:00
Joey Hess
20d7295386 improve testremote command, adding chunk size testing
And also a --size parameter to configure the basic object size.
2014-08-01 16:50:24 -04:00
Joey Hess
9720ee9e56 testremote: New command to test uploads/downloads to a remote.
This only performs some basic tests so far; no testing of chunking or
resuming. Also, the existing encryption type of the remote is used; it
would be good later to derive an encrypted and a non-encrypted version of
the remote and test them both.

This commit was sponsored by Joseph Liu.
2014-08-01 15:10:01 -04:00
Joey Hess
c03e1c5648 add new section for testing commands 2014-08-01 12:49:26 -04:00
Joey Hess
522a0922b8 sync: Fix git sync with local git remotes even when they don't have an annex.uuid set.
Catch an exception when ensureInitialized is run in a non-initted
repository. In this case, just read the git config, so that the Git.Repo
object is not LocalUnknown, which is what is used to represent remotes
on eg, drives that are not connected.

The assistant already got this right, and like with the assistant, this
causes an implicit git-annex init of the local remote on the second sync,
once the git-annex branch has been pushed to it.
See this comment for more analysis:
http://git-annex.branchable.com/todo/Recovering_from_a_bad_sync/#comment-64e469a2c1969829ee149cbb41b1c138

This commit was sponsored by jscit.
2014-07-15 14:27:43 -04:00
Joey Hess
618b1d2a38 improve documentation 2014-07-14 14:37:14 -04:00
Joey Hess
cb66ca3a76 resolvemerge: New plumbing command that runs the automatic merge conflict resolver. 2014-07-11 16:45:18 -04:00
Joey Hess
f22a77890e improve documentation about sync 2014-07-11 14:08:40 -04:00
Joey Hess
d0c1a22e7c import metadata from feeds
When annex.genmetadata is set, metadata from the feed is added to files
that are imported from it.

Reused the same feedtitle and itemtitle, feedauthor, itemauthor, etc names
that are used in --template.

Also added title and author, which are the item title/author if available,
falling back to the feed title/author. These are more likely to be common
metadata fields.

(There is a small bit of dupication here, but once git gets
around to packing the object, it will compress it away.)

The itempubdate field is not included in the metadata as a string; instead
it is used to generate year and month fields, same as is done when adding
files with annex.genmetadata set.

This commit was sponsored by Amitai Schlair, who cooincidentially
is responsible for ikiwiki generating nice feed metadata!
2014-07-03 14:15:00 -04:00
Joey Hess
fd23e819c5
clarify what mtime field in find --format is 2014-06-30 19:03:23 -04:00
Fraser Tweedale
4eb72392b4 execute remote.<name>.annex-shell on remote, if set
It is useful to be able to specify an alternative git-annex-shell
program to execute on the remote, e.g., to run a version not on the
PATH.  Use remote.<name>.annex-shell if specified, instead of the
default "git-annex-shell" i.e., first so-named executable on the
PATH.
2014-05-16 15:46:43 -04:00
Joey Hess
00986d19f4 group: When no groups are specified to set, lists the current groups of a repository. 2014-05-16 14:43:40 -04:00
Robie Basak
4184566627 ddar special remote 2014-05-15 16:32:44 -04:00
Joey Hess
ecc3dc8433 findref: New command, like find but shows files in a specified git ref. 2014-04-17 18:41:24 -04:00
Joey Hess
915d038bec reinit: New command that can initialize a new reposotory using the configuration of a previously known repository. Useful if a repository got deleted and you want to clone it back the way it was. 2014-04-15 20:13:35 -04:00
Joey Hess
03ce5cd8d2
found a way to make uninit always fast
To do so, I slightly changed the behavior of unannex. Now in fast mode, it
only makes a hard link when the annexed file's link count is 1. This avoids
unannexing 2 files with the same content in fast mode from hard linking
them together. (One will end up hard linked to the annex, which the docs
warn about.)

With that change, uninit can simply always run unannex in fast mode. Since
.git/annex/objects is being blown away anyway, there's no worry in this
case about a hard link pointing into it causing an annexed object to be
modified.
2014-04-15 14:23:08 -04:00
Joey Hess
49005d5c28 document uninit --fast and also why it's not the default 2014-04-15 14:12:42 -04:00
Joey Hess
b82582caf1 improve desc 2014-04-14 14:14:54 -04:00
Joey Hess
836f2cc816 importfeed: Filename template can now contain an itempubdate variable. Needs feed 0.3.9.2. 2014-04-07 16:55:04 -04:00
Joey Hess
43909723b3 added git-annex remotedaemon
So far, handling connecting to git-annex-shell notifychanges, and
pulling immediately when a change is pushed to a remote.

A little bit buggy (crashes after the first pull), but it already works!

This commit was sponsored by Mark Sheppard.
2014-04-06 19:10:23 -04:00
Joey Hess
18970377df typo 2014-03-31 20:15:01 -04:00
Joey Hess
dcf4136340
improve metadata documentation 2014-03-26 16:55:29 -04:00
Joey Hess
2f538dd65c add --include-dotfiles: New option, perhaps useful for backups. 2014-03-26 14:52:07 -04:00
Joey Hess
fb8a32cc7f notifications on drop 2014-03-22 15:01:48 -04:00
Joey Hess
e426fac273 add desktop notifications
Motivation: Hook scripts for nautilus or other file managers
need to provide the user with feedback that a file is being downloaded.

This commit was sponsored by THM Schoemaker.
2014-03-22 14:12:19 -04:00
Joey Hess
8bcd67b9d8 metadata: Add --get (from bremner) 2014-03-15 17:29:40 -04:00