Commit graph

417 commits

Author SHA1 Message Date
Joey Hess
2cecc8d2a3
Added GIT_ANNEX_VECTOR_CLOCK environment variable
Can be used to override the default timestamps used in log files in the
git-annex branch. This is a dangerous environment variable; use with
caution.

Note that this only affects writing to the logs on the git-annex branch.
It is not used for metadata in git commits (other env vars can be set for
that).

There are many other places where timestamps are still used, that don't
get committed to git, but do touch disk. Including regular timestamps
of files, and timestamps embedded in some files in .git/annex/, including
the last fsck timestamp and timestamps in transfer log files.

A good way to find such things in git-annex is to get for getPOSIXTime and
getCurrentTime, although some of the results are of course false positives
that never hit disk (unless git-annex gets swapped out..)

So this commit does NOT necessarily make git-annex comply with some HIPPA
privacy regulations; it's up to the user to determine if they can use it in
a way compliant with such regulations.

Benchmarking: It takes 0.00114 milliseconds to call getEnv
"GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log
files can be written with an added overhead of only 0.114 seconds. That
should be by far swamped by the actual overhead of writing the log files
and making the commit containing them.

This commit was supported by the NSF-funded DataLad project.
2017-08-14 14:19:58 -04:00
Joey Hess
bcf276655c
Keys marked as dead are now skipped by --all.
fsck already special-cased dead keys to make --all not report errors with
them, and it makes sense to also expand that to whereis. I think it makes
sense for dead keys to be skipped by all uses of --all, so mistakes can be
completely forgotten about and not come back to haunt us.

The speed impact of testing if the key is dead is negligible for fsck and
whereis, since they use the location log anyway and it gets cached.
This does slow down a few commands that support --all, in particular
metadata --all runs around 2x as slow. I don't think metadata
--all is often used though. It might slow down copy/move/mirror
--all and get --all.
log --all is not affected (does not use the normal --all machinery).

Dead keys will still be processed by --incomplete, --branch,
--failed, and --key. Although it would be unlikely for a dead key to
ave in incomplete or failed transfer. It seems to make perfect sense for
--branch to process keys on the branch, even if dead.

(fsck's special-casing of dead keys was left in, so if one of these options
causes a dead key to be fscked, there will be a nice message.)

This commit was supported by the NSF-funded DataLad project.
2017-05-09 12:55:21 -04:00
Joey Hess
c3970f6c1a
multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting.
This commit was supported by the NSF-funded DataLad project.
2017-03-30 19:35:30 -04:00
Joey Hess
1c4e5f65fc
Drop support for building with old versions of directory, feed, and http-types. 2017-03-10 15:57:41 -04:00
Joey Hess
c8e1e3dada
AssociatedFile newtype
To prevent any further mistakes like 301aff34c4

This commit was sponsored by Francois Marier on Patreon.
2017-03-10 13:35:31 -04:00
Joey Hess
27eca014be
fix up Read instance incompatability caused by recent commit
9c4650358c changed the Read instance for
Key.

I've checked all uses of that instance (by removing it and seeing what
breaks), and they're all limited to the webapp, except one.
That is GitAnnexDistribution's Read instance.

So, 9c4650358c would have broken upgrades
of git-annex from downloads.kitenet.net. Once the .info files there got
updated for a new release, old releases would have failed to parse them
and never upgraded.

To fix this, I found a way to make the .info files that contain
GitAnnexDistribution values be readable by the old version of git-annex.

This commit was sponsored by Ewen McNeill.
2017-02-24 18:59:12 -04:00
Joey Hess
ed56dba868
annex.autocommit can be configured via git-annex config
... to control the default behavior in all clones of a repository.

This includes a new Configurable data type, so the GitConfig type indicates
which values can be configured this way.

The implementation should be quite efficient; the config log is only read
once, and only when a Configurable value has not already been set by
git-config.

Indeed, it would be nice in the future to extend this, so that git-config
is itself only read on demand. Some commands may not need to look at the
git configuration at all.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-02-03 13:58:53 -04:00
Joey Hess
9eb10caa27
Some optimisations to string splitting code.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.

As well as being an optimisation, this helps move toward eliminating use of
missingh.

(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)

I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-01-31 19:06:22 -04:00
Joey Hess
5676d267b5
forgot to add this new source file 2017-01-30 17:36:45 -04:00
Joey Hess
26d23e38f1
vicfg: Include the numcopies configuation.
Docs say vicfg can configure everything from git-annex branch,
so it ought to configure numcopies.

Note that commenting out existing numcopies does not unset it.

This commit was sponsored by Thom May on Patreon.
2017-01-30 15:27:25 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
0a4479b8ec
Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.

Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.

This commit was sponsored by Ethan Aubin.
2016-11-15 21:29:54 -04:00
Joey Hess
7cae6c746c
Optimised git-annex branch log file timestamp parsing. 10% speedup
This sped up git annex find --not --in web from 6.64s to 5.69s.
The optimised parser is probably more like 50% faster than the general one
it replaced.
2016-09-29 14:04:53 -04:00
Joey Hess
c9082cf0e4
move Arbitrary instance to new Types.Transfer module
Avoid orphan instance warning
2016-09-05 14:52:06 -04:00
Joey Hess
d17f08afdc
avoid warning about orphan Arbirary instance 2016-09-05 14:51:07 -04:00
Joey Hess
1a0e2c9901
get, move, copy, mirror: Added --failed switch which retries failed copies/moves
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.

Noisy due to some refactoring into Types/
2016-08-03 12:37:12 -04:00
Joey Hess
c4d011bf3e
log: Added --all option. 2016-07-17 15:15:08 -04:00
Joey Hess
176cd98293
remove \r from Arbitrary for log tests 2016-05-27 12:04:49 -04:00
Joey Hess
eba68572dc
Split lines in the git-annex branch on \r as well as \n, to deal with \r\n terminated lines written by some versions of git-annex on Windows.
This fixes strange displays in some cases, including whereis showing
many duplicate locations, and showing more total copies than actually
exist.

It's unknown if that lead to data loss when eg, dropping. At the moment,
it seems unlikely it could, since the UUID with \r's appended is not the
same as a UUID without, and so no remote matches it.

It's also unknown if \r's can leak in on windows, perhaps when merging the
git-annex branch.
2016-05-27 11:45:13 -04:00
Joey Hess
823c28d2dc
nub transitionList to avoid ugly message after repeated transitions, and avoid redundant work for repeated ForgetDeadRemotes transitions 2016-05-18 12:26:38 -04:00
Joey Hess
8ab27235ea
reinject: Added new mode which can reinject known files into the annex.
For example: git-annex reinject --known /mnt/backup/*
2016-04-22 13:49:32 -04:00
Joey Hess
403b56fb91
Limit annex.largefiles parsing to the subset of preferred content expressions that make sense in its context.
So, not "standard" or "lackingcopies", etc.
2016-02-03 15:04:42 -04:00
Joey Hess
cdf5977053
simplify 2016-02-03 13:23:34 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Joey Hess
1f3358512a
refactor 2016-01-19 15:55:32 -04:00
Joey Hess
983c1894eb
avoid unnecessary reading of git-annex branch data when matching on annex.largefiles
This makes git annex clean not look at the git-annex branch at all,
and so speeds it up by 50% or more.
2015-12-04 15:06:41 -04:00
Joey Hess
b0626230b7
fix use of hifalutin terminology 2015-11-16 14:37:31 -04:00
Joey Hess
aaf1ef268d
convert from Utility.LockPool to Annex.LockPool everywhere 2015-11-12 18:13:37 -04:00
Joey Hess
f9adb905fc
Avoid unncessary write to the location log when a file is unlocked and then added back with unchanged content.
Implemented with no additional overhead of compares etc.

This is safe to do for presence logs because of their locality of change;
a given repo's presence logs are only ever changed in that repo, or in a
repo that has just been actively changing the content of that repo.

So, we don't need to worry about a split-brain situation where there'd
be disagreement about the location of a key in a repo. And so, it's ok to
not update the timestamp when that's the only change that would be made
due to logging presence info.
2015-10-12 14:46:47 -04:00
Joey Hess
6fbabfcf16 oops, didn't mean to commit this debug 2015-10-06 17:28:20 -04:00
Joey Hess
ba7ecf68c0 analysis 2015-10-06 17:11:52 -04:00
Joey Hess
16947ef654 Fix bug in combination of preferred and required content settings. When one was set to the empty string and the other set to some expression, this bug caused all files to be wanted, instead of only files matching the expression.
Avoid: MAny `MOr` otherexpression
Which matches anything.
2015-09-15 12:50:14 -04:00
Joey Hess
6e829939e9 add test case that all standard group preferred content expressions parse 2015-06-17 13:44:19 -04:00
Joey Hess
5c960601aa 4 ns optimisation of repeated calls to hasDifference on the same Differences
I want this as fast as possible, so it can be added to code paths without
slowing them down.

Avoid the set lookup, and rely on laziness,
drops runtime from 14.37 ns to 11.03 ns according to this criterion benchmark:

import Criterion.Main
import qualified Types.Difference as New
import qualified Types.DifferenceOld as Old

main :: IO ()
main = defaultMain
	[ bgroup "hasDifference"
		[ bench "new" $ whnf (New.hasDifference New.OneLevelObjectHash) new
		, bench "old" $ whnf (Old.hasDifference Old.OneLevelObjectHash) old
		]
	]
  where
	s = "fromList [ObjectHashLower, OneLevelObjectHash, OneLevelBranchHash]"
	new = New.readDifferences s
	old = Old.readDifferences s

A little bit of added boilerplate, but I suppose it's worth it to not
need to worry about set lookup overhead. Note that adding more differences
would slow down the old implementation; the new implementation will run
the same speed.
2015-06-11 16:34:35 -04:00
Joey Hess
f8ab3bc449 dead --key: Can be used to mark a key as dead. 2015-06-09 14:52:05 -04:00
Joey Hess
6eefc5db65 fsck: Ignore keys that are known to be dead when running in --all mode or a in a bare repo. Otherwise, still reports files with lost contents, even if the content is dead. 2015-06-09 14:08:57 -04:00
Joey Hess
53ede1a10e parse X in location log file as indicating a dead key
A dead key is both not present at the location that thinks it has a copy,
and also is assumed to probably not be present anywhere else. Although
there may be lurking disconnected repos that somehow still have a copy.

Suprisingly few changes needed for this! This is because the presence log
code only really concerns itself with keys that are present, and dead keys
are not present.

Note that both the location and web log can be parsed as having a dead key.
I don't see any value to having keys listed as dead in the web log, but
since it doesn't change any behavior, there was no point in not parsing it.
2015-06-09 13:28:30 -04:00
Joey Hess
6383d22ffa remove back-compat code for old version of containers
Already b-d on a newer version.
2015-06-06 15:23:53 -04:00
Joey Hess
87f28bb2ea ignore failure to clean up stale transfer lock file
Perhaps due to permissions problem, or perhaps a race with another process
also cleaning up.
2015-05-19 23:46:42 -04:00
Joey Hess
9de5cd2966 fix crash in stale transfer lockfile cleanup code
Need to differentiate between the lockfile not being locked, and it not
existing.
2015-05-19 23:35:24 -04:00
Joey Hess
ecb0d5c087 use lock pools throughout git-annex
The one exception is in Utility.Daemon. As long as a process only
daemonizes once, which seems reasonable, and as long as it avoids calling
checkDaemon once it's already running as a daemon, the fcntl locking
gotchas won't be a problem there.

Annex.LockFile has it's own separate lock pool layer, which has been
renamed to LockCache. This is a persistent cache of locks that persist
until closed.

This is not quite done; lockContent stil needs to be converted.
2015-05-19 14:09:52 -04:00
Joey Hess
6915b71c57 lock pools to work around non-concurrency/composition safety of POSIX fcntl 2015-05-18 15:57:17 -04:00
Joey Hess
7ebf234616 Stale transfer lock and info files will be cleaned up automatically when get/unused/info commands are run.
Deleting lock files is tricky, tricky stuff. I think I got it right!
2015-05-12 20:11:23 -04:00
Joey Hess
643b233860 an optimization that also fixes a reversion
This is a little optimisation; avoid loading the info file for the
download of the current key when checking for other downloads.

The reversion it fixes is sorta strange.
a812d598ef broke checking for transfers
that were already in progress. Indeed, the transfer lock was not held
after getTransfers was called.

Why? I think it's magic in ghc's handling of getLock and setLock,
although it's hard to tell since those functions are almost entirely
undocumented as to their semantics.

Something, either the RTS (or maybe it's linux?) notices that the
same process has taken a lock and is now calling getLock on a FD attached
to the same file. So, it drops the lock.

So, this optimisation avoids that problematic behavior.
2015-05-12 18:34:49 -04:00
Joey Hess
a812d598ef Take space that will be used by running downloads into account when checking annex.diskreserve. 2015-05-12 15:20:22 -04:00
Joey Hess
03667a162a couple of AMP warnings I missed before 2015-05-10 16:51:03 -04:00
Joey Hess
ec267aa1ea rejigger imports for clean build with ghc 7.10's AMP changes
The explict import Prelude after import Control.Applicative is a trick
to avoid a warning.
2015-05-10 16:20:30 -04:00
Joey Hess
6c2d5b5e41 more time-1.5 fixes 2015-05-10 15:36:58 -04:00
Joey Hess
33a2264546 fix build warning with time 1.5 2015-05-10 15:28:23 -04:00
Joey Hess
a5a53ca011 forgot to add new module 2015-05-10 15:23:38 -04:00
Joey Hess
6cf62a9bde support time-1.5.0
This no longer uses old-locale's defaultTimeLocale, but provides one
of its own.

Factored out a Logs.TimeStamp.
2015-05-10 15:21:35 -04:00
Joey Hess
06211738c1 Fix activity log parsing.
I had some cargo culting in there that used the wrong type, so it failed
to parse old logs, and overwrote them with the new log.
2015-04-09 21:02:38 -04:00
Joey Hess
9445556c97 rethought distributed fsck; instead add activity.log and expire command
This is much more space efficient!
2015-04-05 12:50:02 -04:00
Joey Hess
656fc1c881 fsck: Added --distributed and --expire options, for distributed fsck. 2015-04-01 17:53:16 -04:00
Joey Hess
9e25cbde20 importfeed: Avoid downloading a redundant item from a feed whose guid has been downloaded before, even when the url has changed.
To support this, always store itemid in metadata; before this was only done
when annex.genmetadata was set.
2015-03-31 13:30:13 -04:00
Joey Hess
f0195b2a43 Fix GETURLS in external special remote protocol to strip downloader prefix from logged url info before checking for the specified prefix.
This doesn't change what GETURLS returns, but only whether it matches
any prefix that the external special remote asked for.
2015-03-27 18:49:03 -04:00
Joey Hess
6045406deb Added SETURIPRESENT and SETURIMISSING to external special remote protocol
Useful for things like ipfs that don't use regular urls.

An external special remote can add a regular url to a key, and then
git-annex get will download it from the web. But for ipfs, we want to
instead tell git-annex that the uri uses OtherDownloader. Before this
change, the external special remote protocol lacked a way to do that.
2015-03-05 13:50:15 -04:00
Joey Hess
b0575c621f implement annex.tune.branchhash1
I hope this doesn't impact speed much -- it does have to pull out a value
from Annex state every time it accesses the branch now.

The test case I dropped has never caught any problems that I can remember,
and would have been rather difficult to convert.
2015-01-28 17:17:26 -04:00
Joey Hess
e8c376e0ad import Data.Default in Common 2015-01-28 16:11:28 -04:00
Joey Hess
037d86e046 refactor 2015-01-28 13:56:38 -04:00
Joey Hess
ba3825441c rework Differences data type
Eliminated complexity and future proofed. The most important change is that
all functions over Difference are now total; any Difference that can be
expressed should be handled. Avoids needs for sanity checking of inputs,
and version skew with the future.

Also, the difference.log now serializes a [Difference], not a Differences.
This saves space and keeps it simpler.

Note that [Difference] might contain conflicting differences (eg,
[Version5, Version6]. In this case, one of them needs to consistently win
over the others, probably based on Ord.
2015-01-28 13:50:02 -04:00
Joey Hess
70736d2b41 Repository tuning parameters can now be passed when initializing a repository for the first time.
* init: Repository tuning parameters can now be passed when initializing a
  repository for the first time. For details, see
  http://git-annex.branchable.com/tuning/
* merge: Refuse to merge changes from a git-annex branch of a repo
  that has been tuned in incompatable ways.
2015-01-27 17:38:06 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
3bab5dfb1d revert parentDir change
Reverts 965e106f24

Unfortunately, this caused breakage on Windows, and possibly elsewhere,
because parentDir and takeDirectory do not behave the same when there is a
trailing directory separator.
2015-01-09 13:11:56 -04:00
Joey Hess
965e106f24 made parentDir return a Maybe FilePath; removed most uses of it
parentDir is less safe than takeDirectory, especially when working
with relative FilePaths. It's really only useful in loops that
want to terminate at /

This commit was sponsored by Audric SCHILTKNECHT.
2015-01-06 18:55:56 -04:00
Joey Hess
43dc7f678f setpresentkey: A new plumbing-level command. 2014-12-29 15:16:40 -04:00
Joey Hess
589a048a7d fix addurl behavior when location and url logs are inconsistent
The url log could have an url for a key, while the location log thinks it's
not present in the web. In this case, addurl --file url would not do
anything. Fixed it to re-add the web as a location.

I don't know how this situation could arise, but I saw it in the wild in
the conference_proceedings repo, affecting key
URL-s17806003--http://mirror.linux.org.au/pub/linux.conf.au/2014/Wednesday/53-Building_Effective_Alliances_around_the_Trans-Pacific_Partnershi-c0505b631127ccc67e38e637344d988e
Investigating the presence log, it looked like that key
was originally listed as present in the web, then in commit
56abf9e9f3e691ed9d83513037d4019313321ca3 someone else's git-annex
set it and some other things to not present in the web. It would be
interesting to know what that user did, but I doubt I'll be able to find
out. All I can tell from this investigation is that the inconsistency was
not introduced when originally addurl-ing the url.
2014-12-29 14:22:47 -04:00
Joey Hess
7e422269a6 move dummy uuids to Annex.UUID 2014-12-17 13:57:52 -04:00
Joey Hess
a7690de016 Added bittorrent special remote
addurl behavior change: When downloading an url ending in .torrent,
it will download files from bittorrent, instead of the old behavior
of adding the torrent file to the repository.

Added Recommends on aria2 and bittornado | bittorrent.

This commit was sponsored by Asbjørn Sloth Tønnesen.
2014-12-16 23:22:46 -04:00
Joey Hess
30bf112185 Urls can now be claimed by remotes. This will allow creating, for example, a external special remote that handles magnet: and *.torrent urls. 2014-12-08 19:15:07 -04:00
Joey Hess
cb6e16947d add stub claimUrl 2014-12-08 13:40:15 -04:00
Joey Hess
8093008ef4 External special remote protocol now includes commands for setting and getting the urls associated with a key. 2014-12-08 13:32:46 -04:00
Joey Hess
db9121ecee vicfg: Deleting configurations now resets to the default, where before it has no effect.
Added a Default instance for TrustLevel, and was able to use that to clear
up several other parts of the code too.

This commit was sponsored by Stephan Schulz
2014-10-14 14:15:07 -04:00
Joey Hess
9fd95d9025 indent with tabs not spaces
Found these with:
git grep "^  " $(find -type  f -name \*.hs) |grep -v ':  where'

Unfortunately there is some inline hamlet that cannot use tabs for
indentation.

Also, Assistant/WebApp/Bootstrap3.hs is a copy of a module and so I'm
leaving it as-is.
2014-10-09 15:09:26 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
b874f84086 New annex.hardlink setting. Closes: #758593
* New annex.hardlink setting. Closes: #758593
* init: Automatically detect when a repository was cloned with --shared,
  and set annex.hardlink=true, as well as marking the repository as
  untrusted.

Had to reorganize Logs.Trust a bit to avoid a cycle between it and
Annex.Init.
2014-09-05 13:44:09 -04:00
Joey Hess
59eae904b1 final scary locking refactoring (for now)
Note that while before checkTransfer this called getLock with WriteLock,
getLockStatus's use of ReadLock will also notice any exclusive locks.
Since transfer info files are only locked exclusively, never shared,
there is no behavior change.

Also, fixes checkLocked to actually return Just False when the file
exists, but is not locked.
2014-08-20 19:30:40 -04:00
Joey Hess
ec7dd0446a more lock file refactoring 2014-08-20 17:03:04 -04:00
Joey Hess
c784ef4586 unify exception handling into Utility.Exception
Removed old extensible-exceptions, only needed for very old ghc.

Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.

Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.

However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.

Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
2014-08-07 22:03:29 -04:00
Joey Hess
f4f82e2741 deriving Show 2014-08-01 16:30:33 -04:00
Joey Hess
80cc554c82 add ChunkMethod type and make Logs.Chunk use it, rather than assuming fixed size chunks (so eg, rolling hash chunks can be supported later)
If a newer git-annex starts logging something else in the chunk log, it
won't be used by this version, but it will be preserved when updating the
log.
2014-07-28 13:19:08 -04:00
Joey Hess
014794f4ed improve a bit 2014-07-24 17:18:14 -04:00
Joey Hess
e2c44bf656 implement chunk logs
Slightly tricky as they are not normal UUIDBased logs, but are instead maps
from (uuid, chunksize) to chunkcount.

This commit was sponsored by Frank Thomas.
2014-07-24 16:23:36 -04:00
Joey Hess
d0c1a22e7c import metadata from feeds
When annex.genmetadata is set, metadata from the feed is added to files
that are imported from it.

Reused the same feedtitle and itemtitle, feedauthor, itemauthor, etc names
that are used in --template.

Also added title and author, which are the item title/author if available,
falling back to the feed title/author. These are more likely to be common
metadata fields.

(There is a small bit of dupication here, but once git gets
around to packing the object, it will compress it away.)

The itempubdate field is not included in the metadata as a string; instead
it is used to generate year and month fields, same as is done when adding
files with annex.genmetadata set.

This commit was sponsored by Amitai Schlair, who cooincidentially
is responsible for ikiwiki generating nice feed metadata!
2014-07-03 14:15:00 -04:00
Joey Hess
c00b459f3a unused: Avoid checking view branches for unused files.
This avoids a potential slowdown when using lots of views.

I think that it makes sense for unused to ignore (local) view branches,
since these are by definition supposed to be views of an existing branch,
so looking at the branch should be sufficient (and if the view is out of
date and has files that have since been deleted from the branch, the user's
intent is not to preserve those from unused reaping).
2014-06-04 14:03:41 -04:00
Joey Hess
9eaabf0382 webapp: avoid overwriting remote configs when enabling it
Avoid stomping on existing group and preferred content settings
when enabling or combining with an already existing remote.

Two level fix. First, use defaultStandardGroup rather than
setStandardGroup, so if there is an existing configuration in the git-annex
branch, it's not overwritten.

To handle pre-existing ssh remotes (including gcrypt), a second level is
needed, because before syncing with the remote, it's configuration won't be
available locally. (And syncing could take a long time.) So, in this case,
keep track of whether the remote is being created or enabled, and only set
configs when creating it.

This commit was sponsored by Anders Lannerback.
2014-05-30 14:03:04 -04:00
Joey Hess
065248f3d2 Added required content configuration.
This includes checking when dropping files that any required content
configuration is satisfied. However, it does not yet include an active
check on the required content; the location log is trusted when checking
the required content expression.
2014-03-29 16:03:33 -04:00
Joey Hess
fe19e15040 reorg matcher types; no non-type code changes 2014-03-29 14:43:34 -04:00
Joey Hess
e426fac273 add desktop notifications
Motivation: Hook scripts for nautilus or other file managers
need to provide the user with feedback that a file is being downloaded.

This commit was sponsored by THM Schoemaker.
2014-03-22 14:12:19 -04:00
Joey Hess
ed30b81e2c Improve behavior when unable to parse a preferred content expression (thanks, ion).
Fall back to "present" as the preferred conent expression, which will
not result in any content movement.
2014-03-20 00:10:12 -04:00
Joey Hess
f64c2d6138 toplevel lastchanged field 2014-03-19 19:10:55 -04:00
Joey Hess
6848f09a12
better timestamp format 2014-03-18 19:01:50 -04:00
Joey Hess
caa97d1271 Each for each metadata field, there's now an automatically maintained "$field-lastchanged" that gives the timestamp of the last change to that field.
Note that this is a nearly entirely free feature. The data was already
stored in the metadata log in an easily accessible way, and already was
parsed to a time when parsing the log. The generation of the metadata
fields may even be done lazily, although probably not entirely (the map
has to be evaulated to when queried).
2014-03-18 18:55:43 -04:00
Joey Hess
6a4dd42328 finish wiring up groupwanted 2014-03-15 17:08:55 -04:00
Joey Hess
417aea25be vicfg: Allows editing preferred content expressions for groups.
This is stored in the git-annex branch, but not yet actually hooked up and
used.
2014-03-15 16:17:01 -04:00
Joey Hess
431d805a96 factored out a generic MapLog from uuid-based logs
UUIDBased is just a MapLog with a UUID for the field.
2014-03-15 13:45:25 -04:00
Joey Hess
b7eb1d834a Avoid encoding errors when using the unused log file. 2014-03-15 11:57:27 -04:00
Joey Hess
3551d40b05 "standard" can now be used as a first-class keyword in preferred content expressions.
For example "standard or (include=otherdir/*)" or even "not standard"

Note that the implementation avoids any potential for loops (if a
standard preferred content expression itself mentioned standard).

This commit was sponsored by Jochen Bartl.
2014-03-14 15:04:33 -04:00
Joey Hess
67f09bca6d fully fix fsck memory use by iterative fscking
Not very well tested, but I'm sure it doesn't eg, loop forever.
2014-03-12 15:18:43 -04:00
Joey Hess
9f27339e80 remove uninofrmative warning
dateUnusedLog is only used to show a timestamp in the webapp, so
not worth a warning
2014-03-12 12:42:51 -04:00
Joey Hess
c2e8c21ca6 view, vfilter: Add support for filtering tags and values out of a view, using !tag and field!=value.
Note that negated globs are not supported. Would have complicated the code
to add them, without changing the data type serialization in a
non-backwards-compatable way.

This commit was sponsored by Denver Gingerich.
2014-03-02 14:53:19 -04:00
Joey Hess
a1432bce2f Put non-object tmp files in .git/annex/misctmp, leaving .git/annex/tmp for only partially transferred objects.
This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO
of temp object files is too annoying (and if you don't want to keep
partially transferred objects across reboots).

.git/annex/misctmp must be on the same filesystem as the git work tree,
since files are moved to there in a way that will not work cross-device,
as well as symlinked into there.

I first wanted to put the tmp objects in .git/annex/objects/tmp, but
that would pose transition problems on upgrade when partially transferred
objects existed.

git annex info does not currently show the size of .git/annex/misctemp,
since it should stay small. It would also be ok to make something clean it
out, periodically.
2014-02-26 16:52:56 -04:00
Joey Hess
8d5158fa31 Preserve metadata when staging a new version of an annexed file.
Performance impact: When adding a large tree of new files, this needs
to do some git cat-file queries to check if any of the files already
existed and might need a metadata copy. I tried a benchmark in a copy
of my sound repository (so there was already a significant git tree
to check against.

Adding 10000 small files, with a cold cache:
  before: 1m48.539s
  after:  1m52.791s

So, impact is 0.0004 seconds per file added. Which seems acceptable, so did
not add some kind of configuration to enable/disable this.

This commit was sponsored by Lisa Feilen.
2014-02-24 14:41:33 -04:00
Joey Hess
7498c5dd96 annex.genmetadata can be set to make git-annex automatically set metadata (year and month) when adding files 2014-02-23 00:08:29 -04:00
Joey Hess
bdfc8e1f44 fix build with old version of Data.Set that lacks toDescList 2014-02-21 11:30:31 -04:00
Joey Hess
cfed7f6a5d remove special case for tags in view branch names
Just having "_" for tags=* turned out to be too hard to understand.

Note that this invalidaes all current views.
2014-02-19 17:38:45 -04:00
Joey Hess
c85a482136 improve view branch name when there are a list of values 2014-02-19 16:35:00 -04:00
Joey Hess
dd7b99c860 add tip about metadata driven views (and more flexible view filtering)
While writing this documentation, I realized that there needed to be a way
to stay in a view like tag=* while adding a filter like tag=work that
applies to the same field.

So, there are really two ways a view can be refined. It can have a new
"field=explicitvalue" filter added to it, which does not change the
"shape" of the view, but narrows the files it shows.
Or, it can have a new view added, which adds another level of
subdirectories.

So, added a vfilter command, which takes explicit values to add to the
filter, and rejects changes that would change the shape of the view.

And, made vadd only accept changes that change the shape of the view.

And, changed the View data type slightly; now components that can match
multiple metadata values can be visible, or not visible.

This commit was sponsored by Stelian Iancu.
2014-02-19 16:29:56 -04:00
Joey Hess
39ebfa1a2e pre-commit: Update metadata when committing changes to annexed files within a view.
So the user can now switch to a view and then move files around within it
to manage metadata. For example, moving a file into a new directory
when in the tags=* view adds a tag to it.

Implementation is fairly efficient. One diff-index, which is no more
expensive than the first stage of a git commit, followed by possibly
some cat-file --batch traffic to find the key (when deleting a file).
Very similar to what's done in direct mode when committing. And like
direct mode when updating the WC after a merge, it has to buffer the
diff-tree values in order to make 2 passes over them.

When not in a view, pre-commit now does one extra git symbolic-ref,
which is tiny overhead.

This commit was sponsored by Andrew Eskridge.
2014-02-19 14:17:58 -04:00
Joey Hess
02259d2a55 speed up currentView when not in a view
Avoid reading the view log when the branch is clearly not a view branch.
2014-02-19 12:52:47 -04:00
Joey Hess
4e0be2792b remove Read instance for Ref
Removed instance, got it all to build using fromRef. (With a few things
that really need to show something using a ref for debugging stubbed out.)

Then added back Read instance, and made Logs.View use it for serialization.
This changes the view log format.
2014-02-19 01:19:57 -04:00
Joey Hess
2bf338f443 fixed vpop 2014-02-18 21:09:25 -04:00
Joey Hess
67fd06af76 add git annex view command
(And a vpop command, which is still a bit buggy.)

Still need to do vadd and vrm, though this also adds their documentation.

Currently not very happy with the view log data serialization. I had to
lose the TDFA regexps temporarily, so I can have Read/Show instances of
View. I expect the view log format will change in some incompatable way
later, probably adding last known refs for the parent branch to View
or something like that.

Anyway, it basically works, although it's a bit slow looking up the
metadata. The actual git branch construction is about as fast as it can be
using the current git plumbing.

This commit was sponsored by Peter Hogg.
2014-02-18 18:22:20 -04:00
Joey Hess
a18eae9a0f
nice git ack space optimisation when setting the same metadata value for multiple files 2014-02-13 01:57:43 -04:00
Joey Hess
361aee0470
avoid churning in git to no benefit when optimising metadata log
I think this is now optimal.
2014-02-12 23:24:04 -04:00
Joey Hess
8076530284 improve simplifier 2014-02-12 22:50:41 -04:00
Joey Hess
a05ac13e92 fix metadata log simplifier and additional quickcheck tests 2014-02-12 22:27:55 -04:00
Joey Hess
9f7e76130e add metadata command to get/set metadata
Adds metadata log, and command.

Note that unsetting field values seems to currently be broken.
And in general this has had all of 2 minutes worth of testing.

This commit was sponsored by Julien Lefrique.
2014-02-12 21:30:33 -04:00
Joey Hess
c390e896d1 fix windows build (and make --stop work on windows, incidentially)
The Utility.PID will clean up other code soon.
2014-02-11 15:25:59 -04:00
Joey Hess
4f7e72b51a fix parsing of unused log; keys can contain spaces 2014-02-08 15:27:11 -04:00
Joey Hess
a44e01c29c --in can now refer to files that were located in a repository at some past date. For example, --in="here@{yesterday}" 2014-02-06 12:43:56 -04:00
Joey Hess
1572c460e8 avoid using openFile when withFile can be used
Potentially fixes some FD leak if an action on an opened file handle fails
for some reason. There have been some hard to reproduce reports of
git-annex leaking FDs, and this may solve them.
2014-02-03 10:19:06 -04:00
Joey Hess
32f1f68dc9 typo 2014-01-28 17:17:21 -04:00
Joey Hess
f0dfac4d96 fix build with old ghc that used old-time type 2014-01-28 17:14:43 -04:00
Joey Hess
eefda291c6 fix warning 2014-01-28 14:43:20 -04:00
Joey Hess
891c85cd88 use locking on Windows
This is all the easy cases, where there was already a separate lock file.
2014-01-28 14:42:03 -04:00
Joey Hess
3518c586cf fix transfers of key with no associated file
Several places assumed this would not happen, and when the AssociatedFile
was Nothing, did nothing.

As part of this, preferred content checks pass the Key around.

Note that checkMatcher is sometimes now called with Just Key and Just File.
It currently constructs a FileMatcher, ignoring the Key. However, if it
constructed a FileKeyMatcher, which contained both, then it might be
possible to speed up parts of Limit, which currently call the somewhat
expensive lookupFileKey to get the Key.

I have not made this optimisation yet, because I am not sure if the key is
always the same. Will need some significant checking to satisfy myself
that's the case..
2014-01-23 16:44:02 -04:00
Joey Hess
e0bd088f08 add webapp UI to manage unused files 2014-01-23 15:09:43 -04:00
Joey Hess
3da0064657 assistant unused file handling
Make sanity checker run git annex unused daily, and queue up transfers
of unused files to any remotes that will have them. The transfer retrying
code works for us here, so eg when a backup disk remote is plugged in,
any transfers to it are done. Once the unused files reach a remote,
they'll be removed locally as unwanted.

If the setup does not cause unused files to go to a remote, they'll pile
up, and the sanity checker detects this using some heuristics that are
pretty good -- 1000 unused files, or 10% of disk used by unused files,
or more disk wasted by unused files than is left free. Once it detects
this, it pops up an alert in the webapp, with a button to take action.

TODO: Webapp UI to configure this, and also the ability to launch an
immediate cleanup of all unused files.

This commit was sponsored by Simon Michael.
2014-01-22 22:53:18 -04:00
Joey Hess
4b55afe9e9 add "unused" preferred content expression
With a really nice optimisation that keeps it from having any overhead
in normal operation!

This commit was sponsored by Ulises Vitulli.
2014-01-22 16:35:32 -04:00
Joey Hess
ae3cd632bd add timestamps to unused log files
This will be used in expiring old unused objects. The timestamp is when it
was first noticed it was unused.

Backwards compatability: It supports reading old format unused log files.
The old version of git-annex will ignore lines in log files written by the
new version, so the worst interop problem would be git annex dropunused not
knowing some numbers that git-annex unused reported.
2014-01-22 15:33:02 -04:00
Joey Hess
f7cdc40f7b reorg 2014-01-21 18:08:56 -04:00
Joey Hess
0ef282a116 numcopies cleanup, part 2
This includes several bug fixes.
2014-01-21 17:25:39 -04:00
Joey Hess
b40df4f0d0 reorganize numcopies code (no behavior changes)
Move stuff into Logs.NumCopies. Add a NumCopies newtype.

Better names for various serialization classes that are specific to one
thing or another.
2014-01-21 16:08:59 -04:00
Joey Hess
d66535f065 global numcopies setting
* numcopies: New command, sets global numcopies value that is seen by all
  clones of a repository.
* The annex.numcopies git config setting is deprecated. Once the numcopies
  command is used to set the global number of copies, any annex.numcopies
  git configs will be ignored.
* assistant: Make the prefs page set the global numcopies.

This global numcopies setting is needed to let preferred content
expressions operate on numcopies.

It's also convenient, because typically if you want git-annex to preserve N
copies of files in a repo, you want it to do that no matter which repo it's
running in. Making it global avoids needing to warn the user about gotchas
involving inconsistent annex.numcopies settings.
(See changes to doc/numcopies.mdwn.)

Added a new variety of git-annex branch log file, that holds only 1 value.
Will probably be useful for other stuff later.

This commit was sponsored by Nicolas Pouillard.
2014-01-20 16:47:56 -04:00
Joey Hess
93161d0dea copyright year 2014-01-08 16:29:15 -04:00
Joey Hess
3e68c1c2fd add remote state logs
This allows a remote to store a piece of arbitrary state associated with a
key. This is needed to support Tahoe, where the file-cap is calculated from
the data stored in it, and used to retrieve a key later. Glacier also would
be much improved by using this.

GETSTATE and SETSTATE are added to the external special remote protocol.

Note that the state is left as-is even when a key is removed from a remote.
It's up to the remote to decide when it wants to clear the state.

The remote state log, $KEY.log.rmt, is a UUID-based log. However,
rather than using the old UUID-based log format, I created a new variant
of that format. The new varient is more space efficient (since it lacks the
"timestamp=" hack, and easier to parse (and the parser doesn't mess with
whitespace in the value), and avoids compatability cruft in the old one.

This seemed worth cleaning up for these new files, since there could be a
lot of them, while before UUID-based logs were only used for a few log
files at the top of the git-annex branch. The transition code has also
been updated to handle these new UUID-based logs.

This commit was sponsored by Daniel Hofer.
2014-01-03 16:35:57 -04:00
Joey Hess
8e3032df2d added GETWANTED, SETWANTED for Tobias's flickr remote
This was unexpectedly difficult because of a depdenency cycle. To parse a
preferred content expression involves several things that need to operate
on the list of remotes. Which needs Remote.External. The only way to avoid
this cycle (I tried breaking it at several points) was to skip parsing the
expression in SETWANTED.

That's sorta ok, because git-annex already has to deal with unparsable
preferred content expressions being stored, in order to handle eg,
upgrades. But I'm still not very happy that I cannot check it.

I feel this is a strong indication that I need to beware of further
bloating the special remote protocol interface.
2014-01-01 20:12:20 -04:00
Joey Hess
f0a6de1ca2 add PreferredContentExpression type 2014-01-01 19:58:02 -04:00
Richard Hartmann
974fe009bf Another round of s/amoung/among/ 2013-12-19 12:30:53 -04:00
Joey Hess
f931272681 syntax 2013-12-11 00:18:58 -04:00
Joey Hess
011b8bc7ec pull in Win32-extras, to be able to get current process id in Windows
Fixed up a number of things that had worked around there not being a way to
get that.

Most notably, transfer info files on windows now include the process id,
since no locking is currently done. This means the file format varies
between windows and unix.
2013-12-11 00:15:10 -04:00
Joey Hess
ecd42aef8e different PID types for Unix and Windows
Windows has a larger (unsigned) PID space, so cannot use the unix CInt
there.

Note that TransferInfo does not yet ever get the TransferPid populated,
as there is missing locking.
2013-12-10 23:48:42 -04:00
Joey Hess
6edac746f0 merge improved fsck types from git-repair and some associated changes 2013-11-30 14:29:11 -04:00
Joey Hess
53ab737723 clean up cruft left in log by bug 2013-11-09 14:30:26 -04:00
Joey Hess
8e1b8af6e7 fix crash on empty description
Caused by bug fixed in 46cf00ffd8
2013-11-09 13:50:44 -04:00
Joey Hess
049e80e865 refactor 2013-10-28 14:05:55 -04:00
Joey Hess
d345e5b52f add git fsck to cronner, and UI for repository repair (not yet wired up) 2013-10-22 16:02:52 -04:00
Joey Hess
92d5452a19 write via temp file 2013-10-14 16:15:38 -04:00
Joey Hess
296e21b381 add schedule command
Mostly because it gives me an excuse and a hook to document the schedule
expression format.
2013-10-13 15:40:38 -04:00