Commit graph

129 commits

Author SHA1 Message Date
Joey Hess
8885bd3c5b
addistant: honor annex.delayadd for non-large files
assistant: When adding non-large files to git, honor annex.delayadd
configuration.

Also, don't add non-large files to git when they are still
being written to. This came for free, since the changes to non-large
files get queued up with the ones to large files, and run through the lsof
check.

Sponsored-by: Luke Shumaker on Patreon
2021-07-13 12:17:00 -04:00
Joey Hess
1b63132ca3
add searchPathContents
And rename related functions for consistency.
2021-02-02 19:06:15 -04:00
Joey Hess
5a1e73617d
finished this stage of the RawFilePath conversion
Finally compiles again, and test suite passes.

This commit was sponsored by Brock Spratlen on Patreon.
2020-11-04 14:20:37 -04:00
Joey Hess
fcf5d11c63
add "input" field to json output
The use case of this field is mostly to support -J combined with --json.
When that is implemented, a user will be able to look at the field to
determine which of the requests they have sent it corresponds to.

The field typically has a single value in its list, but in some cases
mutliple values (eg 2 command-line params) are combined together and the
list will have more.

Note that json parsing was already non-strict, so old git-annex metadata
--json --batch can be fed json produced by the new git-annex and will
not stumble over the new field.
2020-09-15 16:22:44 -04:00
Joey Hess
c31e1be781
convert KeySource to RawFilePath 2020-02-21 10:04:44 -04:00
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
3266ad3ff7
everything is building again
However, the test suite fails some quickchecks, so this branch is not
yet in a mergeable state.
2019-12-05 15:10:23 -04:00
Joey Hess
0be23bae2f
refactor
Better to not have a single function module, and better to have a more
specific type than Bool.

This commit was sponsored by Jack Hill on Patreon
2019-11-11 19:10:52 -04:00
Joey Hess
3b34d123ed
Added annex.allowsign option.
This commit was sponsored by Ilya Shlyakhter on Patreon.
2019-11-11 16:28:56 -04:00
Joey Hess
3f0eef4baa
v7 for all repositories
* Default to v7 for new repositories.
* Automatically upgrade v5 repositories to v7.
2019-08-30 14:09:14 -04:00
Joey Hess
689d1fcc92
remove most remnants of direct mode
A few remain, as needed for upgrades, and for accessing objects from
remotes that are direct mode repos that have not been converted yet.
2019-08-26 16:27:48 -04:00
Joey Hess
8355dba5cc
plumb MeterUpdate into getKey
No behavior changes, but this shows everywhere that a progress meter
could be displayed when hashing a file to add to the annex.

Many of the places don't make sense to display a progress meter though,
eg when importing the copy of the file probably swamps the hashing of
the file.
2019-06-25 11:43:24 -04:00
Joey Hess
2d33122215
avoid ingest lockdown file escaping the withOtherTmp call
Fixes bug that caused git-annex to fail to add a file when another
git-annex process cleaned up the temp directory it was using.

Solution is just to push withOtherTmp out to a higher level, so that
the whole ingest process can be completed inside it.

But in the assistant, that was not practical to do, since withOtherTmp runs
in the Annex monad and the assistant does not. Worked around by introducing
a separate temp directory that only the assistant uses for lockdown.
Since only one assistant can run at a time, it's easy to clean up that
directory of old cruft at startup.
2019-05-07 13:04:57 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
d5f2463702
misctmp cleanup
* Switch to using .git/annex/othertmp for tmp files other than partial
  downloads, and make stale files left in that directory when git-annex
  is interrupted be cleaned up promptly by subsequent git-annex processes.
* The .git/annex/misctmp directory is no longer used and git-annex will
  delete anything lingering in there after it's 1 week old.

Also, in Annex.Ingest, made the filename it uses in the tmp dir be
prefixed with "ingest-" to avoid potentially using a filename used by
some other code.
2019-01-17 16:02:22 -04:00
Joey Hess
4a6ebb1034
make sync update adjusted branch to hide/unhide
This completes initial support for --hide-missing, although the
assistant still needs to be updated and it perhaps needs to be sped up,
and maybe there needs to be a way for git-annex get to operate on
missing files. Opened some more todos for those things.

This commit was sponsored by Henrik Riomar.
2018-10-20 14:22:28 -04:00
Joey Hess
8be5a7269a
refactor getCurrentBranch
Both Command.Sync and Annex.Ingest had their own versions of this.

The one in Annex.Ingest used Git.Branch.currentUnsafe, but does not seem
to need it. That is only checking to see if it's in an adjusted unlocked
branch, and when in an adjusted branch, the branch does in fact exist,
so the added check that Git.Branch.current does is fine.

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-10-19 17:29:18 -04:00
Joey Hess
d71c65ca0a
add exporter thread to assistant
This is similar to the pusher thread, but a separate thread because git
pushes can be done in parallel with exports, and updating a big export
should not prevent other git pushes going out in the meantime.

The exportThread only runs at most every 30 seconds, since updating an
export is more expensive than pushing. This may need to be tuned.

Added a separate channel for export commits; the committer records a
commit in that channel.

Also, reconnectRemotes records a dummy commit, to make the exporter
thread wake up and make sure all exports are up-to-date. So,
connecting a drive with a directory special remote export will
immediately update it, and getting online will automatically
update S3 and WebDAV exports.

The transfer queue is not involved in exports. Instead, failed
exports are retried much like failed pushes.

This commit was sponsored by Ewen McNeill.
2017-09-20 15:29:13 -04:00
Joey Hess
a1730cd6af
adeiu, MissingH
Removed dependency on MissingH, instead depending on the split
library.

After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.

Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.

This commit was sponsored by Shane-o on Patreon.
2017-05-16 01:03:52 -04:00
Joey Hess
c8e1e3dada
AssociatedFile newtype
To prevent any further mistakes like 301aff34c4

This commit was sponsored by Francois Marier on Patreon.
2017-03-10 13:35:31 -04:00
Joey Hess
f617988a29
Make import --deduplicate and --skip-duplicates only hash once, not twice
import: --deduplicate and --skip-duplicates were implemented inneficiently;
they unncessarily hashed each file twice. They have been improved to only
hash once.

The new approach is to lock down (minimally) and hash files, and then
reuse that information when importing them.

This was rather tricky, especially in detecting changes to files while
they are being imported.

The output of import changed slightly. While before it silently skipped
over files with eg --skip-duplicates, now it shows each file as it starts
to act on it. Since every file is hashed first thing, it would otherwise
not be clear what file import is chewing on. (Actually, it wasn't clear
before when any of the duplicates switches were used.)

This commit was sponsored by Alexander Thompson on Patreon.
2017-02-09 15:32:22 -04:00
Joey Hess
1a0e2c9901
get, move, copy, mirror: Added --failed switch which retries failed copies/moves
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.

Noisy due to some refactoring into Types/
2016-08-03 12:37:12 -04:00
Joey Hess
b7c8bf5274
Preserve execute bits of unlocked files in v6 mode.
When annex.thin is set, adding an object will add the execute bits to the
work tree file, and this does mean that the annex object file ends up
executable.

This doesn't add any complexity that wasn't already present, because git
annex add of an executable file has always ingested it so that the annex
object ends up executable.

But, since an annex object file can be executable or not, when populating
an unlocked file from one, the executable bit is always added or removed
to match the mode of the pointer file.
2016-04-14 14:47:08 -04:00
Joey Hess
7c20bf6e7a
make sync aware of adjusted branches
So, it will pull and push the original branch, not the adjusted one.

And, for merging, it will use updateAdjustedBranch (not implemented yet).

Note that remaining uses of Git.Branch.current need to be checked too;
for things that should act on the original branch, and not the adjusted
branch.
2016-02-29 15:23:08 -04:00
Joey Hess
4b819bee2b
avoid confusing git with a modified ctime in clean filter
Linking the file to the tmp dir was not necessary in the clean
filter, and it caused the ctime to change, which caused git to think
the file was changed. This caused git status to get slow as it kept
re-cleaning unchanged files.
2016-01-07 17:48:04 -04:00
Joey Hess
7c5c7bb04a
fix OSX build 2015-12-28 13:28:21 -04:00
Joey Hess
c4152654d2
combine PendingAddChanges for the same file into one
In v6 unlocked mode, this fixes a problem that was making eg,
echo > file cause the assistant to copy the file to the annex object,
instead of hard linking it. That because 2 change events were seen
(one for opening the file and one for closing) and processed together
the file was then locked down twice. Which meant it had mutiple hard links,
and so prevented linkAnnex from hard linking it.

There might be scenarios where multiple events come in, but staggered such
that a file gets locked down repeatedly, and it would still be copied to
the annex object in that case.
2015-12-22 17:52:39 -04:00
Joey Hess
cfaac52b88
populate unlocked files with newly available content when ingesting
This can happen when ingesting a new file in either locked or unlocked
mode, when some unlocked files in the repo use the same key, and the
content was not locally available before.
2015-12-22 16:22:28 -04:00
Joey Hess
4f60234690
finish v6 support for assistant
Seems to basically work now!
2015-12-22 15:23:27 -04:00
Joey Hess
8e9608d7f0
refactoring
no behavior changes
2015-12-22 13:42:58 -04:00
Joey Hess
ca2c977704
wip v6 support for assistant
Files are not yet added to v6 repos in unlocked mode.
2015-12-21 18:41:15 -04:00
Joey Hess
3311c48631
move InodeSentinal from direct mode code to its own module
Will be used outside of direct mode for v6 unlocked files, and is already
used outside of direct mode when adding files to annex.
2015-12-09 15:52:11 -04:00
Joey Hess
90f7c4b6a2
add VerifiedCopy data type
There should be no behavior changes in this commit, it just adds a more
expressive data type and adjusts code that had been passing around a [UUID]
or sometimes a Maybe Remote to instead use [VerifiedCopy].

Although, since some functions were taking two different [UUID] lists,
there's some potential for me to have gotten it horribly wrong.
2015-10-08 16:55:11 -04:00
Joey Hess
addc82dab7 removed all uses of undefined from code base
It's a code smell, can lead to hard to diagnose error messages.
2015-04-19 00:38:29 -04:00
Joey Hess
7b32e7acb5 make segmentXargs preserve order 2015-04-02 00:31:36 -04:00
Joey Hess
2b7f3ee3f2 assistant: Committing a whole lot of files at once could overflow command-line length limits and cause the commit to fail. This only happened when using the assistant in an indirect mode repository. 2015-03-26 14:02:35 -04:00
Joey Hess
32b3bed086 add a comment 2015-03-26 13:43:30 -04:00
Joey Hess
f5b830e07c sync, assistant: Include repository name in head branch commit message.
Note that while the assistant detects changes made to remote names, I left
the commit message fixed rather than calculating it after every commit. It
doesn't seem worth the CPU to do the latter.
2015-02-11 13:34:05 -04:00
Joey Hess
70736d2b41 Repository tuning parameters can now be passed when initializing a repository for the first time.
* init: Repository tuning parameters can now be passed when initializing a
  repository for the first time. For details, see
  http://git-annex.branchable.com/tuning/
* merge: Refuse to merge changes from a git-annex branch of a repo
  that has been tuned in incompatable ways.
2015-01-27 17:38:06 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
c784ef4586 unify exception handling into Utility.Exception
Removed old extensible-exceptions, only needed for very old ghc.

Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.

Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.

However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.

Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
2014-08-07 22:03:29 -04:00
Joey Hess
d41849bc23
support commit.gpgsign
Support users who have set commit.gpgsign, by disabling gpg signatures for
git-annex branch commits and commits made by the assistant.

The thinking here is that a user sets commit.gpgsign intending the commits
that they manually initiate to be gpg signed. But not commits made in the
background, whether by a deamon or implicitly to the git-annex branch.
gpg signing those would be at best a waste of CPU and at worst would fail,
or flood the user with gpg passphrase prompts, or put their signature on
changes they did not directly do.

See Debian bug #753720.

Also makes all commits done by git-annex go through a few central control
points, to make such changes easier in future.

Also disables commit.gpgsign in the test suite.

This commit was sponsored by Antoine Boegli.
2014-07-04 11:53:51 -04:00
Joey Hess
501cc8623a assistant: Fix one-way assistant->assistant sync in direct mode.
When in direct mode, update the master branch after committing to the
annex/direct/master branch. Also, update the synced/master branch.

This fixes a topology A->B where both A and B are in direct mode and
running the assistant, and a change is made to B. Before this fix, A pulled
the changes from B, but since they were only on the annex/direct/master
branch, it did not merge them.

Note that I considered making the assistant merge the
remotes/B/annex/direct/master, but decided to keep it simple and only merge
the sync branches as before.
2014-06-16 11:32:13 -04:00
Joey Hess
e4d7e2ebde fix for Windows file timestamp timezone madness
On Windows, changing the time zone causes the apparent mtime of files to
change. This confuses git-annex, which natually thinks this means the files
have actually been modified (since THAT'S WHAT A MTIME IS FOR, BILL <sheesh>).

Work around this stupidity, by using the inode sentinal file to detect if
the timezone has changed, and calculate a TSDelta, which will be applied
when generating InodeCaches.

This should add no overhead at all on unix. Indeed, I sped up a few
things slightly in the refactoring.

Seems to basically work! But it has a big known problem:
If the timezone changes while the assistant (or a long-running command)
runs, it won't notice, since it only checks the inode cache once, and
so will use the old delta for all new inode caches it generates for new
files it's added. Which will result in them seeming changed the next time
it runs.

This commit was sponsored by Vincent Demeester.
2014-06-12 13:42:21 -04:00
Joey Hess
a1432bce2f Put non-object tmp files in .git/annex/misctmp, leaving .git/annex/tmp for only partially transferred objects.
This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO
of temp object files is too annoying (and if you don't want to keep
partially transferred objects across reboots).

.git/annex/misctmp must be on the same filesystem as the git work tree,
since files are moved to there in a way that will not work cross-device,
as well as symlinked into there.

I first wanted to put the tmp objects in .git/annex/objects/tmp, but
that would pose transition problems on upgrade when partially transferred
objects existed.

git annex info does not currently show the size of .git/annex/misctemp,
since it should stay small. It would also be ok to make something clean it
out, periodically.
2014-02-26 16:52:56 -04:00
Joey Hess
964a181026 try to drop unused object if it does not need to be transferred anywhere 2014-01-23 16:51:16 -04:00
Joey Hess
b7e3fe2ebd flip for clarity 2013-12-16 16:24:57 -04:00
Joey Hess
58c7b0a56d assistant: Always batch changes found in startup scan.
Batch detection is heuristic, so can sometimes fail. I observed one such
failure while starting up in a repository with 87000 files. After the first
several batches of ~5000 files, it fell out of batch mode, and never
re-entered it, and so made many more commits of a few files at a time
than necessary.

So, let's always use batch mode when in the startup scan. This avoids the
heuristic there, at least.

There is clearly also room to improve the heuristic. Possibly 10 files is
too high a bar to be found during a commit, on a system that can commit
quickly.
2013-12-16 16:16:19 -04:00
Joey Hess
9d323a98e2 avoid trying to use lsof when it's not in path and --forced 2013-12-04 17:39:44 -04:00