git-annex

Author	SHA1	Message	Date
Joey Hess	cd076cd085	Windows: Support urls like "file:///c:/path" That is a legal url, but parseUrl parses it to "/c:/path" which is not a valid path on Windows. So as a workaround, use parseURIPortable everywhere, which removes the leading slash when run on windows. Note that if an url is parsed like this and then serialized back to a string, it will be different from the input. Which could potentially be a problem, but is probably not in practice. An alternative way to do it would be to have an uriPathPortable that fixes up the path after parsing. But it would be harder to make sure that is used everywhere, since uriPath is also used when constructing an URI. It's also worth noting that System.FilePath.normalize "/c:/path" yields "c:/path". The reason I didn't use it is that it also may change "/" to "\" in the path and I wanted to keep the url changes minimal. Also noticed that convertToWindowsNativeNamespace handles "/c:/path" the same as "c:/path". Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-03-27 13:38:02 -04:00
Joey Hess	a0badc5069	sync: Fix parsing of gcrypt::rsync:// urls that use a relative path Such an url is not valid; parseURI will fail on it. But git-annex doesn't actually need to parse the url, because all it needs to do to support syncing with it is know that it's not a local path, and use git pull and push. (Note that there is no good reason for the user to use such an url. An absolute url is valid and I patched git-remote-gcrypt to support them years ago. Still, users gonna do anything that tools allow, and git-remote-gcrypt still supports them.) Sponsored-by: Jack Hill on Patreon	2023-03-23 15:20:00 -04:00
Joey Hess	e822df2a09	fix build warnings on windows	2023-03-21 18:41:23 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Yaroslav Halchenko	e018ae1125	Fix ambigous typos	2023-03-17 15:14:47 -04:00
Joey Hess	a6bebe3c0f	make hashFile support paths with newlines git hash-object --stdin-paths is a newline protocol so it cannot support them. It would help to not use absPath, when the problem is that the repository itself is in a path with a newline. But, there's a reason it used absPath, which is that git hash-object --stdin-paths actually chdirs to the top of the repository on startup! That is not documented, and I think is a bug in git. I considered making the path relative to the top of the repo, but then what if this is a git bug and gets fixed? git-annex would break horribly. So instead, keep the absPath, but when the path contains a newline, fall back to running git hash-object once per file, which avoids the problem with newlines and --stdin-paths. It will be slower, but this is an edge case. (Similar slow code paths are already used elsewhere when dealing with filenames with newlines and other parts of git that use line-based protocols.) Sponsored-by: Dartmouth College's Datalad project	2023-03-13 13:43:40 -04:00
Joey Hess	54ad1b4cfb	Windows: Support long filenames in more (possibly all) of the code Works around this bug in unix-compat: https://github.com/jacobstanley/unix-compat/issues/56 getFileStatus and other FilePath using functions in unix-compat do not do UNC conversion on Windows. Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the necessary conversion on windows to support long filenames. Audited all imports of System.PosixCompat.Files to make sure that no functions that operate on FilePath were imported from it. Instead, use the equvilants from Utility.RawFilePath. In particular the re-export of that module in Common had to be removed, which led to lots of other changes throughout the code. The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig make Utility.Directory not be needed to build setup. And so let it use Utility.RawFilePath, which depends on unix, which cannot be in setup-depends. Sponsored-by: Dartmouth College's Datalad project	2023-03-01 15:55:58 -04:00
Joey Hess	f09e299156	rawfilepath conversion	2023-02-27 15:06:32 -04:00
Joey Hess	672258c8f4	Revert "revert recent bug fix temporarily for release" This reverts commit `16f1e24665`.	2023-02-14 14:11:23 -04:00
Joey Hess	16f1e24665	revert recent bug fix temporarily for release Decided this bug is not severe enough to delay the release until tomorrow, so this will be re-applied after the release.	2023-02-14 14:06:29 -04:00
Joey Hess	c1ef4a7481	Avoid Git.Config.updateLocation adding "/.git" to the end of the repo path to a bare repo when git config is not allowed to list the configs due to the CVE-2022-24765 fix. That resulted in a confusing error message, and prevented the nice message that explains how to mark the repo as safe to use. Made isBare a tristate so that the case where core.bare is not returned can be handled. The handling in updateLocation is to check if the directory contains config and objects and if so assume it's bare. Note that if that heuristic is somehow wrong, it would construct a repo that thinks it's bare but is not. That could cause follow-on problems, but since git-annex then checks checkRepoConfigInaccessible, and skips using the repo anyway, a wrong guess should not be a problem. Sponsored-by: Luke Shumaker on Patreon	2023-02-14 14:00:36 -04:00
Joey Hess	c1f4d536b2	fix comment	2023-02-14 13:28:02 -04:00
Joey Hess	49ee07f93d	fix flush of a closed file handle Avoids displaying warning about git-annex restage needing to be run in situations where it does not. Closing a handle flushes it anyway, so no need for an explict flush. The handle does get closed twice, but that's fine, the second one does nothing. Sponsored-by: Dartmouth College's DANDI project	2022-09-30 14:02:31 -04:00
Joey Hess	bfa451fc4e	pass --git-dir when reading git config when it was specified explicitly Let GIT_DIR and --git-dir override git's protection against operating in a repository owned by another user. This is the same behavior other git commands have. Sponsored-by: Jarkko Kniivilä on Patreon	2022-09-26 14:38:34 -04:00
Joey Hess	6a3bd283b8	add restage log When pointer files need to be restaged, they're first written to the log, and then when the restage operation runs, it reads the log. This way, if the git-annex process is interrupted before it can do the restaging, a later git-annex process can do it. Currently, this lets a git-annex get/drop command be interrupted and then re-ran, and as long as it gets/drops additional files, it will clean up after the interrupted command. But more changes are needed to make it easier to restage after an interrupted process. Kept using the git queue to run the restage action, even though the list of files that it builds up for that action is not actually used by the action. This could perhaps be simplified to make restaging a cleanup action that gets registered, rather than using the git queue for it. But I wasn't sure if that would cause visible behavior changes, when eg dropping a large number of files, currently the git queue flushes periodically, and so it restages incrementally, rather than all at the end. In restagePointerFiles, it reads the restage log twice, once to get the number of files and size, and a second time to process it. This seemed better than reading the whole file into memory, since potentially a huge number of files could be in there. Probably the OS will cache the file in memory and there will not be much performance impact. It might be better to keep running tallies in another file though. But updating that atomically with the log seems hard. Also note that it's possible for calcRestageLog to see a different file than streamRestageLog does. More files may be added to the log in between. That is ok, it will only cause the filterprocessfaster heuristic to operate with slightly out of date information, so it may make the wrong choice for the files that got added and be a little slower than ideal. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 15:47:24 -04:00
Joey Hess	9c76e503cf	generalize refreshIndex to MonadIO Sponsored-by: Dartmouth College's DANDI project	2022-09-23 14:28:52 -04:00
Joey Hess	8d26fdd670	skip checkRepoConfigInaccessible when git directory specified explicitly Fix a reversion that prevented git-annex from working in a repository when --git-dir or GIT_DIR is specified to relocate the git directory to somewhere else. (Introduced in version 10.20220525) checkRepoConfigInaccessible could still run git config --list, just passing --git-dir. It seems not necessary, because I know that passing --git-dir bypasses git's check for repo ownership. I suppose it might be that git eventually changes to check something about the ownership of the working tree, so passing --git-dir without --work-tree would still be worth doing. But for now this is the simple fix. Sponsored-by: Nicholas Golder-Manning on Patreon	2022-09-20 14:52:43 -04:00
Joey Hess	9621beabc4	cache credentials in memory when doing http basic auth to a git remote When accessing a git remote over http needs a git credential prompt for a password, cache it for the lifetime of the git-annex process, rather than repeatedly prompting. The git-lfs special remote already caches the credential when discovering the endpoint. And presumably commands like git pull do as well, since they may download multiple urls from a remote. The TMVar CredentialCache is read, so two concurrent calls to getBasicAuthFromCredential will both prompt for a credential. There would already be two concurrent password prompts in such a case, and existing uses of `prompt` probably avoid it. Anyway, it's no worse than before.	2022-09-09 14:20:32 -04:00
Joey Hess	23c6e350cb	improve createDirectoryUnder to allow alternate top directories This should not change the behavior of it, unless there are multiple top directories, and then it should behave the same as if there was a single top directory that was actually above the directory to be created. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 12:52:37 -04:00
Joey Hess	fbc3c223a6	filter-process: Fix protocol for empty files This caused git to complain that filter-process failed and kill it with signal 15. Because it wrote an extra flushPkt for an empty file, which git did not expect, and so git saw an unexpected response to the next request. Luckily, filter-process is only used by default in v9 and up, and v8 is still the default. Also, git had to be updating an empty file, followed by another file, which is a fairly unlikely situation. And git restarts filter-process after this happens and uses it to filter the rest of the files. So this isn't a crippling bug. Sponsored-by: Luke Shumaker on Patreon	2022-07-13 17:13:54 -04:00
Joey Hess	debcf86029	use RawFilePath version of rename Some small wins, almost certianly swamped by the system calls, but still worthwhile progress on the RawFilePath conversion. Sponsored-by: Erik Bjäreholt on Patreon	2022-06-22 16:47:34 -04:00
Joey Hess	dca6e96e31	debug output of git security check probe This is so, if there's some other failure that triggers it, --debug will show what went wrong. See https://github.com/datalad/datalad/issues/6708 Sponsored-by: Dartmouth College's Datalad project	2022-05-31 12:25:11 -04:00
Joey Hess	af0d854460	deal with git's changes for CVE-2022-24765 Deal with git's recent changes to fix CVE-2022-24765, which prevent using git in a repository owned by someone else. That makes git config --list not list the repo's configs, only global configs. So annex.uuid and annex.version are not visible to git-annex. It displayed a message about that, which is not right for this situation. Detect the situation and display a better message, similar to the one other git commands display. Also, git-annex init when run in that situation would overwrite annex.uuid with a new one, since it couldn't see the old one. Add a check to prevent it running too in this situation. It may be that this fix has security implications, if a config set by the malicious user who owns the repo causes git or git-annex to run code. I don't think any git-annex configs get run by git-annex init. It may be that some git config of a command does get run by one of the git commands that git-annex init runs. ("git status" is the command that prompted the CVE-2022-24765, since core.fsmonitor can cause it to run a command). Since I don't know how to exploit this, I'm not treating it as a security fix for now. Note that passing --git-dir makes git bypass the security check. git-annex does pass --git-dir to most calls to git, which it does to avoid needing chdir to the directory containing a git repository when accessing a remote. So, it's possible that somewhere in git-annex it gets as far as running git with --git-dir, and git reads some configs that are unsafe (what CVE-2022-24765 is about). This seems unlikely, it would have to be part of git-annex that runs in git repositories that have no (visible) annex.uuid, and git-annex init is the only one that I can think of that then goes on to run git, as discussed earlier. But I've not fully ruled out there being others.. The git developers seem mostly worried about "git status" or a similar command implicitly run by a shell prompt, not an explicit use of git in such a repository. For example, Ævar Arnfjörð Bjarma wrote: > * There are other bits of config that also point to executable things, > e.g. core.editor, aliases etc, but nothing has been found yet that > provides the "at a distance" effect that the core.fsmonitor vector > does. > > I.e. a user is unlikely to go to /tmp/some-crap/here and run "git > commit", but they (or their shell prompt) might run "git status", and > if you have a /tmp/.git ... Sponsored-by: Jarkko Kniivilä on Patreon	2022-05-20 14:38:27 -04:00
Joey Hess	0406c33f58	fix git-annex repair false positive Avoid treating refs/annex/last-index or other refs that are not commit objects as evidence of repository corruption. The repair code checks to find bad refs by trying to run `git log` on them, and assumes that no output means something is broken. But git log on a tree object is empty. This was worth fixing generally, not as a special case, since it's certainly possible that other things store tree or other objects in refs. Sponsored-by: Max Thoursie on Patreon	2022-05-04 11:32:23 -04:00
Joey Hess	faf84aa5c2	Avoid git status taking a long time after git-annex unlock of many files. Implemented by making Git.Queue have a FlushAction, which can accumulate along with another action on files, and runs only once the other action has run. This lets git-annex unlock queue up git update-index actions, without conflicting with the restagePointerFiles FlushActions. In a repository with filter-process enabled, git-annex unlock will often not take any more time than before, though it may when the files are large. Either way, it should always slow down less than git-annex status speeds up. When filter-process is not enabled, git-annex unlock will slow down as much as git status speeds up. Sponsored-by: Jochen Bartl on Patreon	2022-02-18 15:06:40 -04:00
Joey Hess	a03e9107cb	wording	2021-12-14 13:53:36 -04:00
Joey Hess	681d8611be	fix flush order reversion commit `c2e46f4707` caused the queue to possibly be flushed in the wrong order when it contained a mix of different actions.	2021-12-14 13:51:00 -04:00
Joey Hess	c2e46f4707	improve git command queue flushing with time limit So that eg, addurl of several large files that take time to download will update the index for each file, rather than deferring the index updates to the end. In cases like an add of many smallish files, where a new file is being added every few seconds. In that case, the queue will still build up a lot of changes which are flushed at once, for best performance. Since the default queue size is 10240, often it only gets flushed once at the end, same as before. (Notice that updateQueue updated _lastchanged when adding a new item to the queue without flushing it; that is necessary to avoid it flushing the queue every 5 minutes in this case.) But, when it takes more than a 5 minutes to add a file, the overhead of updating the index immediately is probably small, so do it after each file. This avoids git-annex potentially taking a very very long time indeed to stage newly added files, which can be annoying to the user who would like to get on with doing something with the files it's already added, eg using git mv to rename them to a better name. This is only likely to cause a problem if it takes say, 30 seconds to update the index; doing an extra 30 seconds of work after every 5 minute file add would be less optimal. Normally, updating the index takes significantly less time than that. On a SSD with 100k files it takes less than 1 second, and the index write time is bound by disk read and write so is not too much worse on a hard drive. So I hope this will not impact users, although if it does turn out to, the time limit could be made configurable. A perhaps better way to do it would be to have a background worker thread that wakes up every 60 seconds or so and flushes the queue. That is made somewhat difficult because the queue can contain Annex actions and so this would add a new source of concurrency issues. So I'm trying to avoid that approach if possible. Sponsored-by: Erik Bjäreholt on Patreon	2021-12-14 12:23:19 -04:00
Joey Hess	a62f2e141b	convert some error to giveup error has a backtrace, but these are non-internal errors, so a backtrace is unlikely to be useful	2021-12-09 14:36:54 -04:00
Joey Hess	5a7f253974	support git 2.34.0's handling of merge conflict between annexed and non-annexed file This version of git -- or its new default "ort" resolver -- handles such a conflict by staging two files, one with the original name and the other named file~ref. Use unmergedSiblingFile when the latter is detected. (It doesn't do that when the conflict is between a directory and a file or symlink though, so see previous commit for how that case is handled.) The sibling file has to be deleted separately, because cleanConflictCruft may not delete it -- that only handles files that are annex links, but the sibling file may be the non-annexed file side of the conflict. The graftin code had assumed that, when the other side of a conclict is a symlink, the file in the work tree will contain the non-annexed content that we want it to contain. But that is not the case with the new git; the file may be the annex link and needs to be replaced with the content, while the annex link will be written as a -variant file. (The weird doesDirectoryExist check in graftin turns out to still be needed, test suite failed when I tried to remove it.) Test suite passes with new git with ort resolver default. Have not tried it with old git or other defaults. Sponsored-by: Noam Kremen on Patreon	2021-11-22 16:10:24 -04:00
Joey Hess	a0758bdd10	dynamically disable filter-process in restagePointerFile when it would be slower Based on my earlier benchmark, I have a rough cost model for how expensive it is for git-annex smudge to be run on a file, vs how expensive it is for a gigabyte of a file's content to be read and piped through to filter-process. So, using that cost model, it can decide if using filter-process will be more or less expensive than running the smudge filter on the files to be restaged. It turned out to be really annoying to temporarily disable filter-process. I did find a way, but urk, this is horrible. Notice that, if it's interrupted with it disabled, it will remain disabled until the next time restagePointerFile runs. Which could be some time later. If the user runs `git add` or `git checkout` on a lot of small files before that, they will see slower than expected performance. (This commit also deletes where I wrote down the benchmark results earlier.) Sponsored-by: Noam Kremen on Patreon	2021-11-08 16:20:34 -04:00
Joey Hess	483e82ae0e	update	2021-11-05 10:53:11 -04:00
Joey Hess	a5a7d8433d	add pktLineHeaderLength	2021-11-04 15:37:39 -04:00
Joey Hess	218e1983ad	reorg	2021-11-04 15:03:12 -04:00
Joey Hess	68257e9076	add git-annex filter-process filter-process: New command that can make git add/checkout faster when there are a lot of unlocked annexed files or non-annexed files, but that also makes git add of large annexed files slower. Use it by running: git config filter.annex.process 'git-annex filter-process' Fully tested and working, but I have not benchmarked it at all. And, incremental hashing is not done when git add uses it, so extra work is done in that case. Sponsored-by: Mark Reidenbach on Patreon	2021-11-04 15:02:36 -04:00
Joey Hess	d706b49979	handle unhandled case	2021-11-04 14:36:48 -04:00
Joey Hess	b1f9dadafe	git long-running filter process implementation This module is not used yet, but the plan is to use it for smudge/clean filtering, at least as an option. In some circumstances, using this interface may perform better than the interface git-annex is currently using. Sponsored-by: Brock Spratlen on Patreon	2021-11-03 15:41:26 -04:00
Joey Hess	e9685aac5b	git pkt-line implementation This module is not used yet, but the plan is to implement the long running filter process for smudge/clean. Sponsored-by: Shae Erisson on Patreon	2021-11-03 15:30:25 -04:00
Joey Hess	b36cc0320e	avoid crashing tilde expansion on user who does not exist git does not crash when there's a remote configured for a user who does not exist, and this prevents git-annex from crashing too. Consider that a user might exist on one system but not another, and the git repo be moved between systems. So not crashing is desirable. Note that git fetch seems to mishandle a remote path like ~foo/bar when the user does not exist. While it does access ./~foo/bar, and gets as far as running git-upload-pack on the path, it then complains there is no such repo. So different parts of git seem to be doing different things in that edge case. Anyway, git-annex does not need to be bug-for-bug compatible with git. Sponsored-by: Jack Hill on Patreon	2021-10-13 09:16:36 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	1dc82f177f	use bytestring filepaths more This should be more efficient, and allocate less. Sponsored-by: Graham Spencer on Patreon	2021-10-05 15:44:02 -04:00
Joey Hess	ee31698825	remove errant print debug	2021-10-03 18:18:04 -04:00
Joey Hess	9012fa0187	reinject: Fix crash when reinjecting a file from outside the repository Commit `4bf7940d6b` introduced this problem, but was otherwise doing a good thing. Problem being that fileRef "/foo" used to return ":./foo", which was actually wrong, but as long as there was no foo in the local repository, catKey could operate on it without crashing. After that fix though, fileRef would return eg "../../foo", resulting in fileRef returning ":./../../foo", which will make git cat-file crash since that's not a valid path in the repo. Fix is simply to make fileRef detect paths outside the repo and return Nothing. Then catKey can be skipped. This needed several bugfixes to dirContains as well, in previous commits. In Command.Smudge, this led to needing to check for Nothing. That case should actually never happen, because the fileoutsiderepo check will detect it earlier. Sponsored-by: Brock Spratlen on Patreon	2021-10-01 14:06:34 -04:00
Joey Hess	e47b4badb3	separate handles for cat-file and cat-file --batch-check This avoids starting one process when only the other one is needed. Eg in git-annex smudge --clean, this reduces the total number of cat-file processes that are started from 4 to 2. The only performance penalty is that when both are needed, it has to do twice as much work to maintain the two Maps. But both are very small, consisting of 1 or 2 items, so that work is negligible. Sponsored-by: Dartmouth College's Datalad project	2021-09-24 13:16:13 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	33a80d083a	sync --quiet * sync: When --quiet is used, run git commit, push, and pull without their ususual output. * merge: When --quiet is used, run git merge without its usual output. This might also make --quiet work better for some other commands that make commits, like git-annex adjust. Sponsored-by: Kevin Mueller on Patreon	2021-07-19 11:28:47 -04:00
Joey Hess	fcd1b93a7d	whereused --historical Does not check the reflog, but otherwise works. It's possible for it to display something that is not an annexed file, if a non-annexed file somehow ends up containing something that looks like the key's name. This seems very unlikely to happen, and it would add a lot of complexity to detect it and somehow skip over that file, since the git log would need to either be run again, or not limited to 1 result and canceled once enough results have been read. Also, it kind of seems ok, if a file refers to a key, to consider that as a place the key was used, for some definition of used. So, I punted on dealing with that. May revisit later. Sponsored-by: Brock Spratlen on Patreon	2021-07-14 15:38:28 -04:00
Joey Hess	d2c48404a8	assistant: Avoid unncessary git repository repair In a situation where git fsck gets confused about a commit that is made while it's running. Sponsored-by: Graham Spencer on Patreon	2021-06-30 18:00:16 -04:00
Joey Hess	75d29de8d4	avoid using CopyFile If .git/config symlinks are ever a thing, this will handle them better. But mostly, this is to avoid git-repair needing to include CopyFile, which would complicate it unduely.	2021-06-29 13:21:21 -04:00
Joey Hess	6b0d732746	repair: Fix reversion in version 8.20200522 that prevented fetching missing objects from remotes In commit `dfc4e641b5` git repair was changed to use remote name, not url, when fetching. But it fetches into a temporary git repo, which doesn't have remotes configured. Oops. (In my defense, that commit was made just as covid lockdown started. But testing? Urk.) Sponsored-by: Mark Reidenbach on Patreon	2021-06-29 13:15:15 -04:00
Joey Hess	199391befe	make repair interruption safe Fixed bug that interrupting git-annex repair (or assistant) while it was fixing repository corruption would lose objects that were contained in pack files. Unpack all pack files and move objects into place before deleting the pack files. The old approach moved the pack files to a temp directory before unpacking them, which was not interruption safe. Sponsored-By: Jochen Bartl on Patreon	2021-06-29 13:14:28 -04:00
Joey Hess	70dbe61fc2	remove unnecessary liftIO	2021-06-07 14:51:12 -04:00
Joey Hess	efae085272	fixed reconcileStaged crash when index is locked or in conflict Eg, when git commit runs the smudge filter. Commit `428c91606b` introduced the crash, as write-tree fails in those situations. Now it will work, and git-annex always gets up-to-date information even in those situations. It does need to do a bit more work, each time git-annex is run with the index locked. Although if the index is unmodified from the last time write-tree succeeded, that work is avoided.	2021-05-24 11:33:23 -04:00
Joey Hess	984034f335	filter-branch working aside from some edge cases Added a note to man page about what happens to information that is recorded in the private journal. Since it uses Branch.get, that information will be copied when options allow. It seemed better to allow it and document it than not allow it, since the options allow excluding repositories and so can be used to exclude private repos if desired.	2021-05-17 13:24:58 -04:00
Joey Hess	1d16654a22	convert formatLsTree to ByteString for speed	2021-05-17 10:46:24 -04:00
Joey Hess	4bf7940d6b	fileRef: make paths relative and simplified Fix behavior of several commands, including reinject, addurl, and rmurl when given an absolute path to an unlocked file, or a relative path that leaves and re-enters the repository. To avoid slowing down all the cases where the paths are already ok with an unncessary call to getCurrentDirectory, put in an optimisation in relPathCwdToFile. That will probably also speed up other parts of git-annex by some small amount, but I have not benchmarked. Note that I did not convert branchFileRef, because it seems likely that it will be used with a file that is not provided by the user, so is already in a sane format. This is certainly true for the way git-annex uses it, though maybe arguable to the extent Git.Ref is a reusable library.	2021-05-07 13:25:59 -04:00
Joey Hess	32138b8cd8	implement annex.privateremote and remote.name.private configs The slightly unusual parsing in Types.GitConfig avoids the need to look at the remote list to get configs of remotes. annexPrivateRepos combines all the configs, and will only be calculated once, so it's nice and fast. privateUUIDsKnown and regardingPrivateUUID now need to read from the annex mvar, so are not entirely free. But that overhead can be optimised away, as seen in getJournalFileStale. The other call sites didn't seem worth optimising to save a single MVar access. The feature should have impreceptable speed overhead when not being used.	2021-04-23 14:21:57 -04:00
Joey Hess	0e830b6bb5	make remoteKeyToRemoteName safer If it's passed a ConfigKey such as annex.version, avoid returning an empty remote name and return Nothing instead. Also, foo.bar.baz is not treated as a remote named "bar".	2021-04-23 13:29:21 -04:00
Joey Hess	5712a7ef93	fix incomplete pattern match warning There was not really a bug here, because the 2 lists are always the same length, but the compiler does not know that.	2021-03-30 12:59:53 -04:00
Joey Hess	4611813ef1	Fix bug importing from a special remote into a subdirectory more than one level deep Which generated unusual git trees that could confuse git merge, since they incorrectly had 2 subtrees with the same name. Root of the bug was a) not testing that at all! but also b) confusing graftdirs, which contains eg "foo/bar" with non-recursively read trees, which would contain eg "bar" when reading a subtree of "foo". It's worth noting that Annex.Import uses graftTree, but it really shouldn't have needed to. Eg, when importing into foo/bar from a remote, it's enough to generate a tree of foo/bar/x, foo/bar/y, and does not include other files that are at the top of the master branch. It uses graftTree, so it does include the other files, as well as the foo/bar tree. git merge will do the same thing for both trees. With that said, switching it away from graftTree would result in another import generating a new commit that seems to delete files that were there in a previous commit, so it probably has to keep using graftTree since it used it before. This commit was sponsored by Kevin Mueller on Patreon.	2021-03-26 16:04:36 -04:00
Joey Hess	5d78cd9d08	Sped up git-annex init in a clone of an existing repository Seems that hasOrigin was never finding origin's git-annex branch, so a new one got created each time. And so then it later needed to merge the two branches, which is expensive. Added --no-track to git branch to avoid it displaying a message about setting up tracking branches. Of course there's no reason to make the git-annex branch a tracking branch since git-annex auto-merges it.	2021-03-23 15:23:13 -04:00
Joey Hess	a8b837aaef	add git ls-tree --long parser Not yet used, but allows getting the size of items in the tree fairly cheaply. I noticed that CmdLine.Seek uses ls-tree and the feeds the files into another long-running process to check their size. That would be an example of a place that might be sped up by using this. Although in that particular case, it only needs to know the size of unlocked files, not locked. And since enabling --long probably doubles the ls-tree runtime or more, the overhead of using it there may outwweigh the benefit.	2021-03-23 12:47:00 -04:00
Joey Hess	ed717cf646	fix handling of subtree I don't think this actually fixes any buggy behavior in git-annex, I just noticed that using treeItemToLsTreeItem and then serializing it resulted in something starting with "160000 blob" rather than "160000 commit"	2021-03-12 13:24:19 -04:00
Joey Hess	4b57e1c0ad	allow adjusttreeitem to remove submodules	2021-03-12 13:19:23 -04:00
Joey Hess	e07eabbf7f	Fix support for local gcrypt repositories with a space in their URI Git.Remote.parseRemoteLocation had a hack to handle URIs that contained characters like spaces, which is something git unfortunately allows despite not being a valid URI. However, that hack looked for "//" to guess something was an URI, and these gcrypt URIs, being to a local path, don't contain that. So instead escape all illegal characters and check if the resulting thing is an URI. And that was already done by Git.Construct.fromUrl, so internally the gcrypt URI with a space looks like "gcrypt::foo%20bar" and that needs to be de-escaped when converting back from URI to local repo path. This change might also allow a few other almost-valid URIs to be handled as URIs by git-annex. None that contain "//" will change, and any behavior change should result in git-annex doing closer to a right thing than it did before, probably. This commit was sponsored by Noam Kremen on Patreon.	2021-03-09 12:49:51 -04:00
Joey Hess	3a66cd715f	avoid making absolute git remote path relative When a git remote is configured with an absolute path, use that path, rather than making it relative. If it's configured with a relative path, use that. Git.Construct.fromPath changed to preserve the path as-is, rather than making it absolute. And Annex.new changed to not convert the path to relative. Instead, Git.CurrentRepo.get generates a relative path. A few things that used fromAbsPath unncessarily were changed in passing to use fromPath instead. I'm seeing fromAbsPath as a security check, while before it was being used in some cases when the path was known absolute already. It may be that fromAbsPath is not really needed, but only git-annex-shell uses it now, and I'm not 100% sure that there's not some input that would cause a relative path to be used, opening a security hole, without the security check. So left it as-is. Test suite passes and strace shows the configured remote url is used unchanged in the path into it. I can't be 100% sure there's not some code somewhere that takes an absolute path to the repo and converts it to relative and uses it, but it seems pretty unlikely that the code paths used for a git remote would call such code. One place I know of is gitAnnexLink, but I'm pretty sure that git remotes never deal with annex symlinks. If that did get called, it generates a path relative to cwd, which would have been wrong before this change as well, when operating on a remote.	2021-02-08 13:18:01 -04:00
Joey Hess	e3224ff77d	formatLsTree did not use a tab where git does Fixed that, and made parserLsTree accept the space as well as tab. Fixes a reversion that made import of a tree from a special remote result in a merge that deleted files that were not preferred content of that special remote.	2021-01-28 12:36:37 -04:00
Joey Hess	e7134ca1eb	avoid partial functions in Git.Url After the last commit, it was able to throw errors just due to an unparseable url. This avoids needing to worry about that, as long as the call site has already checked that it has a parseable url.	2021-01-18 15:07:23 -04:00
Joey Hess	2aa4fab62a	avoid crashing when there are remotes using unparseable urls Including the non-standard URI form that git-remote-gcrypt uses for rsync. Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port number. But git could have a remote with that, it would try to run git-remote-ook to handle it. So, git-annex has to allow for such things, rather than crashing. This commit was sponsored by Luke Shumaker on Patreon.	2021-01-18 14:59:08 -04:00
Joey Hess	5193aae385	Bug fix: Fix tilde expansion in ssh urls when the tilde is the last character in the url. Thanks, Grond for the patch.	2021-01-18 12:22:48 -04:00
Joey Hess	dc0caef297	merge from git-repair	2021-01-11 21:57:35 -04:00
Joey Hess	33bcee86f1	avoid using wildcard near bug kyle fixed	2021-01-07 13:44:23 -04:00
Kyle Meyer	fd161da2c2	adjustTree: Consider submodule deletions In addition to regular file deletions, the removefiles argument passed to adjustTree may contain removed submodules. When making the new tree, filter these out in the same way that is done for regular files so that the deletion is propagated.	2021-01-07 13:43:09 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	1c5fc8f047	Git.Queue: allow providing git common options like -c	2021-01-04 12:51:55 -04:00
Joey Hess	cd776ecb2e	avoid combining queued commands with different params I don't think this affected git-annex currently, but if the same command was queued twice with different params, one set of params was thrown away, and the files going with those were run with the other set of params.	2021-01-04 12:41:19 -04:00
Joey Hess	5d8e4a7c74	avoid borg list of archives that have been listed before This makes sync a lot faster in the common case where there's no new backup. There's still room for it to be faster. Currently the old imported tree has to be traversed, to generate the ImportableContents. Which then gets turned around to generate the new imported tree, which is identical. So, it would be possible to just return a "no new imports", or an ImportableContents that has a way to graft in a tree. The latter is probably too far to go to optimise this, unless other things need it. The former might be worth it, but it's already pretty fast, since git ls-tree is pretty fast.	2020-12-22 14:06:40 -04:00
Joey Hess	a3b714ddd9	finish fixing removeLink on windows `9cb250f7be` got the ones in RawFilePath, but there were others that used the one from unix-compat, which fails at runtime on windows. To avoid this, import System.PosixCompat.Files hiding removeLink This commit was sponsored by Ethan Aubin.	2020-11-24 13:20:44 -04:00
Joey Hess	804808d569	squash build warnings on windows	2020-11-23 14:00:17 -04:00
Joey Hess	fcb1d67b41	fix build on windows	2020-11-20 12:53:25 -04:00
Joey Hess	ff0927bde9	converted reads from stderr to use hGetLineUntilExitOrEOF These are all unlikely to suffer from the inherited stderr fd problem, but who knows, it could happen.	2020-11-19 16:21:17 -04:00
Joey Hess	66497d39b3	convert git config reading to use hGetLineUntilExitOrEOF Much nicer than the old hack of waiting for a few seconds for stderr to be read.	2020-11-19 15:38:43 -04:00
Joey Hess	6b63278f31	init: When writing hook scripts, set all execute bits, not only the user execute bit	2020-11-17 13:31:12 -04:00
Joey Hess	0121f5f6d3	support parsing numeric git configs as bool I'm not sure if git documents it aside from 0 and 1, but any integer can be interpreted as a bool by it. Doing the same in git-annex is good for consistency. Also, I am planning a config that starts out as a numeric range, but will later transition to a simple bool (hopefully), which this interpretation supports well.	2020-11-16 10:09:25 -04:00
Joey Hess	885974be99	add newtypes for QuickCheck to avoid LANG=C issues All properties changed to use them, except for prop_encode_c_decode_c_roundtrip, which already filtered to ascii for other reasons. A few modules had to be split out, because Setup does not build-depend on QuickCheck.	2020-11-09 20:21:18 -04:00
Joey Hess	2c8cf06e75	more RawFilePath conversion Converted file mode setting to it, and follow-on changes. Compiles up through 369/646. This commit was sponsored by Ethan Aubin.	2020-11-05 18:45:37 -04:00
Joey Hess	5a1e73617d	finished this stage of the RawFilePath conversion Finally compiles again, and test suite passes. This commit was sponsored by Brock Spratlen on Patreon.	2020-11-04 14:20:37 -04:00
Joey Hess	eb42cd4d46	more RawFilePath conversion 535/645 This commit was sponsored by Brett Eisenberg on Patreon.	2020-11-03 10:11:04 -04:00
Joey Hess	55400a03d3	more RawFilePath conversion This commit was sponsored by Luke Shumaker on Patreon.	2020-11-02 16:31:28 -04:00
Joey Hess	87f91ce563	more RawFilePath conversion 451/645	2020-10-30 15:55:59 -04:00
Joey Hess	681b44236a	more RawFilePath conversion at 377/645 This commit was sponsored by Svenne Krap on Patreon.	2020-10-29 14:20:57 -04:00
Joey Hess	f45ad178cb	more RawFilePath conversion At 318/645 after 4k lines of changes This commit was sponsored by Jake Vosloo on Patreon.	2020-10-29 12:03:50 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	8d66f7ba0f	more RawFilePath conversion Added a RawFilePath createDirectory and kept making stuff build. Up to 296/645 This commit was sponsored by Mark Reidenbach on Patreon.	2020-10-28 17:25:59 -04:00
Joey Hess	b8bd2e45e3	more RawFilePath conversion Notable wins in Annex.Locations which was sometimes doing 6 conversions in a single function call. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-10-28 16:24:14 -04:00
Joey Hess	6c29817748	RawFilePath version of getCurrentDirectory This commit was sponsored by Jochen Bartl on Patreon	2020-10-28 16:03:45 -04:00
Joey Hess	08cbaee1f8	more RawFilePath conversion Most of Git/ builds now. Notable win is toTopFilePath no longer double converts This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-10-28 15:55:30 -04:00
Joey Hess	f167851628	Revert "pass --git-dir, rather than changing cwd" This reverts commit `c142696c58`. It turns out it was not needed; `681313dfd4` fixed up the git dir, so setting cwd to it works ok. But worst, this commit broke the test suite massively. I don't understand how. git-annex get was failing. Very weirdly, git-annex find in a fresh clone of an annex repo, during autoinit, was displaying a side message -- but side messages are disabled when running find.	2020-10-23 16:09:50 -04:00
Joey Hess	681313dfd4	deal with .git pointer file in Git.CurrentRepo This fixes the bug. Note, it's only done when GIT_DIR is set. When it's not set, Git.Construct already handled it. This is why it was only noticed with this git submodule command. This commit was sponsored by Brett Eisenberg on Patreon.	2020-10-23 14:56:12 -04:00
Joey Hess	c142696c58	pass --git-dir, rather than changing cwd If .git is a gitlink file, setting cwd to it will fail, but --git-dir will succeed. And this is the only place where it sets cwd when running git, everywhere else already uses --git-dir. Note that, git-annex's submodule fixup code usually converts gitlink files to symlinks, so this wasn't usually problem. Still, worth fixing. This commit was sponsored by Svenne Krap on Patreon.	2020-10-23 13:36:56 -04:00

1 2 3 4 5 ...

786 commits