git-annex

Author	SHA1	Message	Date
Joey Hess	2b014f1a8b	don't frontload reconcileStaged in git-annex init init: Avoid scanning for annexed files, which can be lengthy in a large repository. Instead that scan is done on demand. This lets git-annex init be run and some query commands be used in a repository without waiting. Note that autoinit already behaved this way, so while this will mean some commands like git-annex get/unlock/add will do the scan the first time run, that is not really a significant behavior change. And, it's really better to have a consistent behavior. The reason for the inconsistency was a strange bug discussed in `b3c4579c79`. Avoiding reconcileStaged in init will keep avoiding whatever that was. Sponsored-by: Dartmouth College's DANDI project	2022-11-18 13:58:47 -04:00
Joey Hess	b2cc63d5bf	export: fix multi-file delete bug export: Fix a bug that left a file on a special remote when two files with the same content were both deleted in the exported tree. Case of the wrong data structure leading to the wrong result. The DiffMap now contains all the old filenames, and all the new filenames. Note that, when 2 files with the same content are both renamed, it only renames the first, but deletes and re-exports the second. Improving that is possible, but it would need to use a different temporary filename. Anyway, that is an unusual case, and there are known to be other unusual cases where export does not rename with maximum efficiency, IIRC. (Or maybe this is the case that I remember?) Sponsored-by: Dartmouth College's OpenNeuro project	2022-11-09 16:24:37 -04:00
Joey Hess	14f7a386f0	Make git-annex enable-tor work when using the linux standalone build Clean the standalone environment before running the su command to run "sh". Otherwise, PATH leaked through, causing it to run git-annex.linux/bin/sh, but GIT_ANNEX_DIR was not set, which caused that script to not work: [2022-10-26 15:07:02.145466106] (Utility.Process) process [938146] call: pkexec ["sh","-c","cd '/home/joey/tmp/git-annex.linux/r' && '/home/joey/tmp/git-annex.linux/git-annex' 'enable-tor' '1000'"] /home/joey/tmp/git-annex.linux/bin/sh: 4: exec: /exe/sh: not found Changed programPath to not use GIT_ANNEX_PROGRAMPATH, but instead run the scripts at the top of GIT_ANNEX_DIR. That works both when the standalone environment is set up, and when it's not. Sponsored-by: Kevin Mueller on Patreon	2022-10-26 15:45:08 -04:00
Joey Hess	731e806c96	use lookupKeyStaged in --batch code paths Make --batch mode handle unstaged annexed files consistently whether the file is unlocked or not. Before this, a unstaged locked file would have the symlink on disk examined and operated on in --batch mode, while an unstaged unlocked file would be skipped. Note that, when not in batch mode, unstaged files are skipped over too. That is actually somewhat new behavior; as late as 7.20191114 a command like `git-annex whereis .` would operate on unstaged locked files and skip over unstaged unlocked files. That changed during optimisation of CmdLine.Seek with apparently little fanfare or notice. Turns out that rmurl still behaved that way when given an unstaged file on the command line. It was changed to use lookupKeyStaged to handle its --batch mode. That also affected its non-batch mode, but since that's just catching up to the change earlier made to most other commands, I have not mentioed that in the changelog. It may be that other uses of lookupKey should also change to lookupKeyStaged. But it may also be that would slow down some things, or lead to unwanted behavior changes, so I've kept the changes minimal for now. An example of a place where the use of lookupKey is better than lookupKeyStaged is in Command.AddUrl, where it looks to see if the file already exists, and adds the url to the file when so. It does not matter there whether the file is staged or not (when it's locked). The use of lookupKey in Command.Unused likewise seems good (and faster). Sponsored-by: Nicholas Golder-Manning on Patreon	2022-10-26 14:43:06 -04:00
Joey Hess	b2ee2496ee	remove whenAnnexed and ifAnnexed In preparation for adding a new variation on lookupKey. Sponsored-by: Max Thoursie on Patreon	2022-10-26 14:06:32 -04:00
Joey Hess	ba7ecbc6a9	avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project	2022-10-12 14:12:23 -04:00
Joey Hess	c2ad84b423	all keys are still present on versioned remote after import of a tree When importing from versioned remotes, fix tracking of the content of deleted files. Only S3 supports versioning so far, so only it was affected. But, the draft import/export interface for external remotes also seemed to need a change, so that versionedExport could be set.	2022-10-11 13:05:40 -04:00
Joey Hess	4a42c69092	take lock in checkLogFile and calcLogFile move: Fix openFile crash with -J This does make them a bit slower, although usually the log file is not very big, so even when it's being rewritten, they will not block for long taking the lock. Still, little slowdowns may add up when moving a lot file files. A less expensive fix would be to use something lower level than openFile that does not check if the file is already open for write by another thread. But GHC does not seem to provide anything convenient; even mkFD checks for a writing thread. fullLines is no longer necessary since these functions no longer will read the file while it's being written. Sponsored-by: Dartmouth College's DANDI project	2022-10-07 13:19:17 -04:00
Joey Hess	44d763468a	add missing whitespace in warning message	2022-10-04 13:30:22 -04:00
Joey Hess	70d2ece381	improve usage These commands operate on not only remotes, but any way a repository can be specified, including "here" etc. Sponsored-by: Graham Spencer on Patreon	2022-10-03 13:49:42 -04:00
Joey Hess	15f9fcbcb1	avoid combining multiple words provided to trust/untrust/dead * trust, untrust, semitrust, dead: Fix behavior when provided with multiple repositories to operate on. * trust, untrust, semitrust, dead: When provided with no parameters, do not operate on a repository that has an empty name. The man page and usage already indicated that multiple repos could be provided to these commands, but they actually used unwords to combine everything into string, and found a repo matching that string. This was especially bad when no parameters resulted in the empty string and some repo happened to have an empty description. This does change the behavior, and it's possible someone relied on the current behavior to eg, trust a repo by name with the name not quoted into a single parameter. But fixing the empty string bug and matching the documentation are worth breaking that usage. Note that git-annex init/reinit do still unwords multiple parameters when provided to them. That is inconsistent behavior, but it certianly seems possible that something does run git-annex init with an unquoted description, and I don't think it's worth breaking that just to make it more consistent with these other commands. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2022-10-03 13:48:40 -04:00
Joey Hess	ce65f11de0	enable-tor: Fix breakage caused by git's fix for CVE-2022-24765 This relies on `bfa451fc4e` and is a bit of an ugly hack. Sponsored-by: Noam Kremen on Patreon	2022-09-26 14:48:58 -04:00
Joey Hess	2478e9e03a	restage: New git-annex command, handles restaging unlocked files This is much easier and less failure-prone than having the user run git update-index --refresh themselves. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 16:29:59 -04:00
Joey Hess	b17e328175	avoid unncessary locking by checkLogFile Like the comment says, this works without locking. It looks like I originally copied another function and forgot to remove the locking. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 14:01:43 -04:00
Joey Hess	451a7ce77f	vicfg: Include mincopies configuration Sponsored-by: k0ld on Patreon	2022-09-15 15:11:59 -04:00
Joey Hess	eefc026370	fix reversion on skipping dead keys in --all/bare Fix a reversion that made dead keys not be skipped when operating on all keys via --all or in a bare repo. (Introduced in version 8.20200720) Also, improved the documentation of git-annex-dead, it does not only apply to fsck --all. Also, made git-annex fsck, when run on a file whose key is dead, display that. Before, it displayed that only when run with --all, but with this fix, it skips dead keys with --all. But it can still be run on a file that uses a dead key, and displaying "This key is dead" explains to the user why it does not consider missing content for it to be a problem. Sponsored-by: k0ld on Patreon	2022-09-13 14:38:13 -04:00
Joey Hess	c62fe5e9a8	avoid redundant prompt for http password in git-annex get that does autoinit autoEnableSpecialRemotes runs a subprocess, and if the uuid for a git remote has not been probed yet, that will do a http get that will prompt for a password. And then the parent process will subsequently prompt for a password when getting annexed files from the remote. So the solution is for autoEnableSpecialRemotes to run remoteList before the subprocess, which will probe for the uuid for the git remote in the same process that will later be used to get annexed files. But, Remote.Git imports Annex.Init, and Remote.List imports Remote.Git, so Annex.Init cannot import Remote.List. Had to pass remoteList into functions in Annex.Init to get around this dependency loop.	2022-09-09 14:43:43 -04:00
Joey Hess	d4fd966396	avoid dup check of guardSafeToUseRepo Speeds up init slightly, and reduces the number of syscalls by the dynamic linker. Sponsored-by: Dartmouth College's Datalad project	2022-08-29 13:52:58 -04:00
Joey Hess	94029995fa	fix git-annex add regression on deleted file Fix a regression in 10.20220624 that caused git-annex add to crash when there was an unstaged deletion. Sponsored-by: Dartmouth College's Datalad project	2022-08-19 12:55:49 -04:00
Joey Hess	e60766543f	add annex.dbdir (WIP) WIP: This is mostly complete, but there is a problem: createDirectoryUnder throws an error when annex.dbdir is set to outside the git repo. annex.dbdir is a workaround for filesystems where sqlite does not work, due to eg, the filesystem not properly supporting locking. It's intended to be set before initializing the repository. Changing it in an existing repository can be done, but would be the same as making a new repository and moving all the annexed objects into it. While the databases get recreated from the git-annex branch in that situation, any information that is in the databases but not stored in the branch gets lost. It may be that no information ever gets stored in the databases that cannot be reconstructed from the branch, but I have not verified that. Sponsored-by: Dartmouth College's Datalad project	2022-08-11 16:58:53 -04:00
Joey Hess	21cfd0ea98	fix reversion `3a513cfe73` caused a reversion in addurl. The type of addSmall changed, but the void prevented the type checker from helping notice this. Since it now returns a CommandPerform, the cleanup action has to be run. Sponsored-by: Dartmouth College's Datalad project	2022-08-09 13:49:30 -04:00
Joey Hess	3a513cfe73	add --dry-run: New option This is intended for users who want to see what it would output in order to eg, check if a file would be added to git or the annex. It is not intended as a way for scripts to get information. Sponsored-by: Dartmouth College's Datalad project	2022-08-03 11:16:04 -04:00
Joey Hess	570b1aa6a1	Allow find --branch to be used in a bare repository, the same as the deprecated findref can be This will allow later fully deprecating and removing findref. Sponsored-by: Erik Bjäreholt on Patreon	2022-07-29 12:52:12 -04:00
Joey Hess	be19a68276	new matching options --want-get-by and --want-drop-by Sponsored-by: Graham Spencer on Patreon	2022-07-28 13:26:03 -04:00
Joey Hess	2d65c4ff1d	avoid unix-compat's rename On Windows, that does not support long paths https://github.com/jacobstanley/unix-compat/issues/56 Instead, use System.Directory.renamePath, which does support long paths. Sponsored-by: Dartmouth College's Datalad project	2022-07-12 14:55:02 -04:00
Joey Hess	201e41cffd	add: Fix reversion when adding an annex link that has been moved to another directory Fixes commit `f259be7f39` Sponsored-by: Dartmouth College's Datalad project	2022-07-05 16:22:41 -04:00
Joey Hess	149d12f188	support --backend again in addurl and importfeed Missed these two when converting from a global option. Sponsored-by: Dartmouth College's Datalad project	2022-07-05 15:35:43 -04:00
Joey Hess	cddcabfbb5	include filename in fsck warning So it's available in --quiet mode. The same was already done in other fsck warnings. Sponsored-by: Noam Kremen on Patreon	2022-06-29 14:33:35 -04:00
Joey Hess	b223988e22	remove --backend from global options --backend is no longer a global option, and is only accepted by commands that actually need it. Three commands that used to support backend but don't any longer are watch, webapp, and assistant. It would be possible to make them support it, but I doubt anyone used the option with these. And in the case of webapp and assistant, the option was handled inconsistently, only taking affect when the command is run with an existing git-annex repo, not when it creates a new one. Also, renamed GlobalOption etc to AnnexOption. Because there are many options of this type that are not actually global (any more) and get added to commands that need them. Sponsored-by: Kevin Mueller on Patreon	2022-06-29 13:33:25 -04:00
Joey Hess	543993b068	remove redundant pattern match brilliant spot by new ghc	2022-06-28 15:40:27 -04:00
Joey Hess	cb9cf30c48	move several readonly values to AnnexRead This improves performance to a small extent in several places. Sponsored-by: Tobias Ammann on Patreon	2022-06-28 15:40:19 -04:00
Joey Hess	debcf86029	use RawFilePath version of rename Some small wins, almost certianly swamped by the system calls, but still worthwhile progress on the RawFilePath conversion. Sponsored-by: Erik Bjäreholt on Patreon	2022-06-22 16:47:34 -04:00
Joey Hess	d00e23cac9	RawFilePath optimisations	2022-06-22 16:20:08 -04:00
Joey Hess	af1a45c69c	use replaceWorkTreeFile when fixing an annex symlink This does not change any behavior, but it's useful for all worktree changes to be made using this. Sponsored-by: Graham Spencer on Patreon	2022-06-22 13:41:41 -04:00
Joey Hess	f259be7f39	fix overwrite race with small file that got large When adding a small file, it does not get locked down, so can be modified after git-annex checks that it's small. The use of queued git add made the race window nice and wide too. Fixed by checking if the file has changed, and by not using git add. Instead, have to recapitulate git add's handling of things like symlinks and executable files. Sponsored-by: Jochen Bartl on Patreon	2022-06-14 16:38:56 -04:00
Joey Hess	78a3d44ea0	get rid of racy addLink The remaining callers all did not rely on it checking gitignore, so were easy to convert. They were susceptable to the same overwrite race as add and fix, although less likely to have it and a narrower window than add's race. Command.Rekey in passing got an unncessary call to removeFile deleted. addSymlink handles deleting any existing worktree file.	2022-06-14 14:47:15 -04:00
Joey Hess	64c7f60f7a	fixed overwrite race with git-annex fix Similar to git-annex add, git-annex fix queued git add, so if a file got modified before git add ran, the wrong content would be staged, perhaps a large file content. Sponsored-by: Brock Spratlen on Patreon	2022-06-14 14:19:58 -04:00
Joey Hess	5ef79125ad	fix overwrite race with git-annex add of annex symlink In the unlikely case where git-annex add is run on an annex symlink that is not already added, and while it's processing it, the annex symlink is overwritten with something else, avoid git-annex overwriting that with the symlink again. Sponsored-by: Jack Hill on Patreon	2022-06-14 14:00:13 -04:00
Joey Hess	dd6dec4eb1	fix add overwrite race with git-annex add to annex This is not a complete fix for all such races, only the one where a large file gets changed while adding and gets added to git rather than to the annex. addLink needs to go away, any caller of it is probably subject to the same kind of race. (Also, addLink itself fails to check gitignore when symlinks are not supported.) ingestAdd no longer checks gitignore. (It didn't check it consistently before either, since there were cases where it did not run git add!) When git-annex import calls it, it's already checked gitignore itself earlier. When git-annex add calls it, it's usually on files found by withFilesNotInGit, which handles checking ignores. There was one other case, when git-annex add --batch calls it. In that case, old git-annex behaved rather badly, it would seem to add the file, but git add would later fail, leaving the file as an unstaged annex symlink. That behavior has also been fixed. Sponsored-by: Brett Eisenberg on Patreon	2022-06-14 13:37:19 -04:00
Joey Hess	6d0b243d9d	avoid cleaning up move log when drop from remote fails move: Improve resuming a move that succeeded in transferring the content, but where dropping failed due to eg a network problem, in cases where numcopies checks prevented the resumed move from dropping the object from the source repository. This was earlier done for moves that got interrupted during the drop stage. Sponsored-by: Svenne Krap on Patreon	2022-06-09 15:26:25 -04:00
Joey Hess	c59ea5b1ca	info: Added --autoenable option Use cases include using git-annex init --no-autoenable and then going back and enabling the special remotes that have autoenable configured. As well as just querying to remember which ones have it enabled. It lists all special remotes that have autoenable=yes whether currently enabled or not. And it can be used with --json. I pondered making this "git-annex info autoenable", but that seemed wrong because then if the use has a directory named "autoenable", it's unclear what they are asking for. (Although "git-annex info remote" may be similarly unclear.) Making it an option does mean that it can't be provided via --batch though. Sponsored-by: Dartmouth College's Datalad project	2022-06-01 14:20:38 -04:00
Joey Hess	0d50c90794	init: Added --no-autoenable option Someone may disagree with what repositories are set to autoenable and it's good to have local overrides. See https://github.com/datalad/datalad/issues/6634 Sponsored-by: Dartmouth College's Datalad project	2022-06-01 13:27:49 -04:00
Joey Hess	aa414d97c9	make fsck normalize object locations The purpose of this is to fix situations where the annex object file is stored in a directory structure other than where annex symlinks point to. But it will also move object files from the hashdirmixed back to hashdirlower if the repo configuration makes that the normal location. It would have been more work to avoid that than to let it do it. Sponsored-by: Dartmouth College's Datalad project	2022-05-16 15:38:06 -04:00
Joey Hess	5a98f2d509	avoid creating content directory when locking content If the content directory does not exist, then it does not make sense to lock the content file, as it also does not exist, and so it's ok for the lock operation to fail. This avoids potential races where the content file exists but is then deleted/renamed, while another process sees that it exists and goes to lock it, resulting in a dangling lock file in an otherwise empty object directory. Also renamed modifyContent to modifyContentDir since it is not only necessarily used for modifying content files, but also other files in the content directory. Sponsored-by: Dartmouth College's Datalad project	2022-05-16 12:34:56 -04:00
Joey Hess	90950a37e5	support incremental verification when retrieving from export/import remotes None of the special remotes do it yet, but this lays the groundwork. Added MustFinishIncompleteVerify so that, when an incremental verify is started but not complete, it can be forced to finish it. Otherwise, it would have skipped doing it when verification is disabled, but verification must always be done when retrievin from export remotes since files can be modified during retrieval. Note that retrieveExportWithContentIdentifier doesn't support incremental verification yet. And I'm not sure if it can -- it doesn't know the Key before it downloads the content. It seems a new API call would need to be split out of that, which is provided with the key. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 12:25:04 -04:00
Joey Hess	8675b2b075	rename memoryUnits It's not just used for memory sizes.	2022-05-05 15:35:11 -04:00
Joey Hess	fd65de0eb9	multicast: Support uftp 5.0 by switching from aes256-cbc to aes256-gcm aes256-gcm is supported by both 4.x and 5.x, while 5.x dropped aes256-cbc. Sponsored-by: Graham Spencer on Patreon	2022-04-19 12:02:10 -04:00
Joey Hess	d266a41f8d	prevent numcopies or mincopies being configured to 0 Ignore annex.numcopies set to 0 in gitattributes or git config, or by git-annex numcopies or by --numcopies, since that configuration would make git-annex easily lose data. Same for mincopies. This is a continuation of the work to make data only be able to be lost when --force is used. It earlier led to the --trust option being disabled, and similar reasoning applies here. Most numcopies configs had docs that strongly discouraged setting it to 0 anyway. And I can't imagine a use case for setting to 0. Not that there might not be one, but it's just so far from the intended use case of git-annex, of managing and storing your data, that it does not seem like it makes sense to cater to such a hypothetical use case, where any git-annex drop can lose your data at any time. Using a smart constructor makes sure every place avoids 0. Note that this does mean that NumCopies is for the configured desired values, and not the actual existing number of copies, which of course can be 0. The name configuredNumCopies is used to make that clear. Sponsored-by: Brock Spratlen on Patreon	2022-03-28 15:20:34 -04:00
Joey Hess	6079b0c72c	fix reversion add: Avoid unncessarily converting a newly unlocked file to be stored in git when it is not modified, even when annex.largefiles does not match it. This fixes a reversion in version 10.20220222, where git-annex unlock followed by git-annex add, followed by git commit file could result in git thinking the file was modified after the commit. I do have half a mind to remove the withUnmodifiedUnlockedPointers part of git-annex add. It seems weird, despite that old bug report arguing a case of consistency that it ought to behave that way. When git-annex add surpises me, it seems likely it's wrong.. But for now, this is the smallest possible fix. Sponsored-by: Dartmouth College's Datalad project	2022-03-21 15:54:04 -04:00
Joey Hess	a314a8dfd0	add back lost packString The patch that removed it did not break anything, since the strings it's used on are all ASCII not unicode. But I like making sure to use packString everywhere just in case the code later changes in a way that needs it.	2022-03-02 18:22:38 -04:00

1 2 3 4 5 ...

2595 commits