git-annex

Author	SHA1	Message	Date
Joey Hess	1c4f4b449a	support --unlock-present adjustment of view branches When generating the view, check if the key is present. When syncing in a view branch with an adjustment, run adjustedBranchRefreshFull the same as is done when syncing in other adjusted branches. This is needed because the docs for git-annex adjust --unlock-present suggest using git-annex sync to update the branch when annex.adjustedbranchrefresh is not set. Note that, with annex.adjustedbranchrefresh set, it just works! The adjusted branch gets updated in the usual way and it doesn't matter that there's a view branch underneath. And of course, re-running git-annex adjut --unlock-present also works, as suggested in the docs. Sponsored-by: Erik Bjäreholt on Patreon	2023-02-27 15:37:57 -04:00
Joey Hess	7d839176c3	support generation of unlocked views Just make pointer files rather than symlinks, easy. As for the other adjustments: --lock is the default for views --fix happens automatically in views --hide-missing probably does not make sense when combined with views, because deleting a file from a view removes metadata --unlock-present will need a bit more work	2023-02-27 15:07:36 -04:00
Joey Hess	f09e299156	rawfilepath conversion	2023-02-27 15:06:32 -04:00
Joey Hess	cc32e31161	understand adjusted view branch names An adjusted view branch has a name like "refs/heads/adjusted/views/master(author=_)(unlocked)", so it is a view branch that has been converted to an adjusted branch. Made Logs.View support such branch names. So now git-annex sync and pre-commit handle updating metadata on commit in such a branch. Much remains to be done to fully support adjusted view branches, including actually applying the adjustment when updating the view branch. Sponsored-by: Graham Spencer on Patreon	2023-02-27 14:57:58 -04:00
Joey Hess	2a966f49f2	overwrite old adjusted view branch When git-annex adjust is run in a view branch, and the adjusted branch already exists, overwrite the old adjusted branch with the new one without being forced. Usually overwriting an adjusted branch is avoided because it could lose data. But when a view branch has been adjusted, there is no data to lose in the adjusted branch, because the only changes that can be made of significance are to move files between directories. Which changes metadata on commit. And the old branch has already been committed. Sponsored-by: Lawrence Brogan on Patreon	2023-02-27 14:35:27 -04:00
Joey Hess	9b1fe37818	improve adjusted branch name parsing to support adjusted view branches An adjusted view branch has a name like "adjusted/views/master(author=_)(unlocked)" and so the adjustment starts at the last open paren, not the first open paren. Note that git-annex sync still does not do anything useful when run in such a branch, because it does not realize that it is a view branch. This is only groundwork for adjusted view branches. This also fixes adjusted branches when the basis branch name contains parens for some other reason, though that is not common in a git branch name. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2023-02-27 14:09:05 -04:00
Joey Hess	da61d564f1	fix view reversion caused by optimisation view: Fix a reversion in 10.20230214 that omitted a file from a view when the file had no metadata set, but the view only used path fields. Sponsored-by: Jack Hill on Patreon	2023-02-16 15:18:17 -04:00
Joey Hess	826b225ca8	Sped up view branch construction by 50% A benchmark in my sound repository with `git-annex view feedtitle=*` took 2:52 wall clock time before and 1:58 after. Though it still only used 130% of CPU. This is the same kind of optimisation that is in seekFilteredKeys, though that precaches location logs while this streams the metadata logs direct to parsing them. seekFilteredKeys contains more streaming, to find the annexed files, and this could be further sped up with similar streaming. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-02-13 13:29:57 -04:00
Joey Hess	bb4550c7c1	sync: Warn when the adjusted basis ref cannot be found As happens eg when the user has renamed branches. Sponsored-by: Graham Spencer on Patreon	2023-02-10 14:33:21 -04:00
Joey Hess	5f9bf51438	sync in view branch updates the view branch * sync: When run in a view branch, refresh the view branch to reflect any changes that have been made to the parent branch or metadata. This is basically working, but probably needs some more work to deal with all the edge cases of things sync does. Sponsored-by: Lawrence Brogan on Patreon	2023-02-08 15:37:28 -04:00
Joey Hess	aa0350ff49	add directory to views for files that lack specified metadata * view: New field?=glob and ?tag syntax that includes a directory "_" in the view for files that do not have the specified metadata set. * Added annex.viewunsetdirectory git config to change the name of the "_" directory in a view. When in a view using the new syntax, old git-annex will fail to parse the view log. It errors with "Not in a view.", which is not ideal. But that only affects view commands. annex.viewunsetdirectory is included in the View for a couple of reasons. One is to avoid needing to warn the user that it should not be changed when in a view, since that would confuse git-annex. Another reason is that it helped with plumbing the value through to some pure functions. annex.viewunsetdirectory is actually mangled the same as any other view directory. So if it's configured to something like "N/A", there won't be multiple levels of directories, which would also confuse git-annex. Sponsored-By: Jack Hill on Patreon	2023-02-07 16:28:46 -04:00
Joey Hess	579d9b60c1	improve concurrency of move/copy --from --to Use separate stages for download and upload. In the common case where it downloads the file from one remote and then uploads to the other, those are by far the most expensive operations, and there's a decent chance the two remotes bottleneck on different resources. Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads will be started both downloading from the src remote. They will probably finish at the same time. Then two threads will be started uploading to the dst remote. They will probably take the same time as well. Before this change, it would alternate back and forth, bottlenecking on src and dst. With this change, as soon as the two threads start uploading to dst, two more threads are able to start, downloading from src. So bandwidth to both remotes is saturated more often. Other commands that use transferStages only send in one direction at a time. So the worker threads for the other direction will sit idle, and there will be no change in their behavior. Sponsored-by: Dartmouth College's DANDI project	2023-01-24 13:59:39 -04:00
Joey Hess	acc3f6211f	finishing up move --from --to Lock the local content for drop after getting it from src, to prevent another process from using the local content as a copy and dropping it from src, which would prevent dropping the local content after sending it to dest. Support resuming an interrupted move that downloaded the content from src, leaving the local content populated. In this case, the location log has not been updated to say the content is present locally, so we can assume that it's resuming and go ahead and drop the local content after sending it to dest. Note that if a `git-annex get` is being ran at the same time as a `git-annex move --from --to`, it may get a file just before the move processes it. So the location log has not been updated yet, and the move thinks it's resuming. Resulting in local copy being dropped after it's sent to the dest. This race is something we'll just have to live with, it seems. I also gave up on the idea of checking if the location log had been updated by a `git-annex get` that is ran at the same time. That wouldn't work, because the location log is precached in the seek stage, so reading it again after sending the content to dest would not notice changes made to it, unless the cache were invalidated, which would slow it down a lot. That idea anyway was subject to races where it would not detect the concurrent `git-annex get`. So concurrent `git-annex get` will have results that may be surprising. To make that less surprising, updated the documentation of this feature to be explicit that it downloads content to the local repository temporarily. Sponsored-by: Dartmouth College's DANDI project	2023-01-23 17:43:48 -04:00
Joey Hess	1abd457e98	push location log updating up to callers of download Prep for move --to --from, which needs to download from a src repo without updating the location log for the local repo, before sending the content on to the dest repo. Note that caller of download' already update the log themselves. See previous commit `a422a056f2` that pushed it up to download from getViaTmpFrom. (Also removed in passing a debug print + readline that I accidentially committed last week on this branch.) Sponsored-by: Dartmouth College's DANDI project	2023-01-23 13:47:41 -04:00
Joey Hess	cfaae7e931	added an optional cost= configuration to all special remotes Note that when this is specified and an older git-annex is used to enableremote such a special remote, it will simply ignore the cost= field and use whatever the default cost is. In passing, fixed adb to support the remote.name.cost and remote.name.cost-command configs. Sponsored-by: Dartmouth College's DANDI project	2023-01-12 13:42:28 -04:00
Joey Hess	2fa7656627	switch to readMaybe to handle values with leading number followed by non-number readish ignores a trailing string after a number, but to support values like "YYYY:MM:DD" which it makes sense to compare lexographically, require the whole string to be parsed as a number in order to enable numeric comparison. Sponsored-by: Max Thoursie on Patreon	2022-12-22 14:33:47 -04:00
Joey Hess	9d60385001	convert renameFile to moveFile to support cross-device moves Improve handling of some .git/annex/ subdirectories being on other filesystems, in the bittorrent special remote, and youtube-dl integration, and git-annex addurl. The only one of these that I've confirmed to be a problem is in the bittorrent special remote when .git/annex/tmp and .git/annex/othertmp are on different filesystems. As well as auditing for renameFile, also audited for createLink, all of those are ok as are the other remaining renameFile calls. Also audited all code paths that use .git/annex/othertmp, and did not find any other cross-device problems. So, removing mention of othertmp needing to be on the same device. Sponsored-by: Dartmouth College's Datalad project	2022-12-20 15:17:50 -04:00
Joey Hess	aa6919737c	--metadata lexicographical comparisons Change --metadata comparisons < > <= and >= to fall back to lexicographical comparisons when one or both values being compared are not numbers. Sponsored-by: Erik Bjäreholt on Patreon	2022-12-12 13:33:24 -04:00
Joey Hess	65f9e7a3c7	fix deadlock in restagePointerFiles Fix a hang that occasionally occurred during commands such as move. (A bug introduced in 10.20220927, in commit `6a3bd283b8`) The restage.log was kept locked while running a complex index refresh action. In an unusual situation, that action could need to write to the restage log, which caused a deadlock. The solution is a two-stage process. First the restage.log is moved to a work file, which is done with the lock held. Then the content of the work file is read and processed, which happens without the lock being held. This is all done in a crash-safe manner. Note that streamRestageLog may not be fully safe to run concurrently with itself. That's ok, because restagePointerFiles uses it with the index lock held, so only one can be run at a time. streamRestageLog does delete the restage.old file at the end without locking. If a calcRestageLog is run concurrently, it will either see the file content before it was deleted, or will see it's missing. Either is ok, because at most this will cause calcRestageLog to report more work remains to be done than there is. Sponsored-by: Dartmouth College's Datalad project	2022-12-08 14:36:11 -04:00
Joey Hess	43f681d4c1	Support parsing yt-dpl output to display download progress Before this fix, no progress was displayed when yt-dpl was used. Sponsored-by: Graham Spencer on Patreon	2022-11-21 15:04:36 -04:00
Joey Hess	5256be61c1	When youtube-dl is not available in PATH, use yt-dlp instead Debian is going to drop youtube-dl which is not active upstream, and yt-dlp is the replacement. This will make it be used if youtube-dl gets removed. If an old version of youtube-dl remains installed, git-annex will still use it. That might not be desirable, but changing git-annex to use yt-dlp in preference to youtube-dl when both are installed risks breaking when the user has annex.youtube-dl-options set to something that is supported by youtube-dl, but not by yt-dlp. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2022-11-21 14:40:33 -04:00
Joey Hess	2b014f1a8b	don't frontload reconcileStaged in git-annex init init: Avoid scanning for annexed files, which can be lengthy in a large repository. Instead that scan is done on demand. This lets git-annex init be run and some query commands be used in a repository without waiting. Note that autoinit already behaved this way, so while this will mean some commands like git-annex get/unlock/add will do the scan the first time run, that is not really a significant behavior change. And, it's really better to have a consistent behavior. The reason for the inconsistency was a strange bug discussed in `b3c4579c79`. Avoiding reconcileStaged in init will keep avoiding whatever that was. Sponsored-by: Dartmouth College's DANDI project	2022-11-18 13:58:47 -04:00
Joey Hess	14f7a386f0	Make git-annex enable-tor work when using the linux standalone build Clean the standalone environment before running the su command to run "sh". Otherwise, PATH leaked through, causing it to run git-annex.linux/bin/sh, but GIT_ANNEX_DIR was not set, which caused that script to not work: [2022-10-26 15:07:02.145466106] (Utility.Process) process [938146] call: pkexec ["sh","-c","cd '/home/joey/tmp/git-annex.linux/r' && '/home/joey/tmp/git-annex.linux/git-annex' 'enable-tor' '1000'"] /home/joey/tmp/git-annex.linux/bin/sh: 4: exec: /exe/sh: not found Changed programPath to not use GIT_ANNEX_PROGRAMPATH, but instead run the scripts at the top of GIT_ANNEX_DIR. That works both when the standalone environment is set up, and when it's not. Sponsored-by: Kevin Mueller on Patreon	2022-10-26 15:45:08 -04:00
Joey Hess	731e806c96	use lookupKeyStaged in --batch code paths Make --batch mode handle unstaged annexed files consistently whether the file is unlocked or not. Before this, a unstaged locked file would have the symlink on disk examined and operated on in --batch mode, while an unstaged unlocked file would be skipped. Note that, when not in batch mode, unstaged files are skipped over too. That is actually somewhat new behavior; as late as 7.20191114 a command like `git-annex whereis .` would operate on unstaged locked files and skip over unstaged unlocked files. That changed during optimisation of CmdLine.Seek with apparently little fanfare or notice. Turns out that rmurl still behaved that way when given an unstaged file on the command line. It was changed to use lookupKeyStaged to handle its --batch mode. That also affected its non-batch mode, but since that's just catching up to the change earlier made to most other commands, I have not mentioed that in the changelog. It may be that other uses of lookupKey should also change to lookupKeyStaged. But it may also be that would slow down some things, or lead to unwanted behavior changes, so I've kept the changes minimal for now. An example of a place where the use of lookupKey is better than lookupKeyStaged is in Command.AddUrl, where it looks to see if the file already exists, and adds the url to the file when so. It does not matter there whether the file is staged or not (when it's locked). The use of lookupKey in Command.Unused likewise seems good (and faster). Sponsored-by: Nicholas Golder-Manning on Patreon	2022-10-26 14:43:06 -04:00
Joey Hess	b2ee2496ee	remove whenAnnexed and ifAnnexed In preparation for adding a new variation on lookupKey. Sponsored-by: Max Thoursie on Patreon	2022-10-26 14:06:32 -04:00
Joey Hess	6fbd337e34	avoid uncessary keys db writes; doubled speed! When running eg git-annex get, for each file it has to read from and write to the keys database. But it's reading exclusively from one table, and writing to a different table. So, it is not necessary to flush the write to the database before reading. This avoids writing the database once per file, instead it will buffer 1000 changes before writing. Benchmarking getting 1000 small files from a local origin, git-annex get now takes 13.62s, down from 22.41s! git-annex drop now takes 9.07s, down from 18.63s! Wowowowowowowow! (It would perhaps have been better if there were separate databases for the two tables. At least it would have avoided this complexity. Ah well, this is better than splitting the table in a annex.version upgrade.) Sponsored-by: Dartmouth College's Datalad project	2022-10-12 15:33:16 -04:00
Joey Hess	ba7ecbc6a9	avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project	2022-10-12 14:12:23 -04:00
Joey Hess	c2ad84b423	all keys are still present on versioned remote after import of a tree When importing from versioned remotes, fix tracking of the content of deleted files. Only S3 supports versioning so far, so only it was affected. But, the draft import/export interface for external remotes also seemed to need a change, so that versionedExport could be set.	2022-10-11 13:05:40 -04:00
Joey Hess	7059322a6c	Support "inbackend" in preferred content expressions Well, actually, fix a typo that has always been in the implementation of that. "inbacked" used to work, but let's not tell users about that; they might try to use it and expect git-annex to keep supporting the typo.. Sponsored-by: Jack Hill on Patreon	2022-09-26 16:06:49 -04:00
Joey Hess	b411a1ce74	remove unncessary do block Left by Reiko's patch	2022-09-26 13:10:25 -04:00
Reiko Asakura	1d48153bb8	Run freeze and thaw hooks on crippled filesystems The user sets these hooks deliberately so they should always be run. For example this allows hooks to be used to manage file permissions on NTFS volumes in WSL1.	2022-09-26 13:05:39 -04:00
Joey Hess	98eb5ff84f	fix windows build	2022-09-26 12:08:04 -04:00
Joey Hess	e62e4eaaf2	refector for legibility	2022-09-23 18:53:06 -04:00
Joey Hess	2478e9e03a	restage: New git-annex command, handles restaging unlocked files This is much easier and less failure-prone than having the user run git update-index --refresh themselves. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 16:29:59 -04:00
Joey Hess	f7146c153b	fix restaging of transferred files after stalldetection kicks in Sponsored-by: Dartmouth College's DANDI project	2022-09-23 15:55:40 -04:00
Joey Hess	6a3bd283b8	add restage log When pointer files need to be restaged, they're first written to the log, and then when the restage operation runs, it reads the log. This way, if the git-annex process is interrupted before it can do the restaging, a later git-annex process can do it. Currently, this lets a git-annex get/drop command be interrupted and then re-ran, and as long as it gets/drops additional files, it will clean up after the interrupted command. But more changes are needed to make it easier to restage after an interrupted process. Kept using the git queue to run the restage action, even though the list of files that it builds up for that action is not actually used by the action. This could perhaps be simplified to make restaging a cleanup action that gets registered, rather than using the git queue for it. But I wasn't sure if that would cause visible behavior changes, when eg dropping a large number of files, currently the git queue flushes periodically, and so it restages incrementally, rather than all at the end. In restagePointerFiles, it reads the restage log twice, once to get the number of files and size, and a second time to process it. This seemed better than reading the whole file into memory, since potentially a huge number of files could be in there. Probably the OS will cache the file in memory and there will not be much performance impact. It might be better to keep running tallies in another file though. But updating that atomically with the log seems hard. Also note that it's possible for calcRestageLog to see a different file than streamRestageLog does. More files may be added to the log in between. That is ok, it will only cause the filterprocessfaster heuristic to operate with slightly out of date information, so it may make the wrong choice for the files that got added and be a little slower than ideal. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 15:47:24 -04:00
Joey Hess	8718125ae4	refactor the restage runner Sponsored-by: Dartmouth College's DANDI project	2022-09-23 13:12:17 -04:00
Joey Hess	6e3c9bea2e	drain transferrer read handle when shutting it down Fixes updating git index file after getting an unlocked file when annex.stalldetection is set. The transferrer may want to send additional protocol messages when it's shut down. Closing the read handle prevented it from doing that, and caused it to crash rather than cleanly shutting down. Draining the handle without processing the protocol seemed ok to do, because anything it outputs is going to be some side message displayed at shutdown. Displaying those once per transferrer process that is running seems unncessary. Sponsored-by: Dartmouth College's DANDI project	2022-09-22 14:39:39 -04:00
Joey Hess	0ffc59d341	change retrieveExportWithContentIdentifier to take a list of ContentIdentifier This partly fixes an issue where there are duplicate files in the special remote, and the first file gets swapped with another duplicate, or deleted. The swap case is fixed by this, the deleted case will need other changes. This makes retrieveExportWithContentIdentifier take a list of allowed ContentIdentifier, same as storeExportWithContentIdentifier, removeExportWithContentIdentifier, and checkPresentExportWithContentIdentifier. Of the special remotes that support importtree, borg is a special case and does not use content identifiers, S3 I assume can't get mixed up like this, directory certainly has the problem, and adb also appears to have had the problem. Sponsored-by: Graham Spencer on Patreon	2022-09-20 13:19:42 -04:00
Joey Hess	d2c842e9a1	don't force use of conduit in withUrlOptionsPromptingCreds Use curl for downloads from git remotes when annex.url-options and other git configs are set. If the url needs a password, curl will fail, and git credential will not be used to prompt for it. But the user can set --netrc in url-options and put the password in the netrc file. This also means that url-options settings like -4 will take effect. That was the case before commit `1883f7ef8f` forced conduit to be used.	2022-09-09 16:07:32 -04:00
Joey Hess	c62fe5e9a8	avoid redundant prompt for http password in git-annex get that does autoinit autoEnableSpecialRemotes runs a subprocess, and if the uuid for a git remote has not been probed yet, that will do a http get that will prompt for a password. And then the parent process will subsequently prompt for a password when getting annexed files from the remote. So the solution is for autoEnableSpecialRemotes to run remoteList before the subprocess, which will probe for the uuid for the git remote in the same process that will later be used to get annexed files. But, Remote.Git imports Annex.Init, and Remote.List imports Remote.Git, so Annex.Init cannot import Remote.List. Had to pass remoteList into functions in Annex.Init to get around this dependency loop.	2022-09-09 14:43:43 -04:00
Joey Hess	9621beabc4	cache credentials in memory when doing http basic auth to a git remote When accessing a git remote over http needs a git credential prompt for a password, cache it for the lifetime of the git-annex process, rather than repeatedly prompting. The git-lfs special remote already caches the credential when discovering the endpoint. And presumably commands like git pull do as well, since they may download multiple urls from a remote. The TMVar CredentialCache is read, so two concurrent calls to getBasicAuthFromCredential will both prompt for a credential. There would already be two concurrent password prompts in such a case, and existing uses of `prompt` probably avoid it. Anyway, it's no worse than before.	2022-09-09 14:20:32 -04:00
Joey Hess	d4fd966396	avoid dup check of guardSafeToUseRepo Speeds up init slightly, and reduces the number of syscalls by the dynamic linker. Sponsored-by: Dartmouth College's Datalad project	2022-08-29 13:52:58 -04:00
Yaroslav Halchenko	0151976676	Typo fix unncessary -> unnecessary. Detected while reading recent CHANGELOG entry but then decided to apply to entire codebase and docs since why not?	2022-08-20 09:40:19 -04:00
Joey Hess	b801812660	init: probe if sqlite works Help the user get annex.dbdir configured when their filesystem is not one that sqlite works on. The change in Database.Handle makes an error from sqlite not be ignored besides being displayed, which it was before. I can't see any reason git-annex would want to ignore these errors. I chose to use the fsck database rather than the keys database because opening the keys database populates it, and see commit `b3c4579c79`. The placement of the call to checkSqliteWorks inside checkInitializeAllowed avoids annex.uuid getting set before it's called. Sponsored-by: Dartmouth College's Datalad project	2022-08-17 13:12:26 -04:00
Joey Hess	840bd50390	make it easier to use curl for unusual url schemes Use curl when annex.security.allowed-url-schemes includes an url scheme not supported by git-annex internally, as long as annex.security.allowed-ip-addresses is configured to allow using curl. Sponsored-by: Luke Shumaker on Patreon	2022-08-15 12:22:13 -04:00
Joey Hess	4cfe17a9e8	use a subdirectory of annex.dbdir This allows annex.dbdir to be set globally or always set to the same value when needed. Each repository uses a subdirectory of it. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 13:18:15 -04:00
Joey Hess	a335c1e46e	annex.dbdir fully working Completes work started in `e60766543f` I've verified that all the sqlite databases get stored in annex.dbdir and are created successfully. If annex.dbdir does not exist, it will be created; its parent directory must already exist though. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 13:06:58 -04:00
Joey Hess	23c6e350cb	improve createDirectoryUnder to allow alternate top directories This should not change the behavior of it, unless there are multiple top directories, and then it should behave the same as if there was a single top directory that was actually above the directory to be created. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 12:52:37 -04:00
Joey Hess	e60766543f	add annex.dbdir (WIP) WIP: This is mostly complete, but there is a problem: createDirectoryUnder throws an error when annex.dbdir is set to outside the git repo. annex.dbdir is a workaround for filesystems where sqlite does not work, due to eg, the filesystem not properly supporting locking. It's intended to be set before initializing the repository. Changing it in an existing repository can be done, but would be the same as making a new repository and moving all the annexed objects into it. While the databases get recreated from the git-annex branch in that situation, any information that is in the databases but not stored in the branch gets lost. It may be that no information ever gets stored in the databases that cannot be reconstructed from the branch, but I have not verified that. Sponsored-by: Dartmouth College's Datalad project	2022-08-11 16:58:53 -04:00
Joey Hess	a23fd7349f	work around git segfault Work around bug in git 2.37 that causes a segfault when when core.untrackedCache is set, and broke git-annex init. Depending on when git gets fixed and how widely the buggy versions are used, this could be reverted quite soon, or need to linger for a long time. It only makes git-annex init a tiny bit slower in a new repo. Sponsored-by: Max Thoursie on Patreon	2022-08-04 14:20:57 -04:00
Joey Hess	be19a68276	new matching options --want-get-by and --want-drop-by Sponsored-by: Graham Spencer on Patreon	2022-07-28 13:26:03 -04:00
Joey Hess	d905232842	use ResourcePool for hash-object handles Avoid starting an unncessary number of git hash-object processes when concurrency is enabled. Sponsored-by: Dartmouth College's DANDI project	2022-07-25 17:32:39 -04:00
Joey Hess	63cef2ae0b	v8 repositories automatically upgrade to v9 (And v9 later on to v10.) When v9/v10 were added, making v8 automatically upgrade was deferred "for a few months" to prevent interoperability problems if users also have an old version of git-annex. Of course that could still be the case, but there has been a good amount of time and this can't be put off forever. Allow setting annex.autoupgraderepository to false to avoid this upgrade. Previously, that only prevented upgrades from no longer supported git-annex versions, but v8 is still supported, and users may want to keep on v8 to interoperate with an old git-annex version. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2022-07-25 16:20:04 -04:00
Joey Hess	cbe12b9bc3	force fully strict read of journal file again I was thinking that discardIncompleteAppend would make it strict, since it looks at the end of the bytestring. But, it's applied lazily.. This probably fixes windows, which was failing: git-annex.exe: .git\annex\journal\trust.log: DeleteFile "\\\\?\\C:\\Users\\runneradmin\\.t\\5\\tmprepo22\\.git\\annex\\journal\\trust.log": permission denied (The process cannot access the file because it is being used by another process.)	2022-07-22 11:36:21 -04:00
Joey Hess	4e88137a28	prevent appends except when annex.alwayscompact=false I would like for a new repo version to enable appends, but to do so safely would need a v11 followed by a 1 year delay followed by a v12 that does it. Since a similar v9 and v10 transition is currently happening, and is less than 6 months along in most repos, it does not feel wise to stack up another year-long transition behind that. What if I need to hurry up a new repo version for some other change? Added todo so I remember to make this change at some time when a v11 and probably v12 repo version do make sense. Sponsored-by: Dartmouth College's DANDI project	2022-07-20 13:23:55 -04:00
Joey Hess	d275874e6c	handling of interrupted appends An append that is interrupted and writes part of a line is now dealt with by subsequent reads and appends. This also handles a read that happens at the same time as an append to the file. Old versions of git-annex will still see a partially written line, and could get confused. Since appends are currently done for url logs and location logs, the confusion is limited to a substring of the actual url or UUID of the remote being read. This will not affect writes, since the journal file is locked when reading in preparation for writing. However, the bad data can be output by git-annex and used by other things, or could cause surprising behavior by git-annex. Including eg, downloading the content of the wrong url. So, something needs to be done to prevent old versions of git-annex from running in a repository where this appending is being done.. Sponsored-by: Dartmouth College's DANDI project	2022-07-20 12:40:49 -04:00
Joey Hess	6f1fd3abdd	no locking of journal on read after all Finally have a final design, and it turns out not to need locking on read.	2022-07-20 10:57:28 -04:00
Joey Hess	d0860b7f0e	fix build After `28b0aaea54`	2022-07-18 16:44:32 -04:00
Joey Hess	28b0aaea54	re-add lock journal before reading journal files This reverts commit `2e6e9876e3`. This is gonna be needed after all.. The append will only be atomic if the journal is locked, because the file being appended will have to be moved out of the way to avoid an old version of git-annex seeing an incomplete write to it. When git-annex finds that the file is not in the journal, and checks the append location, locking will be needed to avoid a race causing it to miss it in the append location too due to it being moved back to the journal.	2022-07-18 16:40:25 -04:00
Joey Hess	36f0bdcd57	add annex.alwayscompact Added annex.alwayscompact setting which can be unset to speed up writes to the git-annex branch in some cases. Sponsored-by: Dartmouth College's DANDI project	2022-07-18 16:39:19 -04:00
Joey Hess	ccff639651	Merge branch 'master' into append	2022-07-18 14:17:15 -04:00
Joey Hess	de18d92de6	efficient but unsafe journal file append This is only for checking performance, it's not safe. Sponsored-by: Dartmouth College's DANDI project	2022-07-18 14:17:12 -04:00
Joey Hess	1c40b927aa	minor optimisation Avoid re-writing the file when the journal directory did not exist.	2022-07-18 13:50:35 -04:00
Joey Hess	2e6e9876e3	Revert "lock journal before reading journal files" This reverts commit `47358a6f95`. This added overhead, and will not be needed, because appends are going to have to be made atomic for other reasons than avoiding incomplete reads of data being appended. In particular, when git-annex is interrupted in the middle of an append, it must not leave the file with a partially written line. So appending has to somehow be made fully atomic.	2022-07-18 13:38:12 -04:00
Joey Hess	ce455223df	split out appending to journal from writing, high level only Currently this is not an improvement, but it allows for optimising appendJournalFile later. With an optimised appendJournalFile, this will greatly speed up access patterns like git-annex addurl of a lot of urls to the same key, where the log file can grow rather large. Appending rather than re-writing the journal file for each line can save a lot of disk writes. It still has to read the current journal or branch file, to check if it can append to it, and so when the journal file does not exist yet, it can write the old content from the branch to it. Probably the re-reads are better cached by the filesystem than repeated writes. (If the re-reads turn out to keep performance bad, they could be eliminated, at the cost of not being able to compact the log when replacing old information in it. That could be enabled by a switch.) While the immediate need is to affect addurl writes, it was implemented at the level of presence logs, so will also perhaps speed up location logs. The only added overhead is the call to isNewInfo, which only needs to compare ByteStrings. Helping to balance that out, it avoids compactLog when it's able to append. Sponsored-by: Dartmouth College's DANDI project	2022-07-18 13:22:50 -04:00
Joey Hess	47358a6f95	lock journal before reading journal files This is not currently necessary; journal files are updated atomically. However, for faster appends to large journal files, locking on read will be needed, because appends are not atomic. Sponsored-by: Dartmouth College's DANDI project	2022-07-15 14:43:29 -04:00
Joey Hess	a2b1f369d1	disable journalIgnorable in enableInteractiveBranchAccess Fix a reversion that prevented --batch commands (and the assistant) from noticing data written to the journal by other commands. I have not identified which commit broke this for sure, but probably it was `aeca7c2207` --batch commands that wrote to the journal avoided the problem since journalIgnorable sets unset on write. It's a little bit surprising that nobody noticed that query --batch commands did not see data written by other commands. Sponsored-by: Dartmouth College's DANDI project	2022-07-15 13:48:41 -04:00
Joey Hess	91abd872d3	complete a comment	2022-07-15 12:59:59 -04:00
Joey Hess	ad467791c1	optimise journal writes to not mkdir journal directory when it already exists Sponsored-by: Dartmouth College's DANDI project	2022-07-14 12:29:39 -04:00
Joey Hess	1b680d330b	revert accidental change	2022-07-13 15:17:08 -04:00
Joey Hess	68e9b7f987	comment	2022-07-13 13:44:43 -04:00
Joey Hess	f58fb6a79a	fix build when dbus is enabled Broken in commit `8040ecf9b8`	2022-07-05 13:06:45 -04:00
Joey Hess	8040ecf9b8	final readonly values moves to AnnexRead At this point I've checked all AnnexState values and these were all that remained that could move. Pity that Annex.repo can't move, but it gets modified sometimes.. A couple of AnnexState values are set by options and could be AnnexRead, but happen to use Annex when being set. Sponsored-by: Max Thoursie on Patreon	2022-06-28 16:04:58 -04:00
Joey Hess	cb9cf30c48	move several readonly values to AnnexRead This improves performance to a small extent in several places. Sponsored-by: Tobias Ammann on Patreon	2022-06-28 15:40:19 -04:00
Joey Hess	debcf86029	use RawFilePath version of rename Some small wins, almost certianly swamped by the system calls, but still worthwhile progress on the RawFilePath conversion. Sponsored-by: Erik Bjäreholt on Patreon	2022-06-22 16:47:34 -04:00
Joey Hess	d00e23cac9	RawFilePath optimisations	2022-06-22 16:20:08 -04:00
Joey Hess	224a57f9ed	RawFilePath optimisation	2022-06-22 16:11:03 -04:00
Joey Hess	95a04920cf	remove objectDir'	2022-06-22 16:08:49 -04:00
Joey Hess	f80ec74128	RawFilePath optimisation	2022-06-22 16:08:26 -04:00
Joey Hess	78a3d44ea0	get rid of racy addLink The remaining callers all did not rely on it checking gitignore, so were easy to convert. They were susceptable to the same overwrite race as add and fix, although less likely to have it and a narrower window than add's race. Command.Rekey in passing got an unncessary call to removeFile deleted. addSymlink handles deleting any existing worktree file.	2022-06-14 14:47:15 -04:00
Joey Hess	7ace804d8e	avoid writing same symlink twice in a row Oddly, the second write did not cause it to lose the mtime inherited from the file being added, although the mtime was not provided to that write but only to the first. I don't quite know why that worked before!	2022-06-14 14:30:12 -04:00
Joey Hess	5ef79125ad	fix overwrite race with git-annex add of annex symlink In the unlikely case where git-annex add is run on an annex symlink that is not already added, and while it's processing it, the annex symlink is overwritten with something else, avoid git-annex overwriting that with the symlink again. Sponsored-by: Jack Hill on Patreon	2022-06-14 14:00:13 -04:00
Joey Hess	dd6dec4eb1	fix add overwrite race with git-annex add to annex This is not a complete fix for all such races, only the one where a large file gets changed while adding and gets added to git rather than to the annex. addLink needs to go away, any caller of it is probably subject to the same kind of race. (Also, addLink itself fails to check gitignore when symlinks are not supported.) ingestAdd no longer checks gitignore. (It didn't check it consistently before either, since there were cases where it did not run git add!) When git-annex import calls it, it's already checked gitignore itself earlier. When git-annex add calls it, it's usually on files found by withFilesNotInGit, which handles checking ignores. There was one other case, when git-annex add --batch calls it. In that case, old git-annex behaved rather badly, it would seem to add the file, but git add would later fail, leaving the file as an unstaged annex symlink. That behavior has also been fixed. Sponsored-by: Brett Eisenberg on Patreon	2022-06-14 13:37:19 -04:00
Joey Hess	c59ea5b1ca	info: Added --autoenable option Use cases include using git-annex init --no-autoenable and then going back and enabling the special remotes that have autoenable configured. As well as just querying to remember which ones have it enabled. It lists all special remotes that have autoenable=yes whether currently enabled or not. And it can be used with --json. I pondered making this "git-annex info autoenable", but that seemed wrong because then if the use has a directory named "autoenable", it's unclear what they are asking for. (Although "git-annex info remote" may be similarly unclear.) Making it an option does mean that it can't be provided via --batch though. Sponsored-by: Dartmouth College's Datalad project	2022-06-01 14:20:38 -04:00
Joey Hess	f35c551d35	make path absolute for display Avoid suggesting the user add "." to safe.directory.	2022-05-31 12:17:27 -04:00
Joey Hess	478ed28f98	revert windows-specific locking changes that broke tests This reverts windows-specific parts of `5a98f2d509` There were no code paths in common between windows and unix, so this will return Windows to the old behavior. The problem that the commit talks about has to do with multiple different locations where git-annex can store annex object files, but that is not too relevant to Windows anyway, because on windows the filesystem is always treated as criplled and/or symlinks are not supported, so it will only use one object location. It would need to be using a repo populated in another OS to have the other object location in use probably. Then a drop and get could possibly lead to a dangling lock file. And, I was not able to actually reproduce that situation happening before making that commit, even when I forced a race. So making these changes on windows was just begging trouble.. I suspect that the change that caused the reversion is in Annex/Content/Presence.hs. It checks if the content file exists, and then called modifyContentDirWhenExists, which seems like it would not fail, but if something deleted the content file at that point, that call would fail. Which would result in an exception being thrown, which should not normally happen from a call to inAnnexSafe. That was a windows-specific change; the unix side did not have an equivilant change. Sponsored-by: Dartmouth College's Datalad project	2022-05-23 13:21:26 -04:00
Joey Hess	63624c40a0	fix typo in comment	2022-05-23 12:53:55 -04:00
Joey Hess	af0d854460	deal with git's changes for CVE-2022-24765 Deal with git's recent changes to fix CVE-2022-24765, which prevent using git in a repository owned by someone else. That makes git config --list not list the repo's configs, only global configs. So annex.uuid and annex.version are not visible to git-annex. It displayed a message about that, which is not right for this situation. Detect the situation and display a better message, similar to the one other git commands display. Also, git-annex init when run in that situation would overwrite annex.uuid with a new one, since it couldn't see the old one. Add a check to prevent it running too in this situation. It may be that this fix has security implications, if a config set by the malicious user who owns the repo causes git or git-annex to run code. I don't think any git-annex configs get run by git-annex init. It may be that some git config of a command does get run by one of the git commands that git-annex init runs. ("git status" is the command that prompted the CVE-2022-24765, since core.fsmonitor can cause it to run a command). Since I don't know how to exploit this, I'm not treating it as a security fix for now. Note that passing --git-dir makes git bypass the security check. git-annex does pass --git-dir to most calls to git, which it does to avoid needing chdir to the directory containing a git repository when accessing a remote. So, it's possible that somewhere in git-annex it gets as far as running git with --git-dir, and git reads some configs that are unsafe (what CVE-2022-24765 is about). This seems unlikely, it would have to be part of git-annex that runs in git repositories that have no (visible) annex.uuid, and git-annex init is the only one that I can think of that then goes on to run git, as discussed earlier. But I've not fully ruled out there being others.. The git developers seem mostly worried about "git status" or a similar command implicitly run by a shell prompt, not an explicit use of git in such a repository. For example, Ævar Arnfjörð Bjarma wrote: > * There are other bits of config that also point to executable things, > e.g. core.editor, aliases etc, but nothing has been found yet that > provides the "at a distance" effect that the core.fsmonitor vector > does. > > I.e. a user is unlikely to go to /tmp/some-crap/here and run "git > commit", but they (or their shell prompt) might run "git status", and > if you have a /tmp/.git ... Sponsored-by: Jarkko Kniivilä on Patreon	2022-05-20 14:38:27 -04:00
Joey Hess	aa414d97c9	make fsck normalize object locations The purpose of this is to fix situations where the annex object file is stored in a directory structure other than where annex symlinks point to. But it will also move object files from the hashdirmixed back to hashdirlower if the repo configuration makes that the normal location. It would have been more work to avoid that than to let it do it. Sponsored-by: Dartmouth College's Datalad project	2022-05-16 15:38:06 -04:00
Joey Hess	6b5029db29	fix hardcoding of number of hash directories It can be changed to 1 via a tuning, rather than the 2 this assumed. So it would have tried to rmdir .git/annex/objects in that case, which would not hurt anything, but is not what it is supposed to do. Sponsored-by: Dartmouth College's Datalad project	2022-05-16 15:08:42 -04:00
Joey Hess	5a98f2d509	avoid creating content directory when locking content If the content directory does not exist, then it does not make sense to lock the content file, as it also does not exist, and so it's ok for the lock operation to fail. This avoids potential races where the content file exists but is then deleted/renamed, while another process sees that it exists and goes to lock it, resulting in a dangling lock file in an otherwise empty object directory. Also renamed modifyContent to modifyContentDir since it is not only necessarily used for modifying content files, but also other files in the content directory. Sponsored-by: Dartmouth College's Datalad project	2022-05-16 12:34:56 -04:00
Joey Hess	e8a601aa24	incremental verification for retrieval from import remotes Sponsored-by: Dartmouth College's Datalad project	2022-05-09 15:39:43 -04:00
Joey Hess	2f2701137d	incremental verification for retrieval from all export remotes Only for export remotes so far, not export/import. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 13:49:33 -04:00
Joey Hess	90950a37e5	support incremental verification when retrieving from export/import remotes None of the special remotes do it yet, but this lays the groundwork. Added MustFinishIncompleteVerify so that, when an incremental verify is started but not complete, it can be forced to finish it. Otherwise, it would have skipped doing it when verification is disabled, but verification must always be done when retrievin from export remotes since files can be modified during retrieval. Note that retrieveExportWithContentIdentifier doesn't support incremental verification yet. And I'm not sure if it can -- it doesn't know the Key before it downloads the content. It seems a new API call would need to be split out of that, which is provided with the key. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 12:25:04 -04:00
Joey Hess	8675b2b075	rename memoryUnits It's not just used for memory sizes.	2022-05-05 15:35:11 -04:00
Joey Hess	d266a41f8d	prevent numcopies or mincopies being configured to 0 Ignore annex.numcopies set to 0 in gitattributes or git config, or by git-annex numcopies or by --numcopies, since that configuration would make git-annex easily lose data. Same for mincopies. This is a continuation of the work to make data only be able to be lost when --force is used. It earlier led to the --trust option being disabled, and similar reasoning applies here. Most numcopies configs had docs that strongly discouraged setting it to 0 anyway. And I can't imagine a use case for setting to 0. Not that there might not be one, but it's just so far from the intended use case of git-annex, of managing and storing your data, that it does not seem like it makes sense to cater to such a hypothetical use case, where any git-annex drop can lose your data at any time. Using a smart constructor makes sure every place avoids 0. Note that this does mean that NumCopies is for the configured desired values, and not the actual existing number of copies, which of course can be 0. The name configuredNumCopies is used to make that clear. Sponsored-by: Brock Spratlen on Patreon	2022-03-28 15:20:34 -04:00
Joey Hess	982eb7ed0d	remove vendored http-client-restricted Removed vendored copy of http-client-restricted, and removed the HttpClientRestricted build flag that avoided that dependency. http-client-restricted is in Debian stable, and the i386ancient build also uses it, so I think this vendored copy is no longer needed. Sponsored-by: Noam Kremen on Patreon	2022-03-22 11:50:06 -04:00
Joey Hess	952664641a	turn of PackageImports in cabal file This makes it easier to build eg benchmarks of individual modules. May be that most of these PackageImports are not really necessary, dunno.	2022-02-25 13:16:36 -04:00
Joey Hess	51c528980c	avoid accidentally thawing git-annex symlink It did nothing, since at this point the link is dangling. But when there is a thaw hook, it would probably not be happy to be asked to run on a symlink, or might do something unexpected. Sponsored-by: Dartmouth College's Datalad project	2022-02-24 14:21:23 -04:00

1 2 3 4 5 ...

1971 commits