git-annex

Author	SHA1	Message	Date
Joey Hess	8a3beabf35	use RawFilePath for opening sqlite databases Fix a crash opening sqlite databases when run in a non-unicode locale, with a remote that uses a non-unicode filepath. In that situation converting to Text fails. The fix needs git-annex to be built with persistent-sqlite 2.13.3. Building against older versions still works, but that version is used when building with stack. Database.RawFilePath is a lot of code copied from persistent-sqlite and lightly modified, since only 1 function in persistent-sqlite was made to support RawFilePath. This is a bit of a pain, and I hope that persistent-sqlite will eventually switch to using OsPath, allowing this module to be removed from git-annex. Sponsored-by: k0ld on Patreon	2023-12-26 18:31:52 -04:00
Joey Hess	6d789c9c81	sync, push: Avoid trying to send individual files to special remotes configured with importtree=yes exporttree=no That will always fail. It already skipped doing this when exporttree=yes.	2023-12-26 15:56:58 -04:00
Joey Hess	aec7bed1aa	prepping for release	2023-12-26 15:40:55 -04:00
Joey Hess	9a67ed0f10	importtree: support preferred content expressions needing keys When importing from a special remote, support preferred content expressions that use terms that match on keys (eg "present", "copies=1"). Such terms are ignored when importing, since the key is not known yet. When "standard" or "groupwanted" is used, the terms in those expressions also get pruned accordingly. This does allow setting preferred content to "not (copies=1)" to make a special remote into a "source" type of repository. Importing from it will import all files. Then exporting to it will drop all files from it. In the case of setting preferred content to "present", it's pruned on import, so everything gets imported from it. Then on export, it's applied, and everything in it is left on it, and no new content is exported to it. Since the old behavior on these preferred content expressions was for importtree to error out, there's no backwards compatability to worry about. Except that sync/pull/etc will now import where before it errored out.	2023-12-18 16:27:59 -04:00
Joey Hess	eb59da9dd2	Lower precision of timestamps in git-annex branch This can reduce the size of the branch by up to 8%. My test was running git-annex add 1000 times on one file each. Lots of different high-resolution timestamps were recorded before and eliminating those, after packing, the git repo was 8% smaller. Due to the use of vector clocks, high resolution timestamps are not necessary to make clear which information is most recent when eg, a value is changed repeatedly in the same second. In such a case, the vector clock will be advanced to the next second after the last modification. For example, running git-annex numcopies 1; git-annex numcopies 2 The first will record the current second, while the next records the second after that even if it runs in the same second. As for conflicting information written to two different clones of the repository, this will make git-annex sometimes pick information that was written earlier in a second over information written later in the same second. Usually git-annex does not write conflicting information, but there are some cases where it could. Eg, storing an object on a remote can update the remote state log with some state. If two repos both store the same object, and end up storing different remote state for some reason, this can result in one that ran a tiny bit later winning. Such a situation seems unlikely to be user visible. And a small amount of clock skew could already result in such things. The only case I can think of where this might be a user visible change is if a configuration command like git-annex numcopies is being run in 2 clones of a repository on the same machine at very close to the same time. Then the user will know which they ran last, and git-annex won't. If that did become a problem, this could be dialed back to eg log milliseconds with still some space saving.	2023-12-11 15:04:06 -04:00
Joey Hess	86dbe9a825	migrate: support adding size back to URL keys migrate: Support adding size to URL keys that were added with --relaxed, by running eg: git-annex migrate --backend=URL foo Since url keys cannot be generated, that used to fail. Make it notice that the backend is not changed, and just get the size of the content. Sponsored-by: Brock Spratlen on Patreon	2023-12-08 16:22:14 -04:00
Joey Hess	257f01729c	distributed migration for pull and sync --content pull, sync: When operating on content, automatically hard link objects that have been migrated. Added annex.syncmigrations config that can be set to false to prevent pull and sync from migrating object content. I think that true is a good default for this config, because it avoids users having to re-download migrated content or learning about migration. But, some users will surely not like it, whether because it does take some time (especially for the first git-annex branch scan when there is a long history), or because they want to deal with it manually, or because their filesystem doesn't support hard links and they don't want it to copy objects. Sponsored-by: k0ld on Patreon	2023-12-08 14:18:18 -04:00
Joey Hess	4ed71b34de	migrate --apply And avoid migrate --update/--aply migrating when the new key was already present in the repository, and got dropped. Luckily, the location log allows distinguishing from the new key never having been present! That is mostly useful for --apply because otherwise dropped files would keep coming back until the old objects were reaped as unused. But it seemed to make sense to also do it for --update. for consistency in edge cases if nothing else. One case where --update can use it is when one branch got migrated earlier, and we dropped the file, and now another branch has migrated the same file. Sponsored-by: Jack Hill on Patreon	2023-12-08 13:23:46 -04:00
Joey Hess	f1ce15036f	started migrate --update This is most of the way there, but not quite working. The layout of migrate.tree/ needs to be changed to follow this approach. git log will list all the files in tree order, so the new layout needs to alternate old and new keys. Can that be done? git may not document tree order, or may not preserve it here. Alternatively, change to using git log --format=raw and extract the tree header from that, then use git diff --raw $tree:migrate.tree/old $tree:migrate.tree/new That will be a little more expensive, but only when there are lots of migrations. Sponsored-by: Joshua Antonishen on Patreon	2023-12-07 15:50:52 -04:00
Joey Hess	a6eb7d7339	prevent relatedTemplate from truncating a filename to end in "." Avoid a problem with temp file names ending in "." on certian filesystems that have problems with such filenames. relatedTemplate is quite an ugly hack really; since it doesn't know the max filename length of the filesystem it can only assume that the filename is max allowed length. When given the input "lh.aparc.DKTatlas.annot", it wants to reserve 20 characters for tempfile so it truncates to "lh.". That ending period is apparently a problem on some filesystem (FAT eats it, but does not throw EINVAL; ntfs does not seem bothered by it, I don't know what FUSE filesystem the bug reporter was really using). Sponsored-by: Brett Eisenberg on Patreon	2023-12-05 12:38:14 -04:00
Joey Hess	0485dd3161	sync: Fix locking problems during merge when annex.pidlock is set Presumably git merge sometimes needs to verifiy if a worktree file is modified, and so will then run git-annex filter-process which would try to take the pid lock. And for whatever reason, git-annex sync already had the pidlock held. I have not replicated that, but it does make enough sense to deploy the workaround. Like I said back in commit `7bdb0cdc0d`, Arguably, it would be better to have a way to make any process git-annex runs have the env var set. But then it would need to take the pid lock when running any and all processes, and that would be a problem when git-annex runs two processes concurrently. So, I'm left doing it ad-hoc in places where git-annex really does run a child process, directly or indirectly via a particular git command. Sponsored-by: KDM on Patreon	2023-12-04 13:40:28 -04:00
Joey Hess	1e31bf8122	copy/move --from-anywhere --to remote Implementation was simple because it's equivilant to --from=foo --to remote for each other remote, followed by --to remote when there's a local copy. (Or, in the edge case of --from-anywhere --to=here, it's the same as --to=here.) Note that, when the local repo does not have a copy, fromToPerform gets it from a remote, sends it to the destination, and drops the local copy. Another call to that for a second remote will notice that the dest now has a copy, and simply drop from the second remote, avoiding a second transfer. Also note that, when numcopies doesn't allow dropping it from everywhere, it will drop it from the cheapest remotes first (maybe not ideal) up to more expensive remotes, and finally from the local repo. So the local repo will generally end up holding a copy. Maybe not ideal in all cases either, but it seems no worse to do that than to end up with a copy undropped from a remote. And I'm not entirely happy with the output, eg: copy bigfile (from r3...) ok copy bigfile ok That makes sense if you think of the second line as being the same as what is output by `git-annex copy bigfile --to bar`, but it's less clear in this context. Maybe add "(from here...)"? Also the --json output doesn't have a machine-readable field for the "from" uuid, and maybe it should? Sponsored-by: Dartmouth College's DANDI project	2023-11-30 16:34:30 -04:00
Joey Hess	1654572bc1	fix --from overriding annex-ignore Make git-annex get/copy/move --from foo override configuration of remote.foo.annex-ignore, as documented. This already worked for remotes supporting hasKeyCheap. For others though, git-annex copy --from foo would silently not do anything, while git-annex copy --to foo would use the annex-ignored remote. Also improved the annex-ignore docs, to reflect that `git-annex get` without --from will skip using annex-ignored remotes, for example. Sponsored-by: Dartmouth College's DANDI project	2023-11-30 15:12:07 -04:00
Joey Hess	bacd781c4f	releasing package git-annex version 10.20231129	2023-11-29 16:01:01 -04:00
Joey Hess	f3f864fc6d	findkeys: Support --largerthan and --smallerthan Sponsored-by: Brett Eisenberg on Patreon	2023-11-28 11:51:43 -04:00
Joey Hess	6e3bcbf4dd	Make git-annex copy --from --to --fast actually fast Eg when the destination is logged as containing a file, skip actively checking that it does contain it. Note that --fast does not prevent other verifications of content location that are done in a copy --from --to. Perhaps it could, but this change will already avoid the real unnecessary work of operating on files that are already in the remote. And avoiding other verifications might cause it to fail if the location log thinks that --to does not contain the content but does. Such complications with `git-annex copy --to remote --fast` led to commit `d006586cd0` which added a note that gets displayed when that fails, mentioning it might be due to --fast being enabled. copy --from --to is already complicated enough without needing to worry about such edge cases, so continuing to doing some verification of content location after the initial --fast filtering seems ok. Sponsored-by: Dartmouth College's DANDI project	2023-11-17 17:37:58 -04:00
Joey Hess	7a8393ce7d	Fix bug in git-annex copy --from --to Caused it to skip files that were locally present. Sponsored-by: Dartmouth College's DANDI project	2023-11-17 16:30:20 -04:00
Joey Hess	7d67229884	git-annex log --gnuplot The gnuplot output is pretty good, but could still be improved with: * more colors (repeating colors is confusing with a lot of repos) * better positioning of the legend, making the plot wider and moving it from over top of the graph Sponsored-by: Kevin Mueller on Patreon	2023-11-14 14:56:58 -04:00
Joey Hess	0fdc1a54db	git-annex log --received modifier option Only counting received and not dropped makes this show the bandwidth of data coming into the repository, although only in a sense. Since git-annex branch updates only happen at the end of a command, and we don't know when a command started, it's only an approximation of the actual bandwidth. (A previous git-annex branch update made have happened in a different repository.) It would be possible to also add a --dropped option, but I don't know how useful that would be? Sponsored-by: Nicholas Golder-Manning on Patreon	2023-11-14 14:04:46 -04:00
Joey Hess	574514545c	git-annex log --sizesof This can take a lot of memory. I decided to violate the usual rule in git-annex that it operate in constant memory no matter how many annexed objects. In this case, it would be hard to be fast without using a big map of the location logs. The main difficulty here is that there can be many git-annex branches and it needs to display a consistent view at a point in time, which means merging information from multiple git-annex branches. I have not checked if there are any laziness leaks in this code. It takes 1 gb to run in my big repo, which is around what I estimated before writing it. 2 options that are documented are not yet implemented. Small bug: With eg --when=1h, it will display at 12:00 then 1:10 if the next change after 12:59 is then. Then it waits until after 2:10 to display the next change. It ought to wait until after 2:00. Sponsored-by: Brock Spratlen on Patreon	2023-11-10 17:26:10 -04:00
Joey Hess	11cc9f1933	info: Added calculation of combined annex size of all repositories Factored out overLocationLogs from CmdLine.Seek, which can calculate this pretty fast even in a large repo. In my big repo, the time to run git-annex info went up from 1.33s to 8.5s. Note that the "backend usage" stats are for annexed files in the working tree only, not all annexed files. This new data source would let that be changed, but that would be a confusing behavior change. And I cannot retitle it either, out of fear something uses the current title (eg parsing the json). Also note that, while time says "402108maxresident" in my big repo now, up from "54092maxresident", top shows the RES constant at 64mb, and it was 48mb before. So I don't think there is a memory leak. I tried using deepseq to force full evaluation of addKeyCopies and memory use didn't change, which also says no memory leak. And indeed, not even calling addKeyCopies resulted in the same memory use. Probably the increased memory usage is buffering the stream of data from git in overLocationLogs. Sponsored-by: Brett Eisenberg on Patreon	2023-11-08 13:35:11 -04:00
Joey Hess	4e35067325	windows hook scripts newlines without CR Windows: When git-annex init is installing hook scripts, it will avoid ending lines with CR for portability. Existing hook scripts that do have CR line endings will not be changed. While it would be possible to have git-annex init upgrade them, users would need to know to use that command to do that, and it would add complexity that does not seem warranted for the portability benefit alone. Sponsored-by: Luke T. Shumaker on Patreon	2023-11-02 13:37:04 -04:00
Joey Hess	f8d35d9480	lookupkey: Sped up --batch When the file is relative, it does not need to be passed through git lsfiles to normalize it. Sponsored-by: Kevin Mueller on Patreon	2023-10-30 14:59:09 -04:00
Joey Hess	39ca30e004	Windows: Consistently avoid ending output lines with CR This matches the behavior of git on Windows, which does not end lines with CR either. Previously, git-annex used to always write lines with putStrLn, so would output CR on Windows. Then parts of it changed to use ByteString.putStrLn, which does not output CR. That left its output inconsistent, sometimes within the same command. The point of this commit is to get back to consistency. Having the same behavior as git is a nice bonus. It would be much harder to make it consistently output CR, because every place it uses ByteString.putStrLn or similar would need to be changed. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-10-30 14:43:43 -04:00
Joey Hess	eb42935e58	Windows: Fix CRLF handling in some log files In particular, the mergedrefs file was written with CR added to each line, but read without CRLF handling. This resulted in each update of the file adding CR to each line in it, growing the number of lines, while also preventing the optimisation from working, so it remerged unncessarily. writeFile and readFile do NewlineMode translation on Windows. But the ByteString conversion prevented that from happening any longer. I've audited for other cases of this, and found three more (.git/annex/index.lck, .git/annex/ignoredrefs, and .git/annex/import/). All of those also only prevent optimisations from working. Some other files are currently both read and written with ByteString, but old git-annex may have written them with NewlineMode translation. Other files are at risk for breakage later if the reader gets converted to ByteString. This is a minimal fix, but should be enough, as long as I remember to use fileLines when splitting a ByteString into lines. This leaves files written using ByteString without CR added, but that's ok because old git-annex has no difficulty reading such files. When the mergedrefs file has gotten lines that end with "\r\r\r\n", this will eventually clean it up. Each update will remove a single trailing CR. Note that S8.lines is still used in eg Command.Unused, where it is parsing git show-ref, and similar in Git/*. git commands don't include CR in their output so that's ok. Sponsored-by: Joshua Antonishen on Patreon	2023-10-30 14:23:23 -04:00
Joey Hess	0da1d40cd4	Improve memory use of --all when using annex.private This does not improve Annex.Branch.files at all, since it still uses ++ to combine the lists, so forcing all but the last one. But when there are a lot of files in the private journal, it does avoid --all (or a bare repo) from buffering the filenames in memory. See commit `653b719472` for prior discussion of this buffering. Sponsored-by: Graham Spencer on Patreon	2023-10-24 13:20:55 -04:00
Joey Hess	8bde6101e3	sqlite datbase for importfeed importfeed: Use caching database to avoid needing to list urls on every run, and avoid using too much memory. Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster, and memory use dropped from 203000k to 59408k. Database.ImportFeed is Database.ContentIdentifier with the serial number filed off. There is a bit of code duplication I would like to avoid, particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use the persistent sqlite tables, so despite the code being the same, they cannot be factored out. Since this database includes the contentidentifier metadata, it will be slightly redundant if a sqlite database is ever added for metadata. I did consider making such a generic database and using it for this. But, that would then need importfeed to update both the url database and the metadata database, which is twice as much work diffing the git-annex branch trees. Or would entagle updating two databases in a complex way. So instead it seems better to optimise the database that importfeed needs, and if the metadata database is used by another command, use a little more disk space and do a little bit of redundant work to update it. Sponsored-by: unqueued on Patreon	2023-10-23 16:46:22 -04:00
Joey Hess	6a61c7ff45	Fix crash of enableremote when the special remote has embedcreds=yes The crash occurred because writeCreds got called twice, and writeFileProtected neglected to close its file handle, so the file was open for write when written the second time. It seems unncessary and suboptimal that writeCreds gets called twice. One call is from getRemoteCredPair and the other from setRemoteCredPair'. What happens is that in the enableremote case, code that also runs at initremote does unncessary work. Might be possible to improve that, but I've gone for the simple fix. Sponsored-by: k0ld on Patreon	2023-10-20 13:19:12 -04:00
Joey Hess	c268dc5878	only stage regular files from the journal git-annex only writes regular files there, but other things may drop junk like empty .DAV directories around the tree. And trying to hash such things can have weird and hard to understand effects. So it seems best to do a small amount of work in statting the journal file to make sure it's a regular file. Sponsored-by: Jack Hill on Patreon	2023-10-10 13:22:02 -04:00
Joey Hess	b9240d2c5d	releasing package git-annex version 10.20230926	2023-09-26 13:29:49 -04:00
Joey Hess	41f4d0bda9	enableremote: Avoid overwriting existing git remote when passed the uuid of a specialremote that was earlier initialized with the same name	2023-09-22 13:29:48 -04:00
Joey Hess	54da44d42a	Support being built with crypton rather than cryptonite crypton is a fork of cryptonite, and cryptonite's github repo has been archived. Some deps are already using cryptonite so it's clearly the way forward. Added a build flag without a default, so cabal configure will select on its own which to use. stack files pin to cryptonite for now. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-09-21 12:43:42 -04:00
Joey Hess	a18e40bdd7	lookupkey: Added --ref option Sponsored-by: Joshua Antonishen on Patreon	2023-09-12 12:49:11 -04:00
Joey Hess	7be8950138	propigateAdjustedCommits in seekExportContent push: When on an adjusted branch, propagate changes to parent branch before updating export remotes. This is a somewhat redundant call to propigateAdjustedCommits, since it also gets called at pushLocal time. That other one needs to come after importing from importtree remotes though, and seekExportContent has to come earlier, so I don't see a way to avoid doing it twice. Note that git-annex sync also manages to avoid the problem, it's only git-annex push that had the bug. Sponsored-by: Leon Schuermann on Patreon	2023-09-11 14:54:26 -04:00
Joey Hess	29ae536637	adb send to final filename not tmp file Avoids some problems with unusual character in exporttree filenames that confuse adb shell commands. In particular, with a filename that contains \351, adb push sends the file to the correct filename in /sdcard. And running find on the android device roundtrips the filename. But, running mv on that filename on the android device fails with "bad <filename>: No such file or directory". Interestingly, ls on android works, and rm fails. adb push to the final name to avoids this problem. But what about atomicity? Well, I tried an adb push and interrupted it part way through. The file was present while the push was running, but was removed once the push got interrupted. I also tried yanking the cable while adb push was running, and the partially received file was also deleted then. That avoids most problems. An import that runs at the same time as an export will see the partially sent file. But that is unlikely to be done, and if it did happen, it would notice that the imported file had changed in the meantime and discard it. Note that, since rm on the android device fails on these filenames, exporting a tree where the file is deleted is going to fail to remove it. I don't see what I can do about that, so long as android is using an rm that has issues with filename encodings. This was tested on a phone where find, ls, and rm all come from Toybox 0.8.6. Sponsored-by: unqueued on Patreon	2023-09-11 13:13:05 -04:00
Joey Hess	baf8e4f6ed	Override safe.bareRepository for git remotes Fix using git remotes that are bare when git is configured with safe.bareRepository = explicit Sponsored-by: Dartmouth College's DANDI project	2023-09-07 14:56:26 -04:00
Joey Hess	cbfd214993	set safe.directory when getting config for git-annex-shell or git remotes Fix more breakage caused by git's fix for CVE-2022-24765, this time involving a remote (either local or ssh) that is a repository not owned by the current user. Sponsored-by: Dartmouth College's DANDI project	2023-09-07 14:40:50 -04:00
Joey Hess	32cb2bd3fa	Fix linker optimisation in linux standalone tarballs Was only symlinking when there is a usr/ directory, but with usr/ merge, there are none. Sponsored-by: Dartmouth College's Datalad project	2023-09-07 12:59:27 -04:00
Joey Hess	50300a47fe	Removed the vendored git-lfs and the GitLfs build flag AFAICS all git-annex builds are using the git-lfs library not the vendored copy. Debian stable now includes a new enough haskell-git-lfs package as well. Last time this was tried it did not.	2023-08-28 13:12:31 -04:00
Joey Hess	a0a42e7ec1	releasing package git-annex version 10.20230828	2023-08-28 13:04:06 -04:00
Joey Hess	43e2a66a31	wording	2023-08-28 12:13:58 -04:00
Joey Hess	cf8b30c914	oldkeys: New command that lists the keys used by old versions of a file The tricky thing about this turned out to be handling renames and reverts. For that, it has to make two passes over the git log, and to avoid buffering a possibly huge amount of logs in memory (ie the whole git log of an entire repository!), runs git log twice. (It might be possible to speed this up by asking git log to show a diff, and so avoid needing to use catKey.) Sponsored-By: Brock Spratlen on Patreon	2023-08-22 14:51:06 -04:00
Joey Hess	379d58b499	diffdriver: Added --get option Removed the dontCheck repoExists, because running it in a repo that has not been initialized yet would update location log with nouuid. And I guess it's ok for it to only support running in git-annex repos.	2023-08-22 11:58:53 -04:00
Joey Hess	724ceeb1a9	avoid unncessary use of curl when conduit will do Avoid using curl when annex.security.allowed-ip-addresses is set but neither annex.web-options nor annex.security.allowed-url-schemes is set to a value that needs curl. Bug introduced in `840bd50390` Sponsored-By: Brock Spratlen on Patreon	2023-08-22 10:25:53 -04:00
Joey Hess	7aac60769a	implement Unavilable for gcrypt Sponsored-by: Brett Eisenberg on Patreon	2023-08-16 15:54:54 -04:00
Joey Hess	977403d338	implement Unavilable for borg bup ddar directory rsync Only gcrypt remains to add support for. (Well, possibly also adb?) Sponsored-by: Luke T. Shumaker on Patreon	2023-08-16 15:48:09 -04:00
Joey Hess	67c99a4db7	info: Added available to the info displayed for a remote Sponsored-by: Kevin Mueller on Patreon	2023-08-16 14:52:58 -04:00
Joey Hess	9286769d2c	let Remote.availability return Unavilable This is groundwork for making special remotes like borg be skipped by sync when on an offline drive. Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension to the external special remote protocol. The extension is needed because old git-annex, if it sees that response, will display a warning message. (It does continue as if the remote is globally available, which is acceptable, and the warning is only displayed at initremote due to remote.name.annex-availability caching, but still it seemed best to make this a protocol extension.) The remote.name.annex-availability git config is no longer used any more, and is documented as such. It was only used by external special remotes to cache the availability, to avoid needing to start the external process every time. Now that availability is queried as an Annex action, the external is only started by sync (and the assistant), when they actually check availability. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-08-16 14:31:31 -04:00
Joey Hess	75275ed41f	update Last commit also removed curl from linux standalone tarball. Which may or may not have been a mistake.. I'm inclined to go ahead and simplify it.	2023-08-15 14:22:30 -04:00
Joey Hess	571a516ed2	Stop bundling curl in the OSX dmg New curl binary links to libldap with a @loader_path that prevents using the binary when the dmg is used elsewhere. See https://github.com/datalad/git-annex/issues/170 git-annex doesn't use curl by default anyway, so it doesn't really need to be included in the dmg.	2023-08-15 14:21:53 -04:00

1 2 3 4 5 ...

1663 commits