Commit graph

41421 commits

Author SHA1 Message Date
Joey Hess
67245ae00f
fully specify the pointer file format
This format is designed to detect accidental appends, while having some
room for future expansion.

Detect when an unlocked file whose content is not present has gotten some
other content appended to it, and avoid treating it as a pointer file, so
that appended content will not be checked into git, but will be annexed
like any other file.

Dropped the max size of a pointer file down to 32kb, it was around 80 kb,
but without any good reason and certianly there are no valid pointer files
anywhere that are larger than 8kb, because it's just been specified what it
means for a pointer file with additional data even looks like.

I assume 32kb will be good enough for anyone. ;-) Really though, it needs
to be some smallish number, because that much of a file in git gets read
into memory when eg, catting pointer files. And since we have no use cases
for the extra lines of a pointer file yet, except possibly to add
some human-visible explanation that it is a git-annex pointer file, 32k
seems as reasonable an arbitrary number as anything. Increasing it would be
possible, eg to 64k, as long as users of such jumbo pointer files didn't
mind upgrading all their git-annex installations to one that supports the
new larger size.

Sponsored-by: Dartmouth College's Datalad project
2022-02-23 14:20:31 -04:00
Joey Hess
649464619e
read up to and including maxPointerSz
For consistency with everything else.

Sponsored-by: Dartmouth College's Datalad project
2022-02-23 12:54:40 -04:00
Joey Hess
5b373a9dd2
read a consistent amount from pointer file
A few places were reading the max symlink size of a pointer file,
then passing tp parseLinkTargetOrPointer. Which is fine currently, but
to support pointer files with lines of data after the pointer, enough
has to be read that parseLinkTargetOrPointer can be assured of seeing
enough of that data to know if it's correctly formatted.

Sponsored-by: Dartmouth College's Datalad project
2022-02-23 12:52:34 -04:00
Joey Hess
4cd9325c2c
fold parseLinkTarget into parseLinkTargetOrPointer
Only one place remained that differentiated between them.

It is the case that a symlink target that happens to contain a newline
somehow will be treated as a link to a key truncated at the newline.
This is super unlikely to happen, and since a key cannot actually
contain a newline, it's as good a behavior as any. Anyway, this commit
does not change the behavior there, although arguably it should be
changed. Note that getAnnexLinkTarget does prevent a symlink target
containing a newline.

Sponsored-by: Dartmouth College's Datalad project
2022-02-23 12:30:32 -04:00
Joey Hess
38816a9ae9
comment 2022-02-23 11:23:48 -04:00
moortgat-pick
44b9c65af5 2022-02-23 12:53:15 +00:00
moortgat-pick
ebbe840c4c 2022-02-23 12:46:37 +00:00
moortgat-pick
2a68a613a9 2022-02-23 12:45:11 +00:00
https://christian.amsuess.com/chrysn
9d43856075 Added a comment: inodes of git vs. git-annex 2022-02-23 12:18:54 +00:00
Joey Hess
a6b53cb739
add news item for git-annex 10.20220222 2022-02-22 13:34:58 -04:00
Joey Hess
1c4b0b4c2b
releasing package git-annex version 10.20220222 2022-02-22 13:33:45 -04:00
Joey Hess
866cad6582
comment 2022-02-22 12:04:10 -04:00
yarikoptic
7e9ebea910 Added a comment 2022-02-21 21:53:03 +00:00
yarikoptic
c59a7b1275 Added a comment 2022-02-21 21:48:59 +00:00
yarikoptic
8ee2661b85 Added a comment 2022-02-21 21:46:15 +00:00
Joey Hess
83827a6822
comment 2022-02-21 15:59:29 -04:00
Joey Hess
5a8b15f6db
comment 2022-02-21 15:46:12 -04:00
Joey Hess
ba907b6682
comment and close, not a bug really 2022-02-21 15:39:38 -04:00
Joey Hess
80f244d7b6
comment 2022-02-21 15:13:51 -04:00
Joey Hess
9c1ff6d024
comment 2022-02-21 15:00:55 -04:00
Joey Hess
ce1b3a9699
info: Allow using matching options in more situations
File matching options like --include will be rejected in situations where
there is no filename to match against. (Or where there is a filename but
it's not relative to the cwd, or otherwise seemed too bothersome to match
against.)

The addition of listKeys' was necessary to avoid using more memory in the
common case of "git-annex info". Adding a filterM would have caused the
list to buffer in memory and not stream. This is an ugly hack, but listKeys
had previously run Annex operations inside unafeInterleaveIO (for direct
mode). And matching against a matcher should hopefully not change any Annex
state.

This does allow for eg `git-annex info somefile --include=*.ext`
although why someone would want to do that I don't really know. But it
seems to make sense to allow it.
But, consider: `git-annex info ./somefile --include=somefile`
This does not match, so will not display info about somefile.
If the user really wants to, they can `--include=./somefile`.

Using matching options like --copies or --in=remote seems likely to be
slower than git-annex find with those options, because unlike such
commands, info does not have optimised streaming through the matcher.

Note that `git-annex info remote` is not the same as
`git-annex info --in remote`. The former shows info about all files in
the remote. The latter shows local keys that are also in that remote.
The output should make that clear, but this still seems like a point
where users could get confused.

Sponsored-by: Jochen Bartl on Patreon
2022-02-21 14:46:07 -04:00
Joey Hess
d36de3edf9
comment 2022-02-21 12:49:36 -04:00
Joey Hess
3187639735
retitle 2022-02-21 12:13:59 -04:00
Atemu
6ca9f5e18a 2022-02-20 18:03:35 +00:00
xloem
499b940dc5 Added a comment 2022-02-19 07:50:53 +00:00
xloem
7976d9d303 removed 2022-02-19 07:48:48 +00:00
xloem
6143695820 Added a comment: free dweb storage services 2022-02-19 07:47:28 +00:00
xloem
32d138b52f Added a comment: free dweb storage services 2022-02-19 07:47:07 +00:00
ycp@f118e050dc106530b9cf62ead031e05eef7b1687
7f167d4be1 Added a comment 2022-02-19 01:15:24 +00:00
yarikoptic
b481ec2738 Added a comment 2022-02-18 21:56:19 +00:00
yarikoptic
9d2e6a60f0 Added a comment 2022-02-18 20:18:04 +00:00
yarikoptic
12fc03a9b2 initial thinking for a possible safe guard 2022-02-18 20:02:40 +00:00
yarikoptic
4643788916 bug/question on the semantic of find --unlocked 2022-02-18 19:17:37 +00:00
Joey Hess
faf84aa5c2
Avoid git status taking a long time after git-annex unlock of many files.
Implemented by making Git.Queue have a FlushAction, which can accumulate
along with another action on files, and runs only once the other action has
run.

This lets git-annex unlock queue up git update-index actions, without
conflicting with the restagePointerFiles FlushActions.

In a repository with filter-process enabled, git-annex unlock will
often not take any more time than before, though it may when the files are
large. Either way, it should always slow down less than git-annex status
speeds up.

When filter-process is not enabled, git-annex unlock will slow down as much
as git status speeds up.

Sponsored-by: Jochen Bartl on Patreon
2022-02-18 15:06:40 -04:00
Joey Hess
c68f52c6a2
restage pointer file after unlock
This avoids a later git status or similar taking a long time to run
as it runs git-annex smudge once per file. While v9 repositories do
avoid that taking long when the files are small, large files can still
make git status take a very long time.

This does make unlock slower, because now git-annex smudge is being run
once per file unlocked. However, the next commit should speed that up in
many cases.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2022-02-18 14:55:52 -04:00
Joey Hess
07215cfeb5
complete annex.skipunknown transition
annex.skipunknown now defaults to false, so commands like `git annex get foo*`
will not silently skip over files/dirs that are not checked into git.

Sponsored-by: Brock Spratlen on Patreon
2022-02-18 13:18:05 -04:00
Joey Hess
56d180864f
comment 2022-02-18 12:35:49 -04:00
Joey Hess
52bc18850e
comment 2022-02-18 12:16:19 -04:00
Joey Hess
544eaff1e1
comment 2022-02-18 12:10:55 -04:00
nluv4hs@705031de2adc81421f76ad6025dc4d1519d5361a
9f9b1488ed 2022-02-16 17:18:23 +00:00
ycp@f118e050dc106530b9cf62ead031e05eef7b1687
49b0a3a1bd Added a comment: I have the same problem 2022-02-16 08:03:05 +00:00
Joey Hess
0edf01d7d4
registerurl,unregisterurl: rework output and support --json
* registerurl, unregisterurl: Improved output when reading from stdin
  to be more like other batch commands.
* registerurl, unregisterurl: Added --json and --json-error-messages options.

Note that this did change the --batch output in a way that could possibly
break something that expected the old output to never change. I think it's
acceptable to break that because there has never been a guarantee of
unchanging output format except with --batch for most commands. The old
output was just really weird too!

One possible wart is that "git-annex registerurl" with no options now
seems to just hang, since it's waiting for stdin input. Before, it said
"registerurl (stdin)" which was clearer about what's happenening. But this
is a deprecated mode anyway, --batch makes clear what's happening. If
anything, this problem would be a reason to eventually remove the support
for reading from stdin w/o --batch.

Sponsored-by: Dartmouth College's Datalad project
2022-02-14 13:29:20 -04:00
Joey Hess
291dc0d1a9
comment 2022-02-14 12:42:37 -04:00
Joey Hess
2ac8734454
comment 2022-02-10 15:44:04 -04:00
yarikoptic
f87de5ff7f initial report on a failing test 2022-02-10 15:55:56 +00:00
yarikoptic
c908046235 initial todo for --json for registerurl 2022-02-09 21:39:46 +00:00
Jakube
90af8c5e99 2022-02-08 20:50:56 +00:00
Jakube
f06265dc8e 2022-02-08 20:46:18 +00:00
jonas@ab8487518c600ac0c785f4f6ca641c219f2bcfdc
32bf4d3457 Added a comment 2022-02-08 18:04:45 +00:00
Joey Hess
ad2f0446a0
comment 2022-02-08 13:24:28 -04:00