Commit graph

39899 commits

Author SHA1 Message Date
Joey Hess
13a6bfff49
comments 2021-05-25 16:37:32 -04:00
Joey Hess
f5dc06077d
Merge branch 'master' of ssh://git-annex.branchable.com 2021-05-25 13:10:34 -04:00
Joey Hess
b5f5475ed6
New matching options --excludesamecontent and --includesamecontent
The normalisation of filenames turns out to be the tricky part here,
because the associated files coming out of the keys db may look like
"./foo/bar" or "../bar". For the former to match a glob like "foo/*",
it needs to be normalised.

Note that, on windows, normalise "./foo/bar" = "foo\\bar"
which a glob like "foo/*" won't match. So the glob is matched a second
time, on the toInternalGitPath, so allowing the user to provide a glob
with the slashes in either direction. However, this still won't support
some wacky edge cases like the user providing a glob of "foo/bar\\*"

Sponsored-by: Dartmouth College's Datalad project
2021-05-25 13:08:18 -04:00
Lukey
2ccf525b7f Added a comment 2021-05-25 16:48:26 +00:00
Joey Hess
cd73fcc92c
Merge branch 'master' of ssh://git-annex.branchable.com 2021-05-25 11:45:02 -04:00
Joey Hess
483fc4dc6b
Merge branch 'trackassociated' 2021-05-25 11:43:52 -04:00
Joey Hess
e9c95ef890
comments 2021-05-25 11:43:46 -04:00
Joey Hess
cedc28a783
prevent dropping required content of other file using same content
When two files have the same content, and a required content expression
matches one but not the other, dropping the latter file will fail as it
would also remove the content of the required file.

This will slow down drop (w/o --auto), dropunused, mirror, and move, by one
keys db lookup per file. But I did include an optimisation to avoid a
double db lookup in the drop --auto / sync --content case. I suspect that
dropunused could also use PreferredContentChecked True, but haven't
entirely thought it through and it's rarely used with enough files for the
optimisation to matter.

Sponsored-by: Dartmouth College's Datalad project
2021-05-25 11:34:06 -04:00
Atemu
7ed4c4a35c 2021-05-25 14:51:21 +00:00
Joey Hess
7029ef1c3d
improve changelog 2021-05-25 10:08:29 -04:00
Joey Hess
01331f0b8f
required content update 2021-05-25 10:04:29 -04:00
Joey Hess
45c0fb29f0
update 2021-05-25 09:58:46 -04:00
datamanager
b6f6c7c778 Added a comment: is there some way to remove a file I've commited? 2021-05-25 13:10:35 +00:00
Atemu
82ee0f053b Added a comment 2021-05-25 11:00:39 +00:00
Joey Hess
9a5981a153
comment 2021-05-24 16:43:06 -04:00
Joey Hess
125a28c58e
Merge branch 'master' of ssh://git-annex.branchable.com 2021-05-24 16:31:58 -04:00
Joey Hess
07c98a4ce2
update 2021-05-24 16:31:14 -04:00
Joey Hess
63de81b52a
Merge branch 'master' into trackassociated 2021-05-24 16:27:24 -04:00
Joey Hess
2de49c186f
update 2021-05-24 16:27:07 -04:00
Joey Hess
44a0d21e57
Merge branch 'master' into trackassociated 2021-05-24 16:24:53 -04:00
Joey Hess
73e1507c72
fix deadlock
git-annex test hung, at varying points depending
on when git decided to run the smudge clean filter.

Recent changes to reconcileStaged caused a deadlock, when git write-tree
for some reason decides to run the smudge clean filter. Which tries
to open the keys db, and blocks waiting for the lock file that its
grandparent has locked.

I don't know why git write-tree does that. It's supposed to only write a
tree from the index which needs no smudge/clean filtering.

I've verified that, in a situation where git write-tree runs the clean
filter, disabling the filter results in a tree being written that
contains the annex link, not eg, the worktree file content. So it seems
safe to disable the clean filter, but also this seems likely to be
working around a bug in git because it seems it is running the clean
filter in a situation where the object has already been cleaned.

Sponsored-by: Dartmouth College's Datalad project
2021-05-24 16:19:26 -04:00
Joey Hess
5d18994736
clearer language 2021-05-24 14:54:51 -04:00
Joey Hess
f46e4c9b7c
fix case where keys db was not initialized in time
When the keys db is opened for read, and did not exist yet, it used to
skip creating it, and return mempty values. But that prevents
reconcileStaged from populating associated files information in time for
the read. This fixes the one remaining case I know of where
the fix in a56b151f90 didn't work.

Note that, when there is a permissions error, it still avoids creating
the db and returns mempty for all queries. This does mean that
reconcileStaged does not run and so it may want to drop files that it
should not. However, presumably a permissions error on the keys database
also means that the user does not have permission to delete annex
objects, so they won't be able to drop the files anyway.

Sponsored-by: Dartmouth College's Datalad project
2021-05-24 14:46:59 -04:00
Joey Hess
a56b151f90
fix longstanding indeterminite preferred content for duplicated file problem
* drop: When two files have the same content, and a preferred content
  expression matches one but not the other, do not drop the file.
* sync --content, assistant: Fix an edge case where a file that is not
  preferred content did not get dropped.

The sync --content edge case is that handleDropsFrom loaded associated files
and used them without verifying that the information from the database was
not stale.

It seemed best to avoid changing --want-drop's behavior, this way when
debugging a preferred content expression with it, the files matched will
still reflect the expression. So added a note to the --want-drop documentation,
to make clear it may not behave identically to git-annex drop --auto.

While it would be possible to introspect the preferred content
expression to see if it matches on filenames, and only look up the
associated files when it does, it's generally fairly rare for 2 files to
have the same content, and the database lookup is already avoided when
there's only 1 file, so I did not implement that further optimisation.

Note that there are still some situations where the associated files
database does not get locked files recorded in it, which will prevent
this fix from working.

Sponsored-by: Dartmouth College's Datalad project
2021-05-24 14:07:05 -04:00
Joey Hess
78be7cf73f
remove warning about combining options
the option parser no longer allows combining --want-get/--want-drop with
options like --all
2021-05-24 13:53:28 -04:00
Joey Hess
d62d6e2fcf
note about a wart
All code that uses associated files already deals with this problem,
which used to be worse. Unfortunately I was not able to entirely
eliminate it, although it happens in fewer cases now.
2021-05-24 12:05:49 -04:00
Joey Hess
c1b5028211
update 2021-05-24 11:59:01 -04:00
Joey Hess
13423f337c
refactoring 2021-05-24 11:38:22 -04:00
Joey Hess
efae085272
fixed reconcileStaged crash when index is locked or in conflict
Eg, when git commit runs the smudge filter.

Commit 428c91606b introduced the crash,
as write-tree fails in those situations. Now it will work, and git-annex
always gets up-to-date information even in those situations. It does
need to do a bit more work, each time git-annex is run with the index
locked. Although if the index is unmodified from the last time
write-tree succeeded, that work is avoided.
2021-05-24 11:33:23 -04:00
Joey Hess
3698e804d4
Merge branch 'master' into trackassociated 2021-05-24 10:24:53 -04:00
parhuzamos
54e1ac849a Added a comment 2021-05-24 09:33:50 +00:00
Ilya_Shlyakhter
bcedcef97f Added a comment: defining preferred content state 2021-05-23 20:39:23 +00:00
alt
8aedf51032 2021-05-23 03:07:50 +00:00
falsifian
2866d53797 On second thought, simpler not to mention the version. 2021-05-23 01:12:33 +00:00
falsifian
fb681d4fcf git-annex is available for OpenBSD 6.9. 2021-05-23 01:11:56 +00:00
Atemu
3eb6a3b05f Added a comment 2021-05-22 17:31:01 +00:00
Lukey
be6bf5ba35 Added a comment 2021-05-22 17:19:19 +00:00
Atemu
8daca82623 Added a comment 2021-05-22 17:02:07 +00:00
Atemu
21fba1cdb8 Added a comment 2021-05-22 10:20:35 +00:00
Atemu
0b89436b47 Added a comment 2021-05-22 09:55:31 +00:00
strmd
4ef58fd093 Added a comment 2021-05-22 05:16:45 +00:00
Joey Hess
b81f5532c6
comment 2021-05-21 16:44:44 -04:00
Joey Hess
428c91606b
include locked files in the keys database associated files
Before only unlocked files were included.

The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.

reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.

On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.

reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.

Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.

Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.

There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.

However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 16:24:37 -04:00
Joey Hess
df0b75cdc4
complications 2021-05-21 14:18:38 -04:00
Joey Hess
1d9bad51d2
plan for these 2021-05-21 13:50:26 -04:00
Joey Hess
f39b7c3663
comment 2021-05-21 12:39:35 -04:00
Joey Hess
d5e18c8710
comment 2021-05-21 12:26:00 -04:00
Joey Hess
a26e7d763d
comment 2021-05-21 12:07:21 -04:00
Joey Hess
442398e1e0
comment 2021-05-21 11:48:57 -04:00
Joey Hess
414dc39a12
comment 2021-05-21 11:31:38 -04:00