2010-10-10 19:54:02 +00:00
|
|
|
{- git-annex file locations
|
2010-10-27 20:53:54 +00:00
|
|
|
-
|
add content retention files
This allows lockContentShared to lock content for eg, 10 minutes and
if the process then gets terminated before it can unlock, the content
will remain locked for that amount of time.
The Windows implementation is not yet tested.
In P2P.Annex, a duration of 10 minutes is used. This way, when p2pstdio
or remotedaemon is serving the P2P protocol, and is asked to
LOCKCONTENT, and that process gets killed, the content will not be
subject to deletion. This is not a perfect solution to
doc/todo/P2P_locking_connection_drop_safety.mdwn yet, but it gets most
of the way there, without needing any P2P protocol changes.
This is only done in v10 and higher repositories (or on Windows). It
might be possible to backport it to v8 or earlier, but it would
complicate locking even further, and without a separate lock file, might
be hard. I think that by the time this fix reaches a given user, they
will probably have been running git-annex 10.x long enough that their v8
repositories will have upgraded to v10 after the 1 year wait. And it's
not as if git-annex hasn't already been subject to this problem (though
I have not heard of any data loss caused by it) for 6 years already, so
waiting another fraction of a year on top of however long it takes this
fix to reach users is unlikely to be a problem.
2024-07-03 18:44:38 +00:00
|
|
|
- Copyright 2010-2024 Joey Hess <id@joeyh.name>
|
2010-10-27 20:53:54 +00:00
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
2010-10-10 19:54:02 +00:00
|
|
|
-}
|
|
|
|
|
2019-01-14 18:02:47 +00:00
|
|
|
{-# LANGUAGE OverloadedStrings #-}
|
|
|
|
|
2016-01-20 20:36:33 +00:00
|
|
|
module Annex.Locations (
|
2010-10-13 00:04:36 +00:00
|
|
|
keyFile,
|
2010-10-13 07:41:12 +00:00
|
|
|
fileKey,
|
2011-12-02 18:39:47 +00:00
|
|
|
keyPaths,
|
2012-11-19 03:59:39 +00:00
|
|
|
keyPath,
|
assistant: Detect stale git lock files at startup time, and remove them.
Extends the index.lock handling to other git lock files. I surveyed
all lock files used by git, and found more than I expected. All are
handled the same in git; it leaves them open while doing the operation,
possibly writing the new file content to the lock file, and then closes
them when done.
The gc.pid file is excluded because it won't affect the normal operation
of the assistant, and waiting for a gc to finish on startup wouldn't be
good.
All threads except the webapp thread wait on the new startup sanity checker
thread to complete, so they won't try to do things with git that fail
due to stale lock files. The webapp thread mostly avoids doing that kind of
thing itself. A few configurators might fail on lock files, but only if the
user is explicitly trying to run them. The webapp needs to start
immediately when the user has opened it, even if there are stale lock
files.
Arranging for the threads to wait on the startup sanity checker was a bit
of a bear. Have to get all the NotificationHandles set up before the
startup sanity checker runs, or they won't see its signal. Perhaps
the NotificationBroadcaster is not the best interface to have used for
this. Oh well, it works.
This commit was sponsored by Michael Jakl
2013-10-05 21:02:11 +00:00
|
|
|
annexDir,
|
2013-09-24 21:25:47 +00:00
|
|
|
objectDir,
|
2011-01-27 21:00:32 +00:00
|
|
|
gitAnnexLocation,
|
2022-05-16 19:19:48 +00:00
|
|
|
gitAnnexLocation',
|
2015-06-11 19:14:42 +00:00
|
|
|
gitAnnexLocationDepth,
|
2013-04-04 19:46:33 +00:00
|
|
|
gitAnnexLink,
|
2016-05-16 21:05:42 +00:00
|
|
|
gitAnnexLinkCanonical,
|
2014-01-28 20:01:19 +00:00
|
|
|
gitAnnexContentLock,
|
add content retention files
This allows lockContentShared to lock content for eg, 10 minutes and
if the process then gets terminated before it can unlock, the content
will remain locked for that amount of time.
The Windows implementation is not yet tested.
In P2P.Annex, a duration of 10 minutes is used. This way, when p2pstdio
or remotedaemon is serving the P2P protocol, and is asked to
LOCKCONTENT, and that process gets killed, the content will not be
subject to deletion. This is not a perfect solution to
doc/todo/P2P_locking_connection_drop_safety.mdwn yet, but it gets most
of the way there, without needing any P2P protocol changes.
This is only done in v10 and higher repositories (or on Windows). It
might be possible to backport it to v8 or earlier, but it would
complicate locking even further, and without a separate lock file, might
be hard. I think that by the time this fix reaches a given user, they
will probably have been running git-annex 10.x long enough that their v8
repositories will have upgraded to v10 after the 1 year wait. And it's
not as if git-annex hasn't already been subject to this problem (though
I have not heard of any data loss caused by it) for 6 years already, so
waiting another fraction of a year on top of however long it takes this
fix to reach users is unlikely to be a problem.
2024-07-03 18:44:38 +00:00
|
|
|
gitAnnexContentRetentionTimestamp,
|
|
|
|
gitAnnexContentRetentionTimestampLock,
|
2022-01-20 15:33:14 +00:00
|
|
|
gitAnnexContentLockLock,
|
2013-02-19 20:26:07 +00:00
|
|
|
gitAnnexInodeSentinal,
|
|
|
|
gitAnnexInodeSentinalCache,
|
2021-07-16 18:16:05 +00:00
|
|
|
annexLocationsBare,
|
|
|
|
annexLocationsNonBare,
|
2024-08-02 18:07:45 +00:00
|
|
|
annexLocation,
|
2024-08-04 15:58:07 +00:00
|
|
|
exportAnnexObjectLocation,
|
2011-01-27 21:00:32 +00:00
|
|
|
gitAnnexDir,
|
|
|
|
gitAnnexObjectDir,
|
2019-01-17 19:40:44 +00:00
|
|
|
gitAnnexTmpOtherDir,
|
|
|
|
gitAnnexTmpOtherLock,
|
|
|
|
gitAnnexTmpOtherDirOld,
|
2019-05-07 17:04:39 +00:00
|
|
|
gitAnnexTmpWatcherDir,
|
2014-02-26 20:52:56 +00:00
|
|
|
gitAnnexTmpObjectDir,
|
|
|
|
gitAnnexTmpObjectLocation,
|
2017-11-29 17:49:52 +00:00
|
|
|
gitAnnexTmpWorkDir,
|
2011-01-27 21:00:32 +00:00
|
|
|
gitAnnexBadDir,
|
2011-04-29 17:59:00 +00:00
|
|
|
gitAnnexBadLocation,
|
2011-01-27 21:00:32 +00:00
|
|
|
gitAnnexUnusedLog,
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexKeysDbDir,
|
2015-12-09 21:00:37 +00:00
|
|
|
gitAnnexKeysDbLock,
|
2018-08-22 17:04:12 +00:00
|
|
|
gitAnnexKeysDbIndexCache,
|
2012-09-25 18:16:34 +00:00
|
|
|
gitAnnexFsckState,
|
2015-02-18 19:54:24 +00:00
|
|
|
gitAnnexFsckDbDir,
|
2019-11-06 20:27:25 +00:00
|
|
|
gitAnnexFsckDbDirOld,
|
2015-02-17 21:08:11 +00:00
|
|
|
gitAnnexFsckDbLock,
|
2013-10-22 20:02:52 +00:00
|
|
|
gitAnnexFsckResultsLog,
|
2022-01-19 19:51:04 +00:00
|
|
|
gitAnnexUpgradeLog,
|
|
|
|
gitAnnexUpgradeLock,
|
2018-10-25 18:43:13 +00:00
|
|
|
gitAnnexSmudgeLog,
|
|
|
|
gitAnnexSmudgeLock,
|
add restage log
When pointer files need to be restaged, they're first written to the
log, and then when the restage operation runs, it reads the log. This
way, if the git-annex process is interrupted before it can do the
restaging, a later git-annex process can do it.
Currently, this lets a git-annex get/drop command be interrupted and
then re-ran, and as long as it gets/drops additional files, it will
clean up after the interrupted command. But more changes are
needed to make it easier to restage after an interrupted process.
Kept using the git queue to run the restage action, even though the
list of files that it builds up for that action is not actually used by
the action. This could perhaps be simplified to make restaging a cleanup
action that gets registered, rather than using the git queue for it. But
I wasn't sure if that would cause visible behavior changes, when eg
dropping a large number of files, currently the git queue flushes
periodically, and so it restages incrementally, rather than all at the
end.
In restagePointerFiles, it reads the restage log twice, once to get
the number of files and size, and a second time to process it.
This seemed better than reading the whole file into memory, since
potentially a huge number of files could be in there. Probably the OS
will cache the file in memory and there will not be much performance
impact. It might be better to keep running tallies in another file
though. But updating that atomically with the log seems hard.
Also note that it's possible for calcRestageLog to see a different file
than streamRestageLog does. More files may be added to the log in
between. That is ok, it will only cause the filterprocessfaster heuristic to
operate with slightly out of date information, so it may make the wrong
choice for the files that got added and be a little slower than ideal.
Sponsored-by: Dartmouth College's DANDI project
2022-09-23 18:38:59 +00:00
|
|
|
gitAnnexRestageLog,
|
fix deadlock in restagePointerFiles
Fix a hang that occasionally occurred during commands such as move.
(A bug introduced in 10.20220927, in
commit 6a3bd283b8af53f810982e002e435c0d7c040c59)
The restage.log was kept locked while running a complex index refresh
action. In an unusual situation, that action could need to write to the
restage log, which caused a deadlock.
The solution is a two-stage process. First the restage.log is moved to a
work file, which is done with the lock held. Then the content of the work
file is read and processed, which happens without the lock being held.
This is all done in a crash-safe manner.
Note that streamRestageLog may not be fully safe to run concurrently
with itself. That's ok, because restagePointerFiles uses it with the
index lock held, so only one can be run at a time.
streamRestageLog does delete the restage.old file at the end without
locking. If a calcRestageLog is run concurrently, it will either see the
file content before it was deleted, or will see it's missing. Either is
ok, because at most this will cause calcRestageLog to report more
work remains to be done than there is.
Sponsored-by: Dartmouth College's Datalad project
2022-12-08 18:18:54 +00:00
|
|
|
gitAnnexRestageLogOld,
|
add restage log
When pointer files need to be restaged, they're first written to the
log, and then when the restage operation runs, it reads the log. This
way, if the git-annex process is interrupted before it can do the
restaging, a later git-annex process can do it.
Currently, this lets a git-annex get/drop command be interrupted and
then re-ran, and as long as it gets/drops additional files, it will
clean up after the interrupted command. But more changes are
needed to make it easier to restage after an interrupted process.
Kept using the git queue to run the restage action, even though the
list of files that it builds up for that action is not actually used by
the action. This could perhaps be simplified to make restaging a cleanup
action that gets registered, rather than using the git queue for it. But
I wasn't sure if that would cause visible behavior changes, when eg
dropping a large number of files, currently the git queue flushes
periodically, and so it restages incrementally, rather than all at the
end.
In restagePointerFiles, it reads the restage log twice, once to get
the number of files and size, and a second time to process it.
This seemed better than reading the whole file into memory, since
potentially a huge number of files could be in there. Probably the OS
will cache the file in memory and there will not be much performance
impact. It might be better to keep running tallies in another file
though. But updating that atomically with the log seems hard.
Also note that it's possible for calcRestageLog to see a different file
than streamRestageLog does. More files may be added to the log in
between. That is ok, it will only cause the filterprocessfaster heuristic to
operate with slightly out of date information, so it may make the wrong
choice for the files that got added and be a little slower than ideal.
Sponsored-by: Dartmouth College's DANDI project
2022-09-23 18:38:59 +00:00
|
|
|
gitAnnexRestageLock,
|
sync: use log to track adjusted branch needs updating
Speeds up sync in an adjusted branch by avoiding re-adjusting the branch
unncessarily, particularly when it is adjusted with --hide-missing or
--unlock-present.
When there are a lot of files, that was the majority of the time of a
--no-content sync.
Uses a log file, which is updated when content presence changes. This
adds a little bit of overhead to every file get/drop when on such an
adjusted branch. The overhead is minimal for get of any size of file,
but might be noticable for drop in some cases. It seems like a reasonable
trade-off. It would be possible to update the log file only at the end, but
then it would not happen if the command is interrupted.
When not in an adjusted branch, there should be no additional overhead.
(getCurrentBranch is an MVar read, and it avoids the MVar read of
getGitConfig.)
Note that this does not deal with situations such as:
git checkout master, git-annex get, git checkout adjusted branch,
git-annex sync. The sync won't know that the adjusted branch needs to be
updated. Dealing with that would add overhead to operation in non-adjusted
branches, which I don't like. Also, there are other situations like having
two adjusted branches that both need to be updated like this, and switching
between them and sync not updating.
This does mean a behavior change to sync, since it did previously deal
with those situations. But, the documentation did not say that it did.
The man pages only talk about sync updating the adjusted branch after
it transfers content.
I did consider making sync keep track of content it transferred (and
dropped) and only update the adjusted branch then, not to catch up to other
changes made previously. That would perform better. But it seemed rather
hard to implement, and also it would have problems with races with a
concurrent get/drop, which this implementation avoids.
And it seemed pretty likely someone had gotten used to get/drop followed by
sync updating the branch. It seems much less likely someone is switching
branches, doing get/drop, and then switching back and expecting sync to update
the branch.
Re-running git-annex adjust still does a full re-adjusting of the branch,
for anyone who needs that.
Sponsored-by: Leon Schuermann on Patreon
2023-06-08 18:35:26 +00:00
|
|
|
gitAnnexAdjustedBranchUpdateLog,
|
|
|
|
gitAnnexAdjustedBranchUpdateLock,
|
2023-12-06 19:38:01 +00:00
|
|
|
gitAnnexMigrateLog,
|
|
|
|
gitAnnexMigrateLock,
|
2023-12-07 19:50:52 +00:00
|
|
|
gitAnnexMigrationsLog,
|
|
|
|
gitAnnexMigrationsLock,
|
2020-10-21 14:31:56 +00:00
|
|
|
gitAnnexMoveLog,
|
|
|
|
gitAnnexMoveLock,
|
2019-11-06 21:13:39 +00:00
|
|
|
gitAnnexExportDir,
|
2017-09-04 17:52:22 +00:00
|
|
|
gitAnnexExportDbDir,
|
2017-09-18 16:12:11 +00:00
|
|
|
gitAnnexExportLock,
|
2019-03-07 19:59:44 +00:00
|
|
|
gitAnnexExportUpdateLock,
|
2019-05-20 20:37:04 +00:00
|
|
|
gitAnnexExportExcludeLog,
|
2023-05-31 19:45:23 +00:00
|
|
|
gitAnnexImportDir,
|
|
|
|
gitAnnexImportLog,
|
2019-02-20 20:59:10 +00:00
|
|
|
gitAnnexContentIdentifierDbDir,
|
|
|
|
gitAnnexContentIdentifierLock,
|
sqlite datbase for importfeed
importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.
Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.
Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.
Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.
Sponsored-by: unqueued on Patreon
2023-10-23 20:12:26 +00:00
|
|
|
gitAnnexImportFeedDbDir,
|
|
|
|
gitAnnexImportFeedDbLock,
|
2024-08-12 15:19:58 +00:00
|
|
|
gitAnnexRepoSizeDbDir,
|
2013-10-08 15:48:28 +00:00
|
|
|
gitAnnexScheduleState,
|
2012-07-01 18:29:00 +00:00
|
|
|
gitAnnexTransferDir,
|
2012-09-26 16:06:44 +00:00
|
|
|
gitAnnexCredsDir,
|
2014-03-01 01:32:18 +00:00
|
|
|
gitAnnexWebCertificate,
|
|
|
|
gitAnnexWebPrivKey,
|
2013-08-03 05:40:21 +00:00
|
|
|
gitAnnexFeedStateDir,
|
|
|
|
gitAnnexFeedState,
|
2012-12-18 19:04:44 +00:00
|
|
|
gitAnnexMergeDir,
|
2011-06-23 15:37:26 +00:00
|
|
|
gitAnnexJournalDir,
|
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 18:32:41 +00:00
|
|
|
gitAnnexPrivateJournalDir,
|
2011-10-03 20:32:36 +00:00
|
|
|
gitAnnexJournalLock,
|
2019-05-06 19:15:12 +00:00
|
|
|
gitAnnexGitQueueLock,
|
2011-12-11 18:14:28 +00:00
|
|
|
gitAnnexIndex,
|
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 18:32:41 +00:00
|
|
|
gitAnnexPrivateIndex,
|
2013-10-03 19:06:58 +00:00
|
|
|
gitAnnexIndexStatus,
|
2014-02-18 21:38:23 +00:00
|
|
|
gitAnnexViewIndex,
|
|
|
|
gitAnnexViewLog,
|
2016-07-17 16:11:05 +00:00
|
|
|
gitAnnexMergedRefs,
|
2013-08-28 19:57:42 +00:00
|
|
|
gitAnnexIgnoredRefs,
|
2012-06-11 05:20:19 +00:00
|
|
|
gitAnnexPidFile,
|
2015-11-12 21:47:31 +00:00
|
|
|
gitAnnexPidLockFile,
|
2012-06-13 17:35:15 +00:00
|
|
|
gitAnnexDaemonStatusFile,
|
2020-10-20 19:06:55 +00:00
|
|
|
gitAnnexDaemonLogFile,
|
2013-05-23 23:00:46 +00:00
|
|
|
gitAnnexFuzzTestLogFile,
|
2012-07-26 03:13:01 +00:00
|
|
|
gitAnnexHtmlShim,
|
2012-09-18 21:50:07 +00:00
|
|
|
gitAnnexUrlFile,
|
2012-10-03 21:04:52 +00:00
|
|
|
gitAnnexTmpCfgFile,
|
2012-01-20 19:34:52 +00:00
|
|
|
gitAnnexSshDir,
|
2012-03-04 20:00:24 +00:00
|
|
|
gitAnnexRemotesDir,
|
2012-08-31 22:59:57 +00:00
|
|
|
gitAnnexAssistantDefaultDir,
|
2015-01-28 19:55:17 +00:00
|
|
|
HashLevels(..),
|
2011-04-02 17:49:03 +00:00
|
|
|
hashDirMixed,
|
2011-06-22 21:51:48 +00:00
|
|
|
hashDirLower,
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
preSanitizeKeyName,
|
2017-08-17 19:09:38 +00:00
|
|
|
reSanitizeKeyName,
|
2010-10-11 21:52:46 +00:00
|
|
|
) where
|
2010-10-10 19:54:02 +00:00
|
|
|
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
import Data.Char
|
2015-01-28 19:55:17 +00:00
|
|
|
import Data.Default
|
2019-01-14 18:02:47 +00:00
|
|
|
import qualified Data.ByteString.Char8 as S8
|
2019-12-09 17:49:05 +00:00
|
|
|
import qualified System.FilePath.ByteString as P
|
2010-10-16 20:20:49 +00:00
|
|
|
|
2011-10-04 02:24:57 +00:00
|
|
|
import Common
|
2017-02-24 17:42:30 +00:00
|
|
|
import Key
|
2013-10-22 20:02:52 +00:00
|
|
|
import Types.UUID
|
2016-01-20 20:36:33 +00:00
|
|
|
import Types.GitConfig
|
2015-01-27 21:38:06 +00:00
|
|
|
import Types.Difference
|
2024-05-15 21:33:38 +00:00
|
|
|
import Types.BranchState
|
2024-08-04 15:58:07 +00:00
|
|
|
import Types.Export
|
2011-06-30 17:16:57 +00:00
|
|
|
import qualified Git
|
2016-05-16 21:05:42 +00:00
|
|
|
import qualified Git.Types as Git
|
2015-02-09 19:24:33 +00:00
|
|
|
import Git.FilePath
|
2015-01-28 20:51:40 +00:00
|
|
|
import Annex.DirHashes
|
2015-03-04 20:08:41 +00:00
|
|
|
import Annex.Fixup
|
2019-12-11 18:12:22 +00:00
|
|
|
import qualified Utility.RawFilePath as R
|
2010-10-10 19:54:02 +00:00
|
|
|
|
2011-01-27 21:00:32 +00:00
|
|
|
{- Conventions:
|
|
|
|
-
|
|
|
|
- Functions ending in "Dir" should always return values ending with a
|
|
|
|
- trailing path separator. Most code does not rely on that, but a few
|
|
|
|
- things do.
|
|
|
|
-
|
2024-04-06 13:50:58 +00:00
|
|
|
- Everything else should not end in a trailing path separator.
|
2011-01-27 21:00:32 +00:00
|
|
|
-
|
|
|
|
- Only functions (with names starting with "git") that build a path
|
2015-01-06 19:31:24 +00:00
|
|
|
- based on a git repository should return full path relative to the git
|
|
|
|
- repository. Everything else returns path segments.
|
2011-01-27 21:00:32 +00:00
|
|
|
-}
|
|
|
|
|
2011-03-03 18:51:57 +00:00
|
|
|
{- The directory git annex uses for local state, relative to the .git
|
|
|
|
- directory -}
|
2019-12-18 20:45:03 +00:00
|
|
|
annexDir :: RawFilePath
|
|
|
|
annexDir = P.addTrailingPathSeparator "annex"
|
2019-12-11 18:12:22 +00:00
|
|
|
|
2011-03-03 18:51:57 +00:00
|
|
|
{- The directory git annex uses for locally available object content,
|
|
|
|
- relative to the .git directory -}
|
2022-06-22 20:08:49 +00:00
|
|
|
objectDir :: RawFilePath
|
|
|
|
objectDir = P.addTrailingPathSeparator $ annexDir P.</> "objects"
|
2019-12-11 18:12:22 +00:00
|
|
|
|
2021-07-16 18:16:05 +00:00
|
|
|
{- Annexed file's possible locations relative to the .git directory
|
|
|
|
- in a non-bare repository.
|
2015-01-28 20:51:40 +00:00
|
|
|
-
|
2021-07-16 18:16:05 +00:00
|
|
|
- Normally it is hashDirMixed. However, it's always possible that a
|
|
|
|
- bare repository was converted to non-bare, or that the cripped
|
|
|
|
- filesystem setting changed, so still need to check both. -}
|
|
|
|
annexLocationsNonBare :: GitConfig -> Key -> [RawFilePath]
|
|
|
|
annexLocationsNonBare config key =
|
|
|
|
map (annexLocation config key) [hashDirMixed, hashDirLower]
|
|
|
|
|
|
|
|
{- Annexed file's possible locations relative to a bare repository. -}
|
|
|
|
annexLocationsBare :: GitConfig -> Key -> [RawFilePath]
|
|
|
|
annexLocationsBare config key =
|
|
|
|
map (annexLocation config key) [hashDirLower, hashDirMixed]
|
2015-01-28 20:51:40 +00:00
|
|
|
|
2019-12-11 18:12:22 +00:00
|
|
|
annexLocation :: GitConfig -> Key -> (HashLevels -> Hasher) -> RawFilePath
|
2022-06-22 20:08:49 +00:00
|
|
|
annexLocation config key hasher = objectDir P.</> keyPath key (hasher $ objectHashLevels config)
|
2010-10-13 07:41:12 +00:00
|
|
|
|
2024-08-04 15:58:07 +00:00
|
|
|
{- For exportree remotes with annexobjects=true, objects are stored
|
|
|
|
- in this location as well as in the exported tree. -}
|
|
|
|
exportAnnexObjectLocation :: GitConfig -> Key -> ExportLocation
|
|
|
|
exportAnnexObjectLocation gc k =
|
|
|
|
mkExportLocation $
|
|
|
|
".git" P.</> annexLocation gc k hashDirLower
|
|
|
|
|
2015-06-11 19:14:42 +00:00
|
|
|
{- Number of subdirectories from the gitAnnexObjectDir
|
|
|
|
- to the gitAnnexLocation. -}
|
|
|
|
gitAnnexLocationDepth :: GitConfig -> Int
|
|
|
|
gitAnnexLocationDepth config = hashlevels + 1
|
|
|
|
where
|
|
|
|
HashLevels hashlevels = objectHashLevels config
|
|
|
|
|
2015-01-06 19:31:24 +00:00
|
|
|
{- Annexed object's location in a repository.
|
2011-11-29 03:08:11 +00:00
|
|
|
-
|
|
|
|
- When there are multiple possible locations, returns the one where the
|
|
|
|
- file is actually present.
|
|
|
|
-
|
|
|
|
- When the file is not present, returns the location where the file should
|
|
|
|
- be stored.
|
2011-11-29 02:43:51 +00:00
|
|
|
-}
|
2019-12-11 18:12:22 +00:00
|
|
|
gitAnnexLocation :: Key -> Git.Repo -> GitConfig -> IO RawFilePath
|
2022-05-16 19:19:48 +00:00
|
|
|
gitAnnexLocation = gitAnnexLocation' R.doesPathExist
|
|
|
|
|
|
|
|
gitAnnexLocation' :: (RawFilePath -> IO Bool) -> Key -> Git.Repo -> GitConfig -> IO RawFilePath
|
|
|
|
gitAnnexLocation' checker key r config = gitAnnexLocation'' key r config
|
2019-12-09 17:49:05 +00:00
|
|
|
(annexCrippledFileSystem config)
|
|
|
|
(coreSymlinks config)
|
2022-05-16 19:19:48 +00:00
|
|
|
checker
|
2019-12-11 18:12:22 +00:00
|
|
|
(Git.localGitDir r)
|
2019-12-09 17:49:05 +00:00
|
|
|
|
2022-05-16 19:19:48 +00:00
|
|
|
gitAnnexLocation'' :: Key -> Git.Repo -> GitConfig -> Bool -> Bool -> (RawFilePath -> IO Bool) -> RawFilePath -> IO RawFilePath
|
|
|
|
gitAnnexLocation'' key r config crippled symlinkssupported checker gitdir
|
2013-04-04 19:46:33 +00:00
|
|
|
{- Bare repositories default to hashDirLower for new
|
2016-05-10 19:00:19 +00:00
|
|
|
- content, as it's more portable. But check all locations. -}
|
2021-07-16 18:16:05 +00:00
|
|
|
| Git.repoIsLocalBare r = checkall annexLocationsBare
|
2021-07-15 16:16:31 +00:00
|
|
|
{- If the repository is configured to only use lower, no need
|
|
|
|
- to check both. -}
|
2016-05-10 19:00:19 +00:00
|
|
|
| hasDifference ObjectHashLower (annexDifferences config) =
|
|
|
|
only hashDirLower
|
2021-07-16 18:16:05 +00:00
|
|
|
{- Repositories on crippled filesystems use same layout as bare
|
|
|
|
- repos for new content, unless symlinks are supported too. -}
|
2016-05-16 21:19:07 +00:00
|
|
|
| crippled = if symlinkssupported
|
2021-07-16 18:16:05 +00:00
|
|
|
then checkall annexLocationsNonBare
|
|
|
|
else checkall annexLocationsBare
|
|
|
|
| otherwise = checkall annexLocationsNonBare
|
2012-10-29 01:27:15 +00:00
|
|
|
where
|
2016-05-10 19:00:19 +00:00
|
|
|
only = return . inrepo . annexLocation config key
|
2021-07-16 18:16:05 +00:00
|
|
|
checkall f = check $ map inrepo $ f config key
|
2016-05-10 19:00:19 +00:00
|
|
|
|
2019-12-11 18:12:22 +00:00
|
|
|
inrepo d = gitdir P.</> d
|
2015-03-04 19:44:36 +00:00
|
|
|
check locs@(l:_) = fromMaybe l <$> firstM checker locs
|
2012-10-29 01:27:15 +00:00
|
|
|
check [] = error "internal"
|
2011-01-27 21:00:32 +00:00
|
|
|
|
2015-03-04 20:08:41 +00:00
|
|
|
{- Calculates a symlink target to link a file to an annexed object. -}
|
2020-10-28 20:24:14 +00:00
|
|
|
gitAnnexLink :: RawFilePath -> Key -> Git.Repo -> GitConfig -> IO RawFilePath
|
2015-01-27 21:38:06 +00:00
|
|
|
gitAnnexLink file key r config = do
|
2020-10-28 20:24:14 +00:00
|
|
|
currdir <- R.getCurrentDirectory
|
2017-05-15 21:13:08 +00:00
|
|
|
let absfile = absNormPathUnix currdir file
|
2015-03-04 20:08:41 +00:00
|
|
|
let gitdir = getgitdir currdir
|
2022-05-16 19:19:48 +00:00
|
|
|
loc <- gitAnnexLocation'' key r config False False (\_ -> return True) gitdir
|
2020-10-28 20:24:14 +00:00
|
|
|
toInternalGitPath <$> relPathDirToFile (parentDir absfile) loc
|
2015-01-21 17:54:47 +00:00
|
|
|
where
|
2015-03-04 20:08:41 +00:00
|
|
|
getgitdir currdir
|
|
|
|
{- This special case is for git submodules on filesystems not
|
|
|
|
- supporting symlinks; generate link target that will
|
|
|
|
- work portably. -}
|
2015-04-11 04:10:34 +00:00
|
|
|
| not (coreSymlinks config) && needsSubmoduleFixup r =
|
2020-10-28 20:24:14 +00:00
|
|
|
absNormPathUnix currdir (Git.repoPath r P.</> ".git")
|
2015-03-04 20:08:41 +00:00
|
|
|
| otherwise = Git.localGitDir r
|
2020-10-28 20:24:14 +00:00
|
|
|
absNormPathUnix d p = toInternalGitPath $
|
|
|
|
absPathFrom (toInternalGitPath d) (toInternalGitPath p)
|
2013-04-04 19:46:33 +00:00
|
|
|
|
2016-05-16 21:05:42 +00:00
|
|
|
{- Calculates a symlink target as would be used in a typical git
|
|
|
|
- repository, with .git in the top of the work tree. -}
|
2020-10-28 20:24:14 +00:00
|
|
|
gitAnnexLinkCanonical :: RawFilePath -> Key -> Git.Repo -> GitConfig -> IO RawFilePath
|
2016-05-16 21:05:42 +00:00
|
|
|
gitAnnexLinkCanonical file key r config = gitAnnexLink file key r' config'
|
|
|
|
where
|
|
|
|
r' = case r of
|
|
|
|
Git.Repo { Git.location = l@Git.Local { Git.worktree = Just wt } } ->
|
2019-12-09 17:49:05 +00:00
|
|
|
r { Git.location = l { Git.gitdir = wt P.</> ".git" } }
|
2016-05-16 21:05:42 +00:00
|
|
|
_ -> r
|
|
|
|
config' = config
|
|
|
|
{ annexCrippledFileSystem = False
|
|
|
|
, coreSymlinks = True
|
|
|
|
}
|
|
|
|
|
2014-01-28 20:01:19 +00:00
|
|
|
{- File used to lock a key's content. -}
|
2020-10-29 18:20:57 +00:00
|
|
|
gitAnnexContentLock :: Key -> Git.Repo -> GitConfig -> IO RawFilePath
|
2014-01-28 20:01:19 +00:00
|
|
|
gitAnnexContentLock key r config = do
|
|
|
|
loc <- gitAnnexLocation key r config
|
2020-10-29 18:20:57 +00:00
|
|
|
return $ loc <> ".lck"
|
2014-01-28 20:01:19 +00:00
|
|
|
|
add content retention files
This allows lockContentShared to lock content for eg, 10 minutes and
if the process then gets terminated before it can unlock, the content
will remain locked for that amount of time.
The Windows implementation is not yet tested.
In P2P.Annex, a duration of 10 minutes is used. This way, when p2pstdio
or remotedaemon is serving the P2P protocol, and is asked to
LOCKCONTENT, and that process gets killed, the content will not be
subject to deletion. This is not a perfect solution to
doc/todo/P2P_locking_connection_drop_safety.mdwn yet, but it gets most
of the way there, without needing any P2P protocol changes.
This is only done in v10 and higher repositories (or on Windows). It
might be possible to backport it to v8 or earlier, but it would
complicate locking even further, and without a separate lock file, might
be hard. I think that by the time this fix reaches a given user, they
will probably have been running git-annex 10.x long enough that their v8
repositories will have upgraded to v10 after the 1 year wait. And it's
not as if git-annex hasn't already been subject to this problem (though
I have not heard of any data loss caused by it) for 6 years already, so
waiting another fraction of a year on top of however long it takes this
fix to reach users is unlikely to be a problem.
2024-07-03 18:44:38 +00:00
|
|
|
{- File used to indicate a key's content should not be dropped until after
|
|
|
|
- a specified time. -}
|
|
|
|
gitAnnexContentRetentionTimestamp :: Key -> Git.Repo -> GitConfig -> IO RawFilePath
|
|
|
|
gitAnnexContentRetentionTimestamp key r config = do
|
|
|
|
loc <- gitAnnexLocation key r config
|
|
|
|
return $ loc <> ".rtm"
|
|
|
|
|
|
|
|
{- Lock file for gitAnnexContentRetentionTimestamp -}
|
|
|
|
gitAnnexContentRetentionTimestampLock :: Key -> Git.Repo -> GitConfig -> IO RawFilePath
|
|
|
|
gitAnnexContentRetentionTimestampLock key r config = do
|
|
|
|
loc <- gitAnnexLocation key r config
|
|
|
|
return $ loc <> ".rtl"
|
|
|
|
|
2022-01-20 15:33:14 +00:00
|
|
|
{- Lock that is held when taking the gitAnnexContentLock to support the v10
|
fix failing readonly test case
The problem is that withContentLockFile, in a v8 repo, has to take a shared
lock of `.git/annex/content.lck`. But, in a readonly repository, if that
file does not yet exist, it cannot lock it. And while it will sometimes
work to `chmod +r .git/annex`, the repository might be readonly due to
being owned by another user, or due to being mounted readonly.
So, it seems that the only solution is to use some other file than
`.git/annex/content.lck` as the lock file. The inode sential file
was almost the only option that should always exist. (And if it somehow
does not exist, creating an empty one for locking will be ok.)
Wow, what a hack!
Sponsored-by: Dartmouth College's Datalad project
2022-01-21 17:49:31 +00:00
|
|
|
- upgrade.
|
|
|
|
-
|
|
|
|
- This uses the gitAnnexInodeSentinal file, because it needs to be a file
|
|
|
|
- that exists in the repository, even when it's an old v8 repository that
|
|
|
|
- is mounted read-only. The gitAnnexInodeSentinal is created by git-annex
|
|
|
|
- init, so should already exist.
|
|
|
|
-}
|
2022-01-20 15:33:14 +00:00
|
|
|
gitAnnexContentLockLock :: Git.Repo -> RawFilePath
|
fix failing readonly test case
The problem is that withContentLockFile, in a v8 repo, has to take a shared
lock of `.git/annex/content.lck`. But, in a readonly repository, if that
file does not yet exist, it cannot lock it. And while it will sometimes
work to `chmod +r .git/annex`, the repository might be readonly due to
being owned by another user, or due to being mounted readonly.
So, it seems that the only solution is to use some other file than
`.git/annex/content.lck` as the lock file. The inode sential file
was almost the only option that should always exist. (And if it somehow
does not exist, creating an empty one for locking will be ok.)
Wow, what a hack!
Sponsored-by: Dartmouth College's Datalad project
2022-01-21 17:49:31 +00:00
|
|
|
gitAnnexContentLockLock = gitAnnexInodeSentinal
|
2022-01-20 15:33:14 +00:00
|
|
|
|
2019-12-11 18:12:22 +00:00
|
|
|
gitAnnexInodeSentinal :: Git.Repo -> RawFilePath
|
2019-12-18 20:45:03 +00:00
|
|
|
gitAnnexInodeSentinal r = gitAnnexDir r P.</> "sentinal"
|
2013-02-19 20:26:07 +00:00
|
|
|
|
2019-12-11 18:12:22 +00:00
|
|
|
gitAnnexInodeSentinalCache :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexInodeSentinalCache r = gitAnnexInodeSentinal r <> ".cache"
|
2013-02-19 20:26:07 +00:00
|
|
|
|
2011-03-03 18:51:57 +00:00
|
|
|
{- The annex directory of a repository. -}
|
2019-12-18 20:45:03 +00:00
|
|
|
gitAnnexDir :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexDir r = P.addTrailingPathSeparator $ Git.localGitDir r P.</> annexDir
|
2010-11-07 21:36:24 +00:00
|
|
|
|
2011-11-29 02:43:51 +00:00
|
|
|
{- The part of the annex directory where file contents are stored. -}
|
2020-11-06 18:10:58 +00:00
|
|
|
gitAnnexObjectDir :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexObjectDir r = P.addTrailingPathSeparator $
|
2022-06-22 20:08:49 +00:00
|
|
|
Git.localGitDir r P.</> objectDir
|
2010-11-08 19:14:54 +00:00
|
|
|
|
2014-02-26 20:52:56 +00:00
|
|
|
{- .git/annex/tmp/ is used for temp files for key's contents -}
|
2020-10-30 19:55:59 +00:00
|
|
|
gitAnnexTmpObjectDir :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexTmpObjectDir r = P.addTrailingPathSeparator $
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexDir r P.</> "tmp"
|
2010-10-17 20:39:30 +00:00
|
|
|
|
2019-01-17 19:40:44 +00:00
|
|
|
{- .git/annex/othertmp/ is used for other temp files -}
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexTmpOtherDir :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexTmpOtherDir r = P.addTrailingPathSeparator $
|
|
|
|
gitAnnexDir r P.</> "othertmp"
|
2019-01-17 19:40:44 +00:00
|
|
|
|
|
|
|
{- Lock file for gitAnnexTmpOtherDir. -}
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexTmpOtherLock :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexTmpOtherLock r = gitAnnexDir r P.</> "othertmp.lck"
|
2019-01-17 19:40:44 +00:00
|
|
|
|
2019-09-10 17:37:07 +00:00
|
|
|
{- .git/annex/misctmp/ was used by old versions of git-annex and is still
|
|
|
|
- used during initialization -}
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexTmpOtherDirOld :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexTmpOtherDirOld r = P.addTrailingPathSeparator $
|
|
|
|
gitAnnexDir r P.</> "misctmp"
|
2019-01-17 19:40:44 +00:00
|
|
|
|
2019-05-07 17:04:39 +00:00
|
|
|
{- .git/annex/watchtmp/ is used by the watcher and assistant -}
|
2020-11-04 18:20:37 +00:00
|
|
|
gitAnnexTmpWatcherDir :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexTmpWatcherDir r = P.addTrailingPathSeparator $
|
|
|
|
gitAnnexDir r P.</> "watchtmp"
|
2019-05-07 17:04:39 +00:00
|
|
|
|
2013-04-02 17:13:42 +00:00
|
|
|
{- The temp file to use for a given key's content. -}
|
2020-10-29 18:20:57 +00:00
|
|
|
gitAnnexTmpObjectLocation :: Key -> Git.Repo -> RawFilePath
|
2020-10-30 19:55:59 +00:00
|
|
|
gitAnnexTmpObjectLocation key r = gitAnnexTmpObjectDir r P.</> keyFile key
|
2011-01-28 18:10:50 +00:00
|
|
|
|
2017-11-29 17:49:52 +00:00
|
|
|
{- Given a temp file such as gitAnnexTmpObjectLocation, makes a name for a
|
|
|
|
- subdirectory in the same location, that can be used as a work area
|
|
|
|
- when receiving the key's content.
|
|
|
|
-
|
|
|
|
- There are ordering requirements for creating these directories;
|
|
|
|
- use Annex.Content.withTmpWorkDir to set them up.
|
|
|
|
-}
|
2020-10-29 18:20:57 +00:00
|
|
|
gitAnnexTmpWorkDir :: RawFilePath -> RawFilePath
|
2017-11-29 17:49:52 +00:00
|
|
|
gitAnnexTmpWorkDir p =
|
2020-10-29 18:20:57 +00:00
|
|
|
let (dir, f) = P.splitFileName p
|
2017-11-29 17:49:52 +00:00
|
|
|
-- Using a prefix avoids name conflict with any other keys.
|
2020-10-29 18:20:57 +00:00
|
|
|
in dir P.</> "work." <> f
|
2017-11-29 17:49:52 +00:00
|
|
|
|
2011-06-21 18:44:56 +00:00
|
|
|
{- .git/annex/bad/ is used for bad files found during fsck -}
|
2020-10-29 18:20:57 +00:00
|
|
|
gitAnnexBadDir :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexBadDir r = P.addTrailingPathSeparator $ gitAnnexDir r P.</> "bad"
|
2010-11-13 18:59:27 +00:00
|
|
|
|
2011-04-29 17:59:00 +00:00
|
|
|
{- The bad file to use for a given key. -}
|
2020-10-29 18:20:57 +00:00
|
|
|
gitAnnexBadLocation :: Key -> Git.Repo -> RawFilePath
|
|
|
|
gitAnnexBadLocation key r = gitAnnexBadDir r P.</> keyFile key
|
2011-04-29 17:59:00 +00:00
|
|
|
|
2012-04-14 18:22:33 +00:00
|
|
|
{- .git/annex/foounused is used to number possibly unused keys -}
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexUnusedLog :: RawFilePath -> Git.Repo -> RawFilePath
|
|
|
|
gitAnnexUnusedLog prefix r = gitAnnexDir r P.</> (prefix <> "unused")
|
2010-11-15 22:04:19 +00:00
|
|
|
|
2019-11-06 19:37:18 +00:00
|
|
|
{- .git/annex/keysdb/ contains a database of information about keys. -}
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexKeysDbDir :: Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexKeysDbDir r c = fromMaybe (gitAnnexDir r) (annexDbDir c) P.</> "keysdb"
|
2015-12-07 17:42:03 +00:00
|
|
|
|
2015-12-09 21:00:37 +00:00
|
|
|
{- Lock file for the keys database. -}
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexKeysDbLock :: Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexKeysDbLock r c = gitAnnexKeysDbDir r c <> ".lck"
|
2015-12-07 17:42:03 +00:00
|
|
|
|
2018-08-22 17:04:12 +00:00
|
|
|
{- Contains the stat of the last index file that was
|
2019-11-06 19:37:18 +00:00
|
|
|
- reconciled with the keys database. -}
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexKeysDbIndexCache :: Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexKeysDbIndexCache r c = gitAnnexKeysDbDir r c <> ".cache"
|
2018-08-22 17:04:12 +00:00
|
|
|
|
2015-02-17 21:08:11 +00:00
|
|
|
{- .git/annex/fsck/uuid/ is used to store information about incremental
|
|
|
|
- fscks. -}
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexFsckDir :: UUID -> Git.Repo -> Maybe GitConfig -> RawFilePath
|
|
|
|
gitAnnexFsckDir u r mc = case annexDbDir =<< mc of
|
|
|
|
Nothing -> go (gitAnnexDir r)
|
|
|
|
Just d -> go d
|
|
|
|
where
|
|
|
|
go d = d P.</> "fsck" P.</> fromUUID u
|
2015-02-17 21:08:11 +00:00
|
|
|
|
|
|
|
{- used to store information about incremental fscks. -}
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexFsckState :: UUID -> Git.Repo -> RawFilePath
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexFsckState u r = gitAnnexFsckDir u r Nothing P.</> "state"
|
2015-02-17 21:08:11 +00:00
|
|
|
|
2015-02-18 19:54:24 +00:00
|
|
|
{- Directory containing database used to record fsck info. -}
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexFsckDbDir :: UUID -> Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexFsckDbDir u r c = gitAnnexFsckDir u r (Just c) P.</> "fsckdb"
|
2019-11-06 21:13:39 +00:00
|
|
|
|
|
|
|
{- Directory containing old database used to record fsck info. -}
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexFsckDbDirOld :: UUID -> Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexFsckDbDirOld u r c = gitAnnexFsckDir u r (Just c) P.</> "db"
|
2015-02-17 21:08:11 +00:00
|
|
|
|
|
|
|
{- Lock file for the fsck database. -}
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexFsckDbLock :: UUID -> Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexFsckDbLock u r c = gitAnnexFsckDir u r (Just c) P.</> "fsck.lck"
|
2012-09-25 18:16:34 +00:00
|
|
|
|
2013-10-22 20:02:52 +00:00
|
|
|
{- .git/annex/fsckresults/uuid is used to store results of git fscks -}
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexFsckResultsLog :: UUID -> Git.Repo -> RawFilePath
|
|
|
|
gitAnnexFsckResultsLog u r =
|
2019-12-18 20:45:03 +00:00
|
|
|
gitAnnexDir r P.</> "fsckresults" P.</> fromUUID u
|
2013-10-22 20:02:52 +00:00
|
|
|
|
2022-01-19 19:51:04 +00:00
|
|
|
{- .git/annex/upgrade.log is used to record repository version upgrades. -}
|
|
|
|
gitAnnexUpgradeLog :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexUpgradeLog r = gitAnnexDir r P.</> "upgrade.log"
|
|
|
|
|
|
|
|
gitAnnexUpgradeLock :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexUpgradeLock r = gitAnnexDir r P.</> "upgrade.lck"
|
|
|
|
|
add restage log
When pointer files need to be restaged, they're first written to the
log, and then when the restage operation runs, it reads the log. This
way, if the git-annex process is interrupted before it can do the
restaging, a later git-annex process can do it.
Currently, this lets a git-annex get/drop command be interrupted and
then re-ran, and as long as it gets/drops additional files, it will
clean up after the interrupted command. But more changes are
needed to make it easier to restage after an interrupted process.
Kept using the git queue to run the restage action, even though the
list of files that it builds up for that action is not actually used by
the action. This could perhaps be simplified to make restaging a cleanup
action that gets registered, rather than using the git queue for it. But
I wasn't sure if that would cause visible behavior changes, when eg
dropping a large number of files, currently the git queue flushes
periodically, and so it restages incrementally, rather than all at the
end.
In restagePointerFiles, it reads the restage log twice, once to get
the number of files and size, and a second time to process it.
This seemed better than reading the whole file into memory, since
potentially a huge number of files could be in there. Probably the OS
will cache the file in memory and there will not be much performance
impact. It might be better to keep running tallies in another file
though. But updating that atomically with the log seems hard.
Also note that it's possible for calcRestageLog to see a different file
than streamRestageLog does. More files may be added to the log in
between. That is ok, it will only cause the filterprocessfaster heuristic to
operate with slightly out of date information, so it may make the wrong
choice for the files that got added and be a little slower than ideal.
Sponsored-by: Dartmouth College's DANDI project
2022-09-23 18:38:59 +00:00
|
|
|
{- .git/annex/smudge.log is used to log smudged worktree files that need to
|
2018-10-25 18:43:13 +00:00
|
|
|
- be updated. -}
|
2020-11-03 14:11:04 +00:00
|
|
|
gitAnnexSmudgeLog :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexSmudgeLog r = gitAnnexDir r P.</> "smudge.log"
|
2018-10-25 18:43:13 +00:00
|
|
|
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexSmudgeLock :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexSmudgeLock r = gitAnnexDir r P.</> "smudge.lck"
|
2018-10-25 18:43:13 +00:00
|
|
|
|
add restage log
When pointer files need to be restaged, they're first written to the
log, and then when the restage operation runs, it reads the log. This
way, if the git-annex process is interrupted before it can do the
restaging, a later git-annex process can do it.
Currently, this lets a git-annex get/drop command be interrupted and
then re-ran, and as long as it gets/drops additional files, it will
clean up after the interrupted command. But more changes are
needed to make it easier to restage after an interrupted process.
Kept using the git queue to run the restage action, even though the
list of files that it builds up for that action is not actually used by
the action. This could perhaps be simplified to make restaging a cleanup
action that gets registered, rather than using the git queue for it. But
I wasn't sure if that would cause visible behavior changes, when eg
dropping a large number of files, currently the git queue flushes
periodically, and so it restages incrementally, rather than all at the
end.
In restagePointerFiles, it reads the restage log twice, once to get
the number of files and size, and a second time to process it.
This seemed better than reading the whole file into memory, since
potentially a huge number of files could be in there. Probably the OS
will cache the file in memory and there will not be much performance
impact. It might be better to keep running tallies in another file
though. But updating that atomically with the log seems hard.
Also note that it's possible for calcRestageLog to see a different file
than streamRestageLog does. More files may be added to the log in
between. That is ok, it will only cause the filterprocessfaster heuristic to
operate with slightly out of date information, so it may make the wrong
choice for the files that got added and be a little slower than ideal.
Sponsored-by: Dartmouth College's DANDI project
2022-09-23 18:38:59 +00:00
|
|
|
{- .git/annex/restage.log is used to log worktree files that need to be
|
|
|
|
- restaged in git -}
|
|
|
|
gitAnnexRestageLog :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexRestageLog r = gitAnnexDir r P.</> "restage.log"
|
|
|
|
|
fix deadlock in restagePointerFiles
Fix a hang that occasionally occurred during commands such as move.
(A bug introduced in 10.20220927, in
commit 6a3bd283b8af53f810982e002e435c0d7c040c59)
The restage.log was kept locked while running a complex index refresh
action. In an unusual situation, that action could need to write to the
restage log, which caused a deadlock.
The solution is a two-stage process. First the restage.log is moved to a
work file, which is done with the lock held. Then the content of the work
file is read and processed, which happens without the lock being held.
This is all done in a crash-safe manner.
Note that streamRestageLog may not be fully safe to run concurrently
with itself. That's ok, because restagePointerFiles uses it with the
index lock held, so only one can be run at a time.
streamRestageLog does delete the restage.old file at the end without
locking. If a calcRestageLog is run concurrently, it will either see the
file content before it was deleted, or will see it's missing. Either is
ok, because at most this will cause calcRestageLog to report more
work remains to be done than there is.
Sponsored-by: Dartmouth College's Datalad project
2022-12-08 18:18:54 +00:00
|
|
|
{- .git/annex/restage.old is used while restaging files in git -}
|
|
|
|
gitAnnexRestageLogOld :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexRestageLogOld r = gitAnnexDir r P.</> "restage.old"
|
|
|
|
|
add restage log
When pointer files need to be restaged, they're first written to the
log, and then when the restage operation runs, it reads the log. This
way, if the git-annex process is interrupted before it can do the
restaging, a later git-annex process can do it.
Currently, this lets a git-annex get/drop command be interrupted and
then re-ran, and as long as it gets/drops additional files, it will
clean up after the interrupted command. But more changes are
needed to make it easier to restage after an interrupted process.
Kept using the git queue to run the restage action, even though the
list of files that it builds up for that action is not actually used by
the action. This could perhaps be simplified to make restaging a cleanup
action that gets registered, rather than using the git queue for it. But
I wasn't sure if that would cause visible behavior changes, when eg
dropping a large number of files, currently the git queue flushes
periodically, and so it restages incrementally, rather than all at the
end.
In restagePointerFiles, it reads the restage log twice, once to get
the number of files and size, and a second time to process it.
This seemed better than reading the whole file into memory, since
potentially a huge number of files could be in there. Probably the OS
will cache the file in memory and there will not be much performance
impact. It might be better to keep running tallies in another file
though. But updating that atomically with the log seems hard.
Also note that it's possible for calcRestageLog to see a different file
than streamRestageLog does. More files may be added to the log in
between. That is ok, it will only cause the filterprocessfaster heuristic to
operate with slightly out of date information, so it may make the wrong
choice for the files that got added and be a little slower than ideal.
Sponsored-by: Dartmouth College's DANDI project
2022-09-23 18:38:59 +00:00
|
|
|
gitAnnexRestageLock :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexRestageLock r = gitAnnexDir r P.</> "restage.lck"
|
|
|
|
|
sync: use log to track adjusted branch needs updating
Speeds up sync in an adjusted branch by avoiding re-adjusting the branch
unncessarily, particularly when it is adjusted with --hide-missing or
--unlock-present.
When there are a lot of files, that was the majority of the time of a
--no-content sync.
Uses a log file, which is updated when content presence changes. This
adds a little bit of overhead to every file get/drop when on such an
adjusted branch. The overhead is minimal for get of any size of file,
but might be noticable for drop in some cases. It seems like a reasonable
trade-off. It would be possible to update the log file only at the end, but
then it would not happen if the command is interrupted.
When not in an adjusted branch, there should be no additional overhead.
(getCurrentBranch is an MVar read, and it avoids the MVar read of
getGitConfig.)
Note that this does not deal with situations such as:
git checkout master, git-annex get, git checkout adjusted branch,
git-annex sync. The sync won't know that the adjusted branch needs to be
updated. Dealing with that would add overhead to operation in non-adjusted
branches, which I don't like. Also, there are other situations like having
two adjusted branches that both need to be updated like this, and switching
between them and sync not updating.
This does mean a behavior change to sync, since it did previously deal
with those situations. But, the documentation did not say that it did.
The man pages only talk about sync updating the adjusted branch after
it transfers content.
I did consider making sync keep track of content it transferred (and
dropped) and only update the adjusted branch then, not to catch up to other
changes made previously. That would perform better. But it seemed rather
hard to implement, and also it would have problems with races with a
concurrent get/drop, which this implementation avoids.
And it seemed pretty likely someone had gotten used to get/drop followed by
sync updating the branch. It seems much less likely someone is switching
branches, doing get/drop, and then switching back and expecting sync to update
the branch.
Re-running git-annex adjust still does a full re-adjusting of the branch,
for anyone who needs that.
Sponsored-by: Leon Schuermann on Patreon
2023-06-08 18:35:26 +00:00
|
|
|
{- .git/annex/adjust.log is used to log when the adjusted branch needs to
|
|
|
|
- be updated. -}
|
|
|
|
gitAnnexAdjustedBranchUpdateLog :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexAdjustedBranchUpdateLog r = gitAnnexDir r P.</> "adjust.log"
|
|
|
|
|
|
|
|
gitAnnexAdjustedBranchUpdateLock :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexAdjustedBranchUpdateLock r = gitAnnexDir r P.</> "adjust.lck"
|
|
|
|
|
2023-12-06 19:38:01 +00:00
|
|
|
{- .git/annex/migrate.log is used to log migrations before committing them. -}
|
|
|
|
gitAnnexMigrateLog :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexMigrateLog r = gitAnnexDir r P.</> "migrate.log"
|
|
|
|
|
|
|
|
gitAnnexMigrateLock :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexMigrateLock r = gitAnnexDir r P.</> "migrate.lck"
|
|
|
|
|
2023-12-07 19:50:52 +00:00
|
|
|
{- .git/annex/migrations.log is used to log committed migrations. -}
|
|
|
|
gitAnnexMigrationsLog :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexMigrationsLog r = gitAnnexDir r P.</> "migrations.log"
|
|
|
|
|
|
|
|
gitAnnexMigrationsLock :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexMigrationsLock r = gitAnnexDir r P.</> "migrations.lck"
|
|
|
|
|
2020-10-21 14:31:56 +00:00
|
|
|
{- .git/annex/move.log is used to log moves that are in progress,
|
|
|
|
- to better support resuming an interrupted move. -}
|
2020-11-03 14:11:04 +00:00
|
|
|
gitAnnexMoveLog :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexMoveLog r = gitAnnexDir r P.</> "move.log"
|
2020-10-21 14:31:56 +00:00
|
|
|
|
2020-11-03 14:11:04 +00:00
|
|
|
gitAnnexMoveLock :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexMoveLock r = gitAnnexDir r P.</> "move.lck"
|
2020-10-21 14:31:56 +00:00
|
|
|
|
2019-11-06 21:13:39 +00:00
|
|
|
{- .git/annex/export/ is used to store information about
|
2017-09-04 17:52:22 +00:00
|
|
|
- exports to special remotes. -}
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexExportDir :: Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexExportDir r c = fromMaybe (gitAnnexDir r) (annexDbDir c) P.</> "export"
|
2017-09-04 17:52:22 +00:00
|
|
|
|
|
|
|
{- Directory containing database used to record export info. -}
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexExportDbDir :: UUID -> Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexExportDbDir u r c =
|
|
|
|
gitAnnexExportDir r c P.</> fromUUID u P.</> "exportdb"
|
2017-09-04 17:52:22 +00:00
|
|
|
|
2022-08-11 20:57:44 +00:00
|
|
|
{- Lock file for export database. -}
|
|
|
|
gitAnnexExportLock :: UUID -> Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexExportLock u r c = gitAnnexExportDbDir u r c <> ".lck"
|
2017-09-18 16:12:11 +00:00
|
|
|
|
2022-08-11 20:57:44 +00:00
|
|
|
{- Lock file for updating the export database with information from the
|
|
|
|
- repository. -}
|
|
|
|
gitAnnexExportUpdateLock :: UUID -> Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexExportUpdateLock u r c = gitAnnexExportDbDir u r c <> ".upl"
|
2019-03-07 19:59:44 +00:00
|
|
|
|
2019-05-20 20:37:04 +00:00
|
|
|
{- Log file used to keep track of files that were in the tree exported to a
|
|
|
|
- remote, but were excluded by its preferred content settings. -}
|
2020-10-29 18:20:57 +00:00
|
|
|
gitAnnexExportExcludeLog :: UUID -> Git.Repo -> RawFilePath
|
|
|
|
gitAnnexExportExcludeLog u r = gitAnnexDir r P.</> "export.ex" P.</> fromUUID u
|
2019-05-20 20:37:04 +00:00
|
|
|
|
2019-04-09 23:58:24 +00:00
|
|
|
{- Directory containing database used to record remote content ids.
|
|
|
|
-
|
|
|
|
- (This used to be "cid", but a problem with the database caused it to
|
|
|
|
- need to be rebuilt with a new name.)
|
|
|
|
-}
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexContentIdentifierDbDir :: Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexContentIdentifierDbDir r c =
|
|
|
|
fromMaybe (gitAnnexDir r) (annexDbDir c) P.</> "cidsdb"
|
2019-02-20 20:59:10 +00:00
|
|
|
|
|
|
|
{- Lock file for writing to the content id database. -}
|
2022-08-11 20:57:44 +00:00
|
|
|
gitAnnexContentIdentifierLock :: Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexContentIdentifierLock r c = gitAnnexContentIdentifierDbDir r c <> ".lck"
|
2019-02-20 20:59:10 +00:00
|
|
|
|
2023-05-31 19:45:23 +00:00
|
|
|
{- .git/annex/import/ is used to store information about
|
|
|
|
- imports from special remotes. -}
|
|
|
|
gitAnnexImportDir :: Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexImportDir r c = fromMaybe (gitAnnexDir r) (annexDbDir c) P.</> "import"
|
|
|
|
|
|
|
|
{- File containing state about the last import done from a remote. -}
|
|
|
|
gitAnnexImportLog :: UUID -> Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexImportLog u r c =
|
|
|
|
gitAnnexImportDir r c P.</> fromUUID u P.</> "log"
|
|
|
|
|
sqlite datbase for importfeed
importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.
Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.
Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.
Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.
Sponsored-by: unqueued on Patreon
2023-10-23 20:12:26 +00:00
|
|
|
{- Directory containing database used by importfeed. -}
|
|
|
|
gitAnnexImportFeedDbDir :: Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexImportFeedDbDir r c =
|
|
|
|
fromMaybe (gitAnnexDir r) (annexDbDir c) P.</> "importfeed"
|
|
|
|
|
|
|
|
{- Lock file for writing to the importfeed database. -}
|
|
|
|
gitAnnexImportFeedDbLock :: Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexImportFeedDbLock r c = gitAnnexImportFeedDbDir r c <> ".lck"
|
|
|
|
|
2024-08-12 15:19:58 +00:00
|
|
|
{- Directory containing reposize database. -}
|
|
|
|
gitAnnexRepoSizeDbDir :: Git.Repo -> GitConfig -> RawFilePath
|
|
|
|
gitAnnexRepoSizeDbDir r c =
|
|
|
|
fromMaybe (gitAnnexDir r) (annexDbDir c) P.</> "reposize"
|
|
|
|
|
2013-10-08 15:48:28 +00:00
|
|
|
{- .git/annex/schedulestate is used to store information about when
|
|
|
|
- scheduled jobs were last run. -}
|
2020-10-29 18:20:57 +00:00
|
|
|
gitAnnexScheduleState :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexScheduleState r = gitAnnexDir r P.</> "schedulestate"
|
2013-10-08 15:48:28 +00:00
|
|
|
|
2012-09-26 16:06:44 +00:00
|
|
|
{- .git/annex/creds/ is used to store credentials to access some special
|
|
|
|
- remotes. -}
|
2020-10-28 21:25:59 +00:00
|
|
|
gitAnnexCredsDir :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexCredsDir r = P.addTrailingPathSeparator $ gitAnnexDir r P.</> "creds"
|
2012-09-26 16:06:44 +00:00
|
|
|
|
2014-03-01 01:32:18 +00:00
|
|
|
{- .git/annex/certificate.pem and .git/annex/key.pem are used by the webapp
|
|
|
|
- when HTTPS is enabled -}
|
|
|
|
gitAnnexWebCertificate :: Git.Repo -> FilePath
|
2019-12-18 20:45:03 +00:00
|
|
|
gitAnnexWebCertificate r = fromRawFilePath $ gitAnnexDir r P.</> "certificate.pem"
|
2014-03-01 01:32:18 +00:00
|
|
|
gitAnnexWebPrivKey :: Git.Repo -> FilePath
|
2019-12-18 20:45:03 +00:00
|
|
|
gitAnnexWebPrivKey r = fromRawFilePath $ gitAnnexDir r P.</> "privkey.pem"
|
2014-03-01 01:32:18 +00:00
|
|
|
|
2024-02-29 17:26:06 +00:00
|
|
|
{- .git/annex/feeds/ is used to record per-key (url) state by importfeed -}
|
2020-11-03 22:34:27 +00:00
|
|
|
gitAnnexFeedStateDir :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexFeedStateDir r = P.addTrailingPathSeparator $
|
|
|
|
gitAnnexDir r P.</> "feedstate"
|
2013-08-03 05:40:21 +00:00
|
|
|
|
2020-11-03 22:34:27 +00:00
|
|
|
gitAnnexFeedState :: Key -> Git.Repo -> RawFilePath
|
|
|
|
gitAnnexFeedState k r = gitAnnexFeedStateDir r P.</> keyFile k
|
2013-08-03 05:40:21 +00:00
|
|
|
|
2020-11-12 16:40:35 +00:00
|
|
|
{- .git/annex/merge/ is used as a empty work tree for merges in
|
|
|
|
- adjusted branches. -}
|
2012-12-18 19:04:44 +00:00
|
|
|
gitAnnexMergeDir :: Git.Repo -> FilePath
|
2019-12-18 20:45:03 +00:00
|
|
|
gitAnnexMergeDir r = fromRawFilePath $
|
|
|
|
P.addTrailingPathSeparator $ gitAnnexDir r P.</> "merge"
|
2012-12-18 19:04:44 +00:00
|
|
|
|
2012-09-26 16:06:44 +00:00
|
|
|
{- .git/annex/transfer/ is used to record keys currently
|
2012-08-23 17:42:13 +00:00
|
|
|
- being transferred, and other transfer bookkeeping info. -}
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexTransferDir :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexTransferDir r =
|
2019-12-18 20:45:03 +00:00
|
|
|
P.addTrailingPathSeparator $ gitAnnexDir r P.</> "transfer"
|
2012-07-01 18:29:00 +00:00
|
|
|
|
2011-06-23 13:56:04 +00:00
|
|
|
{- .git/annex/journal/ is used to journal changes made to the git-annex
|
|
|
|
- branch -}
|
2024-05-15 21:33:38 +00:00
|
|
|
gitAnnexJournalDir :: BranchState -> Git.Repo -> RawFilePath
|
|
|
|
gitAnnexJournalDir st r = P.addTrailingPathSeparator $
|
|
|
|
case alternateJournal st of
|
|
|
|
Nothing -> gitAnnexDir r P.</> "journal"
|
|
|
|
Just d -> d
|
2011-06-23 13:56:04 +00:00
|
|
|
|
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 18:32:41 +00:00
|
|
|
{- .git/annex/journal.private/ is used to journal changes regarding private
|
|
|
|
- repositories. -}
|
2024-05-15 21:33:38 +00:00
|
|
|
gitAnnexPrivateJournalDir :: BranchState -> Git.Repo -> RawFilePath
|
|
|
|
gitAnnexPrivateJournalDir st r = P.addTrailingPathSeparator $
|
|
|
|
case alternateJournal st of
|
|
|
|
Nothing -> gitAnnexDir r P.</> "journal-private"
|
|
|
|
Just d -> d
|
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 18:32:41 +00:00
|
|
|
|
2011-10-03 20:32:36 +00:00
|
|
|
{- Lock file for the journal. -}
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexJournalLock :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexJournalLock r = gitAnnexDir r P.</> "journal.lck"
|
2011-10-03 20:32:36 +00:00
|
|
|
|
2019-05-06 19:15:12 +00:00
|
|
|
{- Lock file for flushing a git queue that writes to the git index or
|
|
|
|
- other git state that should only have one writer at a time. -}
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexGitQueueLock :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexGitQueueLock r = gitAnnexDir r P.</> "gitqueue.lck"
|
2019-05-06 19:15:12 +00:00
|
|
|
|
2011-12-11 18:14:28 +00:00
|
|
|
{- .git/annex/index is used to stage changes to the git-annex branch -}
|
2020-11-05 22:45:37 +00:00
|
|
|
gitAnnexIndex :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexIndex r = gitAnnexDir r P.</> "index"
|
2011-12-11 18:14:28 +00:00
|
|
|
|
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 18:32:41 +00:00
|
|
|
{- .git/annex/index-private is used to store information that is not to
|
|
|
|
- be exposed to the git-annex branch. -}
|
|
|
|
gitAnnexPrivateIndex :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexPrivateIndex r = gitAnnexDir r P.</> "index-private"
|
|
|
|
|
2013-10-03 19:06:58 +00:00
|
|
|
{- Holds the ref of the git-annex branch that the index was last updated to.
|
|
|
|
-
|
|
|
|
- The .lck in the name is a historical accident; this is not used as a
|
|
|
|
- lock. -}
|
2020-10-29 18:20:57 +00:00
|
|
|
gitAnnexIndexStatus :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexIndexStatus r = gitAnnexDir r P.</> "index.lck"
|
2011-12-11 20:11:13 +00:00
|
|
|
|
2014-02-18 21:38:23 +00:00
|
|
|
{- The index file used to generate a filtered branch view._-}
|
2020-11-05 22:45:37 +00:00
|
|
|
gitAnnexViewIndex :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexViewIndex r = gitAnnexDir r P.</> "viewindex"
|
2014-02-18 21:38:23 +00:00
|
|
|
|
|
|
|
{- File containing a log of recently accessed views. -}
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexViewLog :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexViewLog r = gitAnnexDir r P.</> "viewlog"
|
2014-02-18 21:38:23 +00:00
|
|
|
|
2016-07-17 16:11:05 +00:00
|
|
|
{- List of refs that have already been merged into the git-annex branch. -}
|
2020-10-29 18:20:57 +00:00
|
|
|
gitAnnexMergedRefs :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexMergedRefs r = gitAnnexDir r P.</> "mergedrefs"
|
2016-07-17 16:11:05 +00:00
|
|
|
|
2013-08-28 19:57:42 +00:00
|
|
|
{- List of refs that should not be merged into the git-annex branch. -}
|
2020-10-29 18:20:57 +00:00
|
|
|
gitAnnexIgnoredRefs :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexIgnoredRefs r = gitAnnexDir r P.</> "ignoredrefs"
|
2013-08-28 19:57:42 +00:00
|
|
|
|
2012-06-11 05:20:19 +00:00
|
|
|
{- Pid file for daemon mode. -}
|
2020-10-29 14:33:12 +00:00
|
|
|
gitAnnexPidFile :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexPidFile r = gitAnnexDir r P.</> "daemon.pid"
|
2012-06-11 05:20:19 +00:00
|
|
|
|
2015-11-12 21:47:31 +00:00
|
|
|
{- Pid lock file for pidlock mode -}
|
2020-10-29 14:33:12 +00:00
|
|
|
gitAnnexPidLockFile :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexPidLockFile r = gitAnnexDir r P.</> "pidlock"
|
2015-11-12 21:47:31 +00:00
|
|
|
|
2012-06-13 17:35:15 +00:00
|
|
|
{- Status file for daemon mode. -}
|
|
|
|
gitAnnexDaemonStatusFile :: Git.Repo -> FilePath
|
2019-12-18 20:45:03 +00:00
|
|
|
gitAnnexDaemonStatusFile r = fromRawFilePath $
|
|
|
|
gitAnnexDir r P.</> "daemon.status"
|
2012-06-13 17:35:15 +00:00
|
|
|
|
2012-06-11 04:39:09 +00:00
|
|
|
{- Log file for daemon mode. -}
|
2020-11-04 18:20:37 +00:00
|
|
|
gitAnnexDaemonLogFile :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexDaemonLogFile r = gitAnnexDir r P.</> "daemon.log"
|
2012-06-11 04:39:09 +00:00
|
|
|
|
2013-05-23 23:00:46 +00:00
|
|
|
{- Log file for fuzz test. -}
|
|
|
|
gitAnnexFuzzTestLogFile :: Git.Repo -> FilePath
|
2019-12-18 20:45:03 +00:00
|
|
|
gitAnnexFuzzTestLogFile r = fromRawFilePath $
|
|
|
|
gitAnnexDir r P.</> "fuzztest.log"
|
2013-05-23 23:00:46 +00:00
|
|
|
|
2012-07-26 03:13:01 +00:00
|
|
|
{- Html shim file used to launch the webapp. -}
|
2020-11-04 18:20:37 +00:00
|
|
|
gitAnnexHtmlShim :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexHtmlShim r = gitAnnexDir r P.</> "webapp.html"
|
2012-07-26 03:13:01 +00:00
|
|
|
|
2012-09-18 21:50:07 +00:00
|
|
|
{- File containing the url to the webapp. -}
|
2020-11-04 18:20:37 +00:00
|
|
|
gitAnnexUrlFile :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexUrlFile r = gitAnnexDir r P.</> "url"
|
2012-09-18 21:50:07 +00:00
|
|
|
|
2012-10-03 21:04:52 +00:00
|
|
|
{- Temporary file used to edit configuriation from the git-annex branch. -}
|
2020-10-30 19:55:59 +00:00
|
|
|
gitAnnexTmpCfgFile :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexTmpCfgFile r = gitAnnexDir r P.</> "config.tmp"
|
2012-10-03 21:04:52 +00:00
|
|
|
|
2012-01-20 19:34:52 +00:00
|
|
|
{- .git/annex/ssh/ is used for ssh connection caching -}
|
2020-10-29 16:02:46 +00:00
|
|
|
gitAnnexSshDir :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexSshDir r = P.addTrailingPathSeparator $ gitAnnexDir r P.</> "ssh"
|
2012-01-20 19:34:52 +00:00
|
|
|
|
2012-03-04 20:00:24 +00:00
|
|
|
{- .git/annex/remotes/ is used for remote-specific state. -}
|
2020-10-28 21:25:59 +00:00
|
|
|
gitAnnexRemotesDir :: Git.Repo -> RawFilePath
|
|
|
|
gitAnnexRemotesDir r =
|
2019-12-18 20:45:03 +00:00
|
|
|
P.addTrailingPathSeparator $ gitAnnexDir r P.</> "remotes"
|
2012-03-04 20:00:24 +00:00
|
|
|
|
2012-08-31 22:59:57 +00:00
|
|
|
{- This is the base directory name used by the assistant when making
|
|
|
|
- repositories, by default. -}
|
|
|
|
gitAnnexAssistantDefaultDir :: FilePath
|
|
|
|
gitAnnexAssistantDefaultDir = "annex"
|
|
|
|
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
{- Sanitizes a String that will be used as part of a Key's keyName,
|
2017-08-17 18:46:33 +00:00
|
|
|
- dealing with characters that cause problems.
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
-
|
2020-07-20 18:06:05 +00:00
|
|
|
- This is used when a new Key is initially being generated, eg by genKey.
|
2023-03-14 02:39:16 +00:00
|
|
|
- Unlike keyFile and fileKey, it does not need to be a reversible
|
2016-06-02 01:46:58 +00:00
|
|
|
- escaping. Also, it's ok to change this to add more problematic
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
- characters later. Unlike changing keyFile, which could result in the
|
|
|
|
- filenames used for existing keys changing and contents getting lost.
|
|
|
|
-
|
|
|
|
- It is, however, important that the input and output of this function
|
|
|
|
- have a 1:1 mapping, to avoid two different inputs from mapping to the
|
|
|
|
- same key.
|
|
|
|
-}
|
|
|
|
preSanitizeKeyName :: String -> String
|
2017-08-17 19:09:38 +00:00
|
|
|
preSanitizeKeyName = preSanitizeKeyName' False
|
|
|
|
|
|
|
|
preSanitizeKeyName' :: Bool -> String -> String
|
|
|
|
preSanitizeKeyName' resanitize = concatMap escape
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
where
|
2014-10-09 18:53:13 +00:00
|
|
|
escape c
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
| isAsciiUpper c || isAsciiLower c || isDigit c = [c]
|
2019-01-14 18:02:47 +00:00
|
|
|
| c `elem` ['.', '-', '_'] = [c] -- common, assumed safe
|
|
|
|
| c `elem` ['/', '%', ':'] = [c] -- handled by keyFile
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
-- , is safe and uncommon, so will be used to escape
|
|
|
|
-- other characters. By itself, it is escaped to
|
|
|
|
-- doubled form.
|
2017-08-17 19:09:38 +00:00
|
|
|
| c == ',' = if not resanitize
|
|
|
|
then ",,"
|
|
|
|
else ","
|
2014-02-11 05:35:11 +00:00
|
|
|
| otherwise = ',' : show (ord c)
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
|
2017-08-17 19:09:38 +00:00
|
|
|
{- Converts a keyName that has been santizied with an old version of
|
|
|
|
- preSanitizeKeyName to be sanitized with the new version. -}
|
|
|
|
reSanitizeKeyName :: String -> String
|
|
|
|
reSanitizeKeyName = preSanitizeKeyName' True
|
|
|
|
|
2011-12-02 18:39:47 +00:00
|
|
|
{- Converts a key into a filename fragment without any directory.
|
2010-10-13 07:41:12 +00:00
|
|
|
-
|
|
|
|
- Escape "/" in the key name, to keep a flat tree of files and avoid
|
|
|
|
- issues with keys containing "/../" or ending with "/" etc.
|
|
|
|
-
|
|
|
|
- "/" is escaped to "%" because it's short and rarely used, and resembles
|
|
|
|
- a slash
|
|
|
|
- "%" is escaped to "&s", and "&" to "&a"; this ensures that the mapping
|
|
|
|
- is one to one.
|
Better sanitization of problem characters when generating URL and WORM keys.
FAT has a lot of characters it does not allow in filenames, like ? and *
It's probably the worst offender, but other filesystems also have
limitiations.
In 2011, I made keyFile escape : to handle FAT, but missed the other
characters. It also turns out that when I did that, I was also living
dangerously; any existing keys that contained a : had their object
location change. Oops.
So, adding new characters to escape to keyFile is out. Well, it would be
possible to make keyFile behave differently on a per-filesystem basis, but
this would be a real nightmare to get right. Consider that a rsync special
remote uses keyFile to determine the filenames to use, and we don't know
the underlying filesystem on the rsync server..
Instead, I have gone for a solution that is backwards compatable and
simple. Its only downside is that already generated URL and WORM keys
might not be able to be stored on FAT or some other filesystem that
dislikes a character used in the key. (In this case, the user can just
migrate the problem keys to a checksumming backend. If this became a big
problem, fsck could be made to detect these and suggest a migration.)
Going forward, new keys that are created will escape all characters that
are likely to cause problems. And if some filesystem comes along that's
even worse than FAT (seems unlikely, but here it is 2013, and people are
still using FAT!), additional characters can be added to the set that are
escaped without difficulty.
(Also, made WORM limit the part of the filename that is embedded in the key,
to deal with filesystem filename length limits. This could have already
been a problem, but is more likely now, since the escaping of the filename
can make it longer.)
This commit was sponsored by Ian Downes
2013-10-05 19:01:49 +00:00
|
|
|
- ":" is escaped to "&c", because it seemed like a good idea at the time.
|
|
|
|
-
|
|
|
|
- Changing what this function escapes and how is not a good idea, as it
|
|
|
|
- can cause existing objects to get lost.
|
2011-10-16 04:04:26 +00:00
|
|
|
-}
|
2019-12-18 20:45:03 +00:00
|
|
|
keyFile :: Key -> RawFilePath
|
|
|
|
keyFile k =
|
2019-11-22 20:24:04 +00:00
|
|
|
let b = serializeKey' k
|
2019-11-22 23:13:05 +00:00
|
|
|
in if S8.any (`elem` ['&', '%', ':', '/']) b
|
2019-01-15 00:52:54 +00:00
|
|
|
then S8.concatMap esc b
|
|
|
|
else b
|
2016-09-26 20:47:59 +00:00
|
|
|
where
|
|
|
|
esc '&' = "&a"
|
|
|
|
esc '%' = "&s"
|
|
|
|
esc ':' = "&c"
|
|
|
|
esc '/' = "%"
|
2019-01-14 18:02:47 +00:00
|
|
|
esc c = S8.singleton c
|
2010-10-13 07:41:12 +00:00
|
|
|
|
2013-10-05 17:49:45 +00:00
|
|
|
{- Reverses keyFile, converting a filename fragment (ie, the basename of
|
|
|
|
- the symlink target) into a key. -}
|
2019-12-18 20:45:03 +00:00
|
|
|
fileKey :: RawFilePath -> Maybe Key
|
|
|
|
fileKey = deserializeKey' . S8.intercalate "/" . map go . S8.split '%'
|
2016-09-26 20:47:59 +00:00
|
|
|
where
|
2019-01-15 00:59:09 +00:00
|
|
|
go = S8.concat . unescafterfirst . S8.split '&'
|
|
|
|
unescafterfirst [] = []
|
|
|
|
unescafterfirst (b:bs) = b : map (unesc . S8.uncons) bs
|
2019-01-14 18:02:47 +00:00
|
|
|
unesc :: Maybe (Char, S8.ByteString) -> S8.ByteString
|
|
|
|
unesc Nothing = mempty
|
|
|
|
unesc (Just ('c', b)) = S8.cons ':' b
|
|
|
|
unesc (Just ('s', b)) = S8.cons '%' b
|
|
|
|
unesc (Just ('a', b)) = S8.cons '&' b
|
|
|
|
unesc (Just (c, b)) = S8.cons c b
|
2013-10-05 17:49:45 +00:00
|
|
|
|
2015-01-28 20:51:40 +00:00
|
|
|
{- A location to store a key on a special remote that uses a filesystem.
|
|
|
|
- A directory hash is used, to protect against filesystems that dislike
|
|
|
|
- having many items in a single directory.
|
2011-12-02 18:39:47 +00:00
|
|
|
-
|
|
|
|
- The file is put in a directory with the same name, this allows
|
|
|
|
- write-protecting the directory to avoid accidental deletion of the file.
|
|
|
|
-}
|
2019-12-11 18:12:22 +00:00
|
|
|
keyPath :: Key -> Hasher -> RawFilePath
|
|
|
|
keyPath key hasher = hasher key P.</> f P.</> f
|
2012-10-29 01:27:15 +00:00
|
|
|
where
|
2019-12-18 20:45:03 +00:00
|
|
|
f = keyFile key
|
2011-12-02 18:39:47 +00:00
|
|
|
|
2023-03-14 02:39:16 +00:00
|
|
|
{- All possible locations to store a key in a special remote
|
2015-01-28 20:51:40 +00:00
|
|
|
- using different directory hashes.
|
|
|
|
-
|
2021-07-16 18:16:05 +00:00
|
|
|
- This is compatible with the annexLocationsNonBare and annexLocationsBare,
|
|
|
|
- for interoperability between special remotes and git-annex repos.
|
2011-03-15 21:47:00 +00:00
|
|
|
-}
|
2019-12-11 18:12:22 +00:00
|
|
|
keyPaths :: Key -> [RawFilePath]
|
2015-01-28 22:01:54 +00:00
|
|
|
keyPaths key = map (\h -> keyPath key (h def)) dirHashes
|