Merge branch 'master' into hiddenannex

This commit is contained in:
Joey Hess 2021-04-21 13:04:40 -04:00
commit 9b870e29fd
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
13 changed files with 205 additions and 7 deletions

View file

@ -251,6 +251,12 @@ test runannex mkr mkk =
lockContentForRemoval k noop removeAnnex
get r k
, check "fsck downloaded object" fsck
, check "retrieveKeyFile resume from 0" $ \r k -> do
tmp <- fromRawFilePath <$> prepTmp k
liftIO $ writeFile tmp ""
lockContentForRemoval k noop removeAnnex
get r k
, check "fsck downloaded object" fsck
, check "retrieveKeyFile resume from 33%" $ \r k -> do
loc <- fromRawFilePath <$> Annex.calcRepo (gitAnnexLocation k)
tmp <- fromRawFilePath <$> prepTmp k
@ -261,12 +267,6 @@ test runannex mkr mkk =
lockContentForRemoval k noop removeAnnex
get r k
, check "fsck downloaded object" fsck
, check "retrieveKeyFile resume from 0" $ \r k -> do
tmp <- fromRawFilePath <$> prepTmp k
liftIO $ writeFile tmp ""
lockContentForRemoval k noop removeAnnex
get r k
, check "fsck downloaded object" fsck
, check "retrieveKeyFile resume from end" $ \r k -> do
loc <- fromRawFilePath <$> Annex.calcRepo (gitAnnexLocation k)
tmp <- fromRawFilePath <$> prepTmp k

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2021-04-21T16:11:15Z"
content="""
Still failing on windows after other fix.
More context on the failure shows it cannot be related to exporting,
this is pure key/value store operation.
key size 1048576; directory remote chunksize=0 encryption=none
removeKey when not present: OK (0.07s)
present False: OK (0.07s)
storeKey: OK (0.02s)
present True: OK
storeKey when already present: OK (0.02s)
present True: OK
retrieveKeyFile: OK (0.17s)
fsck downloaded object: OK
retrieveKeyFile resume from 33%: FAIL
Exception: .git\annex\objects\d78\ee7\SHA256E-s1048576--b9be1c0379146c0bc17c03d1caa8fb1c9d25cc741f59c09ab27379d5fc41862d.this-is-a-test-key\SHA256E-s1048576--b9be1c0379146c0bc17c03d1caa8fb1c9d25cc741f59c09ab27379d5fc41862d.this-is-a-test-key: DeleteFile "\\\\?\\C:\\Users\\runneradmin\\.t\\main2\\.git\\annex\\objects\\d78\\ee7\\SHA256E-s1048576--b9be1c0379146c0bc17c03d1caa8fb1c9d25cc741f59c09ab27379d5fc41862d.this-is-a-test-key\\SHA256E-s1048576--b9be1c0379146c0bc17c03d1caa8fb1c9d25cc741f59c09ab27379d5fc41862d.this-is-a-test-key": permission denied (Access is denied.)
Also, the directory special remote exporttree tests actually pass!
"""]]

View file

@ -0,0 +1,38 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2021-04-21T16:21:15Z"
content="""
Note that, before it fails, retrieveKeyFile has already succeeded once.
It may be that the cause is that the earlier retrieveKeyFile leaves
the annex object file somehow inaccessible.
Or it may be that the cause is in Command.TestRemote's code
that sets up this "resume from 33%" test case:
loc <- fromRawFilePath <$> Annex.calcRepo (gitAnnexLocation k)
tmp <- fromRawFilePath <$> prepTmp k
partial <- liftIO $ bracket (openBinaryFile loc ReadMode) hClose $ \h -> do
sz <- hFileSize h
L.hGet h $ fromInteger $ sz `div` 3
liftIO $ L.writeFile tmp partial
lockContentForRemoval k noop removeAnnex -- appears that this is what fails to delete the file
If the handle that is opened to read the annex object file somehow
causes it to linger in a locked state past when the handle should be closed,
it could cause the later failure to delete the annex object file, since windows
may consider an open file handle to be a lock.
(Some issues with ghc not promptly closing file handles, in a version
in the last year or so, come to mind..)
I've swapped the order of the resume from 33% and resume from 0%
tests. The 0% test does not open a handle that way. So if the
resume from 0% still fails, we'll know for sure the problem is not
caused by the 33% test.
If it is caused by the CoW changes, it seems likely to involve fileCopier's
code that tries to preserve the source file's mode. Before the CoW
changes, I don't think that was done by Remote.Directory's retrieveKeyFile.
"""]]

View file

@ -0,0 +1,26 @@
Hello everyone.
Im new to local multi-device file sync, and I just read the project overviews and FAQs as well as most of the documentations of **git-annex**, **Mutagen**, **Syncthing**, and **Unison**. Im a little stuck in thinking everything through until the end, so maybe I could ask some of you for your advice and/or opinion.
## What do I want to achieve?
Synchronized folders and files as well as symlinks. LAN-only preferred, no online/cloud, i.e. everything should, if possible, work without any internet connection whatsoever.
## How many and which devices are in use?
Three, at least. Were having three Mac devices in our network, as well as optionally a Raspberry Pi with optionally some storage attached that could serve as network storage (SSHFS, NFS, AFP, et cetera) and serve files between the Mac devices; also an Apple Time Capsule with 2 TB storage would be available.
## Is real-time synchronization necessary?
Not really; it would be okay to be automating, i.e. auto-starting, the check/sync for example every hour. I think this is one of the main differences of Syncthing and Unison, that Unison needs to be “started” manually after making changes to files, and Syncthing just runs in the background and as soon as something is changed, the changes are propagated to all other devices?
## Are the devices used at the same time?
Generally, Id like to say no. In the very most cases the three Mac devices are not used at the same moment in time.
## Are all devices always-on?
Not really. The Mac devices (old Macbook, new Macbook, Mac Mini) are often in sleep mode, I guess; the Raspberry Pi on my network is always-on, though.
In case I havent forgotten to write anything down, I think thats all I have to say, i.e. am asking/looking for. Based on these demands, what would you say would be the better way to go, and if you dont mind, please elaborate why?
Thank you so much, everyone.

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="Atemu"
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
subject="comment 4"
date="2021-04-20T18:30:39Z"
content="""
Perhaps Git Annex could have first-class support for a local special remote inside the .git/annex dir where files that aren't checked out are stored in a more efficient manner.
This would mainly be useful for old versions of files you want to keep in the repo but don't need immediate access to or bare repos like in OP's case. Once special remotes support compression, it might make sense to make it the default storage method for bare repos actually.
Ideally these could be set to be any local special remote backend; bup would make an ideal candidate for storing old versions of documents efficiently for example.
Having files in this such a \"local special remote\" would then be equivalent to having them in the regular .git/annex/objects dir for tracking purposes.
"""]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="kyle"
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
subject="re: Are my unlocked, annexed files still safe?"
date="2021-04-21T16:32:42Z"
content="""
@pat:
> Basically, unlock gives me an editable copy of the file - but I always
> have the original version, and can revert or check it out if I need
> to. Is that correct?
Yes, it's a copy as long as you don't set `annex.thin=true` (as you
mention). Just as with locked files, though, you may not be able to
get the content back from an earlier version if you've dropped unused
content.
"""]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="kyle"
avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3"
subject="re: clarifying unlocked files"
date="2021-04-21T16:34:50Z"
content="""
@Ilya_Shlyakhter:
> Does the locked/unlocked state apply to one particular path within the
> repo, or to a particular key?
A particular path.
> Can the same key be used by both a locked and an unlocked file?
Yes.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="pat"
avatar="http://cdn.libravatar.org/avatar/6b552550673a6a6df3b33364076f8ea8"
subject="Are my unlocked, annexed files still safe?"
date="2021-04-21T15:47:48Z"
content="""
I want to double-check something: if I've annexed and committed files, I believe they are safely stored in git-annex even if I unlock them (as long as I don't use `--thin`). If I annex copies of the the same file, annex will only store it once, and use a symlink for the two original files. But if I unlock them, I can edit them independently.
Basically, unlock gives me an editable copy of the file - but I always have the original version, and can revert or check it out if I need to. Is that correct?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="clarifying unlocked files"
date="2021-04-21T16:08:05Z"
content="""
Does the locked/unlocked state apply to one particular path within the repo, or to a particular key? Can the same key be used by both a locked and an unlocked file?
"""]]

View file

@ -1187,7 +1187,7 @@ repository, using [[git-annex-config]]. See its man page for a list.)
And when multiple files in the work tree have the same content, only
one of them gets hard linked to the annex.
* `annex.supportunlocked'
* `annex.supportunlocked`
By default git-annex supports unlocked files as well as locked files,
so this defaults to true. If set to false, git-annex will only support

View file

@ -145,6 +145,29 @@ all reads followed by writes do go via Annex.Branch.change, so Annex.Branch.get
can just concacenate the two without worrying about it leaking back out in a
later write.
> Implementing this is in progress, in the `hiddenannex` branch.
>
> Got the separate journal mostly working. No separate index yet.
> No way to configure what repo is hidden yet. --[[Joey]]
>
> Implementation notes:
>
> * CmdLine.Seek precaches git-annex branch
> location logs, but that does not include private ones. Since they're
> cached, the private ones don't get read. Result is eg, whereis finds no
> copies. Either need to disable CmdLine.Seek precaching when there's
> hidden repos, or could make the cache indicate it's only of public
> info, so private info still gets read.
> * CmdLine.Seek contains a LsTreeRecursive over the branch to handle
> --all, and again that won't see private information, including even
> annexed files that are only present in the hidden repo.
> * (And I wonder, don't both the caches above already miss things in
> the journal?)
> * Any other direct accesses of the branch, not going through
> Annex.Branch, also need to be fixed (and may be missing journal files
> already?) Command.ImportFeed.knownItems is one. Command.Log behavior
> needs to be investigated, may be ok. And Logs.Web.withKnownUrls is another.
## networks of hidden repos
There are a lot of complications involving using hidden repos as remotes.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="auto-expire temp repos"
date="2021-04-21T15:37:37Z"
content="""
As a possible simpler alternative, maybe add an option to [[git-annex-dead]] to mark a repo dead from a future time onwards? I often have temp repos created on temp cloud instances. I mark them untrusted right after cloning, and then manually mark them dead after the cloud instance is gone. If the latter part were automated, would that cover most of what hidden repos do?
"""]]

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="Atemu"
avatar="http://cdn.libravatar.org/avatar/d1f0f4275931c552403f4c6707bead7a"
subject="comment 3"
date="2021-04-20T18:05:27Z"
content="""
Would it perhaps be possible to set the compression using filters like file name/extension?
For example, I wouldn't want GA to waste time on compressing multimedia files that are already at entropy and, since they make up the majority of my special remote's content, re-writing them would be very time intensive (even more so when remote solutions are involved).
Certain compressors might also work better on some files types compared to others.
This could be very important to scientists using datalad as they are likely to A. be working very specific kinds of data where certain compressors might significantly outperform others and B. have large quantities of data where compression is essential.
If compressors are going to be limited to a known-safe selection, an important aspect to keep in mind would be compression levels as some compressors like zstd can range from lzo-like performance characteristics to lzma ones.
Definitely a +1 on this one though, it would be very useful for my use-case aswell.
"""]]