This commit is contained in:
parent
ad7dfdb37f
commit
8b5b078df3
1 changed files with 25 additions and 0 deletions
25
doc/bugs/chunks_not_deduplicated.mdwn
Normal file
25
doc/bugs/chunks_not_deduplicated.mdwn
Normal file
|
@ -0,0 +1,25 @@
|
|||
### Please describe the problem.
|
||||
When data is chunked for storage on a remote, identical chunks may be stored repeatedly with different hashes.
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
```
|
||||
git annex init .
|
||||
dd count=16 bs=1024 if=/dev/urandom of=rand1.bin
|
||||
dd count=15 skip=1 bs=1024 if=rand1.bin of=rand2.bin
|
||||
git annex add rand1.bin rand2.bin
|
||||
mkdir data
|
||||
git annex initremote data type=directory directory=data encryption=none chunk=1024
|
||||
git annex copy --all --to=data
|
||||
find data -type f | xargs sha1sum | sort # shows doubly-stored identical chunks
|
||||
```
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
10.20221004-gbf27a02b0 on redhat 7
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
I imagine an optional upgrade to a new key format for chunks, such that they had their own hashes, and the larger key they were a part of were identified separately (e.g. in data content or metadata). They would then become deduplicated already using existing means.
|
||||
|
||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
Yesterday I recovered a bricked phone by accessing flash partitions I had backed up with git-annex. I deleted them locally by accident, by git-annex cobbled many of them together from remotes. The ones it didn't find still had their hashes preserved in the repository, and I was able to compare with hashes on the device and reconstructed all but one. The one I didn't reconstruct is regenerateable.
|
Loading…
Add table
Add a link
Reference in a new issue