This commit is contained in:
git-annex@31849d241f10c295b30a9707352ae5c7d743adb7 2017-01-24 17:15:32 +00:00 committed by admin
parent bc58a9402b
commit 6ce01ca9e5

View file

@ -0,0 +1,65 @@
To make sure we can archive our data safely, we need to:
- Store revisions
- Allow files to be tracked while moved to archival spaces
- Be platform-agnostic
- Sync
- Protect against bit-rot
1 and 3 are handled by git itself; everything is a straight forward graph-structure comprised of plain text pointers *(accepting that some filesystems do not easily expose file metadata, but that's on them as we can simply chose to use a different system if that's important)
2 and 4 seem to be handled by git-annex
**But 5 is missing.**
Thankfully, we already have a technology that can fill in elegantly here: parity files.
### 2 potential user stories:
#### Put everything together
- This user wants everything together and in the filesystem in case one of the tools she relies on disappears.
- Might have a structure like this:
- Project
- documents
- contract.pdf
- contract.pdf.vol000+01.par2
- contract.pdf.vol001+02.par2
- contract.pdf.vol003+04.par2
- Client brochure.zip
- Client brochure.zip.vol000+01.par2
- Client brochure.zip.vol001+02.par2
- Client brochure.zip.vol003+04.par2
- Or like this:
- Project
- documents
- contract.pdf
- Client brochure.zip
- documents.vol000+01.par2
- documents.vol001+02.par2
- documents.vol003+04.par2
#### Keep everything clean
- This user doesn't want to clutter folders with extra files. He would rather only have the data files themselves in case they need to be zipped and sent to clients. If he had setup 1, he would delete *.par before zipping, leading to potential data loss.
- Might have a structure like this:
- Project
- documents
- contract.pdf
- Client brochure.zip
- [git-annex]
- contract.pdf.vol000+01.par2
- contract.pdf.vol001+02.par2
- contract.pdf.vol003+04.par2
- Client brochure.zip.vol000+01.par2
- Client brochure.zip.vol001+02.par2
- Client brochure.zip.vol003+04.par2
This would also enhance the data-checking capabilities of git-annex, as data loss could be fixed and new parity files generated from the recovered files transparently, self-healing the archive.