110 lines
4.7 KiB
Markdown
110 lines
4.7 KiB
Markdown
Finish "git annex watch" command, which runs, in the background, watching via
|
|
inotify for changes, and automatically annexing new files, etc.
|
|
|
|
There is a `watch` branch in git that adds such a command. To make this
|
|
really useful, it needs to:
|
|
|
|
- on startup, add any files that have appeared since last run **done**
|
|
- on startup, fix the symlinks for any renamed links **done**
|
|
- on startup, stage any files that have been deleted since last run
|
|
(seems to require a `git commit -a` on startup, or at least a
|
|
`git add --update`, which will notice deleted files) **done**
|
|
- notice new files, and git annex add **done**
|
|
- notice renamed files, auto-fix the symlink, and stage the new file location
|
|
**done**
|
|
- handle cases where directories are moved outside the repo, and stop
|
|
watching them **done**
|
|
- when a whole directory is deleted or moved, stage removal of its
|
|
contents from the index **done**
|
|
- notice deleted files and stage the deletion
|
|
(tricky; there's a race with add since it replaces the file with a symlink..)
|
|
**done**
|
|
- Gracefully handle when the default limit of 8192 inotified directories
|
|
is exceeded. This can be tuned by root, so help the user fix it.
|
|
**done**
|
|
- periodically auto-commit staged changes (avoid autocommitting when
|
|
lots of changes are coming in)
|
|
- tunable delays before adding new files, etc
|
|
- coleasce related add/rm events for speed and less disk IO
|
|
- don't annex `.gitignore` and `.gitattributes` files **done**
|
|
- configurable option to only annex files meeting certian size or
|
|
filename criteria
|
|
- option to check files not meeting annex criteria into git directly
|
|
- honor .gitignore, not adding files it excludes (difficult, probably
|
|
needs my own .gitignore parser to avoid excessive running of git commands
|
|
to check for ignored files)
|
|
- Possibly, when a directory is moved out of the annex location,
|
|
unannex its contents.
|
|
- Support OSes other than Linux; it only uses inotify currently.
|
|
OSX and FreeBSD use the same mechanism, and there is a Haskell interface
|
|
for it,
|
|
|
|
## the races
|
|
|
|
Many races need to be dealt with by this code. Here are some of them.
|
|
|
|
* File is added and then removed before the add event starts.
|
|
|
|
Not a problem; The add event does nothing since the file is not present.
|
|
|
|
* File is added and then removed before the add event has finished
|
|
processing it.
|
|
|
|
**Minor problem**; When the add's processing of the file (checksum and so
|
|
on) fails due to it going away, there is an ugly error message, but
|
|
things are otherwise ok.
|
|
|
|
* File is added and then replaced with another file before the annex add
|
|
moves its content into the annex.
|
|
|
|
Fixed this problem; Now it hard links the file to a temp directory and
|
|
operates on the hard link, which is also made unwritable.
|
|
|
|
* A process has a file open for write, another one closes it, and so it's
|
|
added. Then the first process modifies it.
|
|
|
|
**Currently unfixed**; This changes content in the annex, and fsck will
|
|
later catch the inconsistency.
|
|
|
|
Possible fixes:
|
|
|
|
* Somehow track or detect if a file is open for write by any processes.
|
|
* Or, when possible, making a copy on write copy before adding the file
|
|
would avoid this.
|
|
* Or, as a last resort, make an expensive copy of the file and add that.
|
|
* Tracking file opens and closes with inotify could tell if any other
|
|
processes have the file open. But there are problems.. It doesn't
|
|
seem to differentiate between files opened for read and for write.
|
|
And there would still be a race after the last close and before it's
|
|
injected into the annex, where it could be opened for write again.
|
|
Would need to detect that and undo the annex injection or something.
|
|
|
|
* File is added and then replaced with another file before the annex add
|
|
makes its symlink.
|
|
|
|
**Minor problem**; The annex add will fail creating its symlink since
|
|
the file exists. There is an ugly error message, but the second add
|
|
event will add the new file.
|
|
|
|
* File is added and then replaced with another file before the annex add
|
|
stages the symlink in git.
|
|
|
|
Now fixed; `git annex watch` avoids running `git add` because of this
|
|
race. Instead, it stages symlinks directly into the index, without
|
|
looking at what's currently on disk.
|
|
|
|
* Link is moved, fixed link is written by fix event, but then that is
|
|
removed by the user and replaced with a file before the event finishes.
|
|
|
|
Now fixed; same fix as previous race above.
|
|
|
|
* File is removed and then re-added before the removal event starts.
|
|
|
|
Not a problem; The removal event does nothing since the file exists,
|
|
and the add event replaces it in git with the new one.
|
|
|
|
* File is removed and then re-added before the removal event finishes.
|
|
|
|
Not a problem; The removal event removes the old file from the index, and
|
|
the add event adds the new one.
|
|
|