Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.
Benchmarking vs master, this git-annex find is significantly faster!
Specifically:
num files old new speedup
48500 4.77 3.73 28%
12500 1.36 1.02 66%
20 0.075 0.074 0% (so startup time is unchanged)
That's without really finishing the optimization. Things still to do:
* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.
It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.
The only remaining version ifdefs in the entire code base are now a couple
for aws!
This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.
This commit was sponsored by Ilya Shlyakhter.
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.
Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.
(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
Somehow forgot about the case where the current export db tree is the
same one in the export log, and it warned about an export conflict when
getting a file in that situation. Of course it's no conflict at all!
This commit was sponsored by Jochen Bartl on Patreon.
Like the earlier fixed one in Command.Export, it occurred when the same
tree was exported by multiple clones. Previous fix was incomplete since
several other places looked at the list of exported trees to detect when
there was an export conflict. Added a single unified function to avoid
missing any places it needed to be fixed.
This commit was sponsored by mo on Patreon.
When an export conflict prevents accessing a special remote, be clearer
about what the problem is and how to resolve it.
This commit was sponsored by Trenton Cronholm on Patreon.
I don't know the circumstances, but have a report of this:
git-annex: failed to commit changes to sqlite database: Just SQLite3 returned
ErrorConstraint while attempting to perform step.
All 3 tables in the export db have uniqueness constraints on them,
insertUnique is used for all the rest, but this use of insertMany
means it doesn't check the constraint. I guess that's what caused the
crash, but I have not been able to test it yet.
Use putMany when available, as it should be faster than mapM of insertMany.
This commit was sponsored by Brock Spratlen on Patreon.
Now when one repository has exported a tree, another repository can get
files from the export, after syncing.
There's a bug: While the database update works, somehow the database on
disk does not get updated, and so the database update is run the next
time, etc. Wasn't able to figure out why yet.
This commit was sponsored by Ole-Morten Duesund on Patreon.
New table needed to look up what filenames are used in the currently
exported tree, for reasons explained in export.mdwn.
Also, added smart constructors for ExportLocation and ExportDirectory to
make sure they contain filepaths with the right direction slashes.
And some code refactoring.
This commit was sponsored by Francois Marier on Patreon.
The subtle part of this is what happens when the remote fails to remove
an empty directory. The removal from the export needs to fail in that
case, so the removal will be tried again later. However, removeExportLocation
has already been run and changed the export db, so if the next run
checks getExportLocation, it might decide nothing remains to be done,
leaving the empty directory.
Dealt with that by making removeEmptyDirectories, handle a failure
by calling addExportLocation, reverting the database changes so the next
run will be guaranteed to try deleting the empty directory again.
This commit was sponsored by Thomas Hochstein on Patreon.
The export database has writes made to it and then expects to read back
the same data immediately. But, the way that Database.Handle does
writes, in order to support multiple writers, makes that not work, due
to caching issues. This resulted in export re-uploading files it had
already successfully renamed into place.
Fixed by allowing databases to be opened in MultiWriter or SingleWriter
mode. The export database only needs to support a single writer; it does
not make sense for multiple exports to run at the same time to the same
special remote.
All other databases still use MultiWriter mode. And by inspection,
nothing else in git-annex seems to be relying on being able to
immediately query for changes that were just written to the database.
This commit was supported by the NSF-funded DataLad project.
Removed uncorrect UniqueKey key in db schema; a key can appear multiple
times with different files.
The database has to be flushed after each removal. But when adding files
to the export, lots of changes are able to be queued up w/o flushing.
So it's still fairly efficient.
If large removals of files from exports are too slow, an alternative
would be to make two passes over the diff, one pass queueing deletions
from the database, then a flush and the a second pass updating the
location log. But that would use more memory, and need to look up
exportKey twice per removed file, so I've avoided such optimisation yet.
This commit was supported by the NSF-funded DataLad project.
Went with a separate db per export remote, rather than a single export
database. Mostly because there will probably not be a lot of separate
export remotes, and it might be convenient to be able to delete a given
remote's export database.
This commit was supported by the NSF-funded DataLad project.