remove old closed bugs and todo items to speed up wiki updates and reduce size

Remove closed bugs and todos that were least edited before 2014.

Command line used:

for f in $(grep -l '\[\[done\]\]' *.mdwn); do if [ -z $(git log --since=2014 --pretty=oneline "$f") ]; then git rm  $f; git rm -rf $(echo "$f" | sed 's/.mdwn$//'); fi; done
This commit is contained in:
Joey Hess 2014-05-29 15:23:05 -04:00
parent e157467f92
commit 222f78e9ea
1970 changed files with 0 additions and 56952 deletions

View file

@ -1,8 +0,0 @@
It seems that currently, syncing will result in every branch winding
up everywhere within the network of git annex nodes. It would be great
if one could keep some branches purely local.
The «fetch» part of «sync» seems to respect the fetch refspec in the
git config, but the push part seems to always push everything.
> [[done]]

View file

@ -1,18 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="108.236.230.124"
subject="comment 1"
date="2014-05-15T19:51:48Z"
content="""
No, it does not:
<pre>
push wren
[2014-05-15 15:50:33 JEST] call: git [\"--git-dir=/home/joey/lib/big/.git\",\"--work-tree=/home/joey/lib/big\",\"push\",\"wren\",\"+git-annex:synced/git-annex\",\"master:synced/master\"]
[2014-05-15 15:50:39 JEST] read: git [\"--git-dir=/home/joey/lib/big/.git\",\"--work-tree=/home/joey/lib/big\",\"push\",\"wren\",\"master\"]
</pre>
That is the entirity of what's pushed: The git-annex branch, and the currently checked out branch.
I don't see a bug here.
"""]]

View file

@ -1,18 +0,0 @@
[[!comment format=mdwn
username="zardoz"
ip="92.227.51.179"
subject="comment 2"
date="2014-05-16T08:40:47Z"
content="""
Joey, thanks for clearing that up. In my test-case I only had two
branches, and I mistook it for pushing everything. Actually, what I
wanted to achieve was the following:
Have a main repo M with branches A and A-downstream, and have a
downstream repo D with just A-downstream. What confused me was that
the main repo always pushed A to D. I suppose if I just have the two
branches, I would achieve the desired effect by not using «annex
sync», and instead just pushing the git-annex branch manually; would
that be the way to go?
"""]]

View file

@ -1,4 +0,0 @@
It would be wonderful if a pre-built package would be available for Synology NAS. Basically, this is an ARM-based Linux. It has most of the required shell commands either out of the box or easily available (through ipkg). But I think it would be difficult to install the Haskell compiler and all the required modules, so it would probably be better to cross-compile targeting ARM.
> [[done]]; the standalone armel tarball has now been tested working on
> Synology. --[[Joey]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkwjBDXkP9HAQKhjTgThGOxUa1B99y_WRA"
nickname="Franck"
subject="comment 10"
date="2013-06-02T17:23:43Z"
content="""
I updated the C program to simplify it so it uses a static path for `_chrooter`. In the previous version, I suspect that one can play with symlinks and use it to get a root shell. So, if `_chrooter` is not installed in `/opt/bin` this file has to be edited too before compilation.
"""]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkwjBDXkP9HAQKhjTgThGOxUa1B99y_WRA"
nickname="Franck"
subject="comment 11"
date="2013-06-03T09:55:54Z"
content="""
A last update and I stop spamming this thread: I've implemented access control and simplified customisation. All this has been moved to https://bitbucket.org/franckp/gasp
"""]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawlJEI45rGczFAnuM7gRSj4C6s9AS9yPZDc"
nickname="Kevin"
subject="SynoCommunity"
date="2013-06-26T18:12:39Z"
content="""
Creating an installable git-annex package available via [SynoCommunity](http://www.synocommunity.com/) would be awesome. They have created [cross-compilation tools](https://github.com/SynoCommunity/spksrc) to help build the packages and integrate the start/stop scripts with the package manager.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnrP-0DGtHDJbWSXeiyk0swNkK1aejoN3c"
nickname="sebastien"
subject="comment 13"
date="2013-08-06T12:18:35Z"
content="""
I post an issue to github synocommunity for that, i hope somenone have some time to package this great features.
"""]]

View file

@ -1,30 +0,0 @@
[[!comment format=mdwn
username="lorenzo"
ip="84.75.27.69"
subject="Running Debian squeeze binaries on libc 2.5 based NAS"
date="2013-10-27T23:56:26Z"
content="""
Following the suggestions in this page I tried to run the binaries that debian provides on my Lacie NetworkSpace which is another one of these NAS devices with old libc. After uploading the binaries and required libraries and using `LD_LIBRARY_PATH` to force the loader to use the version I uploaded of the libraries I was still having a segfault (similar to what Franck was experiencing) while running git-annex in a chroot was working.
It turns out that it is possible to solve the problem without having to use chroot by not loading the binary directly but by substituting it with a script that calls the correct `ld-linux.so.3`. Assume you have uncompressed the files from the deb packages in `/opt/git-annex`.
First create a directory `/opt/git-annex/usr/bin/git-annex.exec` and copy the executable `/opt/git-annex/usr/bin/git-annex` there.
Then create script `/opt/git-annex/usr/bin/git-annex` with the following contents:
#!/bin/bash
PREFIX=/opt/git-annex
export GCONV_PATH=$PREFIX/usr/lib/gconv
exec $PREFIX/lib/ld-linux.so.3 --library-path $PREFIX/lib/:$PREFIX/usr/lib/ $PREFIX/usr/bin/git-annex.exec/git-annex \"$@\"
The `GCONV_PATH` setting is important to prevent the app from failing with the message:
git-annex.exec: mkTextEncoding: invalid argument (Invalid argument)
The original executable is moved to a different directory instead of being simply renamed to make sure that `$0` is correct when the executable starts. The parameter for the linker `--library-path` is used instead of the environment variable `LD_LIBRARY_PATH` to make sure that the programs exec'ed by git-annex do not have the variable set.
Some more info about the approach: [[http://www.novell.com/coolsolutions/feature/11775.html]]
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.87"
subject="comment 15"
date="2013-12-16T05:55:29Z"
content="""
Following the example of @lorenzo, I have made all the git-annex Linux standalone builds include glibc and shims to make the linker use it.
Now that there's a [[forum/new_linux_arm_tarball_build]], it may *just work* on Synology.
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 1"
date="2013-05-24T15:55:42Z"
content="""
There are already git-annex builds for arm available from eg, Debian. There's a good chance that, assuming you match up the arm variant (armel, armhf, etc) and that the NAS uses glibc and does not have too old a version, that the binary could just be copied in, possibly with some other libraries, and work. This is what's done for the existing Linux standalone builds.
So, I look at this bug report as \"please add a standalone build for arm\", not as a request to support a specific NAS which I don't have ;)
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkwjBDXkP9HAQKhjTgThGOxUa1B99y_WRA"
nickname="Franck"
subject="comment 2"
date="2013-05-24T21:31:44Z"
content="""
I tried to run the binary from the Debian package, unfortunately, after installing tons of libraries, git-annex fails complaining that GLIBC is not recent enough. Perhaps a static build for ARM (armel) can solve the problem? Thanks again for your help!
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
nickname="joey"
subject="comment 3"
date="2013-05-25T04:42:22Z"
content="""
Which Debian package? Different ones link to different libcs.
(It's not really possible to statically link something with as many dependencies as git-annex on linux anymore, unfortunately.)
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkwjBDXkP9HAQKhjTgThGOxUa1B99y_WRA"
nickname="Franck"
subject="comment 4"
date="2013-05-25T07:40:13Z"
content="""
I've actually tried several ones: 4.20130521 on sid, 3.20120629~bpo60+2 on squeeze-backports, 3.20120629 on wheezy and jessie, plus a package for Ubuntu 11.02. All of them try to load GLIBC 2.6/2.7 while my system has 2.5 only... I'll try a different approach: install Debian in a chroot on the NAS and extract all the required files, including all libraries.
"""]]

View file

@ -1,23 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkwjBDXkP9HAQKhjTgThGOxUa1B99y_WRA"
nickname="Franck"
subject="comment 5"
date="2013-05-25T10:03:24Z"
content="""
Unfortunately, chroot approach does not work either. While git-annex works fine when I'm in the chroot, it doesn't work any more outside. If I don't copy libc, I get a version error (just like before so this is normal):
git-annex: /lib/libc.so.6: version `GLIBC_2.7' not found (required by /opt/share/git-annex/bin/git-annex)
git-annex: /lib/libc.so.6: version `GLIBC_2.6' not found (required by /opt/share/git-annex/bin/git-annex)
git-annex: /lib/libc.so.6: version `GLIBC_2.7' not found (required by /opt/share/git-annex/lib/libgmp.so.10)
When I copy libc from the Debian chroot, then, it complains about libpthread:
git-annex: relocation error: /lib/libpthread.so.0: symbol __default_rt_sa_restorer, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference
If then I copy libpthread also, I get:
Illegal instruction (core dumped)
So, I'm stuck... :-(
I'll try to find a way using the version in the chroot instead of trying to export it to the host system...
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawln3ckqKx0x_xDZMYwa9Q1bn4I06oWjkog"
nickname="Michael"
subject="bind mount"
date="2013-05-25T15:55:52Z"
content="""
You could bind-mount (e.g. mount -o bind /data /chroot/data ) your main Synology fs into the chroot for git-annex to use.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkwjBDXkP9HAQKhjTgThGOxUa1B99y_WRA"
nickname="Franck"
subject="comment 7"
date="2013-05-25T19:01:29Z"
content="""
This is indeed what I'm doing. But I need to make a wrapper that will call the command in the chroot. Thanks for the tip anyway. :-)
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmqz6wCn-Q1vzrsHGvEJHOt_T5ZESilxhc"
nickname="Sören"
subject="comment 8"
date="2013-05-26T13:50:31Z"
content="""
I have a Synology NAS too, so I thought I could try to run git-annex in a Debian chroot.
As it [turns out](http://forum.synology.com/wiki/index.php/What_kind_of_CPU_does_my_NAS_have), my model (DS213+) runs on a PowerPC CPU instead of ARM. Unfortunately, it isn't compatible with PPC in Debian either because it is a different PowerPC variant.
There is an unofficial Debian port called [powerpcspe](http://wiki.debian.org/PowerPCSPEPort), but ghc doesn't build there yet for [some reason](http://buildd.debian-ports.org/status/package.php?p=git-annex&suite=sid).
Any chance that there will be a build for this architecture at some point in the future or should I better look for another NAS? ;-)
"""]]

View file

@ -1,29 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkwjBDXkP9HAQKhjTgThGOxUa1B99y_WRA"
nickname="Franck"
subject="comment 9"
date="2013-06-02T13:14:56Z"
content="""
Hi, I finally succeeded! :-)
Here are the main steps:
1. install `debian-chroot` on the NAS
2. create an account `gitannex` in Debian
3. configure git on this account (this is important otherwise git complains and fails) `git config --global user.email YOUR_EMAIL` and `git config --global user.name YOUR_NAME`
4. install `gcc` on the NAS (using `ipkg`)
5. download the files here: https://www.dropbox.com/sh/b7z68a730aj3mnm/95nFOzE1QP
6. edit `_chrooter` to fit your settings (probably there is nothing to change if your Debian is freshly installed)
7. run `make install`, everything goes to `/opt/bin`, if you change this, you should also edit line 17 in file `gasp`
8. create an account `gitannex` on the NAS (doesn't need to be the same name as in Debian, but I feel it is easier)
9. edit its `.ssh/authorized_keys` to prefix lines as follows `command=\"gasp\" THE_PUBLIC_KEY_AS_USUAL`
10. it should work
11. the repositories will be in the Debian account, but it's easy to symlink them in the NAS account if you wish
The principle is as follows: `command=\"gasp\"` allows to launch `gasp` on SSH connexion instead of the original command given to `ssh`. This command is retrieved by `gasp` and prefixed with `chrooter-` (so, eg, running `ssh git` on the client results in running `chrooter-git` on the NAS). `chrooter-*` commands are symlinks to `chrooter`, this is a setuid root binary that launches `_chrooter`. (This intermediary binary is necessary because `_chrooter` is a script which cannot be setuid, and setuid is required for the chroot and identity change.) Finally, `_chrooter` starts the `debian-chroot` service, chroot to the target dir, changes identity and eventually launches the original command as if it was lauched directly by `gitannex` user in Debian. `_chrooter` and `gasp` are Python scripts, I did not use shell in order to avoid error-prone issues with spaces in arguments (that need to be passed around several times in the process).
I'll try now to add command-line parameters to `gasp` in order to restrict the commands that can be run through SSH and the repositories allowed.
Cheers,
Franck
"""]]

View file

@ -1,5 +0,0 @@
Especially on Mac OSX (and Windows, and maybe Android), it would be great to be able to check in the webapp if an upgrade is available. A deeper integration with these OS would be even better: for example on Mac OSX, an icon on the status bar list available upgrades for some programs, including LibreOffice and others which are not installed by default.
Also, it would be great to be able to download and install git-annex upgrades directly from the webapp.
> comprehensively [[done]]; [[design/assistant/upgrading]] --[[Joey]]

View file

@ -1,17 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.246"
subject="comment 1"
date="2013-11-15T20:51:18Z"
content="""
I have thought about doing this, especially if there is ever a security hole in git-annex.
All it needs is a file containing the version number to be written along-side the git-annex build, and git-annex knowing if it was built as a standalone build, and should check that.
As for actually performing the upgrade:
* Easy on Linux
* Not sure on OSX.. Is it possible to use hdiutil attach to replace a dmg while a program contained in it is currently running?
* Probably impossible on Android, at least not without using double the space. Probably better to get git-annex into an app store.
* Doable on Windows, but would need git-annex to be distributed in a form that was not a installer.exe.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawlzlNQbf6wBgv9j6-UqfpXcQyAYMF8S3t4"
nickname="Tim"
subject="comment 2"
date="2014-01-12T09:19:31Z"
content="""
I am pretty sure you know about it, but have you seen https://f-droid.org/? I was rather surprised that git-annex isn't yet listed in that \"store\".
"""]]

View file

@ -1,19 +0,0 @@
### Please describe the problem.
Great work on git annex! One possible enhancement occured to me: It would be very useful though if the "whereis" command would support looking up the location of files by arbitrary keys. This way one could inspect the location of old content which is not currently checked-out in the tree.
In a related vein, the "unused" command could report old filenames or describe the associated commits. Tracking old versions is a great feature of your git-based approach, but currently, tasks such as pruning selected content seem unwiedly. Though I might be missing existing solutions. You can easily "cut-off" the history by forcing a drop of all unused content. It would be cool if one could somehow "address" old versions by filename and commit/date and selectively drop just these. The same could go for the "whereis" command, where one could e.g. query which remote holds content which was stored under some filename at some specific date.
Thanks Cheers!
> I agree that it's useful to run whereis on a specific key. This can
> now be done using `git annex whereis --key KEY`
> [[done]] --[[Joey]]
>
> To report old filenames, unused would have to search back through the
> contents of symlinks in old versions of the repo, to find symlinks that
> referred to a key. The best way I know how to do that is `git log -S$KEY`,
> which is what unused suggests you use. But this is slow --
> searching for a single key in one of my repos takes 25 seconds.
> That's why it doesn't do it for you.
>

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="zardoz"
ip="92.227.51.179"
subject="comment 1"
date="2014-05-13T20:34:33Z"
content="""
I suppose that makes sense. Is it more affordable to just retrieve the most recent filename? That would seem to be enough for many practical purposes. But I guess this would still possibly have to go through many revisions. I wonder if such a restricted search can be done by git though. Maybe using non-porcelain commands.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="zardoz"
ip="134.147.14.84"
subject="comment 2"
date="2014-05-15T13:03:47Z"
content="""
Okay, I suppose one way of doing a search that works like that would do a «git log --stat -S'KEY' $commit», starting with HEAD and then walking the parents.
"""]]

View file

@ -1,3 +0,0 @@
One Problem I am having is that I could never get the xmpp pairing to work so whenever I switch machines I have to manually run sync once on the command line to get the changes. Is it possible to have a sync now button of some sort that will trigger a sync on the repos?
> moved from forum; [[done]] --[[Joey]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnR6E5iUghMWdUGlbA9CCs8DKaoigMjJXw"
nickname="Efraim"
subject="comment 1"
date="2014-03-06T20:37:36Z"
content="""
not quite a sync button, but when I want to force sync now I turn off and turn on sync for one of the repos from the webapp and then it syncs.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.146"
subject="comment 2"
date="2014-03-06T22:12:27Z"
content="""
I've added a \"Sync now\" to the menu for each remote. So can be used to sync with an individual remote, or if picked from the menu for the local repository, it causes it to try to sync with every one if its remotes at once.
"""]]

View file

@ -1,117 +0,0 @@
Hi, I am assuming to use git-annex-assistant for two usecases, but I would like to ask about the options or planed roadmap for dropped/removed files from the repository.
Usecases:
1. sync working directory between laptop, home computer, work komputer
2. archive functionality for my photograps
Both usecases have one common factor. Some files might become obsolate and
in long time frame nobody is interested to keep their revisions. Let's
assume photographs. Usuall workflow I take is to import all photograps to
filesystem, then assess (select) the good ones I want to keep and then
process them what ever way.
Problem with git-annex(-assistant) I have is that it start to revision all
of the files at the time they are added to directory. This is welcome at
first but might be an issue if you are used to put 80% of the size of your
imported files to trash.
I am aware of what git-annex is not. I have been reading documentation for
"git-annex drop" and "unused" options including forums. I do understand
that I am actually able to delete all revisions of the file if I will drop
it, remove it and if I will run git annex unused 1..###. (on all synced
repositories).
I actually miss the option to have above process automated/replicated to the other synced repositories.
I would formulate the 'use case' requirements for git-annex as:
* command to drop an file including revisions from all annex repositories?
(for example like moving a file to /trash folder) that will schedulle
it's deletition)
* option to keep like max. 10 last revisions of the file?
* option to keep only previous revisions if younger than 6 months from now?
Finally, how to specify a feature request for git-annex?
> By moving it here ;-) --[[Joey]]
> So, let's spec out a design.
>
> * Add preferred content terminal to configure whether a repository wants
> to hang on to unused content. Simply `unused`.
> (It cannot include a timestamp, because there's
> no way repos can agree on about when a key became unused.) **done**
> * In order to quickly match that terminal, the Annex monad will need
> to keep a Set of unused Keys. This should only be loaded on demand.
> **done**
> NB: There is some potential for a great many unused Keys to cause
> memory usage to balloon.
> * Client repositories will end their preferred content with
> `and (not unused)`. Transfer repositories too, because typically
> only client repos connect to them, and so otherwise unused files
> would build up there. Backup repos would want unused files. I
> think that archive repos would too. **done**
> * Make the assistant check for unused files periodically. Exactly
> how often may need to be tuned, but once per day seems reasonable
> for most repos. Note that the assistant could also notice on the
> fly when files are removed and mark their keys as unused if that was
> the last associated file. (Only currently possible in direct mode.)
> **done**
> * After scanning for unused files, it makes sense for the
> assistant to queue transfers of unused files to any remotes that
> do want them (eg, backup remotes). If the files can successfully be
> sent to a remote, that will lead to them being dropped locally as
> they're not wanted.
> * Add a git config setting like annex.expireunused=7d. This causes
> *deletion* of unused files after the specified time period if they are
> not able to be moved to a repo that wants them.
> (The default should be annex.expireunused=false.)
> * How to detect how long a file has been unused? We can't look at the
> time stamp of the object; we could use the mtime of the .map file,
> that that's direct mode only and may be replaced with a database
> later. Seems best to just keep a unused log file with timestamps.
> **done**
> * After the assistant scans for unused files, if annex.expireunused
> is not set, and there is some significant quantity of unused files
> (eg, more than 1000, or more than 1 gb, or more than the amount of
> remaining free disk space),
> it can pop up a webapp alert asking to configure it. **done**
> * Webapp interface to configure annex.expireunused. Reasonable values
> are no expiring, or any number of days. **done**
>
> [[done]] This does not cover every use case that was requested.
> But I don't see a cheap way to ensure it keeps eg the past 10 versions of
> a file. I guess that if you care about that, you leave
> annex.expireunused=false, and set up a backup repository where the unused
> files will be moved to.
>
> Note that since the assistant uses direct mode by default, old versions
> of modififed files are not guaranteed to be retained. But they very well
> might be. For example, if a file is replicated to 2 clients, and one
> client directly edits it, or deletes it, it loses the old version,
> but the other client will still be storing that old version.
>
> ## Stability analysis for unused in preferred content expressions
>
> This is tricky, because two repos that are otherwise entirely
> in sync may have differing opinons about whether a key is unused,
> depending on when each last scanned for unused keys.
>
> So, this preferred content terminal is *not stable*.
> It may be possible to write preferred content expressions
> that constantly moved such keys around without reaching a steady state.
>
> Example:
>
> A and B are clients directly connected, and both also connected
> to BACKUP.
>
> A deletes F. B syncs with A, and runs unused check; decides F
> is unused. B sends F to BACKUP. B will then think A doesn't want F,
> and will drop F from A. Next time A runs a full transfer scan, it will
> *not* find F (because the file was deleted!). So it won't get F back from
> BACKUP.
>
> So, it looks like the fact that unused files are not going to be
> looked for on the full transfer scan seems to make this work out ok.

View file

@ -1,7 +0,0 @@
Firefox is my default browser, but as we all know, it doesn't load quickly. If I don't have Firefox running but I want to access the git-annex webapp, I'd rather launch the webapp in some small, quick browser like QupZilla than wait for Firefox to load.
Could git-annex have a setting, maybe a "webapp --browser" option and/or a setting in the config file, to specify the browser to launch?
> git-annex uses the standard `git config web.browser` if you set it.
> [[done]]
> --[[Joey]]

View file

@ -1,7 +0,0 @@
A failure during "make test" should be signalled to the caller by means of
a non-zero exit code. Without that signal, it's very hard to run the
regression test suite in an automated fashion.
> git-annex used to have a Makefile that ignored make test exit status,
> but that was fixed in commit dab5bddc64ab4ad479a1104748c15d194e138847,
> in October 6th. [[done]] --[[Joey]]

View file

@ -1,9 +0,0 @@
Git-annex doesn't compile with the latest version of monad-control. Would it be hard to support that new version?
> I have been waiting for it to land in Debian before trying to
> deal with its changes.
>
> There is now a branch in git called `new-monad-control` that will build
> with the new monad-control. --[[Joey]]
>> Now merged to master. [[done]] --[[Joey]]

View file

@ -1,16 +0,0 @@
Please provide a command that basically performs something like:
git get --auto
for i in `git remote`; do git copy -to $i --auto; done
The use case is this:
I have a very large repo (300.000 files) in three places. Now I want the fastest possible way to ensure, that every file exists in annex.numcopies. This should scan every file one time and then get it or copy it to other repos as needed. Right now, I make one "git annex get --auto" in every repo, which is is a waste of time, since most of the files never change anyway!
> Now `git annex sync --content` does effectivly just what the shown for
> loop does. [[done]]
>
> The only difference is that copy --auto proactively downloads otherwise
> unwanted files to satisfy numcopies, and sync --content does not.
> We need a [[preferred_content_numcopies_check]] to solve that.
> --[[Joey]]

View file

@ -1,24 +0,0 @@
Support Amazon S3 as a file storage backend.
There's a haskell library that looks good. Not yet in Debian.
Multiple ways of using S3 are possible. Currently implemented as
a special type of git remote.
Before this can be close, I need to fix:
## encryption
TODO
## unused checking
One problem is `git annex unused`. Currently it only looks at the local
repository, not remotes. But if something is dropped from the local repo,
and you forget to drop it from S3, cruft can build up there.
This could be fixed by adding a hook to list all keys present in a remote.
Then unused could scan remotes for keys, and if they were not used locally,
offer the possibility to drop them from the remote.
[[done]]

View file

@ -1,10 +0,0 @@
It's very confusing to me that the same repo viewed from different client systems can have different names and descriptions. This implies that making changes to a remote repo from one system only affects how that system sees the repo, but it seems to affect how the entire git-annex "pair" or "network of repos" sees it.
I think it would be good if the names and descriptions of repos were synced across clients.
> The descriptions of repositories are synced. (They're stored in git-annex:uuid.log)
>
> git allows for the same repository to be referred to using as many different remote names as you want to set up. git-annex inherits this,
> and I can't see this changing; there are very good reasons for remotes to
> have this flexability. [[done]]
> --[[Joey]]

View file

@ -1,18 +0,0 @@
This is just an idea, and I have no idea if it would work (that's why I'm asking):
**Would it be possible to use ASICs made for Bitcoin mining inside git-annex to offload the hashing of files?**
I got the idea, because I have two RaspberryPis here:
- one runs my git-annex archive. It is really slow at hashing, so I resorted to using the WORM backend
- another one runs 2 old-ish ASIC miners. They are just barely "profitable" right now, so in a few months they will be obsolete
Both devices to some kind of `SHA256`. I have a feeling this is either extremely easy or extremely complicated to do… :)
> git-annex uses binaries such as `sha256sum` for hashing large files (large is
> currently hardcoded as bigger than 1MB). If you insert a binary with the same
> interface as `sha256sum` into your `$PATH`, git-annex will automatically use
> it. If you want to use ASIC hashing even for small files, you need to tweak
> `Backend/Hash.hs`. --[[HelmutGrohne]]
>> [[done]] --[[Joey]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.172"
subject="comment 1"
date="2014-02-20T17:42:10Z"
content="""
I feel that Helmut has the right approach to this general type of thing.
I doubt that bitcoin ASICs feature a fast data transfer bus, because bitcoin is a pretty low-data-volume protocol. Additionally AIUI, bitcoin ASICs get their speed by hashing in parallel, which allows them to try many variations of a block at once. So they probably rely on most of the data remaining the same and only a small amount changing. So it's doubtful this would be a win.
"""]]

View file

@ -1,30 +0,0 @@
Hi,
it would be great if the importfeed command would be able to read feeds generated by youtube (like for playlists). The youtube playlist feed contains links to separate youtube video pages, which quvi handles just fine. Currently I use the following python script:
#!/usr/bin/env python
import feedparser
import sys
d = feedparser.parse('http://gdata.youtube.com/feeds/api/playlists/%s' % sys.argv[1])
for entry in d.entries:
print entry.link
and then
kasimon@pc:~/annex/YouTube/debconf13$ youtube-playlist-urls PLz8ZG1e9MPlzefklz1Gv79icjywTXycR- | xargs git annex addurl --fast
addurl Welcome_talk.webm ok
addurl Bits_from_the_DPL.webm ok
addurl Debian_Cosmology.webm ok
addurl Bits_from_the_DPL.webm ok
addurl Debian_Cosmology.webm ok
addurl Debian_on_Google_Compute_Engine.webm ok
^C
to create a backup of youtube media I'd like to keep.
It would be great if this functionality could be integrated directly into git annex.
Best
Karsten
> [[done]] --[[Joey]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.227"
subject="comment 1"
date="2013-12-29T18:21:32Z"
content="""
Ok, so importfeed looks for items in a feed with enclosures, but this feed is not a podcast feed. So it needs to look for some of the `<links>`
to find pages that quvi supports. (There might be other links that are not video pages, for all I know. Looks like `getItemLink` finds the right links and then I just need to filter through quvi.
"""]]

View file

@ -1,29 +0,0 @@
As per IRC
22:13:10 < RichiH> joeyh: btw, i have been pondering a `git annex import --lazy` or some such which basically goes through a directory and deletes everything i find in the annex it run from
22:50:39 < joeyh> not sure of the use case
23:41:06 < RichiH> joeyh: the use case is "i have important a ton of data into my annexes. now, i am going through the usual crud of cp -ax'ed, rsync'ed, and other random 'new disk, move stuff around and just put a full dump over there' file dumps and would like to delete everything that's annexed already"
23:41:33 < RichiH> joeyh: that would allow me to spend time on dealing with the files which are not yet annexed
23:41:54 < RichiH> instead of verifying file after file which has been imported already
23:43:19 < joeyh> have you tried just running git annex import in a subdirectory and then deleting the dups?
23:45:34 < joeyh> or in a separate branch for that matter, which you could then merge in, etc
23:54:08 < joeyh> Thinking anout it some more, it would need to scan the whole work tree to see what keys were there, and populate a lookup table. I prefer to avoid things that need git-annex to do such a large scan and use arbitrary amounts of memory.
00:58:11 < RichiH> joeyh: that would force everything into the annex, though
00:58:20 < RichiH> a plain import, that is
00:58:53 < RichiH> in a usual data dump directory, there's tons of stuff i will never import
00:59:00 < RichiH> i want to delete large portions of it
00:59:32 < RichiH> but getting rid of duplicates first allows me to spend my time focused on stuff humans are good at: deciding
00:59:53 < RichiH> whereas the computer can focus on stuff it's good at: mindless comparision of bits
01:00:15 < RichiH> joeyh: as you're saying this is complex, maybe i need to rephrase
01:01:40 < RichiH> what i envision is git annex import --foo to 1) decide what hashing algorithm should be used for this file 2) hash that file 3) look into the annex if that hash is annexed 3a) optionally verify numcopies within the annex 4) delete the file in the source directory
01:01:47 < RichiH> and then move on to the next file
01:02:00 < RichiH> if the hash does not exist in the annex, leave it alone
01:02:50 < RichiH> if the hash exists in annex, but numcopies is not fulfilled, just import it as a normal import would
01:03:50 < RichiH> that sounds quite easy, to me; in fact i will prolly script it if you decide not to implement it
01:04:07 < RichiH> but i think it's useful for a _lot_ of people who migrate tons of data into annexes
01:04:31 < RichiH> thus i would rather see this upstream and not hacked locally
The only failure mode I see in the above is "file has been dropped elsewhere, numcopies not fulfilled, but that info is not synched to the local repo, yet" -- This could be worked around by always importing the data.
> [[done]] as `git annex import --deduplicate`.
> --[[Joey]]

View file

@ -1,20 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawl9sYlePmv1xK-VvjBdN-5doOa_Xw-jH4U"
nickname="Richard"
subject="comment 1"
date="2013-08-06T14:22:03Z"
content="""
To expand a bit on the use case:
I have several migration directories which I simply moved to new systems or disks with the help of `cp -ax` or `rsync`.
As I don't _need_ the data per se and merely want to hold on to it in case I ever happen to need it again and as disk space is laughably cheap, I have a lot of duplicates.
While I can at least detect bit flips with the help of checksum lists, cleaning those duplicates of duplicated duplicates is quite some effort.
To make things worse, photos, music, videos, letter and whatnot are thrown into the same container directories.
All in all, getting data out of those data dumps and into a clean structure is quite an effort.
`git annex import --lazy` would help with this effort as I could start with the first directory, sort stuff by hand, and annex it.
As soon as data lives in any of my annexes, I could simply run `git annex import --lazy` to get rid of all duplicates while retaining the unannexed files.
Iterating through this process a few times, I will be left with clean annexes on the one hand and stuff I can simply delete on the other hand.
I could script all this by hand on my own machine, but I am _certain_ that others would find easy, integrated, and unit tested support for whittling down data dumps over time useful.
"""]]

View file

@ -1,6 +0,0 @@
That would make assessing weird reports like [[bugs/Should_UUID__39__s_for_Remotes_be_case_sensitive__63__/]] easier and quicker.
> No, if people want to file a bug report, it's up to them to tell me
> relevant details about their OS. I'm not going down the rathole
> of making git-annex muck about trying to gather such information.
> [[done]] --[[Joey]]

View file

@ -1,6 +0,0 @@
As per DebConf13: Introduce a one-shot command to synchronize everything,
including data, with the other remotes.
Especially useful for the debconf annex.
> [[done]]; `git annex sync --content` --[[Joey]]

View file

@ -1,4 +0,0 @@
Seems pretty self-explanatory.
> This was already implemented, the --exclude option can be used
> for find as well as most any other subcommand. --[[Joey]] [[done]]

View file

@ -1,22 +0,0 @@
`--all` would make git-annex operate on either every key with content
present (or in some cases like `get` and `copy --from` on
every keys with content not present).
This would be useful when a repository has a history with deleted files
whose content you want to keep (so you're not using `dropunused`).
Or when you have a lot of branches and just want to be able to fsck
every file referenced in any branch (or indeed, any file referenced in any
ref). It could also be useful (or even a
good default) in a bare repository.
A problem with the idea is that `.gitattributes` values for keys not
currently in the tree would not be available (without horrific anounts of
grubbing thru history to find where/when the key used to exist). So
`numcopies` set via `.gitattributes` would not work. This would be a
particular problem for `drop` and for `--auto`.
--[[Joey]]
> [[done]]. The .gitattributes problem was solved simply by not
> supporting `drop --all`. `--auto` also cannot be mixed with --all for
> similar reasons. --[[Joey]]

View file

@ -1,18 +0,0 @@
There should be a backend where the file content is stored.. in a git
repository!
This way, you know your annexed content is safe & versioned, but you only
have to deal with the pain of git with large files in one place, and can
use all of git-annex's features everywhere else.
> Speaking as a future user, do very, very much want. -- RichiH
>> Might also be interesting to use `bup` in the git backend, to work
>> around git's big file issues there. So git-annex would pull data out
>> of the git backend using bup. --[[Joey]]
>>> Very much so. Generally speaking, having one or more versioned storage back-ends with current data in the local annexes sounds incredibly useful. Still being able to get at old data in via the back-end and/or making offline backups of the full history are excellent use cases. -- RichiH
[[done]], the bup special remote type is written! --[[Joey]]
> Yay! -- RichiH

View file

@ -1,3 +0,0 @@
Maybe add the icon /usr/share/doc/git-annex/html/logo.svg to the .desktp file.
> [[done]] long ago.. --[[Joey]]

View file

@ -1,14 +0,0 @@
I would like to attach metadata to annexed files (objects) without
cluttering the workdir with files containing this metadata. A common use
case would be to add titles to my photo collection that could than end up
in a generated photo album.
Depending on the implementation it might also be possible to use the metadata facility for a threaded commenting system.
The first question is whether the metadata is attached to the objects and
thus shared by all paths pointing to the same data object or to paths in
the worktree. I've no preference here at this point.
> This is [[done]]; see [[design/metadata]].
> The metadata is attached to objects, not to files.
> --[[Joey]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.154.0.63"
subject="comment 1"
date="2013-08-24T19:58:54Z"
content="""
I don't know if git-annex is the right vehicle to fix this. It seems that a more generic fix that would work in non-git-annex repos would be better.
I can answer your question though: The metadata such as urls and locations that git-annex stores in the git-annex branch is attached to objects, and not to work tree paths.
"""]]

View file

@ -1,10 +0,0 @@
When the [[design/assistant]] is running on a pair of remotes, I've seen
them get out of sync, such that every pull and merge results in a conflict,
that then has to be auto-resolved.
This seems similar to the laddering problem described in this old bug:
[[bugs/making_annex-merge_try_a_fast-forward]]
--[[Joey]]
Think I've fixed this. [[done]] --[[Joey]]

View file

@ -1,31 +0,0 @@
Client repos do not want files in archive directories. This can turn
out to be confusing to users who are using archive directories for their
own purposes and not aware of this special case in the assistant. It can
seem like the assistant is failing to sync their files.
I thought, first, that it should have a checkbox to enable the archive
directory behavior.
However, I think I have a better idea. Change the preferred content
expression for clients, so they want files in archive directories, *until*
those files land in an archive.
This way, only users who set up an archive repo get this behavior. And they
asked for it by setting up that repo!
Also, the new behavior will mean that files in archive directories still
propigate around to clients. Consider this topology:
client A ---- client B ---- archive
If a file is created in client A, and moved to an archive directory before
it syncs to B, it will never get to the archive, and will continue wasting
space on A. With the new behavior, A and B serve as effectively, transfer
repositories for archived content.
Something vaguely like this should work as the preferred content
expression for the clients:
exclude=archive/* or (include=archive/* and (not (copies=archive:1 or copies=smallarchive:1)))
> [[done]] --[[Joey]]

View file

@ -1,40 +0,0 @@
The [[design/assistant]] would be better if git-annex used ghc's threaded
runtime (`ghc -threaded`).
Currently, whenever the assistant code runs some external command, all
threads are blocked waiting for it to finish.
For transfers, the assistant works around this problem by forking separate
upload processes, and not waiting on them until it sees an indication that
they have finished the transfer. While this works, it's messy.. threaded
would be better.
When pulling, pushing, and merging, the assistant runs external git
commands, and this does block all other threads. The threaded runtime would
really help here.
[[done]]; the assistant now builds with the threaded runtime.
Some work still remains to run certian long-running external git commands
in their own threads to prevent them blocking things, but that is easy to
do, now. --[[Joey]]
---
Currently, git-annex seems unstable when built with the threaded runtime.
The test suite tends to hang when testing add. `git-annex` occasionally
hangs, apparently in a futex lock. This is not the assistant hanging, and
git-annex does not otherwise use threads, so this is surprising. --[[Joey]]
> I've spent a lot of time debugging this, and trying to fix it, in the
> "threaded" branch. There are still deadlocks. --[[Joey]]
>> Fixed, by switching from `System.Cmd.Utils` to `System.Process`
>> --[[Joey]]
---
It would be possible to not use the threaded runtime. Instead, we could
have a child process pool, with associated continuations to run after a
child process finishes. Then periodically do a nonblocking waitpid on each
process in the pool in turn (waiting for any child could break anything not
using the pool!). This is probably a last resort...

View file

@ -1,29 +0,0 @@
It should be possible for clones to learn about how to contact
each other without remotes needing to always be explicitly set
up. Say that `.git-annex/remote.log` is maintained by git-annex
to contain:
UUID hostname URI
The URI comes from configured remotes and maybe from
`file://$(pwd)`, or even `ssh://$(hostname -f)`
for the current repo. This format will merge without
conflicts or data loss.
Then when content is belived to be in a UUID, and no
configured remote has it, the remote.log can be consulted and
URIs that look likely tried. (file:// ones if the hostname
is the same (or maybe always -- a removable drive might tend
to be mounted at the same location on different hosts),
otherwise ssh:// ones.)
Question: When should git-annex update the remote.log?
(If not just on init.) Whenever it reads in a repo's remotes?
> This sounds useful and the log should be updated every time any remote is being accessed. A counter or timestamp (yes, distributed times may be wrong/different) could be used to auto-prune old entries via a global and per-remote config setting. -- RichiH
---
I no longer think I'd use this myself, I find that my repositories quickly
grow the paths I actually use, somewhat organically. Unofficial paths
across university quads come to mind. [[done]] --[[Joey]]

View file

@ -1,7 +0,0 @@
Remotes log should probably be stored in ".git/annex/remote.log"
instead of ".git-annex/remote.log" to prevent leaking credentials.
> The idea is to distribute the info between repositories, which is
> why it'd go in `.git-annex`. Of course that does mean that repository
> location information would be included, and if that'd not desirable
> this feature would need to be turned off. --[[Joey]]

View file

@ -1,15 +0,0 @@
A "git annex watch" command would help make git-annex usable by users who
don't know how to use git, or don't want to bother typing the git commands.
It would run, in the background, watching via inotify for changes, and
automatically annexing new files, etc.
The blue sky goal would be something automated like dropbox, except fully
distributed. All files put into the repository would propagate out
to all the other clones of it, as network links allow. Note that while
dropbox allows modifying files, git-annex freezes them upon creation,
so this would not be 100% equivalent to dropbox. --[[Joey]]
This is a big project with its own [[design pages|design/assistant]].
> [[done]].. at least, we have a watch command an an assistant, which
> is still being developed. --[[Joey]]

View file

@ -1,20 +0,0 @@
Some commands cause a union merge unnecessarily. For example, `git annex add`
modifies the location log, which first requires reading the current log (if
any), which triggers a merge.
Would be good to avoid these unnecessary union merges. First because it's
faster and second because it avoids a possible delay when a user might
ctrl-c and leave the repo in an inconsistent state. In the case of an add,
the file will be in the annex, but no location log will exist for it (fsck
fixes that).
It may be that all that's needed is to modify Annex.Branch.change
to read the current value, without merging. Then commands like `get`, that
query the branch, will still cause merges, and commands like `add` that
only modify it, will not. Note that for a command like `get`, the merge
occurs before it has done anything, so ctrl-c should not be a problem
there.
This is a delicate change, I need to take care.. --[[Joey]]
> [[done]] (assuming I didn't miss any cases where this is not safe!) --[[Joey]]

View file

@ -1,7 +0,0 @@
This backend is not finished.
In particular, while files can be added using it, git-annex will not notice
when their content changes, and will not create a new key for the new sha1
of the net content.
[[done]]; use unlock subcommand and commit changes with git

View file

@ -1,159 +0,0 @@
[[done]] !!!
The use of `.git-annex` to store logs means that if a repo has branches
and the user switched between them, git-annex will see different logs in
the different branches, and so may miss info about what remotes have which
files (though it can re-learn).
An alternative would be to store the log data directly in the git repo
as `pristine-tar` does. Problem with that approach is that git won't merge
conflicting changes to log files if they are not in the currently checked
out branch.
It would be possible to use a branch with a tree like this, to avoid
conflicts:
key/uuid/time/status
As long as new files are only added, and old timestamped files deleted,
there would be no conflicts.
A related problem though is the size of the tree objects git needs to
commit. Having the logs in a separate branch doesn't help with that.
As more keys are added, the tree object size will increase, and git will
take longer and longer to commit, and use more space. One way to deal with
this is simply by splitting the logs among subdirectories. Git then can
reuse trees for most directories. (Check: Does it still have to build
dup trees in memory?)
Another approach would be to have git-annex *delete* old logs. Keep logs
for the currently available files, or something like that. If other log
info is needed, look back through history to find the first occurance of a
log. Maybe even look at other branches -- so if the logs were on master,
a new empty branch could be made and git-annex would still know where to
get keys in that branch.
Would have to be careful about conflicts when deleting and bringing back
files with the same name. And would need to avoid expensive searching thru
all history to try to find an old log file.
## fleshed out proposal
Let's use one branch per uuid, named git-annex/$UUID.
- I came to realize this would be a good idea when thinking about how
to upgrade. Each individual annex will be upgraded independantly,
so each will want to make a branch, and if the branches aren't distinct,
they will merge conflict for sure.
- TODO: What will need to be done to git to make it push/pull these new
branches?
- A given repo only ever writes to its UUID branch. So no conflicts.
- **problem**: git annex move needs to update log info for other repos!
(possibly solvable by having git-annex-shell update the log info
when content is moved using it)
- (BTW, UUIDs probably don't compress well, and this reduces the bloat of having
them repeated lots of times in the tree.)
- Per UUID branches mean that if it wants to find a file's location
among configured remotes, it can examine only their branches, if
desired.
- It's important that the per-repo branches propigate beyond immediate
remotes. If there is a central bare repo, that means push --all. Without
one, it means that when repo B pulls from A, and then C pulls from B,
C needs to get A's branch -- which means that B should have a tracking
branch for A's branch.
In the branch, only one file is needed. Call it locationlog. git-annex
can cache location log changes and write them all to locationlog in
a single git operation on shutdown.
- TODO: what if it's ctrl-c'd with changes pending? Perhaps it should
collect them to .git/annex/locationlog, and inject that file on shutdown?
- This will be less overhead than the current staging of all the log files.
The log is not appended to, so in git we have a series of commits each of
which replaces the log's entire contens.
To find locations of a key, all (or all relevant) branches need to be
examined, looking backward through the history of each until a log
with a indication of the presense/absense of the key is found.
- This will be less expensive for files that have recently been added
or transfered.
- It could get pretty slow when digging deeper.
- Only 3 places in git-annex will be affected by any slowdown: move --from,
get and drop. (Update: Now also unused, whereis, fsck)
## alternate
As above, but use a single git-annex branch, and keep the per-UUID
info in their own log files. Hope that git can auto-merge as long as
each observing repo only writes to its own files. (Well, it can, but for
non-fast-forward merges, the git-annex branch would need to be checked out,
which is problimatic.)
Use filenames like:
<observing uuid>/<location uuid>
That allows one repo to record another's state when doing a
`move`.
## outside the box approach
If the problem is limited to only that the `.git-annex/` files make
branching difficult (and not to the related problem that commits to them
and having them in the tree are sorta annoying), then a simple approach
would be to have git-annex look in other branches for location log info
too.
The problem would then be that any locationlog lookup would need to look in
all other branches (any branch could have more current info after all),
which could get expensive.
## way outside the box approach
Another approach I have been mulling over is keeping the log file
branch checked out in .git/annex/logs/ -- this would be a checkout of a git
repository inside a git repository, using "git fake bare" techniques. This
would solve the merge problem, since git auto merge could be used. It would
still mean all the log files are on-disk, which annoys some. It would
require some tighter integration with git, so that after a pull, the log
repo is updated with the data pulled. --[[Joey]]
> Seems I can't use git fake bare exactly. Instead, the best option
> seems to be `git clone --shared` to make a clone that uses
> `.git/annex/logs/.git` to hold its index etc, but (mostly) uses
> objects from the main repo. There would be some bloat,
> as commits to the logs made in there would not be shared with the main
> repo. Using `GIT_OBJECT_DIRECTORY` might be a way to avoid that bloat.
## notes
Another approach could be to use git-notes. It supports merging branches
of notes, with union merge strategy (a hook would have to do this after
a pull, it's not done automatically).
Problem: Notes are usually attached to git
objects, and there are no git objects corresponding to git-annex keys.
Problem: Notes are not normally copied when cloning.
------
## elminating the merge problem
Most of the above options are complicated by the problem of how to merge
changes from remotes. It should be possible to deal with the merge
problem generically. Something like this:
* We have a local branch `B`.
* For remotes, there are also `origin/B`, `otherremote/B`, etc.
* To merge two branches `B` and `foo/B`, construct a merge commit that
makes each file have all lines that were in either version of the file,
with duplicates removed (probably). Do this without checking out a tree.
-- now implemented as git-union-merge
* As a `post-merge` hook, merge `*/B` into `B`. This will ensure `B`
is always up-to-date after a pull from a remote.
* When pushing to a remote, nothing need to be done, except ensure
`B` is either successfully pushed, or the push fails (and a pull needs to
be done to get the remote's changes merged into `B`).

View file

@ -1,23 +0,0 @@
The checkout subcommand replaces the symlink that normally points at a
file's content, with a copy of the file. Once you've checked a file out,
you can edit it, and `git commit` it. On commit, git-annex will detect
if the file has been changed, and if it has, `add` its content to the
annex.
> Internally, this will need to store the original symlink to the file, in
> `.git/annex/checkedout/$filename`.
>
> * git-annex uncheckout moves that back
> * git-annex pre-commit hook checks each file being committed to see if
> it has a symlink there, and if so, removes the symlink and adds the new
> content to the annex.
>
> And it seems the file content should be copied, not moved or hard linked:
>
> * Makes sure other annexes can find it if transferring it from
> this annex.
> * Ensures it's always available for uncheckout.
> * Avoids the last copy of a file's content being lost when
> the checked out file is modified.
[[done]]

View file

@ -1,105 +0,0 @@
Currently [[/direct_mode]] allows the user to point many normally safe
git commands at his foot and pull the trigger. At LCA2013, a git-annex
user suggested modifying direct mode to make this impossible.
One way to do it would be to move the .git directory. Instead, make there
be a .git-annex directory in direct mode repositories. git-annex would know
how to use it, and would be extended to support all known safe git
commands, passing parameters through, and in some cases verifying them.
So, for example, `git annex commit` would run `git commit --git-dir=.git-annex`
However, `git annex commit -a` would refuse to run, or even do something
intelligent that does not involve staging every direct mode file.
----
One source of problems here is that there is some overlap between git-annex
and git commands. Ie, `git annex add` cannot be a passthrough for `git
add`. The git wrapper could instead be another program, or it could be
something like `git annex git add`
--[[Joey]]
----
Or, no git wrapper could be provided. Limit the commands to only git-annex
commands. This should be all that is needed to manage a direct mode
repository simply, and if the user is doing something complicated that
needs git access, they can set `GIT_DIR=.git-annex` and be careful not to
shoot off their foot. (Or can just switch to indirect mode!)
This wins on simplicity, and if it's the wrong choice a git wrapper
can be added later. --[[Joey]]
---
Implementation: Pretty simple really. Already did the hard lifting to
support `GIT_DIR`, so only need to override the default git directory
in direct mode when that's not set to `.git-annex`.
A few things hardcode ".git", including Assistant.Threads.Watcher.ignored
and `Seek.withPathContents`, and parts of `Git.Construct`.
---
Transition: git-annex should detect when it's in a direct mode repository
with a .git directory and no .git-annex directory, and transparently
do the move to transition to the new scheme. (And remember that `git annex
indirect` needs to move it back.)
# alternative approach: move index
Rather than moving .git, maybe move .git/index?
This would cause git to think that all files in the tree were deleted.
So git commit -a would make a commit that removes them from git history.
But, the files in the work tree are not touched by this.
Also, git checkout, git merge, and other things that manipulate the work
tree refuse to do anything if they'd change a file that they think is
untracked.
Hmm, this does't solve the user accidentially running git add on an annexed
file; the whole file still gets added.
# alternative approach: fake bare repo
Set core.bare to true. This prevents all work tree operations,
so prevents any foot shooting. It still lets the user run commands like
git log, even on files in the tree, and git fetch, and push, and git
config, etc.
Even better, it integrates with other tools, like `mr`, so they know
it's a git repo.
This seems really promising. But of course, git-annex has its own set of
behaviors in a bare repo, so will need to recognise that this repo is not
really bare, and avoid them.
> [[done]]!! --[[Joey]]
(Git may also have some bare repo behaviors that are unwanted. One example
is that git allows pushes to the current branch in a bare repo,
even when `receive.denyCurrentBranch` is set.)
> This is indeed a problem. Indeed, `git annex sync` successfully
> pushes changes to the master branch of a fake bare direct mode repo.
>
> And then, syncing in the repo that was pushed to causes the changes
> that were pushed to the master branch to get reverted! This happens
> because sync commits; commit sees that files are staged in index
> differing from the (pushed) master, and commits the "changes"
> which revert it.
>
> Could fix this using an update hook, to reject the updated of the master
> branch. However, won't work on crippled filesystems! (No +x bit)
>
> Could make git annex sync detect this. It could reset the master
> branch to the last one committed, before committing. Seems very racy
> and hard to get right!
>
> Could make direct mode operate on a different branch, like
> `annex/direct/master` rather than `master`. Avoid pushing to that
> branch (`git annex sync` can map back from it to `master` and push there
> instead). A bit clumsy, but works.

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawn-KDr_Z4CMkjS0v_TxQ08SzAB5ecHG3K0"
nickname="Glen"
subject="This sounds good"
date="2013-06-25T10:30:07Z"
content="""
I think we might have been talking about this feature.. Seems like a good idea to me.
Glen
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawm7AuSfii_tCkLyspL6Mr0ATlO6OxLNYOo"
nickname="Georg"
subject="comment 2"
date="2013-09-20T11:29:04Z"
content="""
Maybe make a git sub-namespace of commands. Yeah, I know, something like git annex git-add sounds a bit on the verbose side, but it would allow access to possibly all git commands regardless of name clashes.
"""]]

View file

@ -1,7 +0,0 @@
I've an external USB hard disc attached to my (fritzbox) router that is only accessible through SMB/CIFS. I'd like have all my annexed files on this drive in kind of direct-mode so that I can also access the files without git-annex.
I tried to put a direct-mode repo on the drive but this is painfully slow. The git-annex process than runs on my desktop and accesses the repo over SMB over the slow fritzbox over USB.
I'd wish that git-annex could be told to just use a (mounted) folder as a direct-mode remote.
> [[done]]; dup. --[[Joey]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.64"
subject="comment 1"
date="2013-11-23T19:03:58Z"
content="""
It's not clear to me what you are requesting here.
You seem to say that running git-annex inside a mountpoint is slow. Ok. So, what possible changes to git-annex could make it fast, given that the bottleneck is the SMB/USB?
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnR6E5iUghMWdUGlbA9CCs8DKaoigMjJXw"
nickname="Efraim"
subject="comment 2"
date="2013-11-26T09:26:53Z"
content="""
perhaps he's looking to be able to expand the addurl option to include file://path/to/video.mp4, or for over smb://... , to import a file without changing its location to being inside the annex.
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmicVKRM8vJX4wPuAwlLEoS2cjmFXQkjkE"
nickname="Thomas"
subject="never mind"
date="2013-12-01T18:34:05Z"
content="""
grossmeier.net did a much better job to explain what I want:
[[New special remote suggeston - clean directory]]
Please close this issue as duplicate of the above.
"""]]

View file

@ -1,18 +0,0 @@
Say I have some files on remote A. But I'm away from it, and transferring
files from B to C. I'd like to avoid transferring any files I already have
on A.
Something like:
git annex copy --to C --exclude-on A
This would not contact A, just use its cached location log info.
I suppose I might also sometime want to only act on files that are
thought/known to be on A.
git annex drop --only-on A
--[[Joey]]
[[done]]

View file

@ -1,9 +0,0 @@
Apparently newer gnupg has support for hardware-accelerated AES-NI. It
would be good to have an option to use that. I also wonder if using the
same symmetric key for many files presents a security issues (and whether
using GPG keys directly would be more secure).
> [[done]]; you can now use encryption=pubkey when setting up a special
> remote to use pure public keys without the hybrid symmetric key scheme.
> Which you choose is up to you. Also, annex.gnupg-options can configure
> the ciphers used. --[[Joey]]

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.152.108.145"
subject="comment 1"
date="2013-08-01T17:10:56Z"
content="""
There is a remote.name.annex-gnupg-options git-config setting that can be used to pass options to gpg on a per-remote basis.
> also wonder if using the same symmetric key for many files presents a security issues (and whether using GPG keys directly would be more secure).
I am not a cryptographer, but I have today run this question by someone with a good amount of crypo knowledge. My understanding is that reusing a symmetric key is theoretically vulnerable to eg known-plaintext or chosen-plaintext attacks. And that modern ciphers like AES and CAST (gpg default) are designed to resist such attacks.
If someone was particularly concerned about these attack vectors, it would be pretty easy to add a mode where git-annex uses public key encryption directly. With the disadvantage, of course, that once a file was sent to a special remote and encrypted for a given set of public keys, other keys could not later be granted access to it.
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawl9sYlePmv1xK-VvjBdN-5doOa_Xw-jH4U"
nickname="Richard"
subject="comment 2"
date="2013-08-02T07:21:50Z"
content="""
Using symmetric keys is significantly cheaper, computation-wise.
The scheme of encrypting symmetric keys with asymmetric ones is ancient, well-proven, and generally accepted as a good approach.
Using per-key files makes access control more fine-grained and is only a real performance issue once while creating the private key and a little bit every time more than one file needs to be decrypted as more than one symmetric key needs to be taken care of.
"""]]

View file

@ -1,17 +0,0 @@
[[!comment format=mdwn
username="guilhem"
ip="129.16.20.209"
subject="comment 3"
date="2013-08-19T13:44:35Z"
content="""
AES-NI acceleration will be used by default providing you're using
the new modularized GnuPG (v2.x) and libgcrypt ≥ 1.5.0. Of course it
only speeds up AES encryption, while GnuPG uses CAST by default; you can
either set `personal-cipher-preferences` to AES or AES256 in your
`gpg.conf` or, as joeyh hinted at, set `remote.<name>.annex-gnupg-options`
as described in the manpage.
By the way, I observed a significant speed up when using `--compress-algo none`.
Image, music and video files are typically hard to compress further, and it seems
that's where gpg spent most of its time, at least on the few files I benchmarked.
"""]]

View file

@ -1,4 +0,0 @@
Using an rsync remote is currently very slow when there are a lot of files, since rsync appears to be called for each file copied. It would be awesome if each call to rsync was amortized to copy many files; rsync is very good at copying many small files quickly.
> [[done]]; bug submitter was apparently not using a version
> with rsync connection caching. --[[Joey]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.152.108.145"
subject="comment 1"
date="2013-08-01T16:06:42Z"
content="""
I cannot see a way to do this using rsync's current command-line interface. Ideas how to do it welcomed.
"""]]

View file

@ -1,24 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawln4uCaqZRd5_nRQ-iLcJyGctIdw8ebUiM"
nickname="Edward"
subject="Just put multiple source files"
date="2013-08-01T16:29:04Z"
content="""
It seems like you can just put multiple source files on the command line:
ed@ed-Ubu64 /tmp$ touch a b c d
ed@ed-Ubu64 /tmp$ mkdir test
ed@ed-Ubu64 /tmp$ rsync -avz a b c d test
sending incremental file list
a
b
c
d
sent 197 bytes received 88 bytes 570.00 bytes/sec
total size is 0 speedup is 0.00
ed@ed-Ubu64 /tmp$ ls test
a b c d
It also appears to work with remote transfers too.
"""]]

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="4.152.108.145"
subject="comment 3"
date="2013-08-01T16:58:49Z"
content="""
git-annex needs to build a specific directory structure on the rsync remote though. It seems it would need to build the whole tree locally, containing only the files it wants to send.
When using encryption, it would need to encrypt all the files it's going to send and store them locally until it's built the tree. That could use a lot of disk space.
Also, there's the problem of checking which files are already present in the remote, to avoid re-encrypting and re-sending them. Currently this is done by running rsync with the url of the file, and checking its exit code. rsync does not seem to have an interface that would allow checking multiple files in one call. So any optimisation of the number of rsync calls would only eliminate 1/2 of the current number.
When using ssh:// urls, the rsync special remote already uses ssh connection caching, which I'd think would eliminate most of the overhead. (If you have a version of git-annex older than 4.20130417, you should upgrade to get this feature.) It should not take very long to start up a new rsync over a cached ssh connection. rsync:// is probably noticably slower.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawln4uCaqZRd5_nRQ-iLcJyGctIdw8ebUiM"
nickname="Edward"
subject="Thanks"
date="2013-08-01T17:03:23Z"
content="""
I am using an old version of git-annex. I'll try the newer one and see if the connection caching helps!
"""]]

View file

@ -1,5 +0,0 @@
Find a way to copy a file with a progress bar, while still preserving
stat. Easiest way might be to use pv and fix up the permissions etc
after?
[[done]]

View file

@ -1,11 +0,0 @@
add a git annex fsck that finds keys that have no referring file
(done)
* Need per-backend fsck support. sha1 can checksum all files in the annex.
WORM can check filesize.
* Both can check that annex.numcopies is satisfied. Probably only
querying the locationlog, not doing an online verification.
[[done]]

View file

@ -1,13 +0,0 @@
`git annex fsck --from remote`
Basically, this needs to receive each file in turn from the remote, to a
temp file, and then run the existing fsck code on it. Could be quite
expensive, but sometimes you really want to check.
An unencrypted directory special remote could be optimised, by not actually
copying the file, just dropping a symlink, etc.
The WORM backend doesn't care about file content, so it would be nice to
avoid transferring the content at all, and only send the size.
> [[done]] --[[Joey]]

View file

@ -1,15 +0,0 @@
[[done]]
I've been considering adding a `git-annex-shell` command. This would
be similar to `git-shell` (and in fact would pass unknown commands off to
`git-shell`).
## Reasons
* Allows locking down an account to only be able to use git-annex (and
git).
* Avoids needing to construct complex shell commands to run on the remote
system. (Mostly already avoided by the plumbing level commands.)
* Could possibly allow multiple things to be done with one ssh connection
in future.
* Allows expanding `~` and `~user` in repopath on the remote system.

View file

@ -1,32 +0,0 @@
`git-annex unused` has to compare large sets of data
(all keys with content present in the repository,
with all keys used by files in the repository), and so
uses more memory than git-annex typically needs.
It used to be a lot worse (hundreds of megabytes).
Now it only needs enough memory to store a Set of all Keys that currently
have content in the annex. On a lightly populated repository, it runs in
quite low memory use (like 8 mb) even if the git repo has 100 thousand
files. On a repository with lots of file contents, it will use more.
Still, I would like to reduce this to a purely constant memory use,
as running in constant memory no matter the repo size is a git-annex design
goal.
One idea is to use a bloom filter.
For example, construct a bloom filter of all keys used by files in
the repository. Then for each key with content present, check if it's
in the bloom filter. Since there can be false positives, this might
miss finding some unused keys. The probability/size of filter
could be tunable.
> Fixed in `bloom` branch in git. --[[Joey]]
>> [[done]]! --[[Joey]]
Another way might be to scan the git log for files that got removed
or changed what key they pointed to. Correlate with keys with content
currently present in the repository (possibly using a bloom filter again),
and that would yield a shortlist of keys that are probably not used.
Then scan thru all files in the repo to make sure that none point to keys
on the shortlist.

View file

@ -1,13 +0,0 @@
Would help alot when having to add large(ish) amounts of remotes.
Maybe detect this kind of commit message and ask user whether to automatically add them? See [[auto_remotes]]:
> Question: When should git-annex update the remote.log? (If not just on init.) Whenever it reads in a repo's remotes?
----
I'm not sure that the above suggestion is going down a path that really
makes sense. If you want a list of repository UUIDs and descriptions,
it's there in machine-usable form in `.git-annex/uuid.log`, there is no
need to try to pull this info out of git commit messages. --[[Joey]]
[[done]]

View file

@ -1,39 +0,0 @@
gitosis and gitolite should support git-annex being used to send/receive
files from the repositories they manage. Users with read-only access
could only get files, while users with write access could also put and drop
files.
Doing this right requires modifying both programs, to add [[git-annex-shell]]
to the list of things they can run, and only allow through appropriate
git-annex-shell subcommands to read-only users.
I have posted an RFC for modifying gitolite to the
[gitolite mailing list](http://groups.google.com/group/gitolite?lnk=srg).
> I have not developed a patch yet, but all that git-annex needs is a way
> to ssh to the server and run the git-annex-shell command there.
> git-annex-shell is very similar to git-shell. So, one way to enable
> it is simply to set GL_ADC_PATH to a directory containing git-annex-shell.
>
> But, that's not optimal, since git-annex-shell will send off receive-pack
> commands to git, which would bypass gitolite's permissions checking.
> Also, it makes sense to limit readonly users to only download, not
> upload/delete files from git-annex. Instead, I suggest adding something
> like this to gitolite's config:
# If set, users with W access can write file contents into the git-annex,
# and users with R access can read file contents from the git-annex.
$GL_GIT_ANNEX = 0;
> If this makes sense, I'm sure I can put a patch together for your
> review. It would involve modifying gl-auth-command so it knows how
> to run git-annex-shell, and how to parse out the "verb" from a
> git-annex-shell command line, and modifying R_COMMANDS and W_COMMANDS.
As I don't write python, someone else is needed to work on gitosis.
--[[Joey]]
> [[done]]; support for gitolite is in its `pu` branch, and some changes
> made to git-annefor gitolite is in its `pu` branch, and some changes
> made to git-annex. Word is gitosis is not being maintained so I won't
> worry about try to support it. --[[Joey]]

View file

@ -1,5 +0,0 @@
how to handle git rm file? (should try to drop keys that have no
referring file, if it seems safe..)
[[done]] -- I think that git annex unused and dropunused are the best
solution to this.

View file

@ -1,18 +0,0 @@
A repository like http://annex.debconf.org/debconf-share/ has a git repo
published via http. When getting files from such a repo, git-annex tries
two urls. One url would be used by a bare repo, and the other by a non-bare
repo. (This is due to the directory hashing change.) Result is every file
download from a non-bare http repo starts with a 404 and then it retries
with the right url.
Since git-annex already downloads the .git/config to find the uuid of the
http repo, it could also look at it to see if the repo is bare. If not,
set a flag, and try the two urls in reverse order, which would almost
always avoid this 404 problem.
(The real solution is probably to flag day and get rid of the old-style
directory hashing, but that's been discussed elsewhere.)
--[[Joey]]
[[done]]

View file

@ -1,8 +0,0 @@
The IA would find it useful to be able to control the http headers
git-annex get, addurl, etc uses. This will allow setting cookies, for
example.
* annex-web-headers=blah
* Perhaps also annex-web-headers-command=blah
[[done]]

View file

@ -1,8 +0,0 @@
> josh: Do you do anything in git-annex to try to make the files immutable?
> For instance, removing write permission, or even chattr?
> joey: I don't, but that's a very good idea
> josh: Oh, I just thought of another slightly crazy but handy idea.
> josh: I'd hate to run into a program which somehow followed the symlink and then did an unlink to replace the file.
> josh: To break that, you could create a new directory under annex's internal directory for each file, and make the directory have no write permission.
[[done]] and done --[[Joey]]

View file

@ -1,24 +0,0 @@
Justin Azoff realized git-annex should have an incremental fsck.
This requires storing the last fsck time of each object.
I would not be strongly opposed to sqlite, but I think there are other
places the data could be stored. One possible place is the mode or mtime
of the .git/annex/objects/xx/yy/$key directories (the parent directories
of where the content is stored). Perhaps the sticky bit could be used to
indicate the content has been fsked, and the mtime indicate the time
of last fsck. Anything that dropped or put in content would need to
clear the sticky bit. --[[Joey]]
> Basic incremental fsck is done now.
>
> Some enhancements would include:
>
> * --max-age=30d Once the incremental fsck completes and was started 30 days ago,
> start a new one.
> * --time-limit --size-limit --file-limit: Limit how long the fsck runs.
>> Calling this [[done]]. The `--incremental-schedule` option
>> allows scheduling time between incremental fscks. `--time-limit` is
>> done. I implemented `--smallerthan` independently. Not clear what
>> `--file-limit` would be. --[[Joey]]

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo"
nickname="Justin"
subject="comment 1"
date="2012-09-20T14:11:57Z"
content="""
I have a [proof of concept written in python](https://github.com/JustinAzoff/git-annex-background-fsck/blob/master/git-annex-background-fsck).
You can run it and point it the root of an annex or to a subdirectory. In my brief testing it seems to work :-)
the goal would be to have options like
git annex fsck /data/annex --check-older-than 1w --check-for 2h --max-load-avg 0.5
"""]]

View file

@ -1,52 +0,0 @@
I have two repos, using SHA1 backend and both using git.
The first one is a laptop, the second one is a usb drive.
When I drop a file on the laptop repo, the file is not available on that repo until I run *git annex get*
But when the usb drive is plugged in the file is actually available.
How about adding a feature to link some/all files to the remote repo?
e.g.
We have *railscasts/196-nested-model-form-part-1.mp4* file added to git, and only available on the usb drive:
$ git annex whereis 196-nested-model-form-part-1.mp4
whereis 196-nested-model-form-part-1.mp4 (1 copy)
a7b7d7a4-2a8a-11e1-aebc-d3c589296e81 -- origin (Portable usb drive)
I can see the link with:
$ cd railscasts
$ ls -ls 196*
8 lrwxr-xr-x 1 framallo staff 193 Dec 20 05:49 196-nested-model-form-part-1.mp4 -> ../.git/annex/objects/Wz/6P/SHA256-s16898930--43679c67cd968243f58f8f7fb30690b5f3f067574e318d609a01613a2a14351e/SHA256-s16898930--43679c67cd968243f58f8f7fb30690b5f3f067574e318d609a01613a2a14351e
I save this in a variable just to make the example more clear:
ID=".git/annex/objects/Wz/6P/SHA256-s16898930--43679c67cd968243f58f8f7fb30690b5f3f067574e318d609a01613a2a14351e/SHA256-s16898930--43679c67cd968243f58f8f7fb30690b5f3f067574e318d609a01613a2a14351e"
The file doesn't exist on the local repo:
$ ls ../$ID
ls: ../$ID: No such file or directory
however I can create a link to access that file on the remote repo.
First I create a needed dir:
$ mkdir ../.git/annex/objects/Wz/6P/SHA256-s16898930--43679c67cd968243f58f8f7fb30690b5f3f067574e318d609a01613a2a14351e/
Then I link to the remote file:
$ ln -s /mnt/usb_drive/repo_folder/$ID ../$ID
now I can open the file in the laptop repo.
I think it could be easy to implement. Maybe It's a naive approach, but looks apealing.
Checking if it's a real file or a link shouldn't impact on performance.
The limitation is that it would work only with remote repos on local dirs
Also allows you to have one directory structure like AFS or other distributed FS. If the file is not local I go to the remote server.
Which is great for apps like Picasa, Itunes, and friends that depends on the file location.
> This is a duplicate of [[union_mounting]]. So closing it: [[done]].
>
> It's a good idea, but making sure git-annex correctly handles these links in all cases is a subtle problem that has not yet been tackled. --[[Joey]]

View file

@ -1,18 +0,0 @@
Some podcasts don't include a sortable date as the first thing in their episode title, which makes listening to them in order challenging if not impossible.
The date the item was posted is part of the RSS standard, so we should parse that and provide a new importfeed template option "itemdate".
(For the curious, I tried "itemid" thinking that might give me something close, but it doesn't. I used --template='${feedtitle}/${itemid}-${itemtitle}${extension}' and get:
http___openmetalcast.com__p_1163-Open_Metalcast_Episode__93__Headless_Chicken.ogg
or
http___www.folkalley.com_music_podcasts__name_2013_08_21_alleycast_6_13.mp3-Alleycast___06.13.mp3
that "works" but is ugly :)
Would love to be able to put a YYYYMMDD at the beginning and then the title.
> [[done]]; itempubdate will use form YYYY-MM-DD (or the raw date string
> if the feed does not use a parsable form). --[[Joey]]

View file

@ -1,16 +0,0 @@
[[!comment format=mdwn
username="http://grossmeier.net/"
nickname="greg"
subject="Without knowing Haskell"
date="2014-04-06T04:55:31Z"
content="""
Maybe this just requires adding:
, fieldMaybe \"itemdate\" $ getFeedPubDate $ item i
on line 214 in Command/ImportFeed.hs ??
It is supported by [Text.Feed.Query](http://hackage.haskell.org/package/feed-0.3.9.2/docs/Text-Feed-Query.html)
I have no haskell dev env so I can't test this, but if my suggestion is true, I might set one up :)
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.244"
subject="comment 2"
date="2014-04-07T19:51:27Z"
content="""
https://github.com/sof/feed/issues/6
"""]]

View file

@ -1,25 +0,0 @@
The `Makefile` should respect a `PREFIX` passed on the commandline so git-annex can be installed in (say) `$HOME`.
Simple patch:
[[!format diff """
diff --git a/Makefile b/Makefile
index b8995b2..5b1a6d4 100644
--- a/Makefile
+++ b/Makefile
@@ -3,7 +3,7 @@ all=git-annex $(mans) docs
GHC?=ghc
GHCMAKE=$(GHC) $(GHCFLAGS) --make
-PREFIX=/usr
+PREFIX?=/usr
CABAL?=cabal # set to "./Setup" if you lack a cabal program
# Am I typing :make in vim? Do a fast build.
"""]]
--[[anarcat]]
> [[done]] --[[Joey]]
> > thanks! ;) --[[anarcat]]

View file

@ -1,22 +0,0 @@
The traditionnal way of marking commandline flags in a manpage is with a `.B` (for Bold, I guess). It doesn't seem to be used by mdwn2man, which makes the manpage look a little more dull than it could.
The following patch makes those options come out more obviously:
[[!format diff """
diff --git a/Build/mdwn2man b/Build/mdwn2man
index ba5919b..7f819ad 100755
--- a/Build/mdwn2man
+++ b/Build/mdwn2man
@@ -8,6 +8,7 @@ print ".TH $prog $section\n";
while (<>) {
s{(\\?)\[\[([^\s\|\]]+)(\|[^\s\]]+)?\]\]}{$1 ? "[[$2]]" : $2}eg;
+ s/\`([^\`]*)\`/\\fB$1\\fP/g;
s/\`//g;
s/^\s*\./\\&./g;
if (/^#\s/) {
"""]]
I tested it against the git-annex manpage and it seems to work well. --[[anarcat]]
> [[done]], thanks --[[Joey]]

View file

@ -1,5 +0,0 @@
Support for remote git repositories (ssh:// specifically can be made to
work, although the other end probably needs to have git-annex
installed..)
[[done]], at least get and put work..

View file

@ -1,100 +0,0 @@
We had some informal discussions on IRC about improving the output of the `whereis` command.
[[!toc levels=2]]
First version: columns
======================
[[mastensg]] started by implementing a [simple formatter](https://gist.github.com/mastensg/6500982) that would display things in columns [screenshot](http://www.ping.uio.no/~mastensg/whereis.png)
Second version: Xs
==================
After some suggestions from [[joey]], [[mastensg]] changed the format slightly ([screenshot](http://www.ping.uio.no/~mastensg/whereis2.png)):
[[!format txt """
17:01:34 <joeyh> foo
17:01:34 <joeyh> |bar
17:01:34 <joeyh> ||baz (untrusted)
17:01:34 <joeyh> |||
17:01:34 <joeyh> XXx 3? img.png
17:01:36 <joeyh> _X_ 1! bigfile
17:01:37 <joeyh> XX_ 2 zort
17:01:39 <joeyh> __x 1?! maybemissing
17:02:09 * joeyh does a s/\?/+/ in the above
17:02:24 <joeyh> and decrements the counters for untrusted
17:03:37 <joeyh> __x 0+! maybemissing
"""]]
Third version: incremental
==========================
Finally, [[anarcat]] worked on making it run faster on large repositories, in a [fork](https://gist.github.com/anarcat/6502988) of that first gist. Then paging was added (so headers are repeated).
Fourth version: tuning and blocked
==================================
[[TobiasTheViking]] provided some bugfixes, and the next step was to implement the trusted/untrusted detection, and have a counter.
This required more advanced parsing of the remotes, and instead of starting to do some JSON parsing, [[anarcat]] figured it was time to learn some Haskell instead.
Current status: needs merge
===========================
So right now, the most recent version of the python script is in [anarcat's gist](https://gist.github.com/anarcat/6502988) and works reasonably well. However, it doesn't distinguish between trusted and untrusted repos and so on.
Furthermore, we'd like to see this factored into the `whereis` command directly. A [raw.hs](http://codepad.org/miVJb5oK) file has been programmed by `mastensg`, and is now available in the above gist. It fits the desired output and prototypes, and has been `haskellized` thanks to [[guilhem]].
Now we just need to merge those marvelous functions in `Whereis.hs` - but I can't quite figure out where to throw that code, so I'll leave it to someone more familiar with the internals of git-annex. The most recent version is still in [anarcat's gist](https://gist.github.com/anarcat/6502988). --[[anarcat]]
Desired output
--------------
The output we're aiming for is:
foo
|bar
||baz (untrusted)
|||
XXx 2+ img.png
_X_ 1! bigfile
XX_ 2 zort
__x 0+! maybemissing
Legend:
* `_` - file missing from repo
* `x` - file may be present in untrusted repo
* `X` - file is present in trusted repo
* `[0-9]` - number of copies present in trusted repos
* `+` - indicates there may be more copies present
* `!` - indicates only one copy is left
Implementation notes
--------------------
[[!format txt """
20:48:18 <joeyh> if someone writes me a headerWhereis :: [(RemoteName, TrustLevel)] -> String and a formatWhereis :: [(RemoteName, TrustLevel, UUID)] -> [UUD] -> FileName -> String , I can do the rest ;)
20:49:22 <joeyh> make that second one formatWhereis :: [(RemoteName, TrueLevel, Bool)] -> FileName -> String
20:49:37 <joeyh> gah, typos
20:49:45 <joeyh> suppose you don't need the RemoteName either
"""]]
> So, I incorporated this, in a new remotes command.
> Showing all known repositories seemed a bit much
> (I have 30-some known repositories in some cases),
> so just showing configured remotes seems a good simplification.
> [[done]]
> --[[Joey]]
> > I would have prefered this to be optional since I don't explicitely configure all remotes in git, especially if I can't reach them all the time (e.g. my laptop). It seems to me this should at least be an option, but I am confused as to why `Remote.List.remoteList` doesn't list all remotes the same way `Remote.remote_list` does... Also, it's unfortunate that the +/!/count flags have been dropped, it would have been useful... Thanks for the merge anyways! --[[done]]
> >
> > The more I look at this, the more i think there are a few things wrong with the new `remotes` command.
> >
> > 1. the name is confusing: being a git addict, I would expect the `git annex remote` command to behave like the `git remote` command: list remotes, add remotes, remove remotes and so on. it would actually be useful to have such a command (which would replace `initremote`, I guess). i recommend replacing the current `whereis` command, even if enabled through a special flag
> >
> > 2. its behavior is inconsistent with other git annex commands: `git annex status`, for example, lists information about all remotes, regardless of whether they are configured in git. `remotes` (whatever it's called), should do the same, or at least provide an option to allow the user to list files on all remotes. The way things stand, there is no way to list files on non-git remotes, even if they are added explicitely as a remote, if the remote is not actually reachable: the files are just marked as absent (even thought `whereis` actually finds them). i recommend showing all remotes regardless, either opt-in or opt-out using a flag.
> >
> > 3. having the `!` flag, at least, would be useful because it would allow users to intuitively grep for problematic files without having to learn extra syntax. same with + and having an explicit count.
> >
> > thanks. --[[anarcat]]

View file

@ -1,25 +0,0 @@
Several things suggest now would be a good time to reorgaize the object
directory. This would be annex.version=2. It will be slightly painful for
all users, so this should be the *last* reorg in the forseeable future.
1. Remove colons from filenames, for [[bugs/fat_support]]
2. Add hashing, since some filesystems do suck (like er, fat at least :)
[[forum/hashing_objects_directories]]
(Also, may as well hash .git-annex/* while at it -- that's what
really gets big.)
3. Add filesize metadata for [[bugs/free_space_checking]]. (Currently only
present in WORM, and in an ad-hoc way.)
4. Perhaps use a generic format that will allow further metadata to be
added later. For example,
"bSHA1,s101111,kf3101c30bb23467deaec5d78c6daa71d395d1879"
(Probably everything after ",k" should be part of the key, even if it
contains the "," separator character. Otherwise an escaping mechanism
would be needed.)
[[done]] now!
Although [[bugs/free_space_checking]] is not quite there --[[Joey]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawl9sYlePmv1xK-VvjBdN-5doOa_Xw-jH4U"
nickname="Richard"
subject="comment 1"
date="2011-03-16T01:16:48Z"
content="""
If you support generic meta-data, keep in mind that you will need to do conflict resolution. Timestamps may not be synched across all systems, so keeping a log of old metadata could be used, sorting by history and using the latest. Which leaves the situation of two incompatible changes. This would probably mean manual conflict resolution. You will probably have thought of this already, but I still wanted to make sure this is recorded. -- RichiH
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawl9sYlePmv1xK-VvjBdN-5doOa_Xw-jH4U"
nickname="Richard"
subject="comment 2"
date="2011-03-16T01:19:25Z"
content="""
Hmm, I added quite a few comments at work, but they are stuck in moderation. Maybe I forgot to log in before adding them. I am surprised this one appeared immediately. -- RichiH
"""]]

Some files were not shown because too many files have changed in this diff Show more