August 6, 2007
Resulting in about one week of lazy effort, i reach to produce a preliminary version of AbiScan on top of OCRopus. I produced a screencast video of direct OCR import into Abiword Frame. This is very buggy, but very exciting too :).
I must thanks #abiword people, especially Dominic Lachowicz, Marc Mauer, Martin Sevior, jean, sum1 and Hubert Figuière. Thanks goes to OCRopus and Gegl people for their work and advices.
I provide AbiScan patch against abiword-plugins SVN. The plugins does not work if abiword use G_MODULE_BIND_LAZY flags, this is a bug in abiscan, not abiword. I provide a patch against abiword SVN removing g_module_open flags, but it will hopefully never be merged.
If you want to try it, follow the following steps :
- Install tesseract-ocr from SVN, with the patch i provide in tesseract BTS ;
- Install ocropus ;
- Install Gegl SVN ;
- Install Gnome Scan SVN ;
- Install abiword SVN with g-module-open-flags.diff patch ;
- Install abiword-plugins SVN with abiscan.diff patch ;
- Launch Abiword
- Launch Insert > Import from scanner and follow the steps.
Warning : that’s really buggy.
- Gnome Scan does not handle device list very well if you launch several times the dialog.
- OCRopus does not provide any API, so the plugin use system() and isn’t able to monitor progress. OCRopus might take very long time.
- Sometimes, it eats tons of memory.
- Currently, it lose formating, that’s due to a HTML import pasteFromBuffer() bug. I had to make a choice between paste into existing document losing formating, or open directly tmp OCRopus HTML directly.
Bug reports are very welcome, please file bugs to gnome-scan product in Gnome bugzilla, for the abiscan component. Note that OCRopus prefer 150dpi images.
Anyway, that’s a rought draft with the key feature provided by Gnome Scan and OCRopus : tight integration into application and advanced OCR.
E Ultreïa !