AbiScan Preview

August 6, 2007

Hi all,

Resulting in about one week of lazy effort, i reach to produce a preliminary version of AbiScan on top of OCRopus. I produced a screencast video of direct OCR import into Abiword Frame. This is very buggy, but very exciting too :).

I must thanks #abiword people, especially Dominic Lachowicz, Marc Mauer, Martin Sevior, jean, sum1 and Hubert Figuière. Thanks goes to OCRopus and Gegl people for their work and advices.

I provide AbiScan patch against abiword-plugins SVN. The plugins does not work if abiword use G_MODULE_BIND_LAZY flags, this is a bug in abiscan, not abiword. I provide a patch against abiword SVN removing g_module_open flags, but it will hopefully never be merged.

If you want to try it, follow the following steps :

  1. Install tesseract-ocr from SVN, with the patch i provide in tesseract BTS ;
  2. Install ocropus ;
  3. Install Gegl SVN ;
  4. Install Gnome Scan SVN ;
  5. Install abiword SVN with g-module-open-flags.diff patch ;
  6. Install abiword-plugins SVN with abiscan.diff patch ;
  7. Launch Abiword
  8. Launch Insert > Import from scanner and follow the steps.

Warning : that’s really buggy.

  • Gnome Scan does not handle device list very well if you launch several times the dialog.
  • OCRopus does not provide any API, so the plugin use system() and isn’t able to monitor progress. OCRopus might take very long time.
  • Sometimes, it eats tons of memory.
  • Currently, it lose formating, that’s due to a HTML import pasteFromBuffer() bug. I had to make a choice between paste into existing document losing formating, or open directly tmp OCRopus HTML directly.

Bug reports are very welcome, please file bugs to gnome-scan product in Gnome bugzilla, for the abiscan component. Note that OCRopus prefer 150dpi images.

Anyway, that’s a rought draft with the key feature provided by Gnome Scan and OCRopus : tight integration into application and advanced OCR.


becomes

Regards,
Étienne


E Ultreïa !

3 Responses to “AbiScan Preview”

  1. Mariot Chauvin Says:

    Great work !

    Not directly related to the subject : Can you explain briefly or give a link the method you follow to create the screen cast ?

  2. Étienne Bersac Says:

    Mariot: See recordMyDesktop. Istanbul might help you too.

    Regards

  3. Anonymous Says:

    Super :p

    Does the OCR software recognize Latin characters such as “éèà” ?


Leave a Reply