


Add more options to the HTML exporation (e.g. My duplex scanner can OCR after scanning but the OCR technology in acrobat is more accurate in my opinion.Add an option to exclude the graphics contents (so only text is considered when genertating a document).Manual in German from Ubuntu Users' wiki: Please visit the link below to read OCRFeeder's README: You can checkout the latest sources doing the following: git clone OCRFeeder's category on the author's blog OCRFeeder was developed as the project of the Master's Thesis in Computer Science of Joaquim Rocha. It features a complete GTK graphical user interface that allows the users to correct any unrecognized characters, defined or correct bounding boxes, set paragraph styles, clean the input images, import PDFs, save and load the project, export everything to multiple formats, etc. It generates multiple formats being its main one ODT.

Installation for many can be done on the command line. The list below are open source and work well on Mac environments. Many standard image manipulation tools (Adobe, for example) can be used. Given the images it will automatically outline its contents, distinguish between what's graphics and text and perform OCR over the latter. There are a number of helpful utilities for preparing document files for use in Tesseract. OCRFeeder is a document layout analysis and optical character recognition system.
