Apache OpenOffice (AOO) Bugzilla – Issue 29855
lack of OCR
Last modified: 2013-02-07 22:37:27 UTC
The lack of any OCR interface means that OCR has to be done by way of cut-and-paste from some other application.
Couldn't OOo connect to/interface with gocr in some way?
reassigned to BH:
Lots of utility companies now offer online billing and usually this is provided in pdf format. In order to do any analysis of the information a facility to import to an ooo spreadsheet would be a powerful tool.
The lack of integrated OCR in OOo (Writer, specifically) continues to be a deficiency. Hardly a day goes by without me using the scanner to glean text or data for re-use in other documents. Integration with calc would be useful. Once upon a time GOCR was the best of a poor lot of (GNU/Linux) OCR engines. Today TESSERACT is much a better engine - but it still has limitations. OCUBE is a little script which integrates tesseract and SANE (would probably work with kooka for KDE users). Reformatting of text scanned imported into OOo documents is problematic because of all the inherent line-breaks, requiring either special incantations involving regular expressions (another issue?) - something the average user doesn't know or care about - or reformatting before inserting the scanned text into the OOo document. For that task alone I still run TextPad.exe with WINE - there are no GNU/Linux equivalents.
To grep the issues easier via "requirements" I put the issues currently lying on my owner to the owner "requirements".