See also: wanghaisheng/awesome-ocr - A curated list of promising OCR resources at GitHub. Tessnet2 is under Apache 2 license (like tesseract), meaning you can use it like you want, included in commercial products.įew others: ABBYY CLI OCR for Linux, Asprise OCRįor more complete list, check: List of optical character recognition software at Wikipedia That expose very simple methods to do OCR. Tesseract is a C++ open source OCR engine. We expect that it will also be an excellent OCR system for many other Intended for high-throughput, high-volume document conversion efforts.
OCRopus is development is sponsored by Google and is initially High-performance handwriting recognizer developed in the mid-90's andĭeployed by the US Census bureau, and novel high-performance layout The OCRopus engine is based on two research projects: a Large scale machine learning for addressing problems in documentĪnalysis, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. OCRopus™ ( FAQ) (written in Python, NumPy, and SciPy) Open many different image formats, and its quality have been improving Makes it very easy to port to different OSes and architectures. GOCR can be used with different front-ends, which It converts scanned images of textīack to text files. Choose the from multiple formats to download. Only PNG and JPG image file formats are supported. Drag and drop the image of your document to initiate the file upload process. Tesseract is probably the most accurate open source Make sure all conners of the document are visible in the picture, this will help improve the accuracy of the extraction. NET, Tesseract iOSĪn OCR Engine that was developed at HP Labs between 19.Īnd now at Google. There are few popular OCR command-line tools you can use (I'm not sure if they've GUI):Īlso available for: Tesseract.
Tesseract can only read a TIFF file - if you've got a JPEG or PDF or whatever, you'll have to convert it. To run tesseract goto terminal and type the following tesseract imagefile.tif outputfile.txt
Is Command line utility and it is very simple to use.You can install language package tesseract-ocr-eng from here. Is a document layout analysis and optical character recognition system. Is a KDE application but works fine,in addition you have to install actual OCR programs like GOCR and OCRAD.After installing Kooka and the OCR programs,you have to point Kooka to the OCR install location in order for it to be able to convert the JPEG to text. Is an OCR can be used as a stand-alone console application,or as a backend to other programs.
Is an OCR (Optical Character Recognition) program.It converts scanned images of text back to text files.