最近在想没有一个图片文字识别的工具呢?我想到了OCR,国内比较牛逼的汉王OCR。那借助python能否实现呢?于是我找啊找查啊查有关PYthon在这方面探讨的资料,发现PyTesser 这样一个好玩的程序!拿出来分享讨论一下:
PyTesser 是python的一个光学字符识别模块,它结合Tesseract OCR引擎来使用,能从一个图片或图像文件取出的字符串并输出。
使用PyTesser ,你无须安装Tesseract OCR引擎,但就必须要先安装PIL模块(Python Image Library, python的图形库)
官方介绍说明:
PyTesser is an Optical Character Recognition module for Python. It takes as input an image or image file and outputs a string.
PyTesser uses the Tesseract OCR engine,converting images to an accepted format and calling the Tesseract executable as an external script. A Windows executable is provided along with the Python scripts. The scripts should work in other operating systems as well.
PyTesser uses the Tesseract OCR engine,converting images to an accepted format and calling the Tesseract executable as an external script. A Windows executable is provided along with the Python scripts. The scripts should work in other operating systems as well.
PyTesser 官方下载地址:http://code.google.com/p/pytesser/downloads/list
PIL库资源地址: http://www.pythonware.com/products/pil/