需要安装的有
PILpytesseractTesseract-OCR打开命令提示元,输入:
PILpip install pillow
Pytesseractpip install pytesseract
Tesseract-OCRTesseract-orc-setup-3.02.02.exe
要记得自己的安装路径(我的安装路径为:C:\Program Files(x86)\Tesseract-OCR),待会会用到。
已上都完成后,开始进入实做吧!
首先用小画家测试一下
import pytesseractfrom PIL import Imagepytesseract.pytesseract.tesseract_cmd = 'C://Program Files (x86)/Tesseract-OCR/tesseract.exe'image = Image.open("C:\Users\user\Desktop\Myimgtest\test_1.png")text = pytesseract.image_to_string(image)print(text)
输出结果
Hello word !
功能介绍
pytesseract.pytesseract.tesseract_cmd 为Tesseract-OCR的安装路径
Image.open 你所要辨识图片的所在地
pytesseract.image_to_string 图片转换为文字
如果发生SyntaxError
image = Image.open("C:\Users\user\Desktop\Myimgtest\test_1.png")
^SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3:
truncated \UXXXXXXXX escape
记得在引号最前面加个r → (r"")
在字符串前加个r 是为了告诉编译器这个string是个raw string,不要转译
image = Image.open(r"C:\Users\user\Desktop\Myimgtest\test_1.png")
接下来换个的图片来测试
输出结果
This translation was prepared by Lloyd Kramer. Kramer graduated from theUniversity of California, Berkeley, with a major in Russian. He is also a graduate of the U.S.Navy Foreign Language School in Boulder, Colorado. While a student at Berkeley he waspresident of Dobro Slovo, the Slavic language honor society. As a naval officer during WorldWar H he served as both interpreter and translator in Russian for the U.S. Navy. After thewar, Kramer worked for a year as an analyst in Washington, DC. Subsequent to thisassignment, he joined the staff of the Hoover Institute and Library, Stanford University,where he helped organize and catalog the Institute's large collection of Slavic language nonvbook materials.Mr. Kramer now resides, with his Wife Martha, in Twain Harte, CaliforniaFebruary 23, 2000