Welcome to Nekoto's Brain Bash!
Lets make hard concepts easier for newbies!
OCR is super useful. Why?
- Play a Discord text game with a program? Not an issue
- Copied images that have text and can't be selected? No problem
- Automatically write notes from a presentation to a text file? You got it!
The above are just a few to name, but OCR has many uses when paired with a language you love writing in. Python of course, is a very easy-to-learn and popular programming language, and makes it very easy for us to create our OCR module.
We will not be writing the code for the OCR, just making a module so its easier for us to use in our own programs
Why make a module?
OCR is really powerful, so I know that you're going to be using it a lot in your programs. However, rewriting the basic OCR functionality for each program is going to become really tiresome. I'm sure you just want a "read the screen from here to here" function that gives you the text on screen.
That's what we're going to create
Prerequisites
- Python 3
- You should know some basic Python programming
- You should know how to install external modules
Getting what we need
I'm going to be using Windows for this tutorial.
We're going to need the OCR itself, which is written in C++ but we're not going to using C++ to interact with it. You can download them from the links below.
For Windows:
github.com/UB-Mannheim/tesseract/wiki
OR
digi.bib.uni-mannheim.de/tesseract
For Linux/Mac:
tesseract-ocr.github.io/tessdoc/Installatio..
Cools! Now all we need is a way to get Python to interact with it.
Luckily, there's a really cool module on PyPi that allows us to do this! We can install it using pip.
For Windows:
pip install pytesseract
One last module we need is "Pillow" although I think most people already have this. If not, the use this command on Windows:
pip install Pillow
Actually making our module
Fire up your favorite Python IDE or Text Editor and lets get to work!
First, we'll import the modules we need:
import numpy as nm # cv2 uses numpy ndarray so we need this
import pytesseract # used for OCR duh
import cv2 # used to make image into grayscale for easy reading by Tesseract
from PIL import ImageGrab # Used to take picture of screen for OCR
import os # Used to get Tesseract's path
Ok here's the last annoying thing in this post, I promise.
You'll need to get the installation path of the Tesseract. Which should be in your "localappdata" folder.
You can check by opening file explorer and typing in %localappdata%\Programs
then checking if there is a folder called Tesseract-OCR
. If not, then find the installation folder and move it here.
Now, on with the code.
We need to tell our program where the Tesseract is located, so lets put that into a variable:
path = str(os.getenv('LOCALAPPDATA')) + '/Programs/Tesseract-OCR/tesseract.exe'
Lets also make a function to change the path if our main program finds out the Tesseract is located elsewhere and can specify the path to our module
def change_path(new_path):
global path
path = new_path
Now for the most important function, that we will call to capture the screen and convert it to text.
I think its better to show you this function first and then break it down together. So here it is:
def screen_to_text(arr):
global path
pytesseract.pytesseract.tesseract_cmd = path
#Image-Grab to capture screen
#BBox to capture just a specific area
cap = ImageGrab.grab(bbox=(arr[0], arr[1], arr[2], arr[3]))
#Save image for debugging only
#cap.save("grab.png")
#Convert to monochrome so it can be read easily
#Read using OCR and make string of text from it
#Save grayscale image for debugging purposes only
#cv2.imwrite("processed.png", cv2.cvtColor(nm.array(cap), cv2.COLOR_BGR2GRAY))
tesstr = pytesseract.image_to_string(
cv2.cvtColor(nm.array(cap), cv2.COLOR_BGR2GRAY),
lang ='eng')
return tesstr
Phew! That's a lot! But its really not that complex.
global path
is going to take the path variable we defined earlier.pytesseract.pytesseract.tesseract_cmd = path
is telling the pytesseract module where to find the Tesseractcap = ImageGrab.grab(bbox=(arr[0], arr[1], arr[2], arr[3]))
takes an image of our screen using the coordinates we sent to the functioncap.save("grab.png")
is commented out, but you can uncomment it to see exactly what the program took a photo oftesstr = pytesseract.image_to_string(...)
takes the image passed to it and turns it into a string.
However, the issue is that Tesseract doesn't work very well with colored images. So we use Open-CV to convert it to a monochrome image and then pass it to pytesseract. That's what cv2.cvtColor(nm.array(cap), cv2.COLOR_BGR2GRAY)
does, it converts the image to a grayscale image.
The second argument - lang ='eng'
tells pytesseract what language we expect the image to be in.
Finally we send back the text to the caller of the function using return tesstr
And there you go! You just made an OCR module using Python that you can use in any of your other programs! Now simply save the file as ocr-module.py
or whatever you like!
Usage
Move the Python module we just made into the same folder as the Python script you are writing.
In your Python script, import the module
import ocr-module
Replace "ocr-module" with whatever you named the file, but don't include .py
in the import statement.
Now let's say we wanted to read all the text on a 1080p screen. Here's what we do:
import ocr-module
text = ocr-module.screen_to_text([0, 0, 1920, 1080])
print(text)
The list/array we passed to the function defines what part of the screen we want it to read. The format is like this:
[
top_left_x_coord,
top_left_y_coord,
bottom_right_x_coord,
bottom_right_y_coord
]
So, when we do [0, 0, 1920, 1080]
it takes it from the coordinates (0, 0)
to (1920, 1080)
which is the entire 1080p screen.
Hope you enjoyed this first post of Nekoto's Brain Bash! Hopefully you learned something new or maybe found something cool you could create with OCR!
Enjoy!