Make your own Python OCR module - read text from your screen!

Make your own Python OCR module - read text from your screen!

Welcome to Nekoto's Brain Bash!

Lets make hard concepts easier for newbies!

OCR is super useful. Why?

  • Play a Discord text game with a program? Not an issue
  • Copied images that have text and can't be selected? No problem
  • Automatically write notes from a presentation to a text file? You got it!

The above are just a few to name, but OCR has many uses when paired with a language you love writing in. Python of course, is a very easy-to-learn and popular programming language, and makes it very easy for us to create our OCR module.

We will not be writing the code for the OCR, just making a module so its easier for us to use in our own programs

Why make a module?

OCR is really powerful, so I know that you're going to be using it a lot in your programs. However, rewriting the basic OCR functionality for each program is going to become really tiresome. I'm sure you just want a "read the screen from here to here" function that gives you the text on screen.

That's what we're going to create

Prerequisites

  • Python 3
  • You should know some basic Python programming
  • You should know how to install external modules

Getting what we need

I'm going to be using Windows for this tutorial.

We're going to need the OCR itself, which is written in C++ but we're not going to using C++ to interact with it. You can download them from the links below.

For Windows:

github.com/UB-Mannheim/tesseract/wiki

OR

digi.bib.uni-mannheim.de/tesseract

For Linux/Mac:

tesseract-ocr.github.io/tessdoc/Installatio..

Cools! Now all we need is a way to get Python to interact with it.

Luckily, there's a really cool module on PyPi that allows us to do this! We can install it using pip.

For Windows:

pip install pytesseract

One last module we need is "Pillow" although I think most people already have this. If not, the use this command on Windows:

pip install Pillow

Actually making our module

Fire up your favorite Python IDE or Text Editor and lets get to work!

First, we'll import the modules we need:

import numpy as nm # cv2 uses numpy ndarray so we need this
import pytesseract # used for OCR duh
import cv2 # used to make image into grayscale for easy reading by Tesseract
from PIL import ImageGrab # Used to take picture of screen for OCR
import os # Used to get Tesseract's path

Ok here's the last annoying thing in this post, I promise.

You'll need to get the installation path of the Tesseract. Which should be in your "localappdata" folder. You can check by opening file explorer and typing in %localappdata%\Programs then checking if there is a folder called Tesseract-OCR . If not, then find the installation folder and move it here.

Now, on with the code.

We need to tell our program where the Tesseract is located, so lets put that into a variable:

path = str(os.getenv('LOCALAPPDATA')) + '/Programs/Tesseract-OCR/tesseract.exe'

Lets also make a function to change the path if our main program finds out the Tesseract is located elsewhere and can specify the path to our module

def change_path(new_path):
    global path
    path = new_path

Now for the most important function, that we will call to capture the screen and convert it to text.

I think its better to show you this function first and then break it down together. So here it is:

def screen_to_text(arr):
    global path
    pytesseract.pytesseract.tesseract_cmd = path
    #Image-Grab to capture screen
    #BBox to capture just a specific area
    cap = ImageGrab.grab(bbox=(arr[0], arr[1], arr[2], arr[3]))

    #Save image for debugging only
    #cap.save("grab.png")

    #Convert to monochrome so it can be read easily
    #Read using OCR and make string of text from it

    #Save grayscale image for debugging purposes only
    #cv2.imwrite("processed.png", cv2.cvtColor(nm.array(cap), cv2.COLOR_BGR2GRAY))
    tesstr = pytesseract.image_to_string( 
            cv2.cvtColor(nm.array(cap), cv2.COLOR_BGR2GRAY),  
            lang ='eng')
    return tesstr

Phew! That's a lot! But its really not that complex.

  • global path is going to take the path variable we defined earlier.
  • pytesseract.pytesseract.tesseract_cmd = path is telling the pytesseract module where to find the Tesseract
  • cap = ImageGrab.grab(bbox=(arr[0], arr[1], arr[2], arr[3])) takes an image of our screen using the coordinates we sent to the function
  • cap.save("grab.png") is commented out, but you can uncomment it to see exactly what the program took a photo of
  • tesstr = pytesseract.image_to_string(...) takes the image passed to it and turns it into a string.

However, the issue is that Tesseract doesn't work very well with colored images. So we use Open-CV to convert it to a monochrome image and then pass it to pytesseract. That's what cv2.cvtColor(nm.array(cap), cv2.COLOR_BGR2GRAY) does, it converts the image to a grayscale image.

The second argument - lang ='eng' tells pytesseract what language we expect the image to be in.

Finally we send back the text to the caller of the function using return tesstr

And there you go! You just made an OCR module using Python that you can use in any of your other programs! Now simply save the file as ocr-module.py or whatever you like!

Usage

Move the Python module we just made into the same folder as the Python script you are writing.

In your Python script, import the module

import ocr-module

Replace "ocr-module" with whatever you named the file, but don't include .py in the import statement.

Now let's say we wanted to read all the text on a 1080p screen. Here's what we do:

import ocr-module

text = ocr-module.screen_to_text([0, 0, 1920, 1080])

print(text)

The list/array we passed to the function defines what part of the screen we want it to read. The format is like this:

[
    top_left_x_coord,
    top_left_y_coord,
    bottom_right_x_coord,
    bottom_right_y_coord
]

So, when we do [0, 0, 1920, 1080] it takes it from the coordinates (0, 0) to (1920, 1080) which is the entire 1080p screen.

Hope you enjoyed this first post of Nekoto's Brain Bash! Hopefully you learned something new or maybe found something cool you could create with OCR!

Enjoy!