Home > database >  How to verify if the image contains noise in background before ‘OCR’ing
How to verify if the image contains noise in background before ‘OCR’ing

Time:01-22

I have several types of images that I need to extract text from. I can manually classify the images into 3 categories based on the noise on the background:

  1. Images with no noise.
  2. Images with some light noise in the background.
  3. Heavy noise in the background. Updated

For the category 1 images, I could apply OCR’ing fine without problems. → basic case.

For the category 2 images and some of the category 3 images, I could manage to extract the texts by applying the following methods:

  • Grayscale, Gaussian blur, Otsu’s threshold
  • Morph open to remove noise and invert the image → then perform text extraction.

For the OCR’ing task, one removing noise method is obviously not working for all images. So, Is there any method for classifying the level background noise of the images?

Please all suggestions are welcome. Thanks in advance.

Updated(2022 Jan 21): With the answer that I got from @B200011011 in Category 1

From Category 2 and 3 images. Examples:

Category 2: Category 2

Category 3: enter image description here

Here is the code which I am using:

from imutils import paths
from skimage import exposure
import math

for imagePath in paths.list_images("/content"):
    txt_block_img = cv2.imread(imagePath)
    img = cv2.imread(imagePath, cv2.IMREAD_GRAYSCALE)
    cv2_imshow(img)

    img_pixel_count = img.shape[0] * img.shape[1]
    h = np.array(exposure.histogram(img, nbins=256))

    if len(h[0]) == 256:
        bw_count = h[0][0]   h[0][255]
        other_count = img_pixel_count - bw_count

        bw_percentage = (bw_count * 100.0) / img_pixel_count
        other_percentage = (other_count * 100.0) / img_pixel_count

        # print('BW PIXEL PERCENTAGE: ',  bw_percentage)
        print('OTHER PIXEL PERCENTAGE: ',  math.ceil(other_percentage))

        differentiate_threshold = 30.0
        if other_percentage > differentiate_threshold:
            print('TYPE 2 or TYPE 3')
        else:
            print('TYPE 1: BLACK AND WHITE')
    else:
        print("the image has no black color")

I could get the following results:

enter image description here

However, this solution is not completed, since I can not find a good threshold to separate Type 2 and Type 3 images. So, are there any image processing methods that I could try to separate type 2 and type 3 images?

CodePudding user response:

Following up on your comment from enter image description here Image from, enter image description here Image from, enter image description here

  •  Tags:  
  • Related