I'm trying to parse numbersmostly and some letters from the screen. It's plenty of topics online but 99% are all dated and old and all of them are asking how to get text from an image ( which I dont need). In my case, I want simply a transparent picturebox that will scan whatever text there is under. I found a quite old youtube tutorial
and the code that is written on the bio is
Imports Emgu.CV
Imports Emgu.Util
Imports Emgu.CV.OCR
Imports Emgu.CV.Structure
Public Class Form1
Dim OCRz As Tesseract = New Tesseract("tessdata", "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY)
Dim pic As Bitmap = New Bitmap(270, 100)
Dim gfx As Graphics = Graphics.FromImage(pic)
Private Sub Timer1_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Timer1.Tick
'If Windows XP
gfx.CopyFromScreen(New Point(Me.Location.X PictureBox1.Location.X 4, Me.Location.Y PictureBox1.Location.Y 30), New Point(0, 0), pic.Size)
PictureBox1.Image = pic
'If Windows 7
'gfx.CopyFromScreen(MousePosition, New Point(0, 0), pic.Size)
End Sub
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
OCRz.Recognize(New Image(Of Bgr, Byte)(pic))
RichTextBox1.Text = OCRz.GetText
End Sub
End Class
It is a 8 y old code so I think there must be something better. I downloaded emgu.cv from nuget, the first and most downloaded package but at runtime I get the error "'Unable to create ocr model using Path 'tessdata' and language 'eng'.'" At compile time I get the error "'Tesseract' is not defined.". That's really frustrating cause I've been looking everywhere, also in c# forums but none that can help me. Do you have any solution? I would appreciate your help. Thanks
CodePudding user response:
That code is quite old and I feel It wouldn't be working properly. Have you considered Windows.Media.OCR ? Windows v.10 SDK required.
Controls used:
- 1 picturebox transparent ( Use as same backcolor as form transparentkey, I'm using grey)
- 1 richtextbox
- 1 button
Add as a reference:
"C:\Program Files (x86)\Windows Kits\10\UnionMetadata\Windows.winmd"
"C:\ProgramFiles(x86)\ReferenceAssemblies\Microsoft\Framework.NETCore\v4.5\System.Runtime.WindowsRuntime.dll"
Imports Windows.Media.Ocr
Imports System.IO
Imports System.Runtime.InteropServices.WindowsRuntime
Public Class Form1
Private Async Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim softwareBmp As Windows.Graphics.Imaging.SoftwareBitmap
Using bmp As Bitmap = New Bitmap(PictureBox1.Width, PictureBox1.Height)
Using g As Graphics = Graphics.FromImage(bmp)
Dim pt As Point = Me.PointToScreen(New Point(PictureBox1.Left, PictureBox1.Top))
g.CopyFromScreen(pt.X, pt.Y, 0, 0, bmp.Size, CopyPixelOperation.SourceCopy)
Using memStream = New Windows.Storage.Streams.InMemoryRandomAccessStream()
bmp.Save(memStream.AsStream(), System.Drawing.Imaging.ImageFormat.Bmp)
Dim decoder As Windows.Graphics.Imaging.BitmapDecoder = Await Windows.Graphics.Imaging.BitmapDecoder.CreateAsync(memStream)
softwareBmp = Await decoder.GetSoftwareBitmapAsync()
End Using
End Using
End Using
Dim ocrEng = OcrEngine.TryCreateFromUserProfileLanguages()
'If you want to scan only letters, set the language
'Dim ocrEng = OcrEngine.TryCreateFromLanguage(New Windows.Globalization.Language("en-US"))
Dim languages As IReadOnlyList(Of Windows.Globalization.Language) = ocrEng.AvailableRecognizerLanguages
For Each language In languages
Console.WriteLine(language.LanguageTag)
Next
Dim r = ocrEng.RecognizerLanguage
Dim n = ocrEng.MaxImageDimension
Dim ocrResult = Await ocrEng.RecognizeAsync(softwareBmp)
RichTextBox1.Text = ocrResult.Text
'Follow lines are just to test how the OCR engine has cutted the lines from the whole text
Dim lines As IReadOnlyList(Of OcrLine) = ocrResult.Lines
For Each line In lines
Console.WriteLine(line.Text)
Next
End Sub
End Class
