Does anyone know how one might go about calculating the x,y coordinates of, say, a button or an image on a monitor by analysis of a picture of that monitor?
My laptop's webcam faces the work computer's monitors, so that a photograph through the webcam will capture the database entry as well as the phone system. I need to compute the coordinates of the elements on those monitors from the picture of the monitors so that I can send the coordinates to a microprocessor, which in turn would programmatically control keyboard and mouse.
Anyone know how to do this in python?
CodePudding user response:
This type of problem is called "object detection", and is frequently solved by training an ML model to draw a bounding box around the objects you're interested in. That training usually involves feeding the model examples images where you have drawn the bounding boxes manually and some negative examples where the monitor and/or the button are not present.
In your case, you would want to detect the monitor and then detect the button you're looking for. By subtracting the coordinates of the monitor from the button, you would get a very approximate x,y location of the button on the screen.
That said, this approach is likely to be very brittle and error prone. If it were possible to get the video output being sent to the monitor directly (using a splitter perhaps), that would give you much better data to work with.
