I'm trying to write a web scraper with selenium in python for a website used to look up medical school statistics. https://mec.aamc.org/msar-ui/#/medSchoolDetails/102 provides an example. I've been able to successfully scrape most of the data, but some of the data, such as the matriculant demographics (which you should be able to view without a subscription) is in highcharts bar graphs. This is proving to be very difficult, as I had only scraped data from static websites before.
I initially tried looking up the text that hovers over each bar by CSS selector, but a couple of the characters at the beginning of the selector change every time I access the site, so I can't do it that way. I tried looking up ways to search for an element by CSS selector with wildcards in place of those letters, but everything I found had explanations that were way too high-level for me to understand. I also tried searching how to scrape data from highcharts in general, but again I could not understand what I read.
Any help you guys could give (or an explanation if it's not possible) would be greatly appreciated. Thanks!
CodePudding user response:
So the "easiest" way seems as follows:
The element:
(//*[@class='highcharts-plot-background'])[1]
Contains an attribute named height. This height is 310. This element height seems to represent the Y axis 0-100. So 310 represents 100.
Then the bars. This seems a bit more complicated. I cannot find any unique identifier except the colour, which is not unique.
So basically, under the header Matriculant Demographics there is a chart with 2 blue graphs.
So you are looking for something like this:
(//*[@class='highcharts-plot-background'])[1]/..//*[@class='highcharts-point highcharts-color-0 ']
There will be 2 elements, so pick the first one first and then the second one
These are the 2 blue bars within the first. So you will need to identify which is the first one and which is the second one. From each you can then get the attribute height.
Then you can easily calculate the value by dividing the second height by the first height. In this example, dividing 186 by 310, totaling 0.6 - so 60.
Hope it helps! I got it working this way :)
