I am working on a statical analysis of some python libraries, the source code of the libraries is available on Github. Is there a way to find out how many times a certain library was used in other applications? The GitHub insight provides information for one month only which is not enough in my case to compare the popularity of the libraries.
Thanks in advance.
CodePudding user response:
Yes, there is. I have recently performed research on this topic. First and foremost, I would recommend 
This plot can be interacted with to see the number of times each section was used, including the root itself.
To very concisely answer your question, you can use the following:
from module_dependencies import Module
mod_name = "mymodule"
module = Module(mod_name , count="all")
print(f"{mod_name} was used {module.nested_usage()[mod_name]['occurrences']} times")
This provides a clear, verifiable number of uses in real projects hosted on GitHub (or Gitlab). module_dependencies also extracts the links to those repositories and files that use your module of interest, and tracks how many stars each of those repositories have, in case that is interesting for your analysis.
See https://tomaarsen.github.io/module_dependencies/ for the documentation of module_dependencies. Once again: I am the author of this module.
CodePudding user response:
You can use githunt
But you have to write some code to extract information from html page using Beautiful Soup library of python
There exist a Kaggle dataset, but again it is not updated and limited to specific domain only.
