minetmade by the médialab
webmining library and command line tool written in python
Tools – Code
Guillaume Plique, Jules Farjas
Minet is a python library and command line tool aiming at helping its users to perform various typical webmining tasks.
Minet can for instance be used to:
- Download very quickly large numbers of urls
- Scrape using a custom DSL
- Crawl using a custom DSL
- Extract content from HTML pages
- Transform and parse batches of urls
- Collect data through APIs such as Crowdtangle or Media Cloud
Minet is the result of the lab's lasting experience in webmining and is now used daily for a lot of projects relying on web data collection.
harvesting and processing