1. médialab Sciences Po
  2. Productions
  3. minet

minetmade by the médialab

webmining library and command line tool written in python

Tools – Code

Guillaume Plique, Jules Farjas

Minet is a python library and command line tool aiming at helping its users to perform various typical webmining tasks.

Téléchargement d'urls depuis le shell en utilisant minet
Téléchargement d'urls depuis le shell en utilisant minet

Minet can for instance be used to:

  • Download very quickly large numbers of urls
  • Scrape using a custom DSL
  • Crawl using a custom DSL
  • Extract content from HTML pages
  • Transform and parse batches of urls
  • Collect data through APIs such as Crowdtangle or Media Cloud

Minet is the result of the lab's lasting experience in webmining and is now used daily for a lot of projects relying on web data collection.

harvesting and processing

developers

usable

2019