a research-driven web-crawler aimed at building, curating and categorizing a corpus of web actors and the network graph of hyperlinks connecting them

Mathieu Jacomy, Benjamin Ooghe-Tabanou, Paul Girard

Hyphe is an open source web-crawler allowing researchers to build corpora made of hyperlinked webpages about a specific topic (for instance, palm oil or coronavirus). 

These webpages are selected by researchers and can be grouped as « webentities », which can be single pages as well as a website, subdomains or parts of it, or even a combination of those. They represent different actors of the issue at hand (for instance, a person, an organization, etc.).

By crawling them, Hyphe builds iteratively and helps visualize a network graph of the relationships between these actors through the hyperlinks connecting the webentities.

New webentities are automatically suggested after they were discovered by crawling each entities hyperlinks, and researchers can then review them in an iterative and qualitative process.

As it allows researchers to manually choose and then tag which actors they want to add to their corpus, Hyphe should be considered as a quali-quantitative tool.

