1. médialab Sciences Po
  2. Productions
  3. artoo.js

artoo.jsmade by the médialab

bookmarklet injecting JavaScript code in a web page in order to provide scraping utilities

Tools – Code

Guillaume Plique

artoo.js is a bookmarklet able to inject JavaScript scraping utilities in any web page.

artoo has been injected into the web page!
artoo has been injected into the web page!

This tool stems from the following observation: web technologies have gotten more and more complex since the beginning of the Internet. As a consequence, it has gotten more and more difficult to scrape websites, especially when they rely heavily on JavaScript to function. Thus, scrapers found plenty of tricks to try and "emulate" how modern web browser work. But if you only want to retrieve small quantities of data on the web, why bother emulating a browser at all when you can parasitize the browser itself? artoo is somehow doing exactly this by injecting its code into target web pages.

Thus, artoo enables its user to scrape more easily, to force the browser to download the result of your extractions, to automatically unroll infinite lists, to spawn ajax spiders, to watch requests fired by JavaScript etc.

Finally, thanks to its bookmarklet generator, one can easily create bookmarklets injecting one's own code alongside artoo so that anyone may use it without requiring any programming skills.

It is then easy to use artoo to create custom tools able to automatize data collection on the web directly from the user's browser. You could for instance create a bookmarklet aiming at downloading the results of a Google query as a CSV file.

harvesting

developers

usable

2014