1. médialab Sciences Po
  2. Productions
  3. DMI Tools

DMI Toolsrecommended by the médialab

a collection of useful digital instruments for sociologists, built by the Digital Methods Initiative at the University of Amsterdam

Tools – Software

The Digital Methods Initiative

The Digital Methods Initiative (DMI) is one of Europe's leading Internet Studies research groups. Comprised of new media researchers and PhD candidates, it designs methods and tools for repurposing online devices and platforms (such as Twitter, Facebook and Google) for research into social and political issues. 

The DMI tools page offers a collection of tools for gathering, exploring and visualizating internet data in the frame of social and political science research processes. Here is a list of the tools you will find on this page :

  • 4CAT: Capture and Analysis Toolkit: Create datasets from a variety of web forums and analyze them.
  • Amazon Book Explorer: Provides different analytics for Amazon.com's book search
  • Amazon Related Product Graph: This PHP script allows you to enter a (set of) ASIN(s) and crawl its recommendations up til a user-specified depth.
  • App Tracker explorer: DMI App Tracker Tracker is a tool to detect in a set of APK files predefined fingerprints of known tracking technologies or other software libraries.
  • Bubble Lines: Input tags and values to produce relatively sized bubbles. Output is an svg.
  • Censorship Explorer: Check whether a URL is censored in a particular country by using proxies located around the world.
  • Colors For Data Scientists: Generate and refine palettes of optimally distinct colors. (by Médialab Sciences-Po)
  • Compare Lists: Compare two lists of URLs for their commonalities and differences.
  • Compare Networks Over Time: Compares Issue Crawler networks over time, and displays ranked actor lists. The over time module is best used in tandem with the Issue Crawler scheduler. The results may be plotted to line g...
  • Convert Issuecrawler to Navicrawler: Convert an Issuecrawler XML file into the WXSF format of the Navicrawler file. For Navicrawler, see http://webatlas.fr/wp/navicrawler/.
  • Deduplicate: Replicates the tags in a tag cloud by their value
  • Discus Comment Scraper: This tool scrapes threads and comments from websites implementing the Disqus commenting system.
  • Dorling Map Generator: Input tags and values to produce a Dorling Map (i.e. bubbles). Output is an svg.
  • Expand Tiny Urls: Expands URLs that have been shortened by tools like tinyurl.com or bit.ly. Often used in social media such as Twitter or Facebook.
  • Extract URLs: Extracts URLs from an Issuecrawler result file (.xml). Useful for retrieving starting points as well as a clean list of the actors in the network.
  • Geo IP: Translates URLs or IP addresses into geographical locations
  • Github organizations meta-data lookup: Extract the meta-data of organizations on Github
  • Github repositories meta-data lookup: Extract the meta-data of Github repositories
  • Github repositories scraper: Scrape Github for forks of projects
  • Github scraper: Scrape Github for user interactions and user to repository relations
  • Github user meta-data lookup: Extract meta-data about users on Github
  • GithubContributorsScraper: Find out which users contributed source code to Github repositories
  • Google Autocomplete: Retrieves autocomplete suggestions from Google
  • Google Image Scraper: Query images.google.com with one or more keywords, and/or use images.google.com to query specific sites for images.
  • Google Play Similar Apps: DMI Google Play Similar Apps is a simple tool to extract the details of individual apps, collect ‘Similar’ apps, and extract their details.
  • Google Reverse Image scraper: Scrape Google for occurance of images
  • Googlescraper (Lippmannian Device): Batch queries Google. Query the resonance of a particular term, or a series of terms, in a set of Websites.
  • Harvester: Extract URLs from text, source code or search engine results. Produces a clean list of URLs.
  • Image Scraper: Scrape images from a single page.
  • Instagram Scraper: Retrieves Instagram images for hashtags, locations, or user names.
  • Internet Archive Wayback Machine Link Ripper: Scrapes links from the Wayback Machine
  • Internet Archive Wayback Machine Network Per Year: Enter a set of URLs and the archived versions closest to 1 July for a specific year are retrieved. Thereafter links are extracted and a network file is output.
  • Issue Dramaturg: Enter up to 3 URLs as well as a key word. The Issuedramaturg queries Google for the key word, and shows the Pageranks of the URLs over time. The output is a graph of the Pagerank of the URLs...
  • Issue Geographer: Geo-locates the organizations on an Issue Crawler map, using whois information, and visualizes the organizations' registered locations on a geographical map.
  • Issuecrawler: Enter URLs and the Issue Crawler performs co-link analysis in one, two or three iterations, and outputs a cluster graph. The Issue Crawler also has modules for snowball crawling (up to 3 deg...
  • Itunes Store: Queries the itunes store
  • Language Detection: Detects language for given URLs.  The first 1000 characters on the Web page(s) are extracted, and the language of each page is detected.
  • Link Ripper: Capture all internal links and/or outlinks from a page.
  • Lippmannian Device: The Lippmannian device is named Walter Lippmann, and provides a coarse means of showing actor partisanship.
  • Lippmannian Device To Gephi: This tool allows one to visualize the output of the Lippmannian device as a network with Gephi.
  • Netvizz: Extracts various datasets from Facebook.
  • News Agencies Scraper: Scrape various news agencies for particular keywords and extract titles, images, dates and full text.
  • Ranked Deep Pages from Core Issue Crawler Network: Enter an Issuecrawler XML file and this script will get out all pages from the core network and rank those by pages by inlink count.
  • Raw Text to Tag Cloud Engine: Takes raw text, counts the words and returns an ordered, unordered or alphabetically ordered tagcloud.
  • Rip Sentences: Rip text from a specified page and force line breaks between sentences.
  • Robots.txt Discovery: Display a site's robot exclusion policy.
  • Screenshot generator: Produce screenshots for a list of URLs
  • Search Engine Scraper: Search Engine Scraper
  • Source Code Search: loads a URL and searches for patterns in the page's source code
  • TLD counts: Enter URLS, and count the top level domains.
  • Table to Net: Extract a network from a table. Set a column for nodes and a column for edges. It deals with multiple items per cell. (by Médialab Sciences-Po)
  • Tag Cloud Combinator: Enter two or more tag clouds and the values of each tag will be summed.
  • Tag Cloud Generator: Input tags and values to produce a tag cloud. Output is in SVG.
  • Tag Cloud HTML Generator: Input tags and values in wordle format to produce a HTML tag cloud or tag list.
  • Tag Cloud To Wordle: This tool allows one to transform a normal tag cloud into a fancy Wordle one.
  • Text Ripper: Rip all non-html (i.e. text) from a specified page.
  • Timestamp Ripper: Rips and displays a web page's last modification date (using the page's HTML header). Beware of dynamically generated pages, where the date stamps will be the time of retrieval.
  • Tracker Tracker: DMI App Tracker Tracker is a tool to detect in a set of URLs predefined fingerprints of known web tracking technologies.
  • Triangulation: Enter two or more lists of URLs or other items to discover commonalities among them. Possible visualizations include a Venn Diagram.
  • Tumblr: a simple co-hashtag and post data tool for Tumblr
  • Twitter Capture and Analysis Toolset (DMI-TCAT): Captures tweets and allows for multiple analyses (hashtags, mentions, users, search, ...)
  • Wikipedia Cross-Lingual Image Analysis: Makes the images of all language versions of a Wikipedia article comparable.
  • Wikipedia Edits Scraper and IP Localizer: Scrapes Wikipedia history and does IP to Geo for anonymous edits
  • Wikipedia Entry Check: This tool checks if the issues exist as a Wikipedia page, i.e., an article. If it exists it checks whether the organization is mentioned on that page.
  • Wikipedia History Flow Companion: This script allows you to specify a range of Wikipedia revisions for use with the History Flow visualization.
  • Wikipedia TOC Scraper: Scrape Table of Contents for revisions of a wikipedia page and explore the results by moving a slider to browse across chronologically ordered TOCs.
  • Wikipedia categories scraper: Scrape Wikipedia for the categories of articles and the categories of related articles in different languages.
  • YouTube Data Tools: A collection of simple tools for extracting data from the YouTube platform via the YouTube API v3.

harvesting, curation, processing, exploration and visualization

all audiences

usable