1. médialab Sciences Po
  2. Productions
  3. xsv fork

xsv forkmade by the médialab

command line tool to efficiently process CSV files

Tools – Code

Andrew Gallant, Guillaume Plique, Laura Miguel, Béatrice Mazoyer, César Pichon

xsv is a command line tool, originally written in Rust by Andrew Gallant (aka @BurntSushi) and forked by the lab, that can be used to process large CSV files efficiently.

The tool was heavily rewritten and improved by the lab's engineer to fit our daily use-cases.

We added, among many other features, a dynamic scripting language that can be evaluated for each row of a file, external sorting, efficient reverse reading, k-way merging of already sorted files and many other things.

As a lot of our other tools produce and consume CSV files, it was only natural that we might want to find ways to mangle those files faster and without requiring ad-hoc scripting.

We therefore encourage anyone dealing with large CSV files to try our fork.

curation and processing