xanmade by the médialab
command line tool to efficiently process CSV files
Tools – Code
Andrew Gallant, Guillaume Plique, Laura Miguel, Béatrice Mazoyer, César Pichon, Anna Charles
xan is a command line tool that can be used to process large CSV files efficiently.
The tool was heavily rewritten and improved by the lab's engineer to fit our daily use-cases.
We added, among many other features, a dynamic scripting language that can be evaluated for each row of a file, external sorting, efficient reverse reading, k-way merging of already sorted files and many other things.
As a lot of our other tools produce and consume CSV files, it was only natural that we might want to find ways to mangle those files faster and without requiring ad-hoc scripting.
We therefore encourage anyone dealing with large CSV files to try our fork.
curation and processing