1. médialab Sciences Po
  2. Productions
  3. Talisman: a JavaScript archive of fuzzy matching, information retrieval and record linkage building blocks

Talisman: a JavaScript archive of fuzzy matching, information retrieval and record linkage building blocks

Guillaume Plique

Information retrieval and record linkage have always relied on crafty and heuristical routines aimed at implementing what is often called fuzzy matching. Indeed, even if fuzzy logic feels natural to humans, one needs to find various strategies to coerce computersinto acknowledging that strings, for instance, are not always strictly delimited. But if some of those techniques, such as the Soundex phonetic algorithm invented at the beginning of the 20th century, are still well known and used, a lot of them were unfortunately lost to time. As such, theTalisman JavaScript library aims at being an archive of a wide variety of tech-niques that have been used throughout computer sciences’ history to perform fuzzy comparisons between words, names, sentences etc. Thus, even if Talisman obviously provides state-of-the-art functions that are still being used in an industrial context, it also aims at being a safe harbor for less known or clunkier techniques, for historical and archival purposes.