Speech Corpus Tools

Easier analysis of large speech corpora

This project is maintained by MontrealCorpusTools

Speech Corpus Tools

Speech Corpus Tools is a desktop application built to represent and query large scale speech corpora. Speech Corpus Tools uses PolyglotDB to interact with multiple databases that are built for the kinds of data in speech corpora. Neo4j contains the representations of the discourses and speech in time as a directed acyclic graph. SQL databases contain the lexical, phonological and speaker information in table format, and also contain all of the calculated acoustic information. Please see the manual for more information and tutorials (http://speech-corpus-tools.readthedocs.io/)

Executable downloads

Please visit the release page (https://github.com/MontrealCorpusTools/speechcorpustools/releases) and download the latest release.

Authors and Contributors


McAuliffe, M., Stengel-Eskin, E., Socolof, M. and Sonderegger, M. (2016). Speech Corpus Tools [Computer program]. Version 0.5, retrieved 13 July 2016 from http://montrealcorpustools.github.io/speechcorpustools.

If you cannot cite a computer program, please cite:

Michael McAuliffe, Elias Stengel-Eskin, Michaela Socolof, and Morgan Sonderegger (2017). Polyglot and Speech Corpus Tools: a system for representing, integrating, and querying speech corpora. In Proceedings of the 18th Conference of the International Speech Communication Association. Paper PDF