Auditory Training

2022  ·  Programmation  ·  Machine Learning  ·  Vidéo  ·  Installation

This program is a pre-trained model that relies on a database. There are two ways to build databases: either by creating one’s own data or by retrieving it online.

The latter method is the one used by Google, which, unsurprisingly, became interested in sound recognition as early as 2006 after acquiring the YouTube platform a year after its creation. In 2017, Google developed AudioSet Ontology, an online database comprising 2,084,320 sound event samples totaling 5,800 hours of audio, downloaded from categorized YouTube videos organized into 527 sound categories. Thus, it is YouTube users' data—personal videos—that end up being exploited in this database, used to train sound recognition algorithms, without our explicit consent.

For the development of this software, I used a real-time sound recognition neural network. The sounds recognized by the software in the room where it is installed are displayed live. I decided to directly intervene in the software’s visual interface to make it my own and code. I then worked on the visual interface with Stéphane Blocquaux, a contributor at Labomedia, who helped me bridge the software code (written in Python) and Processing to set up the visual interface in Java. Here, I aim to show the videos from which the sounds in Google's database originate, as they form part of the algorithmic “brain” of the program I use. This approach reveals the database’s composition, offering insight into how sounds from these videos shape the software’s learning process. The videos corresponding to the recognized category are displayed randomly from the Google database, which I have reconstructed locally and stored in miniature on an external hard drive.
A video essay by artist Sean Dockray, illustrating the connections between YouTube, automated listening, and predictive policing, had a profound impact on me while working on this project. He writes: “On YouTube, videos live a double life: on one hand, they are entertainment for a human audience, and on the other, they serve as data for an algorithmic audience. The continuous invention of new algorithms that observe in new ways transforms old videos into new ones. [...] Data will be stored in server farms for years before being exploited for maximum profit. [...] Videos become memories for an algorithm with unknown policies.

The videos we upload to YouTube take on a second life that completely escapes our control. This software serves as a critical tool that allows me to uncover hidden aspects. I delved deep into the database of these software systems, documenting and appropriating them, before presenting my findings to the audience so they, too, can understand.
The software is running on a computer connected to an external microphone and a hard drive containing the database. The sound from the videos in the database, projected onto the wall, can be listened to through headphones.

Photo : Paul de Lanzac


Photo : Paul de Lanzac

Photo : Paul de Lanzac

Video captures of the software running in real time.

Live installation.

First test with the basic software, before redesigning the interface for the installation.