Julien Bloit
music & gesture
Statistical models of audio descriptors
Category : projects

During my PhD. work at Ircam I have been interested in creating models for machine listening of musical sounds. The specificity of my work was it’s musical unspecificity : I was not focused on building an automatic music-to-score transcription machine, but rather on trying to anticipate how a listening machine could deal with music that is made out of noises, recorded samples or extended playing techniques on an instrument (a bow bouncing on a violin string, or slap sounds on a flute…). Such complex sounds could be heard in electronic music today, as well as in free improvisation or contemporary orchestral music :

A more technical description :

I work on probabilistic modeling of audio events in a monophonic audio stream. I’m mostly interested in describing sound morphologies, in the sense of sound “gestures” rather than notes defined by a static {pitch, intensity, duration} triplet. One of the underlying motivations derives from the context of musical interaction with complex instrumental sounds from contemporary playing techniques.

One of the challenges consists in defining the latent layer of the model (the symbolic units) when no definitive taxonomy exists to describe all possible instrumental sounds. My work studies how a complex musical vocabulary can be factorized with a simpler set of elementary profiles.

For this purpose, I have been working with hidden Markov models, and studying specific extensions such as segmental models (where a single state emits an observation sequence), or state-space factorizing techniques.

On a longer term, one of my goals would be to automatically derive a set of elementary profiles, which could also serve on a musicological ground. For this purpose, I’m interested in automatic model selection problematics.

For the sake of real time interaction, I have studied under which conditions can the Viterbi algorithm output the optimal path in an online context. I have identified necessary conditions for a short-time version of the algoritm to work, and studied the relation between an HMM’s topology and the decoding latency.

Here are two example videos illustrating this last aspect :

This prototype patch implements ideas described in my 2008 ICASSP publication. The trained acoustic models and the testing sentences are in french. The main interest of the underlying decoding algorithm is the realtime adaptation of the Viterbi algorithm. Some errors occur, but keep in mind that no language model is used (i.e. the model only knows how french sounds, but doesn’t know any word).
This patch heavily relies on the FTM library for Max/MSP.

This shows the basic idea of the algorithm. The yellow rectangle is the current decoding window. Its right edge moves linearly with time, for each new observation frame. Its left edges moves only when a fusion-point is detected among all possible local-paths (thin black curves), and a decision is output. Notice how the offline Viterbi path matches the locally-fusing paths.

Ressources :







Comments Closed

Comments are closed.

Category : projects

In 2010, Nicolas Rasamimanana and I founded Phonotonic, a nonprofit spun off IRCAM Real Time Musical Interaction Team. Phonotonic’s purpose was to leads art-science projects at the frontier between academic research and applied systems for broad audiences. It gathers researchers and artists to imagine and design interactive systems that combine the body and novel numeric […]

Citiplay is an interactive, digital version of the iconic street game hopscotch that seeks to foster a sense of urban play. The game functions much like the game “Simon” by asking participants to remember and repeat patterns by stepping on hopscotch tiles that light up in sequence. Aimed at both the passerby walking by on […]

This is a collaboration with artist-designer Marguerite Humeau, who is fascinated with scientific experiments and their potential for many narrative opportunities. A presentation of the project from the exhibition in St-Etienne : “Proposal for Resuscitating Prehistoric Creatures” sets up the rebirth of cloned creatures, their wandering and their sound epic. They are seeking to evolve […]

At Music Hack Day, I met Warren Stringer and Matt Howell and teamed up with the idea of creating a dynamic graphic environment, reacting to facial expressions, as well as voice or instrument sounds. This is what it looks like at the end of the weekend : The face and sound tracking happen on the […]

Urban Musical Game : playground

The idea for this project was to control music and sounds with a simple school ball. The motion of the ball bouncing or flying around on the playground is captured live and repurposed as giant musical gestures that mashup pre-recorded tracks and sound effects. Various game scenarios were tried : direct sonification of existing games […]