Introduction

Dissociated Studio is an attempt to do something similar for audio to what Dissociated Press does for text. It takes a piece of audio (about 4-5 minutes is practical on my 500 MHz PIII w/ 96MB of memory), segments it into (by default) 0.1 second segments, computes a matrix indicating how similar each segment is to each other segment, and then plays the piece through, occasionally skipping from one segment to another similar segment, while displaying various either informative or spiffy-looking (or both!) displays. It requires Gtk-- 1.2 and audiofile to run, and FFTW if you want to recompile. Click here to download the source, or here for a gzipped Linux x86 binary. (Current version is 0.1.) To run, specify a file name in a format audiofile supports (I use .wav) as an argument on the command line.

To be more technical, for each segment, I add in the previous 3 segments, applying a scaling factor of 0.3(distance in segments) . (This echoing is done to get some sense of memory into the calculations; it might well be better just to use overlapping windows.) I then apply a Hanning window and do a FFT to calculate the log of power spectrum. After that, I merge some of the close-together higher frequency bins, mostly to reduce processing time. (I should do a psychoacoustically more realistic reduction here -- see this paper for some references on the frequency sensitivity of the human ear.) The similarity metric is the sum of the squares of the per-consolidated-frequency-bin differences. For each segment, I sort the list of similar segments and throw away all but the 255 most similar.

Future ideas: I'd like to make the data reduction method configurable, and incorporate ideas from music fingerprinting, feature extraction, and compression research. Fingerprinting algorithms attempt to find computationally cheap methods of telling when two pieces are the same (for appropriate values of same); feature extraction is more about extracting traditional music-theoretical descriptions; and audio audio compression is about discarding perceptually less relevant data. (If I support reading data compressed with lossy compression methods directly, I could avoid the fidelity loss of decompression and reanalysis, as well as taking advantage of the knowledge of whoever encoded the file in the first place). ISMIR 2002 (and, presumably, future ISMIRs) has some papers on fingerprinting.

I'd also like to be able to run the software in real time on the last n minutes of output. Last, it'd be nice to have some less ad hoc way of incorporating the memory of the last n segments, but the most obvious (at least to me) ways would take a lot more memory, processing time, and be more difficult to visualize.

Dotplot View

dotplot

The above display is a dotplot of track 4 of _Javanese Court Gamelan_, "Bubaran Hudan Mas", Elektra Nonesuch 9 72044-2 Basically, a dotplot is a false-color image of the similarity matrix computed above, with green being most similar and blue being least. Note the diagonal lines indicating repeated phrases, the tilted lines indicating repeats at different tempos, and the curved lines indicating accelerating or decelerating repeats. See here for a very brief discussion of the structure of Javanese gamelan, or the _New Grove Dictionary_ (Indonesia entry) or the _Garland Encyclopedia of World Music_ (SE Asia volume) for somewhat more detail.

The red in the image is an indication of where in the piece we currently are (and have been). You can left-click in this window, and playing will jump to the segment corresponding to the x-position.

Time View

linvis

This window displays which part of the piece is currently playing, which parts have recently been playing, and which parts have never been played. Each segment is mapped onto a piece of the window, left to right and top to down, with top to down more significant (in the standard way for Latin script and Western music notation). The height of the image at a piece of window indicates how long ago the corresponding segment's been played, with the thickest most recent. If a segment has never been played, nothing is drawn. You can left-click in this window, and playing will jump to the corresponding segment.

Controls

control window

Miscellaneous

Dissociated Studio is distributed under the GNU Public License.

Back to Aaron's Home Page