More Tricks for Playing With Audio

Last month I introduced an octave inversion algorithm that can take any sound recording and flip its octave upside down.  I’d now like to review a few more audio processing tricks I’ve been experimenting with lately: melodization, split-band cross-modulation, time blur, frequency blur, and window-reversal.  I’m not sure each of these techniques is entirely original, but what they share in common—both with each other and with my octave inversion algorithm—is that they’re designed to recast existing recordings into new forms that exaggerate, conceal, or scramble their basic parameters, the idea being to expand our options for playing with bits of audio in a spirit of creative discovery, drawing out potentialities that lurk hidden within them, rather than merely playing them in the usual way.  I’ve coded each of the algorithms described below in MATLAB.  There might be other software out there that could achieve similar effects out of the box, but if so, I don’t know about it.

First up is melodization.  I need to tread carefully in explaining what this does.  It would be naïve of me to claim that it can “turn noise into music,” given multiple definitions of noise and nuanced “noise aesthetics” within music itself.  What melodization can do, however, is turn relatively unpitched sounds into clearly pitched ones, making them sound “more musical” in terms of a traditional music-to-noise continuum.  It resembles conventional “noise reduction” in a technical sense, except that it passes signal based on melodic and harmonic criteria rather than on an octave-agnostic noise print.

My melodization algorithm starts by dividing a source recording into a number of frequency bands that repeat at the octave, such that any given frequency and its upper harmonics all fall into the same band.  It then goes through the whole recording segment by segment based on a chosen window size expressed in samples, identifies which frequency bands contain the strongest signal during each segment, and passes them selectively into the output file with a fade-in and fade-out to avoid sharp transitions.  The start and end points of the windows can be immutably fixed—say, at precisely every 10,000 samples—or they can be shifted to correspond to the strongest impulse within each window, with the option to set a threshold for triggering a change.  We can also vary what I call the “passwidth” by passing the whole of each selected band, or half of it, or a tenth, and so on.  The narrower the passwidth, the more strongly sounds become pitched at the expense of timbral detail.  The number of bands to be passed can be adjusted as well; right now, my software allows a range from one to five “notes.”  The scale options I’ve set up are chromatic (12 intervals), pentatonic (5 intervals), quarter tone (24 intervals), eighth tone (48 intervals), major, and minor.  If I choose a major or minor scale, the software rules out certain notes in the chromatic scale corresponding to F major or F minor.  I also provide an optional setting for skipping a specified number of adjacent intervals; for instance, a setting of 1 in a chromatic scale means that if one “note” is selected during a given analysis window, one semitone to either side of it is automatically deselected in the interest of minimizing dissonances.

Let’s listen to a few examples.  Here’s an excerpt from an original unmelodized recording I made of water splashing over the rocks of a stream in the arboretum on the Bloomington campus of Indiana University:

Next, here’s that same excerpt melodized with a 60,000-sample fixed-length analysis window passing the top two notes in an F major scale with a passwidth of 0.2 and with adjacent intervals up to two semitones blocked:

That makes for a respectable piece of New Age music, wouldn’t you say?  Bear in mind that the specific harmonies heard here are drawn from the source recording itself and not imposed by me as a composer, even though I defined the parameters for identifying them.  By way of comparison, here’s the opening of a performance of Bach’s Brandenburg Concerto No. 3 in G Major by the Freiburger Barockorchester, melodized with a 2000-sample impulse-responsive window with a threshold of 0.02 applying a passwidth of 0.2 to the top five notes in a quarter-tone scale, which has an effect reminiscent of extreme conventional noise reduction:

The harmonies in my melodized recording of splashing water are derived from the source audio in fundamentally the same way as the harmonies in this melodized recording of a Bach performance, which my algorithm obviously detected rather than dictating (unless you want to credit it with independently re-composing the concerto).  We might think of melodization as melodic amplification, or as a boosting of contrast among frequencies, rather than as the imposition on recordings of melodies not native to them.  It’s arguably a form of eduction, insofar as it can help make occulted information sensorily perceptible.  If the Bach case is analogous to boosting the contrast of a photograph until it becomes distorted and stylized, I’d suggest that my splashing-water example is more like boosting the contrast of an exceptionally low-contrast image until chance patterns begin to emerge out of the noise.

Recordings of water have been my favorite subject for melodization so far, and I can’t resist sharing just one more example.  Here’s a sound effect clip I found online called “Water lapping 2” slowed to half speed—by a fortuitous mistake on my part—and processed with the same settings as the last example except that I used an analysis window of 30,000 samples:

Other sounds of the natural world lend themselves to such treatment too.  For another example, I borrowed a thunder sound effect recorded by Grant Evans and melodized it using an 8000-sample impulse-variable window with a trigger threshold of 0.5 and a passwidth of 0.2 applied to the top three notes in an F minor scale with adjacent intervals of one semitone blocked:

Next, how about some music that comes straight from the heart?  This time my raw source material is a heartbeat sound posted on YouTube by justsoundfx.  I’m not absolutely sure this isn’t just a loop made from a shorter snippet, but my melodized results vary over time in ways that suggest ongoing changes in the source, as well as a cyclical pattern that might correspond to respiration.  I used an impulse-variable window of 10,000 samples with no threshold to find the top five notes in an eighth-tone scale and then mixed two different results: one with the source pitch-shifted up one octave and processed with a passwidth 0.4, and another with the source pitch-shifted up two octaves and processed with a passwidth of 0.01.  I’ve taken a more intrusive approach in this case because I’ve determined that melodizing a heartbeat “as is,” without shifting it upwards, yields pitches that are too low to be comfortably perceived.

I find myself imagining the beating of the heart-mechanism of some kind of automaton built from metal gears and springs.  How about you?

And what about signals from outer space as source material?  Here’s a recording of pulsar PSR B1055-52 credited to the Parkes radio telescope in Australia and melodized with an impulse-responsive 5,000-sample window with a threshold of 0.2, applying a passwidth of 0.02 to the top two notes of a pentatonic scale:

The result reminds me vaguely of Central Asian string instrument music, but your mileage may vary.

Next, let’s consider split-band cross-modulation.  Mixing two dissimilar recordings in the usual way tends to produce a kind of messy cacophony, but the technique I have in mind here instead uses cross-modulation: multiplying recordings rather than adding them.  Cross-modulation is often an unintended and undesirable phenomenon—think of a radio receiving two competing signals of equal strength from different stations—but it’s also been applied to oscillators for sound synthesis, and it’s been used historically for speech scrambling (multiplying a speech signal by a given recording for transmission and then dividing by the same recording after reception).  So what do I mean by split-band cross-modulation?  This entails filtering two or more recordings into narrow frequency bands for cross-modulation with each other, re-filtering the modulated results to those same narrow frequency bands, and then combining the bands back together.  The result contains only those elements of the source recordings that coincide with and reinforce each other.  By way of illustration, here’s a cross-modulation of the first movement of Beethoven’s Fifth Symphony with the finale of the William Tell Overture, processed with 192 divisions per octave:

If we apply split-band cross-modulation to a pair of recordings where one contains speaking and the other contains some other sound of reasonably consistent volume, this produces a Sonovox-like effect, superimposing the articulations of speech more or less intelligibly onto the other sound.  To demonstrate, here’s a brief excerpt from Donald Trump’s inauguration speech cross-modulated with a recording of pigs squealing and snorting (at 36 divisions per octave):

Next, my time blur algorithm is designed to blur recordings in the time domain while preserving their other characteristics as much as possible.  The strategy it uses is mixing together multiple copies of the source offset by different numbers of samples, much as is done to simulate echo or reverb digitally, but with no change in amplitude from copy to copy—no fade-out.  When doing this, I’ve also limited the offsets to prime values in an effort to avoid cumulative phase interference.  Below is a recording of the first movement of Beethoven’s Fifth Symphony time-blurred twice in a row with twenty-five displacements offset at intervals of a hundred primes staggered by one per iteration (thus, the first iteration offsets the source by 541, 1223, 1987, 2741, 3571, etc. samples; and then the second iteration offsets the output of the first by 547, 1229, 1993, 2749, 3581, etc. samples):

The notes and timbres are still there, but the famous DIT-DIT-DIT-DAH has been pretty well obscured.  Piano music comes out similarly shorn of its attacks and short-term rhythms, as with this excerpt of Chopin’s Nocturne op. 9 No. 2 (time-blurred with two iterations of twenty-five displacements at hundred-prime intervals staggered by one prime per iteration):

With different settings, the same algorithm can transform drums into ratchets or creaky buzzing sounds that can be downright comical—witness this reworked opening of the Beatles’ “I Want to Hold Your Hand,” time-blurred once with twenty displacements at fifty-prime intervals:

We can combine this effect with others covered above.  Here’s what we get if we take the source recording of Chopin’s Nocturne op. 9 No. 2 “as is” and cross-modulate it with my field recording of the babbling brook in the Indiana University Bloomington arboretum with twelve band divisions per octave:

And here’s what we get if we substitute the time-blurred version of the Chopin recording:

Both versions suggest to me something like the Chopin music bubbling up from the depths of a watery amphitheater-beneath-the-sea, but the effects are also somewhat different.

I’ve also experimented with an alternative time-blurring strategy: staggering the output of different frequencies in my melodizer by randomized numbers of samples, either individually or by harmonic sequence.  However, I’m not quite happy with the results yet.

Much as my time blur algorithm blurs recordings in the time domain, frequency blur blurs them in the frequency domain by mixing together multiple copies of the source with incrementally shifted frequencies instead of time offsets.  We can carry out the pitch-shifting with a phase vocoder (if we want to blur frequencies faithfully around their center points), or with my octave inversion algorithm (if we just want to obscure them), or with a combination of approaches.  Here, for example, is “Rock Around the Clock” by Bill Haley and His Comets processed two times with seven shifts spread across half an octave—first linear, then exponential—and then inverted with fourteen shifts spread across a whole octave:

This algorithm basically does the opposite of my melodizer algorithm by taking pitched sounds and making them unpitched.  I’ve tried “demelodizing” recordings in a few other ways besides, but so far this approach seems to be the most effective.

Window-reversal is pretty simple; it just refers to reversing every successive group of x samples throughout a source recording.  Extremely short windows have little effect, while longer ones sound like exactly what they are: periodic reversals.  But when the window falls into a certain intermediate range—in the ballpark of 75-125 samples—reversal can skew frequencies in interesting ways.  Picture what would happen to a sine wave and you’ll start to understand why.

As a comparable illustration for your ear, here’s Chuck Berry’s “Johnny B. Goode” with each successive group of 110 samples reversed, producing very weird-sounding results:

I’ve also continued to tweak my octave inversion algorithm, which now delivers better results than before, especially when it comes to double inversions with their cumulative distortions and artifacts.

  • Previously, I resampled the final composite output to 75% of its length to compensate for the initial resampling to 150%.  Now I instead resample each individual band to 75% of its length after inversion, but before compositing, to take advantage of it shearing off the upper third of the frequency range post-inversion.
  • I also substituted a high-order Butterworth filter for the elliptic filter used in my earlier experiments, hoping to get flatter results near the band-splitting frequency.  I found that this reduced the resonance-like reinforcement of the band-splitting frequency, but spectrograms still displayed conspicuous “banding,” in which each split band looked like a distinct stripe.
  • I thought this continued “banding” might be due to the relationship between frequency and velocity amplitude, so I started playing around with different sequences of taking derivatives and re-integrating.  Through trial and error, I came up with a pair of steps that pretty well eliminates the “banding”: take the difference of the difference of the signal before inversion and then the cumulative sum of the cumulative sum of the signal afterwards, band by band.

Here’s an updated double inversion of “The Stars and Stripes Forever” (split first at 80,000 Hz, and then at 60,000 Hz) that takes advantage of these latest developments:

Most of what I’ve described above has come from me wondering “what would it sound like” if I tried doing X, Y, or Z, and you might reasonably ask what the point of it is, or whether there is any point to it.  I’d like to offer a few possible answers.  First, techniques such as melodization could offer real insights into recorded subject matter (thunderstorms, heartbeats, etc.) if they can magnify distinctions that are imperceptible in their original form.  Second, the ability to strip away or isolate the rhythmic or melodic features in a given recording could help us better to understand the significance of different parameters in recorded sound, or sound in general—what does it tell us, for instance, if recordings of popular music are still easily recognizable even when they’ve been “demelodized”?  And third, I’d like to think these techniques are not only fun to play around with, but aesthetically and artistically interesting as well.  With that in mind, I hope you’ve enjoyed listening to the examples I’ve shared here, and I’d welcome your thoughts and reactions if you have any.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s