Thinking about ordering some wall art or custom jewelry based on a sound wave, or maybe even getting a sound wave tattooed on your arm? Or just curious about what options exist for converting sounds into tangible art objects—and whether, in each case, the audio would really be there in a meaningful way?
If so, I hope you’ll find this blog post interesting and informative. In preparation for it, I’ve reviewed a good deal of existing sound wave art, ranging from the wares offered by independent artists on a limited scale through Etsy to the products of bigger operations armed with patent filings, trademark registrations, software apps, and complex business models. I’ve also been brainstorming about alternative forms sound wave art could take and experimenting with a few of them myself.
Sound wave art has been promoted as a wedding gift, among other things, so I guess it’s appropriate that we’ll have a mix here of something old, something new, something borrowed, something blue. Borrowed because I’ve quoted a number of pictures others have posted online to advertise, publicize, or document the kinds of sound wave art they’ve been making—otherwise it wouldn’t be feasible for me to characterize or critique them, and fair use is only fair. Blue because waveform art displays all the colors of the rainbow, including that one. New in the sense that I haven’t seen anything much like my own experiments out there. And old in the sense that I’ll be summarizing what I have seen out there.
But I think even the “old” part should have something new about it, since I haven’t yet found anyone else trying to present a general overview of contemporary sound wave art. Instead, the reviews I’ve turned up overwhelmingly deal with the work of some individual artist or company and present it as a thing of solitary genius (if laudatory) or novel deceit (if disparaging). Practitioners tend to cultivate an aura of uniqueness in their self-descriptive publicity as well. But some of them have actually been following convergent paths, so if we want to be able to distinguish common trends from distinctive innovations, we need to take the initiative to compare and contrast. And when it comes to the basic idea of sound wave art—well, nobody alive today should be taking credit for that. Way back in 1857, when Édouard-Léon Scott de Martinville patented his phonautograph—the world’s first device for picking up sound waves out of the air and tracing them onto glass or paper—one of the uses he described for it was “to produce industrial designs for embroideries, filigrees, jewelry, shades, illustrations of books of an entirely new kind.” Sound wave art is embedded in a tradition which we should acknowledge and embrace.
At the same time, I bring an unusual perspective to the table because I’ve spent lots of time—maybe more than anyone else—figuring out how to pull playable audio from different kinds of graphical representation of sound (see here, here, and here). For me, the relationship between visible “pictures of sound” and playable audio isn’t some hypothetical teaser for the imagination, but a serious technical reality. When I see an image of a sound wave in an unexpected place or an unconventional form, I don’t just marvel at its appearance; almost as an unconscious reflex, I find myself pondering whether I could extract meaningful sound from it and what would be practically involved in giving it a go. Granted, that’s not something the average person is likely to want to try. Still, I think the perceived potential for playback—the idea that the necessary data is present, even if there’s no mechanism to make it happen—is a big part of the popular allure of sound wave art. And yet most twenty-first-century sound wave art turns out to be far less playable than sound wave images created in the 1860s and 1870s. Even if we don’t value playability per se, which I’ll concede usually isn’t the point of sound wave art, recent examples also tend to present less real audio information in any form than their nineteenth-century precursors did. I can’t help but feel that’s a step backwards. Judging from online discussions of sound wave art, there also seems to be some real underlying confusion and uncertainty out there about visualizations of sound waves, what they show, and how they relate to sounds we can hear, and I’d like to help clear it up.
The Sound Oscillogram or Waveform
In order to evaluate sound wave art in an informed way, you’ll need a basic understanding of how sound wave pictures work in general. Maybe you already have it, but just in case you don’t, I’d like to start by providing some technical background.
Sound is physically made up of pressure waves in a medium such as the air that cause it to compress and rarefy or—in other words—that make its particles move alternately forwards and backwards from a resting position, a bit like tiny pendulums. Louder sounds correspond to bigger motions (higher amplitudes), while higher-pitched sounds correspond to faster back-and-forth motions (higher frequencies), with the audible range for humans extending from around twenty to twenty thousand cycles per second. Sound waves in the real world generally combine multiple amplitudes and frequencies with varied phase relationships (how the back-and-forth motions line up with each other), and the effect on any given particle is the sum of all of them added together, analogous to the motion of a fishing bobber as ripples pass across the surface of a lake from different directions. Recording sound is like recording the motion of the bobber over time, while playing back a recorded sound is like making the bobber move again in the same way as before to create new ripples that resemble the old ones. Your eardrums are like bobbers for sound waves. So are the diaphragms in microphones and loudspeakers.
Strategies for visualizing sound vibrations over time—a spoken phrase, a snippet of music, a sequence of noises—generally fall into one of two main categories, both of which involve plotting information along a time axis. One of them is oscillographic, which involves graphing an actual back-and-forth movement via moment-by-moment measurements of amplitude. The result is sometimes called a waveform. The other strategy is spectrographic, which involves separating out the different frequencies that make up a sound wave and graphing their individual amplitudes. Most sound wave art takes an oscillographic approach, so I’m going to address that first.
Sound oscillograms can take several different forms. The main ones, as illustrated from top to bottom below, are (1) the wavy line, such as a record groove; (2) the band of varying size, such as a “variable area” optical motion picture sound track; and (3) the band of varying intensity, such as a “variable density” optical motion picture sound track.
All three of these forms have been used historically for analog sound recording and playback, but they can also be used for interfacing between digital images and digital audio. By the same token, waveform pictures can be usefully analyzed in terms much like those commonly used to evaluate the resolution or quality of digital audio files: namely, bit depth and sample rate.
The bit depth of a digital audio file—I’ll assume for purposes of argument that we’re dealing with linear PCM, as in a typical WAV—refers to how many bits are used to encode each sample, and hence how precisely amplitude values can be expressed, which in turn affects the dynamic range and noise floor. One-bit audio would offer just two options: 0 and 1, corresponding to “off” and “on.” Two-bit audio would offer four options: 00, 01, 10, and 11. CD-quality audio is encoded at 16 bits, with 65,536 (216) possible values ranging from 0000000000000000 to 1111111111111111, while the archival preservation standard is 24 bits, with 16,777,216 (224) possible values. That’s all for monophonic sound, by the way; stereo files have two sets of samples, one for the left channel and one for the right channel. When it comes to waveform pictures, the equivalent to audio bit depth is the number of meaningful graphical distinctions available at each point along the time axis. In the examples shown above, that means the number of possible gradations in (1) the vertical position of the line, (2) the size of the band, or (3) the brightness of the strip. For example, a wavy line in an image 256 dots or pixels high could easily take any one of 256 (28) possible positions. That’s equivalent to 8-bit audio, which will have more “hiss” in quiet sections than 16-bit audio but will otherwise sound pretty similar. In my opinion, achieving defensible audio bit depth isn’t a major hurdle for most sound wave art.
That said, there’s no one single waveform that corresponds to your favorite song because sound vibrations have more than one kind of graphable amplitude. There’s displacement amplitude, or the distance by which air particles are displaced from their rest positions, and there’s velocity amplitude: how fast air particles are moving, calculated as change in position over time. The former is associated with the position of an air particle, the position of a diaphragm, the position of the tip of a record stylus, and the position of a record groove, while the latter is associated with the velocity of the tip of a record stylus, the voltage it generates in an electromagnetic cartridge, the sample values in a digital sound file, the waveform display in a piece of sound editing software, and the voltage sent to a loudspeaker. Actually, it’s more complicated than that because of the RIAA curve and ceramic cartridges and— well, let’s not go there. Suffice it to say that the shape of a waveform displayed in audio editing software isn’t the same as the shape of a record groove. From here on out, you can assume we’re dealing with velocity amplitudes unless I say otherwise, but you should be aware that this isn’t our only possible choice.
Meanwhile, audio sample rates correspond to image resolution in the time dimension, which runs from left to right in the examples shown above. To match a CD-quality sample rate of 44,100 samples per second, a waveform picture needs 44,100 distinct pixels or dots in whatever amount of space represents one second of time. If it were printed with a resolution of 300 dots per inch, for example, every second’s worth of audio would need to stretch out over 147 inches (44,100/300), or 12.25 feet. A mile-long print could hold approximately seven minutes and eleven seconds of audio (5280/12.25). Chances are good that your favorite song would span about half a mile, and that you wouldn’t be able to run fast enough alongside it to keep up with the speed of playback.
On the other hand, reducing the number of dots or pixels used to represent a given span of time means lowering the sample rate, which impacts the integrity of the audio. Much as lowering the bit depth makes audio sound noisier, lowering the sample rate makes it sound duller. That’s because the highest frequency a digital sound file can theoretically accommodate is its Nyquist frequency, or half its sample rate—the case where successive samples will simply alternate between high and low values. Thus, a sound file with a sample rate of 44,100 samples per second can handle sounds up to 22,050 Hz (near the upper limit of human hearing), while a sound file with a sample rate of 441 samples per second can only handle sounds up to 220.5 Hz (roughly A below Middle C).
In other words, a second’s worth of waveform squeezed into 1.47 inches at 300 dots per inch wouldn’t be able to show Middle C. By downsampling a sound file with a decent anti-aliasing filter, you could still get an accurate representation of the sound below 220.5 Hz under such conditions, if you don’t mind knowing that your favorite song would come out sounding something like this:
But there’s a big difference between this kind of careful downsampling and the ways in which audio editing programs condense sound wave displays when all you’re doing is zooming out to view a longer segment onscreen. The views generated in that case, like the ones that typically augment audio player apps, are intended mainly to support navigation—that is, finding particular positions in a recording so that you can “go” to them and listen to them, edit them, or do whatever else it is you want to do.
Some of these images, at least—and possibly all of them—are supposed to be understood as highly condensed renderings of wavy lines. But they no longer show any of the back-and-forth between individual amplitudes, which is the sort of information a loudspeaker can practically transduce into sound: move forward at a velocity of 3, then forward at a velocity of 1, then backward at a velocity of 2, etc. Instead, each column has become a vertical line representing the whole range of amplitude values present over a span of time. If someone were to try to use this data as a set of instructions for a loudspeaker, it would come out like this: sometime during the next tenth of a second, please move forward at a velocity of 37 and backward at a velocity of 29, but in no particular order, and move back and forth at whatever intermediate velocities you like in the meantime. Good luck with that! A helpful point of analogy, albeit an imperfect one, is the thumbnail image of a printed page:
These thumbnails give a high-level view of the shape of things, and you can even make out a few of the most prominent details, depending on whether a page contains a masthead or an illustration. But you couldn’t possibly read the text itself, no matter how hard you squint. The information just isn’t there.
The same is true of zoomed-out waveform displays, which we can call waveform thumbnails (a term I’m by no means the first to use; see here and here). They’re practically useful for indexing, but the most crucial information just isn’t there. And the reason it’s not there isn’t because the information can’t be displayed graphically due to something inherent in its nature. It’s just a matter of resolution.
However, waveform thumbnails also seem to be the only kind of sound wave visualization many people know about—the only point of reference they have for what a sound wave ought to look like. If your only exposure to waveform displays comes from audio player apps, and you’ve never had a reason to zoom in for a closer look, you could easily assume the thumbnails are all there is to see. That may explain why so much contemporary sound wave art is based on them. A couple do-it-yourself guides I’ve found for creating sound wave art recommend copying thumbnail displays from audio editing software, either via screenshot or by hand. A free online tool that offers to let you “generate a waveform image from an audio file” will give you the same thing. And nearly everyone out there who’s creating sound wave art seems to be working from this kind of image, using it either “as is” or as a starting point for further elaboration. I admire some of this work very much for what it does with the thumbnails. But I also can’t help but wonder what the same artists could be accomplishing with full-fledged waveforms. After all, even if we were to ignore issues of meaningfulness (or, if you prefer, “audio geekery”), a waveform at viable scale can be a marvel of balanced grace and geometric beauty—
One complication, I’ll admit, is that viable waveforms of any significant duration tend to be much larger in the time dimension than they are in the amplitude dimension. This can make them rather unwieldy. But the people who developed historical sound recording formats faced the same dilemma, and they hit upon four different designs for solving it:
- A long, narrow strip wound onto a reel, as with magnetic audiotape.
- A spiral, as with the LP.
- A helix, as with the phonograph cylinder.
- A helix unwound and flattened into parallel strips, as with the phonautogram.
Sound wave artists who want to use viable waveforms in their work could always pursue the same tried-and-true strategies, or similar ones. There’s no need to reinvent the wheel as long as you know about wheels.
The Wall of Sound
Much of the sound wave art being created today is designed for hanging on a wall. A lot of it is also available to buy online, sometimes made to order based on a song request or on something customers have recorded themselves—the words “I love you,” wedding vows, that sort of thing—but sometimes as a ready-made item based on some recording with widespread appeal.
Now, we shouldn’t assume that a sound wave image marketed in this way actually corresponds to the recording it purports to represent any more than we should assume that everything touted during the Middle Ages as a fragment of the True Cross was the real deal. After all, if someone were to substitute something else, who would know the difference? Well, you might, if you were to do a little detective work. As of this writing, eBay seller “zugartzwang” is offering dozens of posters with sound waves captioned with the names of popular songs, but every single one features the exact same picture, which turns out to be simply the fourth of four generic sound wave designs licensed by Microvector.
But this is a rare exception. As far as I can tell, most practitioners of waveform art are creating images in good faith from the recordings they claim they’re using, although nearly all the examples I’ve seen to date have been based on waveform thumbnails.
So what does the landscape look like out there? If you’d like to see and experiment with a variety of display options, you might start by checking out SoundViz, an enterprise that lets you upload a sound file and then choose your own parameters for making a picture out of it, which you can then buy from them—as either a digital download or a print—with prices starting at $30. Even without registering, you can click on the TRY IT FOR FREE button and play around with a sample sound file they provide, walking through the following steps:
Choose a color scheme from a menu.
- Choose a linear or radial “style” (i.e., whether your waveform thumbnail runs left-to-right or is looped into a circle) with an option to adjust settings such as the width and spacing of lines representing successive time snippets.
- Add some text, with a choice among fonts, colors, and so on.
The resulting waveform thumbnails are symmetrical along the time axis, with each line representing amplitude as a peak absolute value. That turns out to be a pretty common approach among sound wave artists. But SoundViz also uses “a special, proprietary process” to add color to its waveform thumbnails, which is more unusual. A quick study suggests that the process must consist of assigning a color spectrum to the amplitude range and then displaying each line along the time axis in the color corresponding to its amplitude. This makes the results a lot more visually appealing than they would otherwise be.
Another out-of-the-box option for those who want some help in trying their own hands at sound wave art is Waveform Artist by Design Rocket, an extension for Photoshop CC2015+ priced at $8. Among other things, it lets you use the waveform thumbnail shape to vignette some other image, as well as offering some rather interesting options for stylization. The “technical” style seems to involve using multiple lines to connect samples staggered at different intervals, while the “simple” style seems to apply some kind of smoothing average to the top and bottom of the waveform thumbnail.
Those who prefer to enlist the creative skills of a live human artist might instead turn to Artsy Voiceprint, Rindle Waves, or Bespoken Art (not a complete list). Any of these concerns can make you something comparable to what SoundViz or Waveform Artist would generate, but they can also transcend the limitations of those do-it-yourself systems to produce pieces that are more complex and more conceptually interesting, such as composites that overlay two or more waveform thumbnails, each in a different color.
Whenever we find artists overlaying multiple waveforms, it seems they mean to reflect some connection among them: the voices of a couple, say, or of the members of a family. But when all they want to do is squeeze in more content than they could comfortably fit into a single pass, they tend instead to arrange a few separate waveform thumbnails parallel to each other, oriented from left to right and top to bottom like “ordinary” writing.
It’s not uncommon to find a waveform looped into a circular or “radial” layout, but I have yet to find anyone coiling one into a spiral, even though—oddly enough—I have seen this approach taken to lyrics accompanying a waveform (see the example on the right, from MangoPrintArt).
Sound wave art destined for hanging on walls is mostly printed on paper or canvas, but examples can also be found printed on wood or encased in glass.
One conspicuous outlier in the world of sound waveform wall art is the work of Tim Wakefield and the Soundwaves Art Foundation. As we’ve seen, nearly everyone else out there is using waveform thumbnails. By contrast, Wakefield takes nicely spread-out waveforms as his raw material, overlays them in brilliant colors, and then warps them to introduce a dizzying array of new twists and curves and angles. The waveforms are said to come from the recording sessions for iconic popular songs, and the prints are (mostly) signed by the musicians, with proceeds from their sale benefiting charity.
Wakefield’s art diverges in two ways from what most others have been creating, and I want to distinguish between them because either direction can also be explored separately on its own—and I’m more interested in exploring one than the other.
I haven’t seen any detailed technical account of Wakefield’s methods, but it’s clear he spends more time manipulating his source material than the other artists whose work we’ve been considering. It might be more accurate to say that he’s crafting new structures of his own out of pieces of sound recordings than that he’s exposing structures already latent within the recordings themselves—more act of creation, less voyage of discovery. The results are visually captivating, but they don’t seem to follow as inexorably from the source material as other waveform art does.
The other difference, as I’ve already mentioned, is Wakefield’s use of waveforms scaled so that their shapes can actually be seen—but it’s not only that. Complex waveforms display cyclical patterns that vary gradually in shape and size as they repeat, and Wakefield often seems to be overlaying cycles in order to exploit these patterns. In fact, if I had to guess at his compositional routine, I’d bet it starts with him searching recordings for bits of waveform that can be overlaid cyclically to good effect.
But cyclical layering can also deliver compelling results on its own, without the kind of extra manipulation Wakefield throws into the mix. To illustrate the point, let’s take the audio clip from which I previously excerpted this waveform—
—but which, in its entirety, covers just under one second’s worth of me slowly pronouncing the diphthong “ow” in something close to a monotone. Instead of laying it out in a single strip, let’s now loop it in cycles of 383 samples and plot them all overlaid on top of one another. To make the change in shape over time easier to see, we’ll gradually shift the color of the trace from green at the beginning to red at the end. And we’ll give each trace a consistent amplitude scale spanning 256 pixel rows, but for the image below at upper left we’ll hold it in a steady vertical position, for the image at lower left we’ll shift it down by one pixel row per cycle, and for the image on the right we’ll shift it down by five pixel rows per cycle. The effect in each case is somewhat different.
By incrementally varying the positions of the traces from left to right as well as their vertical positions, we can also produce striking stereoscopic effects, as illustrated by the anaglyph below—get out your red-cyan glasses if you’ve got ’em!
One challenge involved in making images like these is choosing an appropriate cycle length—one that’s close enough to the length of actual cyclical patterns in a particular waveform to make the details of the repetitions reinforce one another. This can be hard to do on the level of individual sound vibrations in a voice recording, since the cycle length in that case will vary continuously with the pitch of speaking or singing. The above examples only worked as well as they did because I was trying to speak in a monotone when I made the recording. One strategy for applying this technique to a wider range of spoken-word recordings might be to vary the cycle length with the fundamental pitch of the voice.
But there are other cyclical patterns in sound waves that we should be able to harness more easily and more reliably. I’m thinking here of the rhythmic patterns found in music with a consistent tempo. Some time ago, I wrote a program that takes any sound recording and uses autocorrelation to try to find the strongest repeating cycle within it—expressed in samples—between 1.4 and 2.6 seconds in length. My goal at the time was to average musical recordings based on these cycles, distilling them down to a few composite bars that would typify them better than any one real excerpt. It works pretty well for that purpose much of the time, although I wouldn’t say I’ve got the process quite perfected yet. Here are a few listening examples of cyclical song averages made by superimposing whole recordings looped into cycles at eight times their algorithmically detected cycle lengths.
We can apply these same cycle lengths when creating visualizations. To illustrate, let’s take the original German-language version of Falco’s “Der Kommissar.” My algorithm detected a cycle length in it of 89703 samples, and I’ve used this figure to generate an image following the same basic technique as above—shifting the position of the trace down three pixel rows per cycle—except that I’ve taken a different approach to color, making the trace redder where its amplitude is higher and greener where it’s lower. (If you want to know the specifics, I’ve set its RGB values proportional to [√x, 1-x, y], where x is the average absolute value of amplitude between 0 and 1 in a 100-sample window and y is a constant.) The full-size image is a whopping 89703 pixels wide, since every sample in the cycle corresponds to a pixel column. That’s far too big for me to post here, but here’s an excerpt at full resolution you can open in a new window if you like.
Viewed at this scale, the result gives me the impression of an overgrown garden choked with thorn bushes and cobwebs and dusted with a light snowfall. But if we take a step back, more definite patterns emerge out of the chaos. Here’s a version of the whole image with the time dimension downsampled to 2000 pixels (which comes out to around 983 pixels per second), although the same effect could be produced by stretching the full-resolution image vertically.
From this distance, the image looks to me more like a fine gossamer weave of impossible intricacy that I imagine might be fluffy to the touch or sticky like cotton candy. As you may already have guessed, the evenly-spaced “tufts” correspond to beats in the music, and they’re staggered as they are because of a rhythmically anomalous interlude around the 2:30 mark (which explains in turn why my cyclical song average for this recording didn’t turn out very well). If we want another vantage point on the image’s rhythmic symmetries, we can coil it into a spiral.
As we’ve seen, most sound wave artists take a waveform that’s been squished down into a thumbnail along its time axis and maybe split it into four, six, ten, or a dozen parallel strips. But I think it can be more illuminating to take an unsquished waveform and split it into many more parallel strips like this, provided we do our splitting in such a way as to reinforce and expose repeating patterns. Sound waves are full of wonderfully intricate patterns, and the success of sound wave art can be gauged in part by how successfully it draws them out.
There isn’t enough information in a waveform thumbnail for us to “play” it and get something audibly recognizable out of it. However, people have sometimes made arrangements for triggering a sound in connection with a piece of waveform thumbnail art. For example, an online tutorial entitled “How to Make an Interactive Sound Wave Print” gives instructions for installing a proximity sensor behind a framed print that will cause a song to play whenever you touch it. A more elaborate variant on this theme is the “Soundwave Art™ App” promoted through Jaxon LaTour Designs’ soundwaveart.com, another concern that sells custom art based on waveform thumbnails. If you have this app installed on your smartphone and scan a waveform image—or any image, for that matter—it will check the image against its database, and if it finds a match it will play an associated sound or video. The business model for this “augmented reality experience” centers on customers paying to register their images, sounds, and videos in the database. Of course, if the database were to go offline at some point in the future, the linkage between the images and sounds would be lost. On the other hand, if it were possible to play the sounds by processing the information present in the waveform images themselves, there wouldn’t be any need to store the sounds in a database of this kind in the first place.
Could we make waveform images like that?
Obviously we could—to a point—since it’s already been done, and with technology far more primitive than what we have available to us today. We can take phonautograms (sound waveforms scratched onto soot-covered paper starting in the 1850s and 1860s), scan them on ordinary flatbed scanners, and extract meaningful audio from them. Surely we should be able to create waveform art today that’s at least equally playable.
Now, you couldn’t play back “Der Kommissar” from the images I presented above because the successive passes of the waveform overlap each other, and there’s no way to untangle them. But if we wanted to make a playable waveform picture of it, we could instead display it as a succession of clearly separated parallel wavy lines, as illustrated by this tiny excerpt:
That’s what phonautograms look like, and we know those are playable. But presenting the whole of “Der Kommissar” like this in 8-bit, 44.1 kHz resolution would require strips 256 pixels high and an overall image 50957 square pixels in area. That’s awfully big. Fortunately, we have another option that doesn’t take up nearly as much space. Of the three waveform formats I listed earlier—the wavy line, the band of varying size, and the band of varying intensity—the third can be a single pixel high and still carry all the information it needs to sustain playback. Below is an 8-bit, 11.025 kHz recording of “Der Kommissar” formatted as parallel bands of varying intensity, which requires only a 2804×905-pixel grayscale image (for a sample rate of 44.1 kHz we’d need to go up to 5607×1819, which still isn’t bad).
The picture shown above can be played back as sound by reading the pixels from left to right, and then from top to bottom, and converting their intensities (0 to 255) into successive audio samples (-1 to +1). I know it can because I’ve done it. Here’s an excerpt of audio extracted from the actual JPG shown above, but to be clear, the picture contains the whole song and not just this brief clip.
The sound may not be exactly high fidelity, but there’s no inherent limit to the audio quality we could encode according to this strategy. To increase the sample rate by any amount we want, we’d just need to increase the number of pixels. And we have some options for boosting bit depth too. The above image is 8-bit grayscale, a common standard that uses eight bits to represent the intensity of each pixel, which can correspond in turn to an 8-bit audio sample. A 16-bit grayscale image could accommodate 16-bit audio samples in the same way, although that standard isn’t as widely supported. A more versatile possibility would be 24-bit color, another common standard that uses eight bits to represent the intensity of each pixel in each of three color channels—red, green, and blue—for 24 bits total per pixel, which could be used to represent a 16-bit or 24-bit audio sample. Archival quality audio could be stored as 24-bit color images at the rate of one 310×310-pixel square per second.
But what I’ve just been describing wouldn’t necessarily qualify as sound wave art. For that, I believe we’d want something that not only presents lots of audio data, but also does so in a visually engaging way—something, in other words, that will transform cool audible patterns into cool visible patterns. And I believe we’d also want something that can have a life beyond the computer screen.
This would prompt me to want to choose a different approach to color, for a start. The usual formula for translating a 24-bit integer into a RGB color value is (r×216)+(b×28)+g. If all we care about is encoding playable audio information in an image file format, that will work just fine. However, it’s not suitable if we care about what our results look like, since only the red channel will vary proportionally with amplitude, while the blue and green channels will come out as visual noise. Unfortunately, I haven’t figured out any way of encoding 24-bit audio samples in 24-bit RGB color space that results in visually meaningful color distinctions. I also doubt that if we were to print out a 24-bit color image and then scan it back into computer memory again, we could get 24 meaningful bits of audio back out of it. It would probably even be tough just to get a sufficiently squared scan to avoid crosstalk between successive strips unless we increased the height of the strips.
Here’s how I propose we think about this. The 8-bit grayscale and 24-bit RGB techniques described above demonstrate that playable sound waveforms can be encoded compactly and efficiently as images to any desired level of precision. They provide us with a benchmark for what’s technically possible. The challenge now is to refine this approach to make its results more visually engaging. The refinements might well come at the expense of playability, but I believe that’s a fair trade-off as long as the outcome is to make audible patterns more clearly visible.
Let’s start with vertical synchronization, which has no impact whatsoever on playback but has a huge impact on visible patterning. The grayscale image shown above looks a lot like a messed-up video image, and there’s a good reason for that. The cycle length for “Der Kommissar” is 89703 samples at 44.1 kHz. At 11.025 kHz, however, that cycle length translates into 22,425.75, and we can’t split a cycle in mid-sample or mid-pixel. I chose the dimensions 2804×905 because 2804 is close to 2803.21875 (89703 divided by 32) and yields a decently scaled rectangle. But this still leaves a fairly large proportional error in cycle-looping, so the beats in the music gradually drift along the horizontal axis, producing distortion analogous to that seen in the first five examples below.
To take full advantage of the 89703-sample cycle using the same format, we’d need an image 89703×228 pixels in size, which would make for for a really long, really narrow strip. I suppose we might try resampling the audio so that it repeats at some more amenable cycle length. But it’s simpler just to go ahead and create the 89703×228 strip and then resize it to whatever dimensions we like, which is the provisional approach I’m going to take here. Of course, the actual cycle probably isn’t exactly 89703 samples in length either. If we were to upsample the audio to 96 kHz and analyze it again, we might find it’s actually closer to 89702.5 or 89703.5.
Another drawback of the earlier grayscale image is that it doesn’t include any stereo information. Here my thought is to alternate between the left and right channels in successive horizontal strips. The left and right channels will usually look similar to one another, so placing them side by side will both reinforce their common features and highlight their differences. And if we’re going to end up stretching the image vertically, I’ve found that it’s advantageous for each strip in the timeline to be at least two pixels high anyway.
In the grayscale image shown above, the brightest and darkest pixel values correspond to peaks and troughs of greatest amplitude, while the zero point on the amplitude scale corresponds to gray. This practice of defaulting to gray leads to a somewhat washed-out appearance. But if we instead use one color channel to show all the positive amplitudes and another color channel to show all the negative amplitudes, we can (1) link greater amplitudes consistently to brightness, and silences to darkness, which presents a more vivid contrast; (2) introduce some attractive and meaningful patterns of color; and (3) increase our bit depth by one bit, since 28×2=29. This strategy also leaves us with one out of three RGB color channels unassigned and available for some other use. For the moment, I’ve decided to use the extra channel to encode displacement amplitudes, since these are related to the velocity amplitudes but also differ from them in phase and in other ways, such that I thought the combination would produce even more interesting visual patterns. Adding it in also means we don’t need to choose between the physical shape of a digital waveform display and the physical shape of a record groove: we can have them both at once! Here, then, is a provisional strategy for using color:
- The green channel represents positive velocity amplitudes (0=0, +1=255).
- The blue channel represents negative velocity amplitudes (0=0, -1=255).
- The red channel represents displacement amplitudes (-1=0, +1=255). [Correction, 11/16/2021: actually, in the examples shown I assigned it to the absolute value of displacement amplitudes (0=0, ±1=255)]
One dilemma is whether to apply gamma encoding to the results. The numerical values used for sRGB pixel intensity values aren’t linear, so if we start with linear PCM data and map it in a linear fashion onto pixel intensities in the range 0-255 (as I did for the grayscale image above), the shades in the resulting image won’t end up displayed proportionally to the audio values. On the other hand, if we convert our linear PCM values into sRGB values, and then try to convert those back into linear PCM values, we’ll lose resolution. Since our priority here is to optimize things visually, I’ve decided that yes, we should apply gamma encoding.
When we do everything I’ve just described, we end up with pictures that are richly patterned and give a meaningful overview of structures within the music. For convenience, I’ll refer to this kind of picture as a soundweft to reflect its juxtaposition of horizontal and vertical “threads” to form patterns (acknowledging that “Soundweft” is also the title of an unrelated clarinet composition by Edward McGuire). The soundwefts shown below have each been resized to 3000 pixels square (open them in a new window to view at that resolution).
“Der Kommissar” (Falco), cycle = 2.034 sec, orig height = 228:
“Blurred Lines” (Robin Thicke), cycle=2.000 sec, orig height = 264:
“Gangnam Style” (Psy), cycle=1.818 sec, orig height=278:
Maurice Ravel, “Boléro” (London Symphony Orchestra), cycle=2.743 sec, orig height=694:
“Royals” (Lorde), cycle=1.418 sec, orig height=272:
“Topsy, Part Two” (Cozy Cole), cycle=2.389 sec, orig height=166
Different color schemes could be used, of course, and there are countless other possible variations besides.
To play back audio from a soundweft at the correct speed, you need to know the cycle duration in seconds (e.g., 89703/44100=~2.034 seconds for “Der Kommissar”), and to set an output sample rate accordingly (e.g., 3000 pixels / 2.034 seconds =~ 1475 Hz). It’s also helpful to know the original image height (e.g., 228 pixels for “Der Kommissar”) for working out the number of pixel rows you need to advance to catch the center of each strip after resizing (3000/228=~13.158, so we start at the pixel row closest to 13.158/2=6.579~=7 and then advance by adding 13.158 per rotation: 19.737=~20, 32.895=~33, 46.053=~46, etc.), remembering to alternate between left and right stereo samples, although I’m sure this could be worked out from the image itself if needed.
When soundwefts are scaled to 3000 pixels per side, as they are here, their sample rate will tend to be around 1500 Hz, with a Nyquist frequency around 750 Hz. But if we scale up to 6000 pixels, we boost the Nyquist frequency to 1500 Hz; at 12000 pixels, it’s 3000 Hz. If we’re okay with a rectangular image rather than a square one, the latter could be something like 12000×6000 pixels; printed at 300 dots per inch, that would come out 40×20 inches, or 3 1/3 × 1 2/6 feet, which seems perfectly feasible for commercial sound wave wall art. If someone were to print out one of these images and scan it back in, there would admittedly be some generation loss that would reduce audio quality by an unpredictable amount. But as far as the digital side goes, here’s an excerpt of “Der Kommissar” played back as audio from (1) the actual 3000-pixel-wide JPG displayed above; (2) a 6000-pixel-wide TIF; (3) a 12000-pixel-wide TIF; and (4) a full-scale 89703-pixel-wide TIF.
So the soundweft scales well. The audio quality we can retrieve from it varies across an enormous range depending on how much we shrink its dimensions. If we don’t shrink the time axis at all, we can extract respectably high-quality audio. On the other hand, if we shrink it down to the dimensions of typical sound wave wall art, we can still retrieve something recognizable. That’s not true of most waveform thumbnails. Here’s a thumbnail of “Der Kommissar” generated using a free online tool.
Each vertical line seems to cover the distance between the peak positive amplitude and peak negative amplitude within a given time window. Below you’ll find an attempt to play back the same part of the song heard in the previous set of listening examples, my strategy this time being to translate each pixel column into a pair of sequential audio samples: one corresponding to its top and one corresponding to its bottom. What you hear is probably the most faithful playback it’s possible to extract from this picture. With some reverb added, I think it would sound like a pretty convincing helicopter.
Meanwhile, soundwefts also work well as images: they’re interesting, varied, and meaningful. Equivalent patterns can sometimes be glimpsed less clearly by looking at the groove of an LP with a steady and intense beat, and if an LP were cut to make one revolution per cycle length, the patterns would come similarly into focus. I’m not making them up. My code isn’t pulling them out of thin air.
So if you like the idea of hanging a picture of a musical recording on your wall, would you consider going with a soundweft instead of a waveform thumbnail? I suppose soundwefts don’t have much in common with what most people expect a sound recording to look like. But isn’t it all the better when art encourages us to view things in new ways?
Those who really want to get into sound wave art will be glad to hear that much of it is wearable. For example, you can find clothing with sound waves printed on it, including custom T-shirts with personalized messages (MixPixie, below left); predesigned T-shirts based on recordings with broad appeal (Rindle Waves, below middle); and a rather stylized sound wave found on a swimsuit (Miraclesuit, below right).
Sound wave patterns have also been knit into garments such as personalized sound wave scarves (KellyWoveIt, below top; Martina Sestakova, below left) and sound wave beanies based on a pattern by The Crochet Zombie (below right).
But the most common type of wearable sound wave art is sound wave jewelry—especially rings, necklaces, bracelets, and key-chains (although I guess the last of these might not always count as “wearable”). We can find pieces of sound wave jewelry made from many different materials: various metals, leather, plastic, and wood, as well as images enclosed under glass. The waveforms might be incised, upraised, cut out, or formed in a different color, but they usually comprise a single strip with a beginning and end, maybe bent into an arc (but probably not a closed loop). And they’re pretty much always thumbnails, but the formal requirements of jewelry are a bit different from those of wall art, so we run into some interesting variations in approach. Often the waveform is presented as a continuous filled band (see examples below by yhtanaff and AnLGiftsCo):
But sometimes creators try to make the shape look more like a sound wave that sometimes returns to zero by introducing gaps at regular intervals, resulting either in spiky chunks (below left, by FJ4LifeCreations) or in simpler bands of uniform height (below right, by Braceletshomme).
—and showing samples as 3D columns that zig-zag perpendicularly to the dimensions used for time and amplitude, as in this piece by Custom3DSoundWaves:
Another strategy, followed by David Bizer and AudioBracelets (latter shown below left), is to string together circular beads of different diameters representing successive samples in a sound wave (something similar has also been done with vinyl LPs). Or sometimes Bizer instead uses a 3D printer to create a continuous axisymmetrical shape with a diameter that represents the sound wave in a similarly stylized form, but with variation in taper and size along the time axis (shown below right).
This 3D axisymmetrical approach isn’t limited to jewelry. For example, it also turns up in NOCC’s “objects of sound”: utilitarian objects, such as the candle-holder and light shown below, whose shapes are based on sound waves corresponding to their spoken names.
And sound waves can be made to produce other utilitarian forms besides, as illustrated by Matthew Plummer Fernandez’s Sound/Chair: “A sound that when plotted on a volume, frequency time graph resembles a chair.” Judging from the description, Fernandez actually thinks of the synthesized sound recording itself as the “work.” However, to create a chair from it, its amplitude values need to be translated into height in segments running from front to back and then stacked left to right (or back to front, or right to left—I’m not sure about the directions).
The appeal of sound wave jewelry is based at least in part on a belief that the audio is meaningfully there. Thus, JoyComplex promotes its offerings explicitly in terms of accuracy: “One of the most accurate representations of audio wave forms you can currently achieve. Using state of the art metal 3D printers I apply your custom message to my design to create an ultra accurate representation of your voice in metal.” This doesn’t necessarily imply the sound can be played, though, and Etsy seller RosiesDesignStudio includes a disclaimer to forestall any misunderstanding: “Note: this item does not make sound.” Danielle Crampsie’s “Soundwave Jewellery” blog has a nice post in which she describes various situations where people (including me) have been able to decipher pictures of sound waves. “After working with sound waves and people in love for over 15 years,” she writes, “I can identify the the words ‘I love you’ from looking at the image of the sound wave.” However, her own pieces still come with an audio CD, while MyKosmima provides a QR code linked to a sound file and ArtbloxShop supplies a “playback barcode.” I haven’t yet seen any piece of sound wave jewelry that looks as though speech could be played back intelligibly from the jewelry itself.
Well, except for one. In 2005, Luke Jerram recorded a twenty-second proposal onto a silver engagement ring (below left) which can be played on a custom, battery-operated phonograph (below right). He describes the process of making it as follows:
Sneaking out of the house at strange times of night I worked with the record manufacturer in his basement. Spinning a wax ring beneath a diamond stylus we cut a message 20 seconds long into its surface. We cast the ring into silver and played it back by spinning it under a stylus. There was nothing.
We tried to cut directly into a silver ring and still nothing. As a last ditch attempt we cranked up the weight forcing the diamond into the silver, using 100lbs of pressure per square inch to make our mark. Finally as the ring turned beneath the stylus a thin trail of silver anti-sound poured out of the rings surface.
Jerram reportedly gets frequent requests to make similar rings for other people, but he turns them down: “I’m afraid this ring was a one off, made for my wife. I’m afraid you’ll have to make your own unique ring for your own unique partner!” I’m surprised nobody has stepped in to fill the demand. Let’s consider some of the technical details. Jerram’s ring appears to be modulated vertically (up and down), like a phonograph cylinder, rather than laterally (side to side), like the groove on a gramophone record. Indeed, it looks remarkably similar in format to recordings cut into metal rings during the nineteenth century, including one well-known tin cylinder of “Twinkle, twinkle little star” made in the fall of 1888 to fit inside a prototype talking doll.
The groove on Jerram’s ring coils around it in a helix, but its groove pitch—how far the stylus advances with each rotation—is visibly irregular, suggesting that the cutter was guided by hand rather than by a feedscrew as was usual back in the day. Unlike Jerram’s custom playback machine, the Edison phonograph also used a feedscrew during playback, which meant that the groove didn’t need to guide the stylus, as the groove of an LP does, and could instead take the form of shallow, unconnected pits. Jerram may have had a tough time cutting deep enough into metal with delicate modern recording equipment to form a continuously connected groove that could guide a stylus during playback, but people long ago seem to have had no trouble making and playing shallower vertical-cut records with cruder acoustic (non-electric) apparatus that harnessed the brute mechanical force of sound waves. I’d think someone could build a viable machine for cutting metal “phonograph rings” using nothing but 1870s technology (including a feedscrew), and with no more effort than would have gone into constructing a typical experimental phonograph of the time. Gold would cut better than silver, but would wear out faster; a tighter, more regular groove pitch would allow for a longer recording time (or a faster recording speed). The playback instrument could likewise be made strictly mechanical, working entirely without electricity. Maybe it could be manufactured in India using some of the skills, facilities, and production capacity currently devoted to making reproduction wind-up gramophones. This all seems perfectly feasible to me.
But even if nobody wants to take things quite that far, some other kind of waveform image could be coiled into a helix too, so that it could be spread out over multiple passes around a ring. That would free artists from the need to squish waveforms into thumbnails, or at least to squish them as much as they usually do. The audio might still not be conveniently recoverable, but it would be there. A helix probably wouldn’t work as well for something like a necklace or a bracelet, but multiple passes of the same waveform could also be stacked parallel to one another, for example by using multiple beaded strings. Beads could be made narrower to increase the sample rate, and bit depth could be boosted by using color as well as diameter to represent amplitude—say, by having eight sizes and eight colors and encoding 8-bit samples as (size × 8) + color. Axisymmetric shapes could be lathed from full-resolution waveforms rather than stylized thumbnails. Soundwefts could be knit into scarves and sweaters—or laid out in colored tiles.
Sounding the Tattoo
Back in 1878, William Henry Preece related the following anecdote:
The phonograph was discovered, like many other things, by pure chance. Edison himself was experimenting with the telephone, trying all kinds of experiments, as all of us have been doing, to improve the telephone; in doing so, he pricked his finger, and, drawing it rapidly away, a line was made on his finger. This gave him the notion that if the diaphragm of a telephone could mark his finger, why should it not mark paper, and if it marked paper, why could the sound not be reproduced?
If Preece’s account is accurate, Edison made his first-ever sound recording not on wax-coated paper or tinfoil, but on his own skin. This trail of interrupted blood-spots wouldn’t have been permanent, but Edison also decorated himself with some more lasting dots in the form of a quincunx tattooed on his forearm. And his invention of the electric pen—a forerunner of the ones used by today’s tattoo artists—might seem to clinch the case that sound waveform tattoos were meant to be from the start. They’ve certainly received a hefty chunk of the media attention paid to sound wave art in more recent years. Something about them seems to have captured the popular imagination in a big way.
For some sound wave art companies, such as SoundViz, the sound wave tattoo is a mere sideline. But for one company, Nate Siggard’s Skin Motion, it’s the central focus. Here’s how their process works. You upload a recording on their website, and their software generates a waveform “stencil” as a downloadable PDF—that part’s free. Next you’re supposed to take the stencil to a “certified approved” tattoo artist for inking. Then, if you pay a $39.99 activation fee and a $9.99 fee annually thereafter, you can log into an app on your smartphone and scan your tattoo—though currently not anyone else’s—and it will match the waveform against its database in the cloud and play the audio associated with it, with the playback position highlighted onscreen by a glowing stripe passing along the waveform shape. It’s a clever process, and you can read Siggard’s application for a U. S. patent on it here—although there’s some obvious conceptual overlap with the “Soundwave Art™ App” I described earlier.
Nowadays there seem to be a fair number of actual Skin Motion sound wave tattoos out there, judging from tweets and such. Sometimes the tattoo is just the waveform by itself, but sometimes the waveform has been made part of a larger design. Perhaps the most famous example is the waveform tattoo Sakyrah Angelique of Chicago got of a voicemail from her late grandmother, which made international news. Clearly Skin Motion’s work has struck a chord. But it has also attracted its share of critics, including Rain Noe, who writes:
Obviously it would be cheaper to simply get the tattoo and play the clip whenever you want, for free, on your phone’s music app. Yet the fact that Soundwave Tattoos are a going concern indicates that there are willing customers. Why do you suppose this is? Do you think people actually believe the app is scanning the soundwave and turning those scribbles into audio, or do they just not care but enjoy the novelty of “scanning” one’s arm?
I wouldn’t underestimate the coolness of seeing the glowing stripe follow your waveform tattoo (as viewed through your phone) while the sound is playing. But the Skin Motion app is indeed designed only to recognize a pattern inked on your skin, much like an arbitrary QR code, and to pull any recording from the database that happens to be associated with it. Writes James Hennesy: “It’s not actually ‘playing back’ the waveform, which would be impossible.” But of course there’s nothing inherently impossible about “playing back” a waveform picture; the pertinent question is whether the specific waveform pictures used by Skin Motion could be played back.
Here’s a specimen of an actual stencil generated by Skin Motion from a recording just over 1 2/3 second long:
Skin Motion provides stencils like this one as bitmaps embedded in PDF files. I imported this one into Photoshop at 2400 dpi and then downsampled its height and width by half to get to approximately the resolution I found I could obtain by zooming in on the PDF itself. At this scale, the waveform image is about 6225 pixels wide, which comes out to about 3735 pixels (or samples) per second for a 1 2/3 second recording, although the company actually recommends a length between five seconds (1245 samples per second) and twenty seconds (311.25 samples per second), with an upper limit of thirty seconds (207.5 samples per second; my calculations assume the width of the images doesn’t vary with duration). Of course, the transfer to skin and inking of the tattoo would also entail some loss of precision, not to mention the further loss of sharpness to be expected with the aging of any tattoo over time. In short, the waveform in the stencil shown above should be more suitable for playback than any actual Skin Motion tattoo. It represents a kind of ideal case. So can we play it back?
Well, sort of. The waveform image is exactly symmetrical along the time axis, and the height of each horizontal line appears to represent a peak absolute value of amplitude—that is, the maximum velocity recorded during a particular span of time, regardless of whether it’s positive or negative. That’s not quite the sort of information we’d need for conventional audio playback, but we can still educe the data as sound after a fashion if we want to get creative about it. The sound file below presents (1) the original source recording; (2) the waveform played back with the height of each pixel column translated into a single audio sample; and (3) the waveform played back with the height of each pixel column translated into a pair of sequential audio samples, one positive, one negative.
That actually turned out better than I was expecting it to. But remember that it’s very much an ideal case. A five-second recording wouldn’t come out nearly as well from a stencil, much less a twenty or thirty-second recording. I’m less sure how to judge one comment I saw on the Skin Motion Facebook page: “Those of us not so gullible would know that there is no possibility of obtaining a tattoo with enough detail to read a sound wave pattern and produce the audio faithfully if at all.” For whatever reason, there doesn’t seem to be much discussion out there about the maximum potential resolution and accuracy of tattoos. And so it’s hard to say what inherent limits there may or may not be on the playability of tattoos of sound waveforms.
But I think a different approach might have a better chance of success. There’s another type of sound wave picture out there that can already be scanned by smartphone using an existing app and played back from the data in the image itself, without needing to access an outside database. I’m thinking of the PhonoPaper system developed by Alexander Zolotov. Here’s a sample picture from the project website:
To use the PhonoPaper app, you focus your phone’s camera on the picture, and as you move it along the picture from left to right, it will play the corresponding audio. Or if you move your phone backwards, it will play the audio backwards.
The technique is so straightforward and transparent that I was able to write my own code to play PhonoPaper pictures, and the sound I get seems to be identical to the sound the official PhonoPaper app plays (as heard in the promotional video from which I grabbed the above still).
One person has asked the PhonoPaper developers to create a simplified version of their app that could be used specifically for tattoos. But I’m not sure that’s needed. If a regular PhonoPaper picture were used as a tattoo stencil, there’s no doubt in my mind that the PhonoPaper app—in its current state—would be able to play the result. It might admittedly be hard for a tattoo artist to ink some of the finer grayscale details. But I’ve experimented with boosting the contrast on PhonoPaper pictures until there’s nothing left but stark, black lines; the sound isn’t quite as clear afterwards, but it’s still recognizable and intelligible. Overall, my sense is that the kind of playback the Skin Motion system simulates is something the PhonoPaper system could actually do.
The main reason why the PhonoPaper system can do what it does is that its sound wave pictures aren’t oscillograms. They’re spectrograms, which follow a different set of rules and open up a different world of possibilities.
The Sound Spectrogram
While a sound waveform or oscillogram shows back-and-forth oscillations over time, a sound spectrogram instead shows frequencies and their intensities over time. Each strip along its time axis is analogous to the rainbow-colored pattern you get by passing light through a dispersive prism.Bell Laboratories developed the first sound spectrograph in the 1940s as a piece of analog apparatus. A looped recording was played repeatedly through a band-pass filter, which is designed to let through only the signal in a particular frequency band and to block anything else. As this filter was tuned incrementally to different frequency bands covering a range from (say) 50 Hz to 3500 Hz, the output was made to leave marks on electrically sensitive paper, darker where there was more signal, lighter where there was less signal. Sometimes more elaborate color scales or three-dimensional contours were used for displaying intensity instead.
We can emulate the same process digitally today. If we take a recording, apply a sequence of digitally implemented band-pass filters to it at incrementally lower frequencies (following a logarithmic scale), and display the results as successive rows of pixels representing amplitude (using red for positive values and green for negative values), we get something that looks like this:
And then if we drastically compress the time axis, we get something that looks like this:The lowest frequencies remain safely below the Nyquist frequency, so we continue to see red and green striations there. Meanwhile, the higher frequencies have all blurred to yellow—we can’t make out their individual oscillations any more. But that doesn’t matter: we still know what frequencies the yellow marks represent because of where they fall along the vertical axis. This is why a low-resolution spectrogram can be used to show high frequencies, even though a low-resolution oscillogram can’t. And if we then want to play back the sound, we can synthesize pure tones at the frequencies and intensities indicated by the picture and add them all together, a process known as additive synthesis. Even if the phase relationships don’t end up reconstructed correctly—and there’s no reason they would—the resulting audio will still sound a lot like the original.
To illustrate, the audio file below contains (1) the source recording for the image shown above; (2) the image played back strictly as an oscillogram, based on oscillations between green and red; and (3) the image played back as a spectrogram, using additive synthesis.
The method I’ve just demonstrated isn’t how most people create sound spectrograms these days. In today’s digital processing environments, they usually turn instead to mathematical transforms: either a Fast Fourier transform (FFT), where time resolution is equal for all frequencies, or a wavelet transform, where time resolution increases with frequency. It’s common for sound editing programs to let you toggle back and forth between a waveform view and a “spectral” view based on FFT. Spectrographic representations of sound aren’t limited to visual display, but are also the basis for many compressed sound file formats, including the mp3. However, when they are used for visual display, they have a big advantage over oscillograms in that they’re much more susceptible to being deciphered by eye. People can learn to identify specific speech sounds, musical chords, and other things by looking at spectrograms, but not by looking at oscillograms. This makes sense, because the spectrogram decomposes sound waves in much the same terms as our inner ears do. You might think of the cochlea as nature’s sound spectrograph. The human sense of hearing is mainly spectrographic. Thus, sound spectrograms arguably display sound waves much more “as we hear them” than sound oscillograms do.
But sound spectrograms turn up much less frequently than sound oscillograms in sound wave art. A company called Vapor Sky used to offer spectrogram prints under the brand name “Spectrum Decor” (shown below at top) as an alternative to their waveform thumbnail prints (“Resonant Decor”), but they no longer seem to be in business, and I can’t find anyone else offering anything comparable—although there’s a Tumblr blog called “Beautiful Spectrograms” dedicated to sharing images like these online. Etsy seller ArtFromAudio takes a more creative approach by using a waveform thumbnail to vignette a vertically mirrored spectrogram (shown below at bottom). The visible patterns in the two types of display complement each other nicely, although much of the spectrogram ends up missing.
But the best-known spectrographic wall art is probably the work of Mark Fischer, who operates under the name Aguasonic. Fischer’s process entails creating a wavelet-based spectral graph from a recording of an animal vocalization and then transforming it from rectangular to polar coordinates such that the lowest frequencies, with lowest time resolution, get squeezed into the middle while the highest frequencies, with highest time resolution, can stretch out comfortably around the periphery. This makes for a sensible and effective use of the polar format. And the results are gorgeous.
Some journalistic writing about Fischer’s work contrasts his wavelet approach with “spectrograms,” but I think that’s a misunderstanding of the terminology. Fischer’s mandalas plainly have a time dimension and a frequency dimension, and in my book that makes them spectrograms by definition. For me, the real contrast lies between wavelets and FFT as different methods of creating spectrograms.
The beautiful symmetries in Fischer’s work result from him choosing segments of audio where the frequency spectrum is holding relatively constant and looping them with consummate skill. I haven’t had as much luck getting this to work, although the results can still be respectably psychedelic even when it doesn’t—see the example on the right.
Spectrograms are less well attested in jewelry than in wall art, but there’s at least one example out there: Gilles Azzaro has designed a 3D-printed spectrographic pendant of the spoken phrase, “Love is the answer” (below left), in which intensity is translated into height. He has also extended the same principle to a larger 3D-printed representation (below right) of an excerpt from Barack Obama’s 2013 State of the Union address, mounted in a display case with a START button and a laser beam that traverses the sculpture while the audio plays. I don’t think the laser actually plays back the sculpture, but I’m pretty sure something like that could be arranged.
Azzaro has even proposed “a new architectural concept whereby the shapes of buildings are determined by voice prints: words and phrases can design buildings, districts and towns.” That I’d like to see—with NOCC lights hanging from every ceiling, Fernandez Sound/Chairs in every living room, and everyone wearing a Crochet Zombie sound wave beanie!
Compared to these examples, the sound spectrograms used in the PhonoPaper system are pretty straightforward. According to the official specification, the frequency scale starts at 65.4 Hz (C2) and then goes up eight octaves from there, but I’ve found by experiment that I need to set the range a tone higher than that to match the results of the PhonoPaper app, from 69.3 Hz to 17740.8 Hz. The scale is logarithmic, rather than linear as it is in some spectrograms, so that musical tones and octaves end up evenly spaced; a blank template is provided in case you want to try drawing musical notes freehand. The default duration is said to be ten seconds, but the timing of playback is controlled wholly by moving the smartphone: the app continuously synthesizes a combination of tones corresponding to whatever frequencies the “cursor” happens to be covering from moment to moment.
I’d now like to share just a few other ideas I’ve had for using spectrograms in sound wave art, although there are so many possibilities that it’s hard to know even where to start.
One option we have is to split a spectrogram into strips based on longer-term cycles in a recording and to stack them as a soundweft. We can also display a complete spectrogram and a complete oscillogram simultaneously by overlapping them in different colors. In the example below—which shows the opening segment of “Gangnam Style”—a spectrogram occupies the blue and green channels, while an oscillogram occupies the red channel, resampled to fit into the same space as the spectrogram, with each column running first from top to bottom before advancing from left to right.
One thing neither oscillograms nor spectrograms ordinarily show very well is the connection between musical notes in the same pitch class (such as “C”) but in different octaves. However, there are methods available for bringing that connection out. Take Spiral, for example: an ingenious music analysis plugin that works, as its creator explains, “by coiling the spectrum into a spiral framed by a chromatic circle.” In this way, frequencies corresponding to the same pitch class get plotted along the same radius, but further out from the center the higher they are. But Spiral represents each successive moment in time as a separate video frame, and I don’t see any way to apply the same strategy for displaying a whole recording all at once as a single still image. On the other hand, a promising variant on the sound spectrogram is the chromagram, which has a logarithmic frequency scale that loops at the octave. In other words, frequencies corresponding to the same pitch classes—such as 100 Hz, 200 Hz, 400 Hz, 800 Hz, and 1600 Hz—all get plotted together on the same line.
Meanwhile, there’s also a tradition of assigning pitch classes to different colors; see, for example, the Scriabin circle, based on a circle of fifths. A circle of fifths isn’t compatible with a frequency continuum, but we can take a similarly looped color scale—one that passes through all proportional combinations of R+B, B+G, and G+R in turn—and apply it to the frequencies of an octave arranged in regular order. So I’ve given this a try. In the image at right, I’ve assigned the Nyquist frequency to 12 o’clock (pure red); from there, the color found at each hour around the clock face can then represent a different tone in an ascending twelve-tone scale, with intermediate colors representing frequencies in between.
So now let’s create a chromagram in which each strip has the color corresponding to its frequency, but a varying intensity corresponding to the absolute value of amplitude. And let’s lay out this chromagram as a soundweft so that repeating cycles are split up and stacked vertically, with the height of each “pass” determined by the number of tone classes.
At this point, we have an image that looks like this (for “Gangnam Style”):
Close-up examination reveals some interesting and intricate shapes within each “pass,” but viewed from a distance the colors end up looking like tiny islands of light in an ocean of black. If we want to draw out the colors more vividly, we can average each “pass” into a single row of pixels that mixes all the colors present proportionally to their intensities. A single strong note will then come out as one discrete color on the scale, while a chord will blend two or more of these colors and unpitched noise will display as white. Here’s “Gangnam Style” displayed that way:
Let’s try out this technique on another couple recordings, just for good measure. Here’s “Topsy, Part Two” again, processed in the first way:
And here’s Trio’s “Da Da Da (ich lieb dich nicht, du liebst mich nicht),” processed in the second way:
Maybe this will turn out to be a better strategy for assigning color to soundwefts than the one I demonstrated earlier. What do you think?
And Now For Something Completely Different
Another approach to making sound vibrations visible—sometimes known as Cymatics—is entirely different from everything I’ve described here so far, both in how it works and in what it shows. Its oldest and most basic form is the Chladni figure, a geometric pattern formed by sprinkling sand on a plate and causing the plate to resonate so that the sand will cluster at the nodes—that is, the points where the plate stays still while the surrounding points are moving back and forth. Different nodal patterns appear at different resonant frequencies, each of which corresponds to a different “mode” in which the plate can divide itself into parts. For a circular plate, these patterns can be characterized in terms of the number of circles and/or diameters they contain, with higher frequencies bringing forth more of these, and with circles being “worth” more than diameters are. Say a plate with one circle (the outer edge) vibrates at 100 Hz; it might have one circle and one diameter at 159.3 Hz, one circle and two diameters at 213.5 Hz, two circles at 229.5 Hz, one circle and three diameters at 265 Hz, and two circles and one diameter at 291.7 Hz. Yeah, the numbers are complicated.
Chladni figures expose the distinctive nodal patterns and resonant frequencies of plates. But there have also been attempts to elicit distinctive nodal patterns from complex sounds. The main cases in point are the Eidophone of Megan Watts Hughes, the Tonoscope of Hans Jenny, the CymaScope of John Stuart Reid, and the Wasser-Klang-Bilder of Alexander Lauterwasser, with later work moving away from plates and membranes towards the imaging of vibrations in liquid.
The CymaScope produces round images that are known as CymaGlyphs and bear a striking (but misleading) resemblance to Mark Fischer’s Aguasonic pictures, with which they actually have nothing in common. If I understand things right, a CymaGlyph depicts a single moment frozen in time, equivalent to an individual point along the time axis of a sound spectrogram. At cymascope.com, you can order a personalized Voice Mandala to hang on your wall, but this would presumably represent something like a vowel sound rather than a whole spoken phrase. A special product they offer for couples is a double Voice Mandala made up of two overlapping Voice Mandalas, “allowing you to visualize your voices as you say ‘I Love You’ to each other”; maybe the CymaGlyphs are created from the “oo” in “you”? The cymascope.com home page states further:
The CymaScope is the first scientific instrument that can give a visual image of sound and vibration in ways previously hidden from view. When the microscope and telescope were invented they opened vistas on realms that were not even suspected to exist. The CymaScope holds the same potential as the microscope and telescope and its applications are beginning to touch a broad range of human endeavor.
However, I’ve searched in vain for any practical discussion of how to interpret CymaGlyphs. I imagine the appearance of a specific CymaGlyph must derive somehow from the nodal patterns associated with resonances of the water surface that happen to correspond with frequencies in the instigating sound. Based on that, I’m not sure we stand to learn anything about a particular sound from its CymaGlyph that we couldn’t determine more readily from a sound spectrogram (at least when working from a recorded sound rather than a three-dimensionally complex “live” sound). CymaGlyphs are certainly attractive to look at, and they illustrate another way in which different sounds can be made to cause distinctive visible patterns of wonderful complexity. But I suspect their patterns may be much like the ones seen in a kaleidoscope, where every different configuration of objects is likewise made to produce a beautifully symmetrical view.
I’ve touched on the CymaScope here because it’s being promoted similarly to the other sound wave art I’ve been describing, but the fact that it doesn’t have a time base arguably puts it out of scope. In a sense, it has more in common with real-time audio visualization, where video sequences are generated to accompany, correlate with, and be responsive to a sound recording while it’s playing, but using time itself to represent time. A CymaGlyph seems to be analogous to an individual video frame.
There’s also one other oscillographic approach to making sound vibrations visible that does have a time base of sorts, but one that isn’t linear as it usually is in a “regular” oscillogram (or a spectrogram). This entails tracing the combination of two different vibrations as they act at right angles to one another, such that one vibration establishes an irregular time base for the other vibration, and vice versa. The best-known case is the Lissajous figure, which presents the relationship between two vibrations at some one particular peak amplitude. An instrument called the Harmonograph does much the same thing, except that it continues to trace the vibrations as they gradually fade away.
I don’t see any obvious way of making a Lissajous figure or a Harmonograph trace from your voice or your favorite song, but no account of sound wave art would be complete without them.
And that’s where I’ll leave my overview of sound wave art for now, although I feel as though I could continue exploring indefinitely. What did I miss? What did I get wrong? Let me know in the comments section below.