What is Paleospectrophony?

I coined the word paleospectrophony in the fall of 2008 to refer to a new method I’d worked out for educing or “playing” historical representations of sound.  After first introducing it on my Phonozoic website (here), I used it during a presentation I gave at the 2011 ARSC conference (here) and again in my book/CD Pictures of Sound, published in 2012 (here), and the last time I checked, googling it turned up just over three hundred hits, including a tweet (dead link) proposing it as the “coolest bullshit-y science word ever,” which I’ll take as a compliment.  But the concept behind paleospectrophony is somewhat tricky, so I thought I’d take some time here to explain just what I mean by it.  By way of illustrating the technique, I’ll also share a few new paleospectrophonic sound files, as well as a couple past favorites that didn’t make the cut for Pictures of Sound.

The first thing you need to understand in order to grasp what paleospectrophony is all about is that audio data can be displayed in two different ways: as an oscillogram or as a spectrogram.  These forms of display are largely interchangeable, and many pieces of audio editing software will let you toggle back and forth between the two, but there’s also a fundamental distinction to be drawn between them.  The sound oscillogram, often referred to as a “waveform,” is a graph of amplitude as a function of time.  At each point along the time axis, it shows a single value corresponding to distance from a rest position: how far the diaphragm in a microphone had moved back or forth under the influence of sound waves passing through the air, for example, or how far the diaphragm in a loudspeaker is supposed to move back or forth in the course of producing a desired sound wave.  By contrast, the spectrogram is a graph of frequency as a function of time.  According to the Fourier theorem, any periodic sound wave—no matter how complex—can be represented as a combination of simple sine waves with particular frequencies and amplitudes.  A sound spectrogram separates out the amplitudes of each of the component frequencies of a sound at each successive point in time—collectively, these make up the sound spectrum—and displays them with frequency tied to spatial position from high to low and amplitude tied to another parameter such as brightness.  Sound spectrograms can be generated mathematically from sound oscillograms, and vice versa, albeit generally with some loss of information (about phase, for example).

oscillogram-vs-spectrogram

The next thing you need to understand is that we’re not limited to looking at such displays of audio data; we can also educe or “play” them as sound.  This is particularly obvious in the case of sound oscillograms or “waveforms.”  After all, if you look closely enough at the groove of a mono LP, you’ll see that it’s nothing more than a graph of this kind, stretched out and coiled into a long spiral, so that a stylus can mechanically pick up its deflections as the disc revolves, and the membrane in a speaker can be made in turn to carry out the same deflections to impart an audible sound wave to the air.  An audio CD or WAV file contains the same kind of information, but encoded digitally—a difference no more or less momentous than the difference between analog and digital images.  LPs, CDs, and WAVs happen to be particularly convenient when it comes to playback, since they’ve been intentionally optimized for that purpose, but even audio waveforms on paper can be educed as sound as long as we can find some way of converting them into a practically playable format.

But sound spectrograms are “playable” too: a reverse Fourier transform can convert them mathematically into sound oscillograms, which can be educed in turn as sound.  Indeed, this is what happens every time you play an mp3, since the mp3 format represents audio internally in terms of frequencies (which invite compression) rather than aggregate amplitudes (which don’t).  More to my immediate point, even spectrograms on paper can be educed as long as we can find a way to convert them into a practically playable format.  By way of analogy, we might say that spectrograms are to mp3s as phonautograms are to WAVs.  I’ve been using AudioPaint myself to convert digital scans of spectrographic images into playable sound files (Photosounder looks promising too, but I haven’t spent much time with it yet).

paleospectrophony-logicAudioPaint interprets each column of pixels as a span of time, each row of pixels as a frequency, and the brightness (or some other parameter) of the pixels as amplitude.  At the center of the simplified illustration shown above, we see two successive columns of pixels in a hypothetical source image; at the sides, we see the sine waves that would be generated for these pixels and then added together mathematically to produce a composite wave.  In a real source image, there would typically be a lot more columns and rows and often a lot more shades of gray as well.

AudioPaint software is actually designed to convert any digital image into a WAV file as though it were a sound spectrogram, regardless of the nature of the image itself.  So, for instance, we can take a signature—

john-hancock—and generate a sound file from it through additive synthesis by interpreting its pixels as values in a graph of time versus frequency, as described above.

 

Of course, John Hancock’s signature wasn’t originally intended to represent sound in this way.  As cool as this kind of thing is as an innovative source for sound art—which is how it’s usually pitched—it’s also pretty arbitrary, much like “playing” a cross-section of a tree by spinning it on a turntable and letting the stylus run amok.  Not that I have anything against it; there’s a lot of room for conceptual creativity here, and I’ve dabbled myself in using AudioPaint to “play” fingerprints, geometric patterns, and so forth.

But it occurred to me back in 2008 that there were many historical inscriptions out there that had originally been intended to represent sound as graphs of time versus frequency, and that we could use this same technology to “play” them in terms more or less consistent with their intended meanings.  That’s the approach I coined the term paleospectrophony to describe: spectrophony because it entails generating sounds from spectrally interpreted data, and paleo because it deals with old, historical inscriptions.  If I’d written up a formal definition at the time, it might have run something like this: paleospectrophony is the use of a reverse Fourier transform to educe historical inscriptions that represent sound graphically in terms of a time axis and a frequency axis, such that the resulting sound bears an audible resemblance to the originally intended content.  As far as I’m aware, nobody else had ever tried to actualize historical inscriptions of sound in this particular way before—although of course I can’t be 100% certain of that, and it’s always hazardous to lay claim to a “first.”  Note that paleospectrophony refers specifically to work with spectrographic or spectrogram-like representations of sound; similar work with waveform representations of sound might instead be called paleokymophony or paleooscillophony.

Old sound spectrograms are perhaps the most obvious subject matter for paleospectrophony; in this case, AudioPaint becomes simply a “playback” technology for recorded sounds.  The practice of sound spectrography originated at Bell Laboratories in the 1940s (the earliest patent seems to be 2,403,997, issued to Ralph K. Potter; it was filed on April 14, 1942, but not granted until 1946), and sound spectrograms themselves can readily be found dating back seventy years or so.  In the early days, creating a sound spectrogram typically involved recording a short sound clip and then playing it back repeatedly through a narrow band-pass filter that could be tuned by increments to each frequency range in a given scale, while the resulting amplitudes were inscribed as variations of darkness on successive passes across a “chart” to build up a larger pattern (imagine an ink-jet printer spitting out a photograph line by line and you’ll have the right idea).  The goal was to produce automatic inscriptions of speech and other sounds that were also visually legible; and, in fact, people can learn to decipher sound spectrograms by eye with sufficient practice.  Here are a few specimens of early sound spectrograms excerpted from Ralph K. Potter, George A. Kopp, and Harriet C. Green, Visible Speech (New York: D. Van Nostrand, 1947):

spectrograms-p268To turn these pictures into sound, I start in the graphic domain by cropping them, inverting them so that greater amplitude corresponds to greater brightness, running a “dust and scratches” filter to conceal the artifacts of halftone printing, and boosting the contrast.

spectrograms-p268-inverted-contrastThen I import the images into AudioPaint and adjust the settings as needed: in this case, the accompanying text specifies that the frequency scale is linear and runs from 70 to 3500 Hz, and the average duration seems to be 1.6 seconds.  And that’s it!  A few moments later, we have audio:

 

Spectrograms created with other settings can sound somewhat different when we run them through the process I’ve just described.  Take this example, also from Visible Speech:

she-was-waitingThis time the scale is exponential rather than linear, and it runs from 250-7500 Hz rather than from 70-3500 Hz, so we need to adjust the settings in AudioPaint accordingly.  If you listen to the results, you’ll notice that the speaker says “she was waiting at my lawn” rather than “she was waiting on my lawn” as in the caption.

she-was-waiting-inverted-contrastThese examples may not represent the height of audio fidelity, but they still show that sound spectrograms are viable “sound recordings,” capable of intelligibly mediating spoken language; and that running them through AudioPaint is a legitimate way of “playing back sound recordings,” just as playing a vinyl LP or an mp3 is.

Automatically generated sound spectrograms date back only to the 1940s, as I mentioned earlier, but much older inscriptions can be found that share the same basic time-frequency graph format, and we can educe these in precisely the same way we educed the examples of speech spectrograms presented above.  That’s what I mean when I say we can play such inscriptions “just as though they were sound recordings.”  Before I go into the implications of the fact that we can do this, I’d like to illustrate the variety and age of materials we can access in this way.

Here’s one example that predates the automatic sound spectrograph by a decade or so.  It represents a painstaking effort to measure and graph out by hand the overtones of a phonographically recorded snippet of spoken language: the phrase “Joe took father’s shoe bench out.”

jasa-3aThe source of this plate is John C. Steinberg, “Application of Sound Measuring Instruments to the Study of Phonetic Problems,” Journal of the Acoustical Society of America 6:1 (July 1934), 16-24, at page 23.  This time, in order to get the data into a form suitable for eduction, we need not only to invert the image from black-on-white to white-on-black, but also to erase some extraneous lines, a time-consuming process I call degridding.  Here’s the result:

jasa-3-dg

Once we start applying paleospectrophony to manual inscriptions in addition to automatically generated ones, we expand the universe of possibilities considerably.  For example, many of the programs contrived by hand for automatic musical instruments can be interpreted as time-frequency graphs.  Consider this illustration accompanying a United States patent issued to Adoniram F. Hunt and James S. Bradish of Warren, Trumbull County, Ohio, on January 9, 1848:

hunt-bradish-originalHere’s the same data inverted from black-on-white to white-on-black, cropped so that the frequency scale spans exactly two octaves, and spliced together so that the last bar line of the top row overlaps the first bar line of the bottom row:

hunt-and-bradish-processed-for-eductionNow we need to infer some appropriate audio settings.  Since the frequency range spans two octaves, we need to set the top value numerically to four times the bottom value.  The sound file below presents two possibilities in turn—125 Hz to 500 Hz, and then 250 Hz to 1000 Hz—and then concludes with both combined together in stereo, with the duration in each case set to 30 seconds.  (I’ll follow this same presentational approach with some other examples that follow.)  The top half of “Auld Lang Syne” seems to be a polished final draft, but the bottom half appears more rough, tentative, and experimental; it looks as though whoever was plotting out the notes made a couple minor mistakes in length and, instead of erasing them, simply repeated the passages with the errors corrected.  The result sounds a bit like someone practicing a piece of music and repeating flubbed parts.


Sometimes we need to manipulate images a little more to get acceptable sound out of them.  Take the sample program illustrated in Claude-Félix Seytre’s French patent of 1842.  (To view the whole patent online, go to this URL, enter the word “autopanphone” in the search box, and click RECHERCHER; then, on the result this brings up, click on the image of the camera under “Voir le dossier.”)

seytre-originalThe time scale (1, 2, 3, 4, etc., at top) runs from right to left, and the frequency scale (ut, ut#, re, re#, mi, etc.) runs from top to bottom.  To get the data into an educible spectrographic format, we need to rotate it 180 degrees, invert from black-on-white to white-on-black, and remove the grid lines, as before.  But there’s one additional step we have to take.  When Seytre was filling in the labels for his frequency scale, he forgot one instance of la#, which threw off the rest of his scale in turn.  Fortunately, that’s easy to fix; we just insert a blank space where the la# should have gone.  You can see the extra line marked in red in the corrected version of the image shown on the bottom below.

seytre-processed-for-eductionNow we just need to choose a duration and frequency range.  I picked 20 seconds and tried two different frequency ranges: 116-464 Hz and 232-928 Hz.  The sound file presents the two frequency ranges separately and then both of them together in stereo.

seytre-notationThe notation Seytre furnished for his sample tune shows that he made a few other mistakes when drawing up his program, in addition to skipping la# in the scale itself—if I’m analyzing and counting things right, there are discrepancies with note 12 (up an octave), an “extra” note added between 20 and 21, note 23 (flatted), and note 36 (sharped).  But what we hear is what Seytre actually programmed, warts and all, rather than what he meant to program.  Indeed, the mistakes are probably easier to hear than they are to see.

In the same spirit, let’s consider this sample of barrel organ programming from Elysium Britannicum, a manuscript by John Evelyn held by the British Library:

elysiumAs it happens, Evelyn had copied this illustration from a plate in Athanasius Kircher’s Musurgia Universalis (1650), but he hadn’t done a very careful job of it and ended up entering a number of notes incorrectly.  If we educe it paleospectrophonically, the result sounds like an automatic musical instrument in a state of disrepair:

elysium-processed-for-eduction
For comparison, you can hear Athanasius Kircher’s original version of the same barrel organ program at the start of “Magia Phonotactica” (track 17 on Pictures of Sound), which is also freely available for listening on SoundCloud (dead link).  Of course, Evelyn’s garbling of Kircher’s work could also be detected by analyzing the illustrations visually, but that would be more difficult, and also a lot less entertaining.

But this brings up an important question about the ethics of paleospectrophony.  How far is it desirable or acceptable to go in “correcting” source material, and at what point does this cross a line to become “cheating”?  If I hadn’t inserted a blank line for the missing la# in Seytre’s patent plate, the relationships between all the other notes would have been thrown off, and the results wouldn’t have sounded very nice.  There’s also no question that we’re dealing with a mistake, and that Seytre hadn’t intended to leave out a step in his scale; he even includes la# in other octaves.  Overall, I feel pretty comfortable with that adjustment.  But what about Seytre’s other apparent mistakes?  Would it be OK to “fix” errant notes to conform to the music in the accompanying conventional notation?  Should I have snipped out the flubbed passages in the second half of Hunt and Bradish’s “Auld Lang Syne”?  Would it have been better for me to nudge the sour notes in Evelyn’s barrel organ program discreetly into their proper places?  Personally, I don’t think I want to go there.  Correcting a scale is one thing; correcting individual notes strikes me as more intrusive and subjective.  I’d prefer to hear the glitches as glitches.  After all, what’s attractive about paleospectrophony in the first place is the indexical, causal relationship it creates between the sounds we hear and the original inscriptions on which they’re based.  If we tamper too much with the inscriptions, we jeopardize that relationship.

I’m less sure how far I’m comfortable taking the manipulation of frequency scales.  The dilemma becomes especially acute when we extend paleospectrophony to another type of source material, which I’d like to discuss next: medieval musical notation.  Much medieval musical notation was built on a graph-like framework in which vertical coordinates represented pitch and horizontal coordinates represented time.  This is also true of modern staff notation, but medieval notation tended to approach more closely to a “pure” graph, with fewer extraneous conventions overlaid on it (such as sharps and flats, or note durations indicated by shapes, beams, and flags: whole note, half note, quarter note, eighth note, sixteenth note, etc.).  Paleospectrophony enables us to treat these inscriptions exactly like graphs and to play them just as though they were sound spectrograms.

Here’s an excerpt from St. Gallen, Stiftsbibliothek, Cod. Sang. 383, a musical manuscript of the thirteenth or fourteenth century, available digitally here.

kyrie-eleison-originalThe distance separating two bar lines—corresponding to four notes—is 168 pixels, so an octave (spatially divided into seven gradations) corresponds to roughly 294 pixels, and two octaves correspond to 588 pixels.  With that in mind, let’s arrange the notation into a single line in an image file 588 pixels high, with the “C” line positioned exactly halfway between the top and bottom of the image, like this:

kyrie-lined-upNow we invert the image from black-on-white to white-on-black and erase the bar lines as well as other extraneous markings (such as words and the lines that show relationships between notes but don’t themselves represent notes). We’re then left mostly with a sequence of squares that resembles the programs for automatic musical instruments we examined earlier in terms both of what it looks like and what it signifies.  One complication, though, is the “liquescent” neumes that sometimes appear above the ei of eleison and look a bit like arches or hooks.  These neumes apparently indicate a certain fluidity of tonal movement in connecting sung syllables, but I gather that there’s some uncertainty over just what that means soundwise, so for this experiment we’ll simply leave them “as is,” with all their parts intact.  We end up with this:

kyrie-eleison-version1Here’s that image educed into audio with arbitrarily chosen frequency scales of 150 Hz to 600 Hz and 300 Hz to 1200 Hz and the duration set to 90 seconds:


The direction of the melody heard here—both in pitch and sequence—is the same as that of the melody the creator of the inscription originally had in mind.  On the other hand, I’ll concede that the scale for both parameters is problematic.  Let’s start by considering the frequency scale.  First of all, the lines in the manuscript aren’t exactly straight, and the distances between them aren’t precisely equal, and the notes aren’t neatly centered on or between them.  Depending on scribal sloppiness, the “same” note may not always fall in precisely the same vertical position, and these discrepancies come through in the eduction.  But say the spacing were equal and consistent; that still wouldn’t match the conventional intervals of Western music, where the distances between B and C and between E and F are both a half step rather than a full step.  The octave is divided here spatially into seven segments, so the sound file presented above features a (more or less) equally tempered heptatonic scale.  (That’s actually also true of the “Auld Lang Syne” example presented earlier.)  To approximate the “right” scale it would instead need be divided into twelve segments, which for an equal-tempered scale would all be equal in height, with the seven notes of the scale staggered among them.

We can try to fix the scale in various ways.  I held back from doing this in Pictures of Sound because I felt it would be “cheating,” but since then some of the medieval music people I’ve spoken with have assumed this is something that should be done, so I’ve relented and will demonstrate a method here which I find relatively unintrusive.  In the original scanned image, each of the seven gradations into which the octave is divided corresponds to around 42 pixels of height.  Most of the intervals are a full step, so let’s say that 42 pixels equal a full step and that 21 pixels equal a half step.  Now let’s split the notation into sections divided at each of the half-step intervals (between B and C, and between E and F): in the top image below, the sections are tinted yellow, blue, white, and orange.  And then let’s lower these sections by 21 pixels at each juncture, corresponding to half a step, as shown in the bottom image below.

kyrie-eleison-version2

That’s a bit better to my ear, musically speaking. But we’re still faced with the fact that, although the horizontal axis represents time, it doesn’t necessarily do so at a consistent rate as it does in the case of the programs for automatic musical instruments.  In some cases, for instance, the spacing of notes is determined mainly by the desire to coordinate them with words written underneath.  As a result, my automatic eduction sounds somewhat like a person noodling around absent-mindedly on a keyboard.  Gary Galo quite reasonably took me to task for this in his review of Pictures of Sound in the Fall 2013 ARSC Journal:

Where Feaster has gone awry, in treating these musical scores as recordings, is in “moving” these scores at a linear speed.  Groups of notes that are closely spaced proceed more rapidly than notes that are spread apart on the page, and the physical spacing between the notes is used to determine the length of time between the end of one note and beginning of the next.  This simply isn’t the way musical notation works: the spaces between the notes are unrelated to the musical pulse.

I don’t contest any of this.  But my goal with paleospectrophony isn’t really to produce aesthetically acceptable renditions of the music per se; live modern performances are quite capable of doing that.  Rather, I’m interested in actualizing inscriptions as far as it’s possible to do so in any given case.  The “musical pulse” isn’t explicitly encoded in inscriptions such as these (as far as I’m aware), and my understanding is that rhythm is one of the most controversial facets of medieval music.  I still doubt the rhythm heard here is “right” in terms of matching the traditions of the time, and its specifics may often be accidental, based on irrelevant scribal choices; but it at least has the virtue of not being influenced by any subjective modern notions of what the music ought to sound like.  Moreover, the arbitrariness of the time dimension might make it less objectionable for us to play around with it by adding time-based effects such as echo:


We can push things yet further back in time to the Daseian notation of the ninth century.  Here’s a specimen from a manuscript at the Staatsbibliothek Bamberg dated to the end of the tenth century, available digitally here:

tu-patris-originalWe’re on more solid ground here with rhythm because the notation involves writing the words to be sung on lines corresponding to the notes on which they’re to be sung; longer words contain more letters and so take up more space along the time axis than shorter words.  The example shown here spans exactly one octave in height, so we crop it, invert it to black-on-white, set the duration to 10 seconds, and specify frequency ranges of one octave (175-350 Hz and 350-700 Hz in stereo, and then 350-700 Hz and 700-1400 Hz):


Before I continue, I just want to point out that what you’re listening to here is an inscription that’s over one thousand years old, converted into sound just as automatically as though it were an album of modern electronica: the medieval monk meets the synthesizer.

Without further adjustment, however, we once again get an equal-tempered heptatonic scale, which isn’t what the notation was designed to represent.  Daseian notation has its own peculiar system of full-step and half-step intervals (a repeating pattern of four notes that doesn’t repeat at the octave), but for the range of notes actually used in this example, it happens to match the familiar major scale:

daseian-keyThe image we started with is 744 pixels high, so each of the seven gradations making up the octave occupies 106 pixels.  Half of that is 53 pixels.  So we can lower the marks corresponding to F, G, A, and B (tinted blue in the top image below) by 53 pixels and trim 106 pixels (two half steps) from the overall height of the image to give the intervals their approximate intended values.

tu-patris-adjusted

And here for convenient comparison are the uncorrected and corrected versions played once apiece in quick succession, with all the frequency ranges mixed together and some echo added:


Frankly, I prefer the uncorrected version, particularly at the word “es,” even if it’s technically wrong.  Indeed, I find that paleospectrophony often yields more interesting and (to me) aesthetically pleasing results when subject matter doesn’t conform in some respect to the principles of the “sound graph,” even at the point where it’s fed into the software for eduction.  For example, here’s another specimen of Daseian notation, this one from another manuscript of at the Staatsbibliothek Bamberg circa 1000 AD, available online here:

scande-templa-celi-uirgo-originalThis time, the scribe seems to have made no effort to line up the singing parts along the horizontal axis, even though the words “Scande celi templa uirgo digna tanto foedere” were presumably to be sung simultaneously by all voices.  Some successive syllables are even shown one right above the other, as with the final two syllables of foedere.  In other words, we’re dealing with quite a mess.  I suppose I could try to fix it by lining up the words horizontally.  But let’s instead educe it just as it is, duration set to 20 seconds, played once in mono with the range set to 200-800 Hz, and then twice in stereo: first with a channel down one octave (100-400 Hz), and then with a channel up one octave (400-1600 Hz):

scande-templa-celi-uirgo-processed


I doubt that’s much like what the scribe intended to encode a thousand years ago, but it’s still consistent with the logic of the inscription with its loose time and frequency axes, and it happens to sound extremely cool.  If I didn’t know what it was, I might have guessed it was a heavily noise-reduced field recording of a vernacular American musical saw duet of the 1930s.

A quick note about timbre is in order.  One of the less sympathetic reviewers of Pictures of Sound had this to say about the paleospectrophonic examples: “Feaster’s decision to render all of the music, regardless of its source, in the same Wurlitzer-like keyboard voice makes it sound like a snippets heard in an organ showroom.”  To be clear: the timbre (apart from octave-doubling in stereo channels) doesn’t represent any subjective decision on my part, any more than the sound of the voice of the person saying “She was waiting at my lawn” does.  Rather, it’s based strictly on the details of the inscriptions themselves, and mainly on the width and shape of the written marks.  The “Wurlitzer-like keyboard voice” is essentially a pure tone without much timbral complexity—the result of educing an inscription that doesn’t really include that kind of data.  What we’re hearing is what’s there on the page to hear, without any arbitrarily-chosen synthesizer voice added in.

The last several inscriptions we’ve examined represent sound, and specifically musical sound, but were drawn manually rather than “recorded” from life.  On the other hand, we can also use a similar technique to educe inscriptions as sound that don’t represent sound per se, but that still represent “recorded” rhythms or pulses that can be rendered meaningfully as sound.  For example, I’ve applied AudioPaint to the record produced by Samuel Morse’s famous telegraphic transmission of the words “What Hath God Wrought” from Washington to Baltimore on May 24, 1844.  (For more details about this episode in telecommunications history, see here.)  To give the effect of the transmission being received by a 1000 Hz oscillator, as might be done today, I simply set the frequency range from 1000 Hz to 1000 Hz, with a duration based on the average transmission speed reported at the time (only part of the source inscription is shown here).

what-hath-god-wrought1what-hath-god-wrought3

AudioPaint also gives us the option of using a sound sample other than a simple sine wave as the basis for its additive synthesis.  Here, for instance, is a record of the breath impulses accompanying the spoken words “Peter Piper picked a peck of pickled pepper” as recorded by William Henry Barlow on his logograph and published in 1874, educed using a white noise sample (to convey an impression of fluctuations in air pressure) instead of a sine wave as the sound sample:

barlow-logogram

I believe that covers all the main technical strategies I’ve attempted so far in connection with paleospectrophony.  In fact, I’m not sure whether the last two cases (the telegraph message and the breath-impulse record) should even count as paleospectrophony proper.  They’re examples of a closely related technical process, and they serve a similar purpose in bringing mute inscriptions to life, but I think they’re better classified as cases of sonification, or cases of using sound to represent things other than sound; even if Morse code and variations in air pressure often manifest themselves audibly, the data we have here isn’t really about that aspect of them.  By contrast, paleospectrophony uses sound to represent sound.

And that’s why I find it so conceptually provocative.  Many of the examples I’ve shared here don’t involve sounds “recorded” from life.  But “recording and reproduction” is only one facet of the culture of phonography, as a little reflection will show: after all, many so-called “recordings” these days are actually synthetic works created through a leisurely programming process.  If there’s a common denominator to it all, it seems in practice to have more to do with conventions for inscribing and actualizing data about sound than it does with recordedness from life.  And those conventions are rooted largely in the use of graph-like coordinates to represent sound as a function of time, whether in terms of amplitude (on a vinyl LP or in a WAV file) or in terms of frequency (in a spectrogram or mp3).  The playback of recorded sounds dates back only to 1877, but our graphically coordinated conventions for representing sound (e.g., time runs from left to right; “up” on the page means “up” in pitch) date back much further, and these are part of pre-phonography just as surely as the convention of the image sequence is part of pre-cinema.  I don’t claim that “playing” barrel organ programs or medieval musical notation just as though they were modern sound recordings is a superior way of rendering them audible, and my goal isn’t to enter into competition with historically informed performance practices or lovingly restored automatic musical instruments.  But by showing that we can treat the inscriptions in this way, and that the results bear an audible resemblance (however tenuous) to the intended content, I hope to draw attention to some long-term continuities in the ways we represent sound on paper, in language, and in our minds, and to do so more vividly than if I were merely to write about them.  How better to make the argument that thousand-year-old Daseian notation has fundamental similarities with the mp3 format than to play it as though there were no difference at all between the two?

Further Reading and Listening

  • Patrick Feaster, “Phonogram Images on Paper and the Frontiers of Early Recorded Sound, 1250-1950,” Association for Recorded Sound Collections annual conference, Los Angeles, California, May 12, 2011, on YouTube.
  • Patrick Feaster, Pictures of Sound: One Thousand Years of Educed Audio (Atlanta: Dust-To-Digital, 2012), available here; see also reviews here and here.
  • Patrick Feaster, “On Sound-Graphs: A Coordinated Look at Sonic Artifacts,” in Art or Sound, ed. Germano Celant and Chiara Costa (Venice: Fondazione Prada, 2014), 81-86, available here.  Also translated into Italian as “I grafici del suono. Una panoramica dei manufatti sonori,” Ibid., 468-471.
  • Glenn Fleishman, “Ancient audio: The written sound,” The Economist, 12 July 2011.
  • Dietmar Ostermann, “Bismarcks Stimme ist nur der Anfang,” Badische Zeitung, 4 February 2012.