You probably know that the phonautograms of Édouard-Léon Scott de Martinville are the world’s oldest records of airborne sound, and that we can play them back today as audio. But we can reconstruct another kind of motion from them as well. Many phonautograms contain all the information we need to assemble movies with sound in which voices and other physical movements documented together back in the year 1860 can be reproduced synchronously in real time.
Now, when I say “movies,” I don’t mean to suggest there’s some way to recover actual photographic image sequences of people talking and singing into the phonautograph. That would be crazy talk, and this is Griffonage-Dot-Com, not a White House press conference. What we can do, though, is create video displays of the rate at which the cylinder of the phonautograph turned during recording. This is possible because many phonautograms have tuning-fork traces that vary in width based on the original angular velocity of the cylinder: bunched up where it was slower, spread out where it was faster. Once we’ve extracted velocity data from those traces, we can then use it to control an animated model. This is no different in principle from any other use of motion capture in modern cinema: we can see the motion of a phonautograph cylinder “reproduced” onscreen in precisely the same sense in which we see the animated Na’vi in Avatar “reproduce” the recorded movements of actors through CGI. Of course, the movement is a lot simpler in the case of the phonautograph.
When Scott carried out his experiments, it’s likely that he personally voiced his records and cranked his machine without help from anyone else. If that’s the case, we can now listen to him sing and talk while watching how he simultaneously turned a handle around in a circle with his hand. But even if the two tasks were performed by two different people, they would at least have been carried out at the same time: we’d see how one person was turning a cylinder while another person was talking or singing a few feet away. Either way, we’d have a complex audiovisual reproduction of a moment in time from the year 1860.
And that puts us in rarefied company. The Dickson Experimental Sound Film from late 1894 or early 1895 is the oldest known photographically captured motion picture with simultaneously recorded audio. Through it, we can see and hear W. K. L. Dickson play a violin while two men dance nearby to the music. For many cinema scholars and aficionados, I’m sure the element of photographic motion capture is essential to the Dickson Experimental Sound Film‘s status as a historical first. But if what’s at stake is seeing and hearing a synchronized “reproduction” of complementary phenomena recorded together from life, then my phonautographic video beats the Dickson Experimental Sound Film by some thirty-five years.
In its best-known form, Scott’s phonautograph concentrated sounds onto a membrane with a stylus attached to it that traced its movements as a wavy line onto a sheet of soot-blackened paper wrapped around a rotating cylinder. Because Scott turned the cylinder by hand, its rotational speed was distinctly irregular. Starting in 1859, however, his phonautograph was equipped with a tuning fork that could trace its vibrations alongside the “main” records of airborne sound as a speed reference. When we play back these tuning fork traces today, they furnish us with pilot tones we can use to correct for speed irregularities in the records of speech and song they accompany. That’s how my colleagues and I in the First Sounds initiative have been able to take raw phonautograms like this one—
—and speed-correct them into something audibly intelligible, like this:
In the past, we tried using off-the-shelf software for the speed-correction step, but for a few reasons—which I may spell out in a future post—this has never been completely satisfactory. Our results were okay, but not optimal.
More recently, I’ve started doing some coding of my own in GNU Octave, building up a little suite of tools for doing things with sound I can’t find any other way to do. One of the functions I’ve written will speed-correct a recording automatically to a pilot tone, and the corrected example I shared above—from Scott #44, “Au Clair de la Lune,” dated April 20, 1860—is an example of what it can do. Meanwhile, I’ve also created another piece of code to make visual animations from the speed-correction data. Here’s the result of my first experiment along those lines: an animated GIF based on the same phonautogram as the above sound files.
The meter at top center indicates surface velocity; the upper right display shows a frequency curve for the tuning fork pilot tone; and the bottom presents an oscillogram of the same tone. But it’s the rotating crank at upper left that I’d like to focus on here. It shows exactly how the phonautograph cylinder must have been turned in order to account for the variations in the tuning fork trace.
By combining animations like the one shown above with the corresponding audio, we can reproduce voices and cranking movements in perfect audiovisual synchronization. And that lets us observe something very cool. In a presentation I gave about “Phonogram Images on Paper” at the ARSC conference in Los Angeles on May 12, 2011, I pointed out that the tuning fork traces of Scott’s phonautograms of “Au Clair de la Lune” showed that the cylinder had been cranked, at least in part, to the beat of the singing (check out the video here, roughly nine minutes in). We can now see that correlation a lot more vividly. Here’s the April 20, 1860 phonautogram of “Au Clair de la Lune”:
For comparison, here’s a similar video of Scott’s earlier April 9, 1860 phonautogram of the same song.
I’d mentioned in my earlier presentation that the rotation of the cylinder had slowed down in the earlier case—maybe to a complete stop—between strophes (“Au clair de la lune, mon ami Pierrot / prête-moi….”), and speculated that Scott might have paused to see how much space he had left for recording.
Now that we have the rotational videos available to watch, it looks to me as though the manual rotation on April 20th was smoother and more proficient in general than it had been on April 9th. So had Scott (or his designated phonautograph-cranker) grown more comfortable with the process by the 20th? Or were different people turning the cylinder in the two cases? Could the cranker have been standing at a different angle relative to the machine on the 9th, making the motions more awkward? Or did the axle need oil? Does the rough correlation between the “pulses” of cranking and singing point to Scott himself having done both things at once, getting into a rhythm as he did so? Which direction is “up” on the cylinder, and which is “down”?
I can’t yet answer these questions, but it’s nice to be in a position to ask them.
Incidentally, these aren’t the first videos to have been based on phonautograms. In a previous Griffonage-Dot-Com post, I showed some actual images from another phonautogram as though they were being displayed on an oscillograph screen:
The width of the image in this example was even inversely proportional to rotational velocity: the image gets narrower when the cylinder turned more quickly and wider when it turned more slowly. But I’d still argue that by animating the rotation of the cylinder, we’re entering into provocative new territory. Before, we were just displaying our audio data in a novel way by tapping the convention of the oscilloscope. Now we’re using it indirectly to reconstruct something else altogether.
As a next step, we could perhaps attach our reconstruction of cylinder rotation to a physical model of the human hand, arm, and shoulder, and see how patterns of speed fluctuation correlate with the flexion and extension of specific muscles.
Technical postscript. The cyclical analysis of phonautograms is inevitably a bit imprecise due to irregular stretching and cutting of the paper. For these examples, I’ve simply applied the average of the lengths of three typical rotations to the un-speed-corrected audio from Pictures of Sound. Another dilemma has been what to do with the very beginning and end of the trace, where the vibrations are either missing or too weak or bunched-together for accurate analysis. For the animated GIF, I just took the auto-detected peak frequencies “as is”; for the videos with sound, I manually reset the beginnings and ends to the first “good” frequency detected. I’m still not quite sure how best to handle these segments, but for most of the trace, there’s no problem on this score.