By far my most popular post here to date has been “How to ‘play back’ a picture of a sound wave,” an article about converting images of sound waves on paper into playable sound files. But I arguably left out an important final step which I’d now like to discuss.
In my previous article, I took for granted that the shape of a record groove or phonautogram trace means the exact same thing as the shape of a digital waveform display, and that if the two shapes match after conversion, then we’ve done our job. But it’s since come to my attention that this isn’t quite true. The waveform display and the record groove are both graphs of amplitude over time, but they’re not technically graphs of the same kind of amplitude. To create a playable digital file which corresponds as closely as possible to the audio represented by a given waveform, then, copying the shape of the wavy line isn’t necessarily enough; we may also need to convert the one kind of amplitude into the other. I’ll describe and demonstrate a simple method for accomplishing this below. Afterwards, I’ll also explore the question of whether this is something we should always feel we need to do, or whether we might ever want to skip this step and stick with the more straightforward kind of results I’ve generated before.
But first, what are these two kinds of waveform that represent two different kinds of amplitude, and how do they relate to and differ from one another?
On one hand, a record groove represents the displacement amplitude of a stylus. What this corresponds to acoustically became somewhat complicated with the rise of electrical recording technologies in the 1920s (more on that below), but with acoustic gramophone discs of the sort commonly made up until that point, it corresponds in principle to the displacement amplitude of a sound wave—that is, how far removed a vibrating particle is from its rest position at each moment in time. During the acoustic recording process, a membrane gets displaced along with the surrounding air particles (acting something like a piston) and conveys those displacements to the recording stylus, and in playback it’s caused to move back and forth by the displacements of the recording stylus and conveys those displacements to surrounding air particles (again, acting something like a piston). There are limits to the fidelity with which this all happens, but it’s only the underlying principle I’m interested in pinning down at the moment.
By contrast, a digital waveform display represents the voltage amplitude of an electrical signal, and loudspeakers are conventionally designed to transduce voltage amplitudes as sound pressure amplitudes. That’s why we test loudspeakers to check whether the sound they generate from the same voltage consistently yields the same reading on a sound pressure level meter. Sound pressure is proportional not to particle displacement itself, but rather to the rate of change of particle displacement: the particle’s velocity, that is, or distance moved divided by time. In an electromagnetic cartridge such as you’d find in a better modern turntable, the movement of a stylus in a record groove causes a magnet and coils to move relative to each other, inducing a current with a voltage proportional to the transverse velocity of the stylus: how fast it’s moving back and forth perpendicular to the direction of rotation. If the lateral excursions of the record groove represent particle displacement, then the stylus velocity and output voltage sent to a loudspeaker should both correspond in turn to sound pressure, so this all works out.
The amplitude axes of these two kinds of waveform represent different variables, but at the same time those variables are closely related and interconvertible. If v equals sound pressure amplitude, or the voltage of an audio signal, or the velocity of a stylus; and d equals particle displacement amplitude or stylus displacement amplitude; and f equals frequency; then v=df and d=v/f. Note that sound itself is neither one nor the other of these variables: it’s a pressure wave with sound pressure amplitudes and particle displacement amplitudes, and both of those parameters can be measured and represented graphically with equal legitimacy.
There’s some standard terminology available for distinguishing the two kinds of waveforms I’ve been discussing. It’s usually framed in terms of electrical audio signals, but at the risk of unduly conflating electrical and acoustic parameters, I’ll try to explain it here in terms that also make sense in the context of acoustic recording and playback. The acoustic gramophone record groove has what’s called a constant-velocity characteristic. What this means is that for any given sound pressure level, the stylus will have the same velocity, just as a vibrating particle would—it will move the same total distance during a given span of time. But because d=v/f, as frequency increases, displacement decreases. If the sound pressure level holds steady, the displacement of a vibrating particle is reduced by half (-6 dB) each time the frequency doubles, and the same holds true of the amplitude of the excursions of an acoustic record groove: the particle or the stylus will move only half as far away from its rest position as before, but it will do this twice as many times within a given time span, covering the same total distance. Think of it this way: the faster something vibrates, the less far it has to go each time to impart the same sound pressure to the surrounding air. By contrast, the digital waveform display has what’s known as a constant-amplitude characteristic: for any given sound pressure level (understood in this case as equivalent to voltage), the waveform will show the same amplitude, regardless of frequency. If you want to know the sound pressure level at any given point in time, just look at how far the waveform is from the zero crossing and you’ll know.
I can also summarize the distinction a little more simply. In a constant-velocity audio waveform, the amplitude axis represents the amplitude of particle displacement at each moment in time. In a constant-amplitude audio waveform, it instead represents the amplitude of particle velocity at each moment in time, which corresponds to sound pressure. Got it?
Below is a graph showing how the two characteristics would differ for the same sound pressure level over a rise in frequency of one octave, with the top corresponding to an ideal acoustic gramophone record groove and the bottom corresponding to a digital waveform display. Note that there ought to be a consistent ninety degree phase shift between the two, which I’m not sure comes through clearly in my graphic: where one waveform crosses the zero point, the other should be at a peak or trough, and vice versa. That’s because rate of change is greatest where displacement is least, and displacement is greatest where rate of change is least. This issue shouldn’t affect relative phase, and it shouldn’t entail any audible difference.
In my book/CD publication Pictures of Sound: One Thousand Years of Educed Audio, 980-1980, and in all my other past work converting waveforms on paper into playable sound files, I didn’t take into account the factors I’ve just ourlined because, to be honest, I wasn’t yet aware of them. I simply created WAV files that had the same shapes as the waveforms on paper did, whether those were Scott phonautograms or ink prints of gramophone discs. The voltage amplitudes in my WAV files corresponded to the displacement amplitudes in my source waveforms, and for gramophone discs, phonautograms, and so forth, those should correspond in turn to particle displacement amplitudes.
But loudspeakers are conventionally designed to transduce voltage amplitudes as sound pressure amplitudes, not particle displacement amplitudes. Thus, if we want to convert the shape of a phonautogram trace or an acoustic record groove into a WAV file according to the logic which loudspeakers actually use to generate sound, we need to convert the displacement amplitudes into velocity amplitudes. I first learned of the importance of this step from a presentation by Carl Haber in which he described carrying it out in connection with the IRENE system for recovering audio from grooved carriers by means 2D or 3D optical imaging (for a published account of the reasoning behind the step as applied to acoustic recordings, see Appendix A2 here). Below is a pair of before-and-after examples from their informative and interesting project website; you’ll hear the displacement-based WAV first and the velocity-based WAV second. The selection is “Goodnight Irene” as performed by the Weavers, the recording that originally inspired their choice of IRENE as the project acronym.
In essence, this step entails creating a WAV file in which the value of each sample equals the difference in displacement amplitude between two successive samples in the source data, corresponding to the stylus velocity between two successive measurements (as long as we understand the time between samples to be constant). On the face of it, this is a pretty simple calculation to carry out. But how can we go about accomplishing it practically using readily available audio-editing tools? If we were simply concerned about equalization, we could just apply a +6 dB per octave slope to a displacement-based WAV and get something that sounds like a velocity-based WAV. But FFT filters aren’t as precise as a straightforward calculation of differences would be, and as we’ve seen, there should technically also be a ninety-degree phase shift. With that in mind, here’s a more defensible technique I’ve come up with for converting a constant-amplitude WAV into a constant-velocity WAV using ordinary sound-editing software—one that effectively does end up calculating actual differences in successive measurements of displacement amplitude.
- Copy the mono source onto the clipboard.
- Create a stereo target file.
- Paste the source into the left channel of the target file.
- Place the cursor into the right channel of the target file, one sample in, and paste the source there too. The right channel will now contain the same data as the left channel, but offset forward along the time axis by a single sample.
- Subtract the left channel from the right channel. (Or invert the left channel, and then add the left and right channels.) Now each sample in the target file will represent the difference between two successive samples in the source file. Velocity is distance traveled divided by time, and the time between each pair of samples should be the same (disregarding any variations in recording speed, which lie outside the scope of this process), so the difference between two successive samples amounts to velocity.
As with most of my techniques, this is a roundabout way of doing something that anyone with programming chops could surely handle a lot more simply. But it works, and I’m only chagrined that it took me so long to think of trying it.
So let’s try it out. Below is a waveform of the phrase “with my little eye I saw him die,” traced onto paper by Edward Wheeler Scripture from a commercial gramophone disc and published in his Researches in Experimental Phonetics (1906). In the accompanying sound file, you’ll first hear the version of the recording included in Pictures of Sound, at the beginning of track three: a WAV file with samples representing the displacement amplitudes graphed visibly on the paper. Then you’ll hear a second version of the recording with the displacement data converted into velocities using the technique outlined above.
As we’d expect, the audible difference here is comparable to the one we heard between the two versions of “Goodnight Irene.”
So why do the two versions sound so different? And why do they have the specific differences they do?
Comparing them visually offers some useful insight. Here’s a graph comparing displacement and velocity spectra for a sound file based on an early ink print of a Berliner gramophone disc. The darker purple shows the velocity spectrum, while the area shown in periwinkle (or light purple) shows the difference between that and the displacement spectrum.
The periwinkle represents the data in its original form (displacements), while the darker purple represents data derived secondarily from it (velocities). The vertical (amplitude) scale is the same in both cases, and the levels weren’t adjusted after conversion. We can readily see that overall amplitude values are lower in the derived velocity data than in the original displacement data, as is the overall range of possible values, and hence the overall amplitude resolution, corresponding to dynamic range and PCM bit depth. As this example illustrates, converting from displacements to velocities will entail some loss of dynamic range and a commensurate worsening of the noise floor. More importantly, the amplitude of the very highest frequencies remains more or less unchanged while the amplitude of lower frequencies drops at a rate of -6 dB per octave. This means that in relative terms the upper frequencies end up “boosted,” resulting in a much flatter noise characteristic than before, although in absolute terms it’s actually the lower frequencies that have been attenuated. Overall, the raw results will thus tend to sound brighter (more treble), thinner (less bass), noisier (increased hiss plus worse noise floor), and quieter (pending any normalization we choose to carry out).
There’s a further trade-off between amplitude resolution and sampling rate which warrants our attention. The higher the sampling rate, the smaller the differences in displacement amplitude are between successive samples, and the smaller the corresponding velocity amplitude values become. The native sampling rate of the displacement-based WAV files I’ve been creating from Scott phonautograms, where one sample corresponds to one pixel column in the source image, is 44.1 kHz (before speed correction, but it’s in that ballpark after speed correction as well). If we convert one of them into a velocity-based WAV file at that sampling rate, the resulting amplitude values are comparatively low. But if we downsample the source WAV before conversion, the resulting amplitude values increase (at the expense of time resolution). I’ve illustrated the point below using the speed-corrected voice track from Scott 49, a phonautogram of the “Chanson de l’Abeille” dated September 15, 1860. In both the image and the audio file, the amplitude levels generated through calculation have been preserved, so the scale remains consistent throughout.
At higher sampling rates than 44.1 kHz, the amplitude values of the velocity data would decrease even further. Bottom line: whatever we start out with for the amplitude resolution of our displacement data, the amplitude resolution of the resulting velocity data will be significantly lower, and the higher our sampling rate, the greater this discrepancy becomes.
Next, to try to understand what’s going on here a little further, let’s consider what would happen if we were to offset our two channels by more than one sample before subtracting the one from the other. If we were to offset the channels by two samples, we would end up with a target file in which each successive sample represents the difference between the pair of samples to either side of each successive sample in the source file, like this:
Each sample in the target file would then represent a distance traveled at the same velocity during twice the length of time as before (1/22050 of a second instead of 1/44100), potentially doubling the peak amplitude level. And increasing the offset between the channels could amplify this effect even further. Sound promising? Well, here’s Scott 49 again, processed this time at a consistent 44.1 kHz, but with increasing sample offsets (be cautioned that the audio is a bit monotonous to listen to):
Of course there’s no free lunch here, as you may have guessed. We’re again boosting amplitudes only at the expense of time resolution, and we’re simultaneously introducing some troubling distortions due to phase relationships. At ten samples offset, for example, we’re subtracting samples from samples positioned exactly 1/4410 of a second before (10 divided by 44100, presuming a 44.1 kHz sampling rate), which means that a 2205 Hz signal will be doubled (subtracting a trough from a peak) and a 4410 Hz signal will cancel itself out (subtracting a peak from a peak), with the pattern repeating on up the frequency axis from there. At lower sample offsets, this distortion at least gets displaced up into higher frequencies, where it ends up mainly affecting the noise signature. Thus, with a two-sample offset we should expect a doubling at 11025 Hz and a canceling-out at 22050 Hz (the Nyquist frequency).
Indeed, this phase problem—now that we realize it exists—actually appears to be inherent in the conversion process itself. Even with the minimum one-sample offset, we should still expect a doubling of amplitude at the Nyquist frequency. Since the magnitude of the increase can be predictably calculated, we could always try to counteract it through reequalization. Or if we wanted to address the effect more directly, we could try upsampling our source to (say) 96 kHz before differentiating the displacements into velocities. The bottom frequency doubled by phase interactions would then be raised to 48 kHz, well above the highest frequency represented in the source data. Unfortunately, this approach would also give us even lower velocity amplitude values, and hence a worse dynamic range and noise floor, than before. The information we have available to us for calculating velocities is finite, and information theory is an unforgiving beast.
We can identify three complementary factors in the process of conversion into velocities that will predictably result in exaggerated high-frequency noise:
- The +6 dB per octave boost, which this approach shares with the raw output of electromagnetic cartridges and traditional “flat” transfers.
- Another gradual increase in amplitude as we ascend the upper part of the frequency scale, culminating in a doubling at the Nyquist frequency, due to phase issues.
- The reduction in overall range of absolute amplitude levels, with commensurate worsening of the noise floor. Note that the upper frequencies with the least displacement amplitude resolution in the source tend to end up with the most velocity amplitude in the target. A signal at 48 kHz in a target file would be based on just 1/256 of the original displacement amplitude of a signal with the same velocity amplitude at 93.75 Hz. Depending on the resolution of a source waveform image, that could correspond to only a pixel or two.
Sometimes the results of the conversion into velocities sound positively “brighter.” But if we’re dealing with a file that’s already conspicuously noisy before conversion into velocities, the process will yield a result that’s even noisier. Here’s Emile Berliner singing “Reiters Morgenlied,” excerpted from a paper print of a gramophone recording made on November 11, 1889, included as track nine on Pictures of Sound, starting with the displacement-based version, and concluding with the velocity-based version.
Do you hear an improvement or not?
In my opening, I said I had arguably left out an important step in my previous article and my earlier work. But I can think of a few counterarguments to be made as well, and I’ll share them here (with the caveat that I’m unsure how compelling I find each of them, and how far I’ll just be playing devil’s advocate).
First, there are some situations where the new step just isn’t relevant. Sometimes waveforms on paper already represent electrical signals, as was the case with the 1870s record of a discharge of an electrical fish I recently blogged about, and there’s no sense in trying to apply this step to those.
But let’s focus on acoustic recordings in which a waveform represents the displacements of a stylus and membrane. In those cases, what we get when we convert displacements into velocities corresponds to what’s traditionally known as a flat transfer. This is the preferred archival standard for digital preservation copies of acoustic sound recordings played with electromagnetic cartridges. If you digitize a cylinder recording, for instance, you want to preserve as faithfully as possible the raw signal just as it comes from the cartridge. That’s because this constitutes the least tinkered-with digital representation of the source. Information isn’t gained through any subsequent tweaking; it can only be lost. Under these circumstances, a traditional flat transfer is the most reliable, least compromised digital version of the data.
The same can’t be said, though, of optically imaged sound recordings in which we’re starting out with displacement data, and not velocity data output by a cartridge. In these other cases, it’s the displacement-based constant-amplitude WAV that represents the least tinkered-with digital representation of the source, unless we step back into the image domain. The velocity-based WAV corresponding to a flat transfer is generated secondarily from it; it can never add information, but it may well lose some through (say) normalization. In terms of process, the displacement-based WAV seems more philosophically justifiable here as our audio-domain preservation copy.
Someone might object that differentiation into velocities is needed to make our data properly usable, arguing that because the amplitude axis in a WAV file conventionally represents velocity amplitudes, a displacement-based WAV isn’t yet even an acceptable sound file, but only an intermediate form it’s necessary for the data to pass through.
And yet something comparable could be said of traditional flat transfers. People rarely listen to them directly, since they tend to couple objectionable amounts of noise with weak bass (although I’m personally used to this!). Instead, sound archives generally produce subjectively reequalized “listening copies,” with treble attenuation and often a bass boost, while commercial record labels carry out more elaborate restorations. To a large extent, this work would entail undoing the effects of conversion from displacements to velocities on relative frequency levels: we’d be elevating high-frequency noise, for example, only to need to reduce it later for optimal listening.
There are some good reasons why flat transfers as such might not do justice to grooved sound recordings from different historical periods, and why reequalization might be needed to make them “listenable.”
In a constant-velocity waveform, as we’ve seen, very high frequencies at middling sound pressure should correspond to tiny undulations, while very low frequencies should correspond to enormous ones. That creates serious dilemmas when it comes to inscribing data, whether in analog or digital form. However, the practical implications of this factor were mitigated at first because the acoustic recording process as actually implemented was imperfect, incapable of delivering the results we might expect in theory from an ideal system based on its principles. On the recording side, the most conspicuous factor was a limited frequency range: the National Jukebox cites this as “approximately 100-2500 Hz,” while other sources give other figures, such as a lower limit of 200 Hz or an upper limit of 3 or 4 kHz. Below is a comparison of the spectra for an acoustic cylinder recording made by my friend Martin Fisher (in red) and for a “modern” digital recording made simultaneously of the same musical performance (in purple). I don’t know enough about how he made or processed the cylinder transfer to comment on relative amplitude levels, but I’ve highlighted the segment of the frequency range in which it seems to display roughly the same recognizable peak patterns as the digital recording.
In a strict constant-velocity system, the waveform excursions at 200 Hz should theoretically be sixteen times larger for a given sound pressure level than they are at 3200 Hz—just a little larger than the frequency range highlighted above. That’s a big discrepancy, but still a finite one. And even within that range, the acoustic recording process didn’t achieve a perfect, transparent constant-velocity transduction, thanks to the effects of horn geometry, membrane resonance and stiffness, diameters relative to wavelengths, and so on. Playback on period equipment would again have brought those same variables into play while also introducing other ones, such as tracing errors (where a stylus runs into problems following the excursions of a groove) and horn resonances (which would also have impacted recording, of course, but would then have authentically colored the sound prior to its interception at the membrane). Acoustic recording and playback systems may have been constant-velocity in principle, but in practice there were a lot of other things going on with them.
So theory and principle suggest that we interpret acoustic record grooves as constant-velocity rather than constant-amplitude. But which of these two characteristics is a better overall match for acoustic processes in practice, warts and all? I’m not going to try to present a definitive answer to that question here, but let’s see where a couple quick-and-dirty experiments point us.
Below is an animated graph I created to compare the spectrum of the playback of an Edison Diamond Disc (“Entr’acte and Barcarolle from ‘The Tales of Hoffmann,” Edison Concert Orchestra, 52403-L) on period equipment, as recorded by microphone (in red), with the spectra of the electric playback of the same disc, alternating between a velocity response and a displacement response (both in blue). This represents one disc played on one period machine and with one readily available stylus in one modern electromagnetic cartridge. I’m not aiming at any broad and rigorously defensible generalizations here, but just want to get a ballpark sense of the kinds of frequency response we’re dealing with.
The velocity response looks like a closer fit for the relative frequency levels within the 300-3000 Hz range, but the displacement response looks like a closer fit for the ratio between the peak frequencies and the level of high-frequency noise. For frequencies below 300 Hz, the closest fit to the “shape” seems at first glance to be the velocity response attenuated by a certain consistent amount. Based on this very rudimentary investigation, it looks like acoustic playback departed here from a strict constant velocity characteristic in two main areas. First, there was a steep roll-off above around 3000 Hz. Second, frequencies below 300 Hz were also attenuated, although seemingly all to a fairly constant degree. Here’s an excerpt from each of the three versions if you’d like to listen and compare by ear: first velocity, then acoustic, then displacement.
Now, I think this type of comparison offers some insights into the historical logic and aesthetics of acoustic recording, but I certainly don’t mean to suggest that playback on period equipment should be the gold standard for assessing frequency response. On one hand, it’s generally accepted that the acoustic recording process captured more information than period equipment could “reproduce,” and we want to harness all the information that’s there. On the other hand, there’s also a lot of variation within stages of the acoustic playback process itself. Consider the following spectral comparison of the acoustic playback on an Edison Standard of a phonograph cylinder (“Simple Confession” by the Indestructible Orchestra, Indestructible 1349) without a horn fitted to the reproducer (in red) and with a horn in place (in purple). Which of these, if either, should we identify as the curve achieved in this instance by acoustic playback? I’m not sure.
But if we wanted for the purposes of argument to reequalize a flat transfer to approximate acoustic playback for our Diamond Disc, including the effects of the horn, it seems we’d be looking at a curve something like this:
Now, we might hypothesize further that the frequencies where amplitude drops here are roughly those where an acoustic signal chain resists vibration more generally. If that’s the case, a similar curve might also have been imposed during recording, in which case we could try to reverse it in playback to boost frequencies attenuated by the acoustic recording process. Now, this isn’t a very good idea for the upper frequencies, where the rolloff lies near the top of the frequency range typically cited for acoustic recording anyway. Indeed, by reversing that upper part of the curve we’d mostly end up boosting noise in which any good signal would be buried; probably better instead to reduce the noise to something resembling its levels in acoustic playback so that it isn’t as much of a distraction. But for the lower frequencies a reversal of the curve does make good sense; even if we’d end up boosting rumble along with any good bass signal, the noise in that case is far less intrusive. The compensatory curve might then look vaguely like this (with all specific magnitudes and transition points being negotiable and contingent):
This is admittedly all a very crude analysis. But again, I’m only trying to get a ballpark idea of what the meaningful frequency response of an acoustic recording might look like relative to a flat transfer. And whereas I’ve done this based on a smattering of empirical observation, Carl Haber and his coauthors have pursued the matter far more rigorously from a theoretical standpoint here, in Appendix A2, concluding among other things (on page 35):
Above about 5000 Hz, the mass of the diaphragm limits the response of the system and the transmission of sound becomes strongly attenuated. Below some hundreds-of-Hz, due to horn geometry, sound propagation is again attenuated.
They note that exponential horns should attenuate lower frequencies with a sharper slope than conical horns, and they also discuss phase shifts that should occur with overdamped membranes outside the “main” frequency range. Their analysis confirms that there are points at which an acoustic recording system should be expected to break down as a transducer of plane waves, such that constant-velocity conditions no longer apply.
Overall, it’s looking as though the record groove in acoustic recording and playback systems had approximately a constant-velocity characteristic in the midrange, but with loosely predictable exceptions that make a bass boost and treble rolloff attractive and defensible (nudging us towards the overall contour of a constant-amplitude characteristic). And that’s exactly what tends to go into making “listening copies” today.
Meanwhile, we shouldn’t overlook the basic fact that with digitization we’re introducing natively acoustic recordings into an electric audio environment. And if we examine the history of the ground rules governing electric recording and playback, we find some reason to question the absoluteness of the equation in which voltage = stylus velocity = sound pressure.
With the rise of electric sound recording in the 1920s, former constraints on frequency range receded while engineers simultaneously gained new control over how audio signals were processed before reaching the recording stylus. Trying to cut a groove with a strict constant-velocity characteristic now became problematic because low frequencies would have required very large excursions that would either have taken up too much space or risked cutting into adjacent grooves. On the other hand, cutting a groove with a strict constant-amplitude characteristic—which might have been another option—would also have been problematic because high frequencies would then have entailed unmanageably high stylus velocities. Instead, a compromise was struck between a constant-velocity characteristic (for higher frequencies) and a constant-amplitude characteristic (for lower frequencies), like this:
However, there wasn’t initially any agreement about what to use as the turnover point, or the frequency at which the transition from constant amplitude to constant velocity took place: one system might use 250 Hz, another 500 Hz, and yet another 1000 Hz. Moreover, additional subjective manipulation of the frequency spectrum soon began to occur, notably pre-emphasis: a treble boost to enhance brightness, with a transition at whatever frequency suited an engineer’s fancy. After a period in which inconsistency was the rule, these tweaks coalesced into the standardized RIAA curve which has been in pretty general use for the cutting of discs since 1956: a specific hybrid of constant-amplitude and constant-velocity characteristics imposed during cutting and reversed during playback. The graph below shows the RIAA curve (in dark green) compared with a consistent constant-velocity characteristic where excursions represent particle velocity amplitudes (in blue) and a consistent constant-amplitude characteristic where excursions represent particle displacement amplitudes (in red). As can be seen, the RIAA curve is actually a little closer overall to the constant-amplitude characteristic than it is to the constant-velocity one. The flashing light-green line shows a slope exactly midway between the two characteristics (-3 dB per octave), which comes closer yet to the RIAA curve.
If we’re asked whether the amplitude axis of an LP record groove represents particle displacement or sound pressure, then, the answer is that it’s really both mixed together in nearly equal proportions, and not just one or the other. We can either transduce velocities and then adjust the results roughly halfway towards displacements or transduce displacements and then adjust the results roughly halfway towards velocities. Six of one, half a dozen of the other. With a modern higher-end electromagnetic cartridge, we happen to transduce velocities and then apply the necessary curve to adjust the results in the right direction through reequalization.
But many older and cheaper cartridges don’t work that way. They’re designed in principle to output voltages proportional to the amplitude of stylus displacement rather than to stylus velocity. With a common piezoelectric cartridge, the back-and-forth motion of the stylus causes a crystal or ceramic rod to bend and to generate a voltage proportional to how far it’s bent. If you were to run the output of a piezoelectric cartridge playing a 78 rpm record into an analog-to-digital converter, you’d get a WAV file in which voltage amplitudes correspond to displacement amplitudes, much as they do in the WAV files I’ve been creating directly from phonautograms, ink prints of gramophone discs, and so forth.
It’s often said that when you’re using a piezoelectric cartridge, you don’t need to apply any kind of inverse RIAA curve to its output. That’s not because displacement transduction is inherently “right” for grooves cut with an RIAA curve, but rather because piezoelectric cartridges aren’t perfectly transparent displacement transducers. On this point, allow me to quote an article from 1945 by Theodore Lindenberg, Jr.:
Theoretically, an ideal crystal pickup would produce a constant voltage at all frequencies from a constant-amplitude recording, and all that would be necessary to achieve the desired results would be to introduce the proper electrical network to equalize the pickup output on the constant-velocity portion of the recording up to the output on the constant-amplitude portion of the disc. Practically, however, the mass of the stylus, stylus bearing, and drive fork, together with the considerable mass of the crystal, usually introduces a resonance peak in the neighborhood of 2,500 to 4,000 cycles per second. This peak is generally controlled to some degree by damping pads on both sides of the crystal and the result is that the rising characteristic up to this peak compensates for the reduced output of the constant-velocity portion of the recording.
The result would, of course, have been only a rough approximation of the RIAA curve, or any other curve from before standardization. Historically, though, this is how most electric playback of gramophone discs took place in the 1940s and 1950s, with voltages being generated proportionally to stylus displacements, and not stylus velocities. In other words, there’s a strong historical and cultural precedent for interpreting a record groove in this way.
If we go back yet further in time, the first electric records were marketed in the mid-1920s for use mostly on the same machines that had hitherto played acoustic records, so the two types of record needed to be compatible, even as the former were supposed to be an improvement on the latter. The constant-amplitude portions of equalization curves, the effects of pre-emphasis, and so forth, obviously departed from a strict constant-velocity approach, but they couldn’t have departed too radically from the actual frequency profile of acoustic records without creating unacceptable incompatibilities. And as we’ve seen, they didn’t: in particular, the horn geometry of acoustic recording had imposed a bass attenuation that loosely mirrored the effects of applying a constant-amplitude characteristic below the turnover point in electric recording. The transition from acoustic playback to electric playback would also have needed to be minimally disruptive in turn to ensure an acceptable degree of backwards compatibility. And it was, to the point that a typical piezoelectric cartridge should naturally generate a signal from an acoustic recording that somewhat resembles the “listening copies” people go out of their way to create from such recordings today.
Even today, electromagnetic cartridges have triumphed over piezoelectric ones not because it’s more appropriate to generate voltages proportionally to velocities than to displacements, but because of better compliance and lower tracking weights that cause less groove wear. For record grooves cut with the RIAA curve, or a similar curve, there’s no compelling philosophical reason to prefer a transduction of velocities into voltages (followed by reequalization) over a transduction of displacements into voltages (followed by reequalization). Either way, we start out using one logic of transduction and then adjust relative frequency levels about halfway towards the other. Indeed, we could experimentally impose a 3 dB per octave slope on either kind of transduction (positive for displacement-based, negative for velocity-based) to obtain relative frequency levels exactly halfway between the two, corresponding to the flashing light-green line in the graph shown above, although that would ignore the differential treatment of particular frequency ranges by historical equalization curves.
Here’s a pair of examples for comparison by ear: a fifteen-second excerpt from “The Green Hornet Theme” played by Al Hirt, presented four times in a row as transduced from an LP cut with the RIAA curve, with equalization reflecting constant velocity (variant 1), constant amplitude (variant 2), a fifty-fifty compromise (variant 3), and the official RIAA curve (variant 4). Note that the officially correct playback is much closer to the fifty-fifty compromise than to either of the two “pure” characteristics.
With acoustic recordings, it strikes me that the situation isn’t so very different if we think in terms of the characteristics of a culturally acceptable “listening copy” or restoration. If we transduce velocities into voltages to obtain a traditional flat transfer, the midrange ends up “right,” but with weak bass (which needs to be boosted) and exaggerated, intrusive high-frequency noise (which needs to be rolled off). If we instead transduce displacements into voltages, we get robust bass (though not necessarily “correct” bass) and much less high-frequency noise, but with a skewed midrange (at -6 dB per octave) and a “duller” sound. These are our two extremes, if you will, with the target lying somewhere in between them. Which extreme we start from is perhaps not so critical.
Here’s another sound file for comparison by ear: a snippet of “The Stars and Stripes Forever” performed by the Columbia Band, presented four times in a row as transduced from an acoustically recorded gramophone disc with equalization reflecting constant velocity (variant 1), constant amplitude (variant 2), a fifty-fifty compromise (variant 3), and the official—but in this case anachronistic—RIAA curve (variant 4).
In terms of the last two listening examples, the processing step I outlined earlier in this blog post is intended to convert variant 2 into variant 1. There’s a lot to be said for this step as applied to acoustically recorded “pictures of sound,” most notably that it should restore relative frequency levels over a critical swath of the midrange. At the same time, it has some drawbacks: the heightened noise, the weakened bass, the loss of amplitude resolution (which often isn’t great to begin with), and so on. It makes for an important addition to the audio archeologist’s toolkit. But I think we’ll sometimes get more popularly listenable “raw” results if we give it a pass. And if we’re planning to do any further reequalization or restoration, we’re likely to end up undoing much of the effect of this step anyway. In short, it’s worth knowing how to do, but you may not always want to do it.