Among audio preservationists, Digital Audio Tape (DAT) is a notoriously glitchy format, routinely plagued by intrusive digital dropouts. But its glitches are patterned in ways that reflect the technical details of its structure—the interplay between the organization of the digital data and the physical design of the tapes and machines. I believe these patterns could be harnessed for audio restoration purposes more effectively than they have been, and I’d like to describe a few experiments I’ve tried with that goal in mind.
Let’s begin with a little context. In the spring of 2020, when the COVID-19 pandemic brought on-site digitization work to a halt at Indiana University’s Media Digitization and Preservation Initiative (MDPI), the audio engineering staff turned to a work-at-home project to keep busy. Many of the DAT recordings that had been reformatted earlier in the project existed in pairs of one tape originally earmarked for patron use and one “archive” tape. Sometimes, we were told, these tapes had been recorded on two separate machines fed by the same input during concerts or recitals, while sometimes one had instead been copied immediately from the other as a backup. Either way, degradation had since taken its toll, such that by the time the tapes were reformatted to digital files, one or both copies often suffered from severe digital glitches. As Mike Casey describes in an MDPI blog post entitled “From Violence to Violins: Debugging DAT,” the work-at-home project consisted of audio engineers comparing pairs of compromised transfers and splicing together composite files by hand, selecting whichever of the two copies seemed to be in better shape, segment by tedious segment. The project served its purpose: the engineering staff was kept productively busy during the peak of the pandemic, and a lot of formerly “unusable” content was successfully salvaged.
While all that was going on, I also spent some time investigating whether some or all of this same process could be automated. I wasn’t able to get my experimental compositing algorithm working perfectly by the time on-site work resumed, and no version of it was ever put to official practical use. But even so, I think an account of my efforts may be of some interest. The specific scenario for which I tried to design the algorithm is probably not a common one, but the general approach I took may be suitable for a wider range of applications. At the same time, I had to figure out some aspects of the inner workings of the DAT format that I haven’t seen discussed elsewhere from an audio preservation or restoration standpoint, such as the glitch patterns and the reasons behind them. And I’d hate for anyone to need to reinvent that wheel.
One thing that stands out about glitches in DAT transfers is that they almost always repeat at regular intervals. It’s easy to see this simply by looking at the waveforms. It’s also easy to see that the left and right channels contain different glitch patterns rather than the same ones.
There’s also a weaker pattern of recurrence at half that distance, i.e., at an interval of 52 samples. Glitches separated by this shorter interval tend not to resemble each other as closely as glitches separated by the longer interval, but they seem to be of roughly the same length.
The glitches themselves take a few different internal forms. Sometimes alternate samples seem to belong to separate signals, resulting in a distinctive zig-zag pattern. In these cases, the difference between successive samples will be far greater than usual and will also change direction (positive or negative) each time.Regardless of whether there’s much difference between alternate samples specifically, there also tend to be larger jumps during glitches, especially at the beginning and end, that are larger on average than the distances between good samples.
Other glitches consist of stretches of nearly identical sample values, usually varying by only 1/1000 of the amplitude range. These presumably result from an error-masking process in which the last good value before a glitch is held until another good value is detected. The difference between successive samples will accordingly be much less than usual during most of the glitch, except for the likelihood of a big jump right at the end.
For identifying glitches, the absolute value of the difference between samples seems to be a promising metric.
Above are two overlaid excerpts from the left channels of transfers of two DATs of the same performance. The blue waveform represents a file with glitches, while the green waveform represents a good file without errors. The blue waveform displays a zig-zag glitch pattern on the left and a held sample glitch pattern on the right.
Our challenge is to come up with an algorithm (or combination of algorithms) that will reject both the zig-zags and the held samples.
I’ve tried a number of different strategies.
STRATEGY #1: FIND THE SHORTEST PATH
This algorithm is good at avoiding spikes, jumps, and other outliers. As long as we run it twice, once forwards and once backwards, and then average the results, it also rejects many held samples. It’s the most versatile single algorithm I’ve worked out for DAT glitches.
STRATEGY #2: IDENTIFY AND REJECT ZIG-ZAGS
This algorithm is good at minimizing zig-zags, but it doesn’t accomplish anything else.
STRATEGY #3: IDENTIFY AND REJECT HELD SAMPLES (PLUS SUBSEQUENT JUMPS)
This algorithm is good at avoiding segments of held samples and the subsequent jumps, but it doesn’t accomplish anything else.
Each of these three algorithms outputs a verdict as to which source is more likely to be correct for each sample. A positive value favors source two, while a negative value favors source one. By adding the results of different algorithms, we can easily use them in combination with each other rather than just singly. It’s also easy to weight the algorithms differently just by multiplying their results by different factors. (It should even be possible to factor in the output of some third-party restoration tool in this way, such as a more sophisticated commercially available declicker.)
But there’s probably a limit to what can be accomplished by comparing samples in a linear fashion as I’ve described. Whatever criteria we use to try to identify zig-zags, held samples, and so forth, chances are good that the “good” source will occasionally display them more strongly than the “bad” source. (For example, a legitimate tone at the Nyquist frequency will closely resemble a zig-zag.)
Fortunately, we have another piece of information we can bring to bear on the matter: namely, those 104-sample and 52-sample glitch intervals I mentioned earlier. If we could somehow pool our verdicts about samples across those intervals, we might be able to pinpoint glitches more reliably than we can by examining parts of the waveform independently from one another.
However, there’s a second cycle we’ll need to bear in mind if we want to attempt this – namely, the one that governs how many repetitions of the 104-sample glitch cycle we should expect. To figure out how this other cycle works, we’ll need to examine some technical details about the DAT format itself.
The DAT format physically interleaves audio data in such a way that groups of samples at 52- and 104-sample intervals will occupy the same two-block unit and be checked for errors together.
These samples are further organized into larger groups of 662 odd and 661 even samples (1323 samples total) separately by stereo channel. This larger unit is called a frame, consisting of two tracks written diagonally across the width of the tape, with opposing azimuths, read by two different heads.
Within any two-block unit, if one sample is wrong, it turns out that the other samples in the same unit are likely to be wrong too. Thus, samples will tend to be consistently correct or incorrect at 104-sample intervals (between successive four-symbol groupings) and at 52-sample intervals (between pairs of samples with their MSBs and LSBs interleaved) – but only within one 1,323-sample frame.
The fact that alternate samples are written to (and read from) different locations on the tape probably explains why we so often see a “zig-zag” pattern in glitches.
I suspect such glitches may result from a warped tape causing a head to read parts of more than one track during a single rotation, so that samples from two different tracks end up mistakenly shuffled together.
A really sophisticated approach might be able to restore wrongly read samples to their right locations. For now, I’m just trying to detect that they’re wrong.
Here it has been helpful to work out ways of viewing DAT transfers that are customized to draw out the cyclical glitch patterns we want to expose. The approach I’ve taken builds on my earlier development of a visualization technique for audio files I call the soundweft. This involves identifying the duration of a cycle (e.g., one bar in a popular song) and dividing the recording into a sequence of vertically stacked lines based on that length, much like the use of video scan lines to build up a coherent picture.
The soundweft on the right was created from Psy’s “Gangnam Style,” with an algorithmically detected cycle length of 1.818 seconds. Variations in amplitude are shown as differences in brightness, with positive velocity amplitudes, negative velocity amplitudes, and displacement amplitudes assigned to different color channels, and left and right channels alternating.
Soundwefts can be aesthetically appealing, but they’re also a powerful analytical tool for visualizing cyclical phenomena in sound recordings that would otherwise be difficult to see.
The same principle can be used to display glitch cycles in DAT transfers, which – if we map brightness to absolute value of difference between samples – appear as vertical lines or rectangles. Held samples are dark, while jumps between samples are bright. I’ve experimented with some different ways of organizing samples for visual display that have different advantages and disadvantages when it comes to exposing and characterizing issues with DAT transfers.
One option splits up each successive group of 104 left and right samples (208 total) into eight 26-sample subgroups. The glitch profiles of these subgroups often resemble adjacent ones but often go their own distinct ways.
In this case, the differences in position of “bright” glitches (spikes and jumps) between the odd and even samples are particularly noticeable, implying that these arose during the reading of individual block pairs.
On the other hand, some “dark” glitches (held samples) are consistent across the odd and even samples within a channel (but not across channels), hinting that these represent error concealment at the frame level.
Here’s an alternative display option in which the left and right channels are shown as continuous 104-sample strips, but with differences still calculated separately for odd and even samples, as before. The glitch profiles specific to odd and even samples can no longer be singled out for separate analysis.
However, we now see some vertical banding in lighter areas that points to problems that weren’t apparent in the previous view. These represent “zig-zag” stretches in which odd and even samples seem to be drawn from different sources, such that there’s little difference between successive odd samples or odd even samples, but alternate samples differ significantly.If we calculate the difference between all sequential samples rather than doing this separately for the odd and even samples, the zig-zag areas come out more clearly and brightly, but at the expense of a somewhat blurrier overall appearance.
Another option is to display 52-sample strips rather than 104-sample strips. In this arrangement, pairings of samples with their MSBs and LSBs interleaved are represented by pairings of adjacent rows.
This approach obscures differences in glitch profile between the 52- and 104-sample intervals, as well as between odd and even samples. However, it gives us a clearer look at phenomena that are consistent across these subgroupings, such as those “dark” glitches.
On the left, the difference has been calculated separately for odd and even samples; on the right, it’s been calculated for all samples together in sequence.
Any of these display options could be helpful for quality control or assessment purposes.
At the same zoom level, a soundweft can provide a more informative look. In the case shown here, for example, it’s easy to see from the soundweft that the left channel contains a lot of held samples. This isn’t nearly as apparent in the waveform view.
The fact that the soundweft “background” appears uninterrupted around the “bright” glitch suggests further that the errors are narrowly localized and that surrounding samples are probably correct.
The “background” consists of the differences between samples of good signal. Like glitch cycles, these will often form coherent patterns, but they should hardly ever contain vertical lines, much less lines exactly twelve or thirteen pixels high. The human eye is pretty adept at the necessary kind of pattern discrimination, even in cases when vertical lines have relatively low contrast.
If we overlay aligned soundwefts of two different DAT transfer files, assigning one to the green color channel and one to the blue color channel, distinctions of color can be used to identify which of two files has a problem at any given point. In the example shown below, bright green or dark blue indicates a problem with the file assigned to the GREEN channel.
A problem with a file assigned to the BLUE channel reveals itself in bright blue or dark yellow-green.
Our diagnostic color key ends up looking something like this:
Note that the neutral color just means both files are behaving similarly and isn’t a guarantee of good signal – both files could be wrong simultaneously. In that case, vertical banding should still provide a clue that something is amiss. If one file has a jump where the other has a held sample, the color will end up reflecting whichever anomaly is more extreme, but vertical banding is still likely to make the other “masked” error noticeable.
All nice in theory, but in practice it can get pretty confusing!
Still, if people can learn to read QCTools reports, perhaps they could learn to read these too.
A blink comparator method—similar to the one used by astronomers for detecting planets that move against a background of stars that don’t—would be another way to compare two aligned files without having to distinguish among colors, if that proves to be too difficult.
Or soundwefts for two files could just be placed side by side.
I’ve even experimented with 3D viewing (using red-cyan anaglyphs) but didn’t find it helpful, at least for this purpose.
If we just want some help in knowing where two aligned DAT transfer files differ significantly from each other, without an indication of which file is correct, we can simply plot the absolute values of discrepancies between the two files.
In the two examples seen here, the red channel shows the magnitude of discrepancies between samples; the green channel shows the magnitude of discrepancies between sample-to-sample differences, and the blue channel shows the magnitude of discrepancies between the differences of the differences.
The example on the left shows a case where both files display constant major problems. The example on the right shows a case where one of the two files contains sporadic errors.
Different color channels can also be used to display different aspects of a single DAT transfer.
In the example seen here, the green channel shows differences calculated separately for odd and even samples and the blue channel shows differences calculated continuously for all samples. The former will contain low values during zig-zag stretches, while the latter will contain high values. As a result, areas with zig-zag errors “pop out” in vivid blue from the green background.
In short, there are many ways in which soundwefts could be used to help assess DAT transfers and diagnose issues with them. They might be able to save time on the part of an engineer who needs to identify where problems exist. This all depends on whether it’s faster to listen to a pair of files or to scroll down a long, narrow image file looking for telltale cases of vertical banding.
But can we get a computer to use these same cyclical patterns to choose automatically between sources?
One complication is that a 1323-sample frame isn’t evenly divisible into 104-sample cycles. Each frame will contain seventy-five thirteen-sample columns and twenty-nine twelve-sample columns. Thus, vertical banding structures will be sometimes thirteen pixels high and sometimes twelve pixels high. A frame will start in any particular place once every 104 frames, or once every 137,592 samples, which comes out to once every 3.12 seconds.
The bottom line is that in order to know which samples should “go together,” we need to identify the starting sample of the frame.
This would be difficult or even impossible to figure out precisely for most frames taken in isolation. However, if we can identify the starting sample of any frame in a given file, we should then know the starting samples of all the frames, since these should be exactly 1323 samples apart (unless their spacing is affected by dropouts).
I’ve tried to design an algorithm to identify frame starting points automatically, and I suspect this would be possible, but I haven’t yet succeeded in getting anything to work. Still, I did build a tool that lets a user work this out by hand, shown below.
Sample differences for the left and right channels are overlaid in blue and green channels, looped into 104-sample strips, with an arbitrary 1323-sample grouping highlighted in red (which can be toggled on or off). The user can nudge the display backwards or forwards in intervals of 1, 104, or 1323 samples until the red highlighting covers all of the vertical bands within it and none of the vertical bands outside it (this can be checked or fine-tuned across multiple frames by repeatedly advancing +1323). The goal is to decide on an offset, displayed above as 980, meaning that the first full frame begins at sample 980+1 = 981.
Once we have offsets for a pair of files in hand, we can average the results of our algorithms across 1,323-sample frames and apply the averaged results instead of (or in addition to) the raw results. Even if we don’t get the offset exactly right, getting close will still be advantageous, since we’re dealing here with probabilities anyway. Of course, it isn’t possible to infer an offset for files that don’t contain any glitches, nor would it be practically useful to do so. Here’s a comparison of two analyses, one with and one without the cyclical treatment:
The linear analysis was carried out by applying the “held sample” algorithm I described earlier to two DAT files strictly as linear sequences of samples. The cyclical analysis was carried out by adding to that result a second result obtained by averaging the first result in a 104-sample cycle across each 1,323-sample frame. In practice, the “held sample” algorithm seems to benefit most from cyclical processing, largely (I think) by reducing the likelihood of false positives.
The following thus appears to be a promising combination of algorithms:
- Linear “Shortest Path” × 1
- Linear “Zig-Zag” × 1
- Cyclical “Held Samples” × 1
There’s also a fourth algorithm that isn’t very useful on its own, but that I’ve found it advantageous to apply as a final step to the output of the other algorithms.
STRATEGY #4: REJECT SINGLE-SAMPLE OUTLIERS
TECHNICAL DETAILS: For every sample but the first and last, my algorithm compares its value to the average of the samples to either side of it, and if it exceeds a threshold of 0.001 on a scale of -1 to +1, it checks the sample in both source files and chooses the one that comes closest to the average.
So how does this combination of algorithms fare in practice?
Here’s an audio example that illustrates the combination of measures just described. This will be a highly unforgiving test, since it compares one file with no errors against another file with multiple errors of different kinds. The composite output should simply be identical to the first source file and should always reject the second source file whenever the two files disagree. Any audible problems in the output represent flaws in processing.
Good archive: segment of transfer from DAT with no noticeable errors
Distortion and glitches DAT: corresponding segment from DAT with numerous glitches
The results of automated compositing are not perfect: there’s some “rattly” distortion on loud, high notes and in other places. I suspect this may represent points where the algorithm is alternating between sources that were sampled at slightly different moments, producing noise.
Even so, the automated results are still more listenable than the file with glitches. And of course there would be no reason to use the application in a case like this one, where one of the two files appears to be error-free by itself. If both files had contained similar errors, the automated result would presumably be more listenable than either of the source files and would have taken no more work to obtain. Preparing this example required the following steps:
- Choosing a frame starting-point offset value for each file with the separate “DAT Glitch Aligner” application.
- Loading the two files into the main application.
- Zooming in and adjusting alignment. The application auto-aligns the two files with each other using cross-correlation, but this isn’t always accurate. In this case, the green source had to be nudged two samples to the right by clicking twice on a button.
I tried another example that may be closer to a real use scenario: a pair of files about 6:45 in duration that are both riddled with intrusive errors, although one is marginally better than the other.
First, I determined the frame starting-point offsets, which took maybe one minute per file, or two minutes in all.
Finally, Autosplice took one minute and fifty seconds to run, and Export just a couple seconds.
Here’s roughly the first minute of both source files. In many cases their errors overlap, such that neither source for a given sample is correct.
Better file, but still bad
Many of the errors have a distinct pitch to them, which comes from glitches recurring at 52-sample intervals ~848 times per second. For comparison, here’s an 848 Hz tone:
When I processed this pair of files using the same settings as before, the composite result seemed to be an improvement on either source by itself, but I noticed that the algorithms had sometimes chosen spikes and “tones” even when these weren’t present in both files. But it also looked as though there were no cases of held samples in either file, and since I thought the “held samples” algorithm was probably at fault, I shut it off and ran Autosplice again. (Note that I was no longer using the frame starting-point offsets at this point, so I suppose I could have skipped the step of working them out.) Here’s the result:
Automated composite #2
I’m not sure whether this audio would be considered “usable” or not, and my impression is that this pair of files was deemed to be beyond salvage in the first place. However, I think the application has done a good job of extracting whatever good signal is present, leaving audible glitches only—or at least mostly—where there are simultaneous errors in both files. I can’t imagine how many hours it would have taken to pull together a comparable composite file by hand.
The automated results could also be used as a starting point for manual editing if we’re not satisfied with them “as is.” I’ve built an interface into the same application that lets a user select a part of the waveform in the display and toggle back and forth between sources for it. Even if the algorithm doesn’t always choose correctly, then, a tool built around it (probably with further modifications) might still be more efficient or convenient than using ordinary sound-editing software as part of a manual workflow.
I haven’t provided for certain situations that have been reported, such as instances in which two files have opposite polarity or left and right channels swapped. It wouldn’t be too difficult to detect these automatically by checking each possibility and seeing which one provides the best match. But then that check would add some time to the processing of every file, whether it needed it or not, so I’m not sure whether or not it’s a good idea.
There are other issues too that I haven’t yet tried to address. In the example shown below, the DC offset differs from source to source. To compare samples effectively in such a case, we’d need to bring the two sources into alignment with each other along the vertical amplitude axis. Differences in overall amplitude exist too and are similarly troublesome.
My approach to time-axis alignment in the experiments I’ve been describing was pretty simple, and I’m surprised it worked as well as it did. I just used cross-correlation to find the best alignment between the first 100,000 samples of the left channels of both files and then shift the positions of both files relative to each other based on that.
The problem is that we know, based on empirical reports from the MDPI “DAT Project” team, that the number of samples between corresponding samples in two files tends to vary over time. Thus, if we choose an alignment that’s correct for the start of a recording, the files will typically be off by some number of samples from each other somewhere further along. When long dropouts are present, these shifts in alignment can be particularly extreme.
So if we wanted to build an automated tool to create optimal composites from two longer DAT transfers, it couldn’t just align pairs of files once at the beginning and be done with it. It would need to be able to adjust the alignment between them continuously.
To do this, we’d need to cross-correlate the files using a sliding window. That is, we’d find the best correlation between these parts, and then these parts, and then these parts, maybe with overlap between them, maybe not. Beyond that, though, the specific practical measures we’d need to compensate for these timing discrepancies would depend on what’s causing them and what form they take as a result.
- Does some drift occur because of miniscule clocking differences between two DAT recorders, associated with the accuracy of their crystal oscillator frequencies (supposed to be 9.408 MHz)? That is, could one recorder have been capturing 44,100 samples per second while the other was capturing 44,100.01? In that scenario, one file would contain 4,410,000 samples in one minute forty seconds and the other would contain 4,410,001. There would be a 36-sample discrepancy after an hour. If this is what’s happening, then the rate of drift ought to be gradual, regular, and maybe even predictable if the same two machines were used consistently. Various strategies suggest themselves for keeping the files in sync – maybe even something as crude as just disregarding every 4,410,001th sample in one of the files. The effects of clocking differences could be expected to vary depending on whether two recorders were capturing the same analog source or one recorder was capturing the digital output of the other, but the same strategy could probably be applied to both cases.
- In other cases with dropouts present, changes in relative alignment seem to be more abrupt. This is probably because motor speed is controlled in playback by sync patterns on the tape that would be prone to dropping out concurrently with the audio data. In such cases, we can expect abrupt and irregular “jumps” in alignment between segments with sharp boundaries between them.
That’s where I left things in June 2020. To move the project further in its original direction, my next step would have been to move on from excerpts to analyze the sample offsets in matching pairs of complete DAT transfers. Alternatively, I could imagine using the existing tool in another scenario that may actually be more common than the one we faced at MDPI. One more-or-less standard strategy for reformatting glitchy DATs is to put a playback machine intentionally out of alignment, with a skewed azimuth, in an effort to match the presumed azimuth of a misaligned recording device. The goal is usually to obtain one single transfer that’s free of glitches. But if someone were to obtain two imperfect transfers made of the same DAT with different azimuth settings (or even without any change), the tool I’ve described might already be able to crank out a usable “best-of” composite file, following a strategy similar to one I’ve described for audio CDs.
Whether anything practical ever comes out of all this or not, I’ve found it a fun challenge, and I’ve learned a lot about the DAT format in the process of trying to meet it. I hope you’ve enjoyed reading about it. Serious proposals for further development will receive due consideration.