Over the past few centuries, there have been many attempts to write the English language phonographically—that is, in ways that reflect the actual sound of the language more holistically or less arbitrarily than usual. None of these systems has supplanted traditional spelling in widespread use, although the International Phonetic Alphabet (IPA) is now pretty well entrenched among specialists. Each also uses a set of characters or conventions that is somewhat unfamiliar to general audiences, making texts written in it challenging for them to read. But we can also try to access such inscriptions in another way: as long as each character represents one sound or sequence of sounds with reasonable consistency, we should be able to use modern speech synthesis software to actualize them meaningfully as audio. And I recently found a piece of software that allows us to do that, with certain limitations: eSpeak.
The main limitation (from my standpoint) is that eSpeak is designed to be language-specific. Its main purpose is text-to-speech conversion using ordinary text as its input, and the rules for how to do this vary greatly from language to language (imagine what’s involved in articulating character strings like nought, pizza, beaucoup, etc.). The software also allows the user to enter phoneme mnemonics based loosely on IPA as rendered in ASCII characters, which is what enables us to actualize phonographic inscriptions more or less as written, but it does this using a language-specific “voice” with its own restricted selection of phonemes. Since our goal here is to synthesize English-language speech, we’ll start by using the default English voice, but many sounds that don’t occur in contemporary English aren’t available in the off-the-shelf English voices, such as [[A]], which did exist in older versions of the language—more on that later.
So here’s the process, in three steps:
- Associate each character in the historical script with a sound or combination of sounds available to English voices in eSpeak, trying to find the one that most closely approximates what the creator of the script intended the character to represent.
- Transcode the text from its original script into eSpeak-compatible phoneme mnemonics on this basis.
- Use eSpeak to execute the inscription automatically as audible speech.
I should emphasize that my purpose here is not to produce an “accurate” or even “acceptable” rendering of English as it was actually spoken in any particular place and time. Instead, I want to actualize the phonographic scripts as written, making the distinctions they make and ignoring the distinctions they don’t make (at least, as much as is possible given the assumptions and rules of the software itself). In effect, I’m trying to give people who lived in past centuries the opportunity to program a modern speech synthesis engine based on their (perhaps limited) understandings of the language. That said, the results posted here are intended mainly as a proof of concept. An audio rendering of the phonographic scripts exactly as written, meaning a one-to-one match of symbol to sound with no substitutions by the software whatsoever, might take a bit more work and study. These are the best results I could obtain quickly, with little or no under-the-hood tweaking of eSpeak itself. Even so, I think they’re pretty interesting.
EXPERIMENT #1: PHONOTYPY (1844)
If we trace the IPA back to its point of origin, we land upon the Phonotypy of the 1840s: a print equivalent of Isaac Pitman’s handwritten “phonographic” system of stenography. Here’s the first version of Phonotypy used experimentally in print at the beginning of 1844, together with the equivalent eSpeak phoneme mnemonics:
I’ve transcoded [[e:]] rather than [[eI]] and [[o:]] rather than [[oU]], because these vowels weren’t explicitly identified by the system as diphthongs, in contrast to [[aI]], [[OI]], [[aU]], and [[ju:]]. According to the stated rules of accentuation, a polysyllabic word was supposed to be stressed on the penultimate syllable unless some other syllable was marked (with a ‘ following it), so I’ve encoded these stresses throughout, which eSpeak factors into its synthesis of intonation. Here are the opening paragraphs of an address by Isaac Pitman that appeared in the Phonotypic Journal for January 1844 as the first substantial text ever published in Phonotypy:
TO THE MEMBERS OF THE “PHONOGRAPHIC CORRESPONDING SOCIETY,” AND THE SUBSCRIBERS TO THE PHONETIC FOUNT.
Dear Friends,—It is with pleasurable feelings of no ordinary kind that I address you in Phonotypy, and thus offer you the result of the first experiment made with the fount which your liberality has enabled me to provide. To you will future ages look, as being, under Divine Providence, the introducers of a correct mode of writing and printing: the instructors of the civilized world in the true principles of that art which is the mainspring of civilization: the emancipators of the infant mind from the galling change of the present system of orthography: and the elevators of the great mass of mankind from the lowest depths of ignorance and superstition to the pleasures of science, and the delights of virtue.
Allow me to congratulate you, as I do most sincerely, on the establishment, and rapid growth of the “Phonographic Corresponding Society”: it is, unquestionably, one of the most useful associations that characterize the present day. Notwithstanding the Society has been in existence but ten months, it already bears such high promise, and manifests so much energy, talent, and aptitude for the work which it has undertaken, the reformation of our written and printed language, that I hesitate not to express my firm belief, that it will prove effectual for the salvation of the literary world from the bondage under which it groans.
It’s worth noting here that speech synthesis is very much in the original spirit of Phonotypy, and not at all a violation of its original intentions. Alexander Ellis, one of its primary developers, was simultaneously eager to use “speaking machines” modeled on the human speech organs as a means of placing the study of pronunciation on a scientific basis. As he wrote in his Alphabet of Nature (1845):
It is easy to say, “make a string vibrate in air, 512 times a second, and you will experience the sensation represented by the musical note C, on the third space, treble clef.” If we could say, “draw out a pipe to the length of 4 7/10 inches, and vocalize it, you will experience the sound represented by o,” we should have the same certainty as to spoken sounds…. These remarks will tend to shew the great importance of constructing successful speaking machines, raising them far above the grade of simple curiosities, and placing them in the rank of necessaries for human improvement.
In October 1844, when the Phonotypic Journal obtained its first lower-case font (which looks much more like modern IPA than the original all-capital font introduced in January does), the first substantial text published in it was a piece by Charles Wheatstone about “speaking machines.” If somebody had formulated a script for a “speaking machine” to convert automatically into sound in 1844, it would likely have been written in something very much like Phonotypy.
EXPERIMENT #2: BENJAMIN FRANKLIN’S “REFORMED MODE OF SPELLING” (1768)
Benjamin Franklin also devised a “reformed mode of spelling” for the English language which is described and illustrated here, and which can be transcoded into eSpeak phoneme mnemonics as follows:
Here’s the opening of a letter which Franklin wrote to Miss Mary Stevenson in response to a letter she had written to him in his new spelling to raise certain objections to it. I interpret Franklin’s occasional unexplained <ê> as interchangeable with <ee> and hence [[EE]], and I ignore the “[viz.]” which he inserts into his text at one point:
Dear Madam, The objection you make to rectifying our alphabet, “that it will be attended with inconveniences and difficulties,” is a natural one; for it always occurs when any reformation is proposed; whether in religion, government, laws, and even down as low as roads and wheel carriages.—The true question then, is not whether there will be no difficulties or inconveniences; but whether the difficulties may not be surmounted; and whether the conveniences will not, on the whole, be greater than the inconveniences. In this case, the difficulties are only in the beginning of the practice: when they are once overcome, the advantages are lasting.—To either you or me, who spell well in the present mode, I imagine the difficulty of changing that mode for the new, is not so great, but that we might perfectly get over it in a week’s writing.—As to those who do not spell well, if the two difficulties are compared, [viz.] that of teaching them true spelling in the present mode, and that of teaching them the new alphabet and the new spelling according to it; I am confident that the latter would be by far the least. They naturally fall into the new method already, as much as the imperfection of their alphabet will admit of; their present bad spelling is only bad, because contrary to the present bad rules; under the new rules it would be good.—The difficulty of learning to spell well in the old way is so great, that few attain it; thousands and thousands writing on to old age, without ever being able to acquire it. ’Tis, besides, a difficulty continually increasing; as the sound gradually varies more and more from the spelling: and to foreigners it makes the learning to pronounce our language, as written in our books, almost impossible.
In March 1844, the Phonotypic Journal published a transcription of Franklin’s phonographic texts into Phonotypy, which it represented as showing “the Doctor’s pronunciation, according to his own scale of sounds.” Indeed, the editor judged the texts to be a good enough representation of Franklin’s own speech as to remark: “The Doctor’s pronunciation was, perhaps, not so faulty in his own day as it would be considered now.” H. L. Mencken similarly tapped Franklin’s phonographic writings in The American Language to gain insight into his habits of pronunciation, suggesting that these “were presumably those of the best circles in the London of his time, and it seems likely that they also prevailed in Philadelphia, then the center of American culture.” Franklin didn’t mark stressed syllables, so I haven’t either, with the result that eSpeak has handled the intonations more “monotonously” here than in my Experiment #1. But if he’d had access to a modern speech synthesizer, this is still probably pretty close to what the results would have sounded like, giving us arguably the closest thing we have to a recording of his own speech.
EXPERIMENT #3: JOHN HART’S ORTHOGRAPHIE (1569)
As we go back a couple centuries further in time beyond 1768, the results of our process begin to sound less familiar, as they should—after all, we’re now listening to a version of English considerably further removed from our own, predating even the language of Shakespeare’s plays. John Hart (d. 1574) put forward a reformed spelling in his Orthographie (1569) which is said to have been “the first truly phonological scheme of the 16th century.” Here are his characters, together with my best effort at assigning each to a phoneme mnemonic in eSpeak:
I’ve assumed that that and name have together undergone the vowel shift associated with the latter half of the sixteenth century, if only for the practical reason that the older vowel in that, [[A]], isn’t available to English voices in eSpeak—the first two characters would otherwise have had the values [[A]] and [[A:]]. This, I think, is the detail most open to criticism in my Experiment #3, since Hart himself didn’t note a qualitative distinction between the two sounds. I’ve also rendered <c> as [[k]], <x> as [[ks]], <ph> as [[f]], and syllabic <n> as [[@n]]; transcoded Hart’s convention of using an acute over a vowel to show a following “hardened” or lengthened consonant by doubling the consonant (e.g., [[kommon]] for <kómon>); and ignored doubling of consonants at line breaks (e.g., <let- / ters>) as extraneous to Hart’s system. So with those technical remarks out of the way, let’s listen to the opening of the sample text Hart provided in his Orthographie (and in his orthography):
AN EXERCISE OF THAT WHICH IS SAID: WHEREIN IS DECLARED, HOW THE REST OF THE CONSONANTS ARE MADE BY TH’ INSTRUMENTS OF THE MOUTH: WHICH WAS OMITTED IN THE PREMISES, FOR THAT WE DID NOT MUCH ABUSE THEM. In this title above-written, I consider of the i, in exercise, and of the u, in instruments: the like of the i, in title, which the common man, and many learned, do sound in the diphthongs ei, and iu: yet I would not think it meet to write them, in those and like words, where the sound of the vowel only, may be as well allowed in our speech, as that of the diphthong used of the rude: and so far I allow observation for derivations. Hereby you may perceive, that our single sounding and use of letters, may in process of time, bring our whole nation to one certain, perfect and general speaking. Wherein she must be ruled by the learned from time to time. And I can not blame any man to think this manner of new writing strange, for I do confess it is strange to my self, though before I have ended the writing, and you the reading of this book, I doubt not both you and I shall think our labors well bestowed.
EXPERIMENT #4: BULLOKAR ON INTERJECTIONS (1586)
William Bullokar wrote the first published grammar of the English language and was also an advocate of a new phonographic alphabet of his own devising, as follows:
There are also long forms of some vowels, mostly absent from the above table, which I’ve rendered as follows: <á> = [[E:]]; <ó> = [[O:]]; <é> or <æ> = [[e:]]; and <ý> = [[@I]] (although Bullokar doesn’t describe it as a diphthong, he does distinguish it from <e’> or [[i:]]). I invert [[hw]] to [[wh]] to at the end of a syllable (where Bullokar uses it in place of conventional <gh>), and I’ve ignored “grammar notes” designed to show declinations and conjugations, as well as certain diacritics that seem to have served a purely etymological purpose (as with the mark over <ǒ> in Bullokar’s later publications, apparently used to show cases where standard orthography had a preceding silent <h>).
Here’s the passage from Bullokar’s Bref Grammar for English (1586), pages 51-52, that I find most aurally striking. The phrase “some be of” isn’t repeated in print for each item in the list but is tied to all of them, so I believe my treatment of it here is defensible.
An interjection is a part of speech that betokeneth a sudden passion of the mind: the signification or meaning of which speech must be understanded by the gesture, countenance, or passion of the speaker, and some time with regard of the person spoken to, or of the thing spoken of, as is shewed by the titles following, or such like. Some be of Sorrow, as “alas”; “how.” Some be of Fear, as “oh”; “O Lord.” Some be of Wonder, as “uhouh”; “good Lord.” Some be of Disdain, as “waw.” Some be of Shunning, as “hence”; “away”; “fie.” Some be of Praising, as “oh excellent.” Some be of Scorning, as “O brave.” Some be of Lamenting, as “oh, ho, ho.” Some be of Crying-out, as “O good Lord.” Some be of Cursing, as “woe, woe”; “what a mischief.” Some be of Laughing, as “ah, hah, ha.” Some be of Calling, as “how”; “whoop”; “how-sir-a.” Some be of Silence, as “peace”; “hush”; “tst.” Some be of Threatening, as “well, well”; “go to, go to.” Some be of Stopping, as “ho”; “p’htrowh.” Some be of Forcing, as “gep”; “on”; “hop”; “het, ay-horsens.” Some be of Fraying, as “huh”; “showh.” And so of all other voices un-perfectly uttered, yet signifying some sudden passion of the mind, in what manner soever the same be uttered, as “O abominable act; away with him,” mixed in sentence thus: “Fie fie for shame, what world is this? Good Lord, what shall we say? Woe, woe to them; alas the while alas and well away.”
EXPERIMENT #5: ORMULUM (12th century)
The Ormulum is an early Middle English poem written in a distinctively phonetic orthography that suits it well for the purposes of this project. This time there’s no question: the great vowel shift has not yet occurred—by several centuries. Because of that, I’ve had to go “under the hood” of eSpeak to create the [[A]] phoneme. (I’ve taken the simplest possible approach to this, cutting and pasting the corresponding German-language entry into the ph_english file, adding [[A]] to the English phoneme list in dict_phonemes, and then recompiling phoneme data in espeakedit.) The correspondences between Ormulum characters and eSpeak phoneme mnemonics are fairly straightforward, the only significant peculiarity being that short vowels are shown by doubling the following consonant. So here’s a representative passage, with stressed syllables inferred from the poetic structure:
Beforen that the Laferd [=Lord] Christ
Was comen here to manne
Was all this middle world full
Of sinnes thesternesse [=darkness],
Forthi that [=because] Christ, the worldes light,
Nas [=wasn’t] nought yet comen thanne [=then]
For to begripen [=seize hold of] all mankind
Of [=from] heathendom and dwilde [=error],
And for to shewen what was good
And what was evil deede,
And how man mighte quemen [=please] God
And adlen [=earn] heavenes blisse
And standen yain [=against] the lathe [=loathsome] ghost
And all forbowyen [=avoid] helle.
I think those five experiments are enough to serve as a proof of concept for now. They show that historical phonographic scripts can be converted meaningfully into sound by modern speech synthesis software, either using the software “as is” or with modifications as in the last case. Many additional scripts could easily be played in the same way, such as the Deseret alphabet, while others would take more work, such as Melville Bell’s “Visible Speech.” A more rigorous application of this idea would, I think, also require “undoing” some of the built-in rules of the English voices in eSpeak (maybe just by removing the “IF” statements, though I’d need to study this further). More soon, perhaps!
(Post updated August 25, 2014, to include facsimiles of the original texts.)