Rightward and Downward in the Voynich Manuscript

Some words, glyphs, and glyph combinations in the Voynich Manuscript prefer earlier or later positions in lines or paragraphs than others.  A few of these patterns are conspicuous and frequently commented upon, such as the tendency of the glyphs and to appear at the ends of lines and the tendency of the glyphs p and f to appear in the first lines of paragraphs, including at the first position in the first line.  Other patterns are subtler and only reveal themselves through statistical analysis, such as glyphs appearing more or less often at the beginnings of lines than we’d predict from the overall proportions of words that begin with them in the text as a whole.

In this post, I’d like to explore some of these subtler patterns of positional preference—the kind that aren’t easy to notice just through casual browsing.  Some of the patterns I’ve detected seem to behave similarly to the ways in which p, f, m, and are supposed to behave, having a perceptible effect on only one or two fixed locations. But I’m especially interested in developing a hypothesis that other positional preferences can best be understood in terms of a continuum that permeates the whole text, and not just in terms of absolutes that implicate only discrete bits and pieces of it (the first line of a paragraph, the first word of a line, the last word of a line, etc.).  If all positionally variable phenomena could be segregated into a few narrow contexts, then it might be defensible to treat them as superficial modifications made to text created via some method that otherwise functions independently from them.  But if some of them can be shown to extend their influence throughout the whole text, that would point instead towards them reflecting more fundamental properties of the text creation process itself.  I’ll spend some time below trying various methods to get at this question of “continuousness” in patterns, as well as the question of whether the patterns are statistically significant in the first place (since if they’re not, then all the other questions about them are moot).

My statistics are based on René Zandbergen’s transcription (version 1r)—which uses the popular Extensible Voynich Alphabet (EVA) for its representation of Voynichese glyphs with the Latin alphabet—and were generated using a variety of custom Python scripts with further analysis of the output in Microsoft Excel.  I’ve made an effort to scrutinize my methods and coding and to double-check my findings and graphs, but I trust that anyone who’s sufficiently invested in these matters will try to recreate my results independently before running too far with any of them.


Rightwardness

Let’s begin by considering the placement of words within lines, and specifically their appearance earlier or later in lines—a metric I’ll refer to here as rightwardness (its opposite being leftwardness, much as we could refer to the metric of “heat” with its opposite being “coldness”).

My usual method for calculating rightwardness tendencies for words has been to number the words in each line, starting at zero; then to divide these numbers by the quantity of words in the line (minus one), so that each word ends up assigned a value between 0 (first word in line) and 1 (last word in line); and finally to take the mean average of the values for all tokens of a particular word, or of some group of words sharing a common characteristic, so that higher values will correspond to greater overall rightwardness much as higher numerical temperatures correspond to greater heat.  This approach ignores the absolute positions of words, in the sense that it treats the second word of a five-word line the same as the fourth word of a ten-word line.  I admit that absolute word numberings might be significant too—indeed, there are plenty of reasons to suppose they might be—but at the moment it’s the proportional positions I’m opting to investigate, and when I’ve experimented with averaging the absolute numberings of words instead of their fractional positions within lines, I haven’t noticed this making any big difference.  For this set of experiments I’m ignoring all comma breaks in the Zandbergen transcription, so that, for example, I treat “cThar,dan” as the single word “cThardan”; and when Zandbergen lists multiple possible readings for a glyph, my script always accepts the first non-“?” reading for it.

Some results of the type of calculation I’ve just described show strikingly consistent patterns, as I previously reported here (in section six).  For example, words beginning with Sh tend to yield more leftward average scores then words beginning with ch.  J. K. Petersen makes a similar observation about an, ain, aiin, aiiin appearing more leftward in lines than on, oin, oiin, oiiin, a pattern which a consideration of numerical scores not only confirms but also extends to the a / o distinction more generally.  Words starting with qo yield more leftward scores, on average, in lines than equivalent words starting with o.  The differences in each of these cases aren’t necessarily very large, but they’re remarkably consistent as to direction.

One important possibility to which I didn’t give enough attention before is that differences in the average rightwardness of words might simply result from known preferences of particular glyphs to appear line-initially or line-finally, and that they don’t indicate anything about what’s happening deeper within lines.  Thus, words starting Sh might appear more relatively leftward in lines than words starting ch only because words starting Sh are comparatively more likely to appear as the first words of lines (which they are, even though both categories of word are better known for avoiding that position altogether).  If this were the case, it could make rightwardness statistics for words a mere secondary effect of other known phenomena.

With this possibility in mind, I revisited and expanded my earlier analysis of minimal pairs beginning with Sh and ch, this time also counting the quantity of each word in line-initial position and its quantity in line-final position so that I could get a sense for how great a factor these two specific positions are in producing differences in average overall leftwardness or rightwardness.  I aimed to cover every word pair where both words have at least fourteen tokens apiece, yielding twenty-four pairs.  Parentheses below enclose average rightwardness scores followed by counts of line-initial tokens, line-final tokens, and total quantities of tokens.

  • Shaiin (0.290; 2 vs 0 of 19), chaiin (0.542; 0 vs 8 of 42)
  • Shar (0.279; 3 vs 1 of 25), char (0.477; 2 vs 0 of 62)
  • ShcKhy (0.434; 0 vs 0 of 56), chcKhy (0.551; 1 vs 10 of 129)
  • ShcThy (0.520; 0 vs 1 of 32), chcThy (0.591; 0 vs 7 of 76)
  • Shdy (0.491; 3 vs 3 of 43), chdy (0.581; 0 vs 17 of 127)
  • Sheal (0.296; 0 vs 0 of 14), cheal (0.410; 0 vs 1 of 25)
  • Shear (0.493; 0 vs 0 of 22), chear (0.522; 0 vs 2 of 42)
  • ShecKhy (0.417; 0 vs 1 of 30), checKhy (0.513; 0 vs 1 of 47)
  • ShecThy (0.563; 0 vs 1 of 19), checThy (0.601, 0 vs 3 of 27)
  • Shedy (0.481; 7 vs 16 of 371), chedy (0.555; 6 vs 33 of 436)
  • Sheedy (0.411, 3 vs 0 of 73), cheedy (0.582, 0 vs 4 of 54)
  • Sheey (0.425; 4 vs 3 of 124), cheey (0.447, 2 vs 4 of 145)
  • Sheky (0.491; 0 vs 1 of 31), cheky (0.522; 0 vs 6 of 55)
  • Sheo (0.446; 0 vs 0 of 26), cheo (0.444, 1 vs 0 of 33)
  • Sheody (0.466; 1 vs 2 of 43), cheody (0.574, 1 vs 12 of 78)
  • Sheol (0.378; 4 vs 4 of 97), cheol (0.421; 6 vs 4 of 140)
  • Sheor (0.361; 5 vs 0 of 42), cheor (0.444; 3 vs 3 of 74)
  • Shey (0.451; 5 vs 11 of 230), chey (0.466; 4 vs 17 of 283)
  • Sho (0.328; 28 vs 0 of 98), cho (0.438; 2 vs 1 of 40)
  • Shodaiin (0.424; 5 vs 4 of 24), chodaiin (0.533; 5 vs 4 of 44)
  • Shody (0.463; 5 vs 10 of 52), chody (0.580; 1 vs 9 of 85)
  • Shol (0.373; 16 vs 4 of 165), chol (0.414; 16 vs 4 of 321)
  • Shor (0.318; 19 vs 2 of 88), chor (0.421; 8 vs 5 of 192)
  • Shy (0.489; 2 vs 8 of 76), chy (0.504; 2 vs 7 of 117)

Only in the single case of Sheo / cheo, marked above in red (which I’ll use here as a default convention for highlighting anomalies), does the Sh variant have a higher rightwardness score than the ch variant, and that happens also to be the case with the closest pair of scores, differing only by about 0.00175, with the third-lowest total token count in the group.  Thus, this expanded dataset continues to support a claim that, for minimal pairs beginning Sh and ch, the Sh variant consistently favors a more leftward position within a line.  But a preponderance of line-initial or line-final tokens doesn’t appear to be responsible for driving the overall pattern in any obvious way.  It’s true that in some individual cases the variant with Sh is strongly line-initial, as we see with Sho; but sometimes the Sh variant is more common line-finally than line-initially, and sometimes the Sh and ch variants are more or less equally balanced as far as line-initial and line-final token quantities.  This suggests that Sh variants differ from ch variants not merely in their prevalence at the beginnings of lines, but in their preference for a more leftward line position in a continuum that implicates points further into the line as well.

It’s important to note that the contrast between scores is relative within each minimal pair rather than absolute.  For example, Sheody tends to appear more rightward in a line (0.466) than cheol (0.421), so it’s not just that any word beginning with Sh prefers a more leftward position than any word beginning with ch.  Rather, we need to compare Sheody specifically with cheody and Sheol specifically with cheol.  This suggests to me that individual words might be nudged in different directions by each of their component elements, such that in the last-mentioned case, -dy nudges a word rightward, while -l nudges it leftward, in addition to the nudges received from Sh- and ch-.  In other words, the rightwardness tendency of a given word, in the sense of a glyph sequence separated by spaces, might represent a compromise among the tendencies of all its component parts.

Another way to assess the contribution of line-initial and line-final words to the pattern is, of course, just to exclude all such words from the averages themselves.  Below I show the results of this adjustment in boldface, followed by the earlier results for ease of comparison.

  • Shaiin (0.325, 0.290), chaiin (0.435, 0.542)
  • Shar (0.284, 0.279), char (0.495, 0.477)
  • ShcKhy (0.434, 0.434), chcKhy (0.518, 0.551)
  • ShcThy (0.505, 0.520), chcThy (0.550, 0.591)
  • Shdy (0.489, 0.491), chdy (0.516, 0.581)
  • Sheal (0.296, 0.296), cheal (0.386, 0.410)
  • Shear (0.493, 0.493), chear (0.498, 0.522)
  • ShecKhy (0.397, 0.417), checKhy (0.502, 0.513)
  • ShecThy (0.538, 0.563), checThy (0.552, 0.601)
  • Shedy (0.467, 0.481), chedy (0.526, 0.555)
  • Sheedy (0.429, 0.411), cheedy (0.549, 0.582)
  • Sheey (0.424, 0.425), cheey (0.438, 0.447)
  • Sheky (0.474, 0.491), cheky (0.463, 0.522)
  • Sheo (0.446, 0.446), cheo (0.457, 0.444)
  • Sheody (0.451, 0.466), cheody (0.504, 0.574)
  • Sheol (0.367, 0.378), cheol (0.423, 0.421)
  • Sheor (0.409, 0.361), cheor (0.439, 0.444)
  • Shey (0.433, 0.451), chey (0.438, 0.466)
  • Sho (0.450, 0.328), cho (0.446, 0.438)
  • Shodaiin (0.412, 0.424), chodaiin (0.527, 0.533)
  • Shody (0.381, 0.463), chody (0.537, 0.580)
  • Shol (0.397, 0.373), chol (0.428, 0.414)
  • Shor (0.388, 0.318), chor (0.424, 0.421)
  • Shy (0.442, 0.489), chy (0.481, 0.504)

The previous Sheo / cheo exception disappears, but in its place we turn up two other nonconforming pairs: Sheky / cheky and Sho / cho.  As we reduce our dataset by excluding words, I believe it’s only to be expected that random noise will produce a few more exceptions.  But by and large, the pattern holds, with twenty-two out of twenty-four word pairs (~92%) still conforming to it.  Indeed, for many individual word pairs the contrast in average rightwardness has increased this time around.  This finding, I think, is even more persuasive than my earlier count of line-initial and line-final tokens in countering any hypothesis that the first and last words of lines are primarily responsible for driving the rightwardness pattern.

But it has been suggested that the second word in a line can have statistically distinctive properties too (see, e.g., Elmar Vogt’s “second word effect” and Emma May Smith and Marco Ponzi’s findings about second-word statistics in Quire 20).  Not, mind you, that the second word simply recapitulates the same tendencies as the first word; the properties are reportedly distinct from those of adjacent words, such that the proposed first-word and second-word tendencies shouldn’t mutually reinforce each other.  But even so, could some distinctive behavior on the part of second words of lines account for residual rightwardness patterns after the removal of the first and last words?  To find out, let’s further exclude all second words in lines and run our analysis yet again, recognizing that we may be setting the bar impossibly high through this maneuver.  Here’s what we get, with the quantity of tokens in square brackets and the two previous results shown for easy comparison.

  • Shaiin (0.448 [10], 0.325, 0.290), chaiin (0.502 [26], 0.435, 0.542)
  • Shar (0.434 [10], 0.284, 0.279), char (0.572 [47], 0.495, 0.477)
  • ShcKhy (0.547 [41], 0.434, 0.434), chcKhy (0.552 [107], 0.518, 0.551)
  • ShcThy (0.541 [28], 0.505, 0.520), chcThy (0.599 [59], 0.550, 0.591)
  • Shdy (0.537 [33], 0.489, 0.491), chdy (0.544 [102], 0.516, 0.581)
  • Sheal (0.412 [8], 0.296, 0.296), cheal (0.485 [17], 0.386, 0.410)
  • Shear (0.604 [17], 0.493, 0.493), chear (0.581 [30], 0.498, 0.522)
  • ShecKhy (0.574 [18], 0.397, 0.417), checKhy (0.537 [42], 0.502, 0.513)
  • ShecThy (0.564 [17], 0.538, 0.563), checThy (0.659 [19], 0.552, 0.601)
  • Shedy (0.552 [276], 0.467, 0.481), chedy (0.583 [346], 0.526, 0.555)
  • Sheedy (0.541 [49], 0.429, 0.411), cheedy (0.627 [42], 0.549, 0.582)
  • Sheey (0.503 [89], 0.424, 0.425), cheey (0.540 [103], 0.438, 0.447)
  • Sheky (0.545 [25], 0.474, 0.491), cheky (0.533 [39], 0.463, 0.522)
  • Sheo (0.570 [17], 0.446, 0.446), cheo (0.550 [23], 0.457, 0.444)
  • Sheody (0.543 [29], 0.451, 0.466), cheody (0.569 [55], 0.504, 0.574)
  • Sheol (0.488 [59], 0.367, 0.378), cheol (0.516 [94], 0.423, 0.421)
  • Sheor (0.592 [21], 0.409, 0.361), cheor (0.578 [45], 0.439, 0.444)
  • Shey (0.548 [153], 0.433, 0.451), chey (0.527 [198], 0.438, 0.466)
  • Sho (0.550 [51], 0.450, 0.328), cho (0.542 [26], 0.446, 0.438)
  • Shodaiin (0.617 [7], 0.412, 0.424), chodaiin (0.634 [26], 0.527, 0.533)
  • Shody (0.528 [22], 0.381, 0.463), chody (0.583 [67], 0.537, 0.580)
  • Shol (0.489 [102], 0.397, 0.373), chol (0.513 [224], 0.428, 0.414)
  • Shor (0.502 [44], 0.388, 0.318), chor (0.533 [118], 0.424, 0.421)
  • Shy (0.536 [48], 0.442, 0.489), chy (0.551 [88], 0.481, 0.504)

This time there are seven exceptions, and one of them, Shey / chey, has a comparatively high token count.  Even so, seventeen out of twenty-four word pairs (~71%) continue to conform to the pattern.

Here are the same three sets of data presented in the form of a scatterplot with each dot representing a minimal pair, its position along the axis representing the average rightwardness of its Sh variant, and its position along the axis representing the average rightwardness of its ch variant.  Red dots represent sets including all line positions, blue dots represent sets with first and last words excluded, and green dots represent sets with first, last, and second words excluded.  The black line marks the point along which the Sh and ch variants would display equal rightwardness.

The 71% conformity rate we found in our last experiment (associated with the green dots in the above graph) is something, but it’s not overwhelmingly conclusive, so let’s try another, complementary tactic to help us assess the relative contributions of first, second, and last words of lines to overall differences in average rightwardness.  If we calculate scores for the two sets of all words beginning with Sh and ch—without inserted gallows, i.e., omitting words starting cTh and so forth, but including the comparatively few reported cases of Sh and ch as self-standing words—the contrastive values are:

  • 0.439 versus 0.538 for all line positions (difference: 0.099)
  • 0.433 versus 0.488 excluding first and last words of lines (difference: 0.055)
  • 0.532 versus 0.560 excluding first, second, and last words of lines (difference: 0.028)
  • 0.470 versus 0.546 excluding first words of lines (difference: 0.076)
  • 0.522 versus 0.602 excluding second words of lines (difference: 0.080)
  • 0.458 versus 0.563 excluding third words of lines (difference: 0.105)
  • 0.402 versus 0.473 excluding last words of lines (difference: 0.071)
  • 0.399 versus 0.488 excluding second-to-last words of lines (difference: 0.089)
  • 0.411 versus 0.513 excluding third-to-last words of lines (difference: 0.102)

Some caveats are in order.  First, the set of words starting with Sh and the set of words starting with ch may not be fully equivalent, since we’re no longer limiting ourselves to minimal pairs in which the alternation between Sh and ch is the only difference, such that other formal factors might conceivably also vary.  Meanwhile, excluding the second, third, second-to-last, or third-to-last words also affects words in other categories we’re trying to study here; in a two-word line, for instance, removing the second word will also remove the last word, which obviously muddies the water.  Finally, the numerical differences aren’t directly comparable with each other, since removing all words in particular line positions also decreases the total range of possible values.

Nevertheless, even after excluding first, second, and last words, we’re still left with a difference of 0.028.  Someone might object that this difference looks too small to be meaningful.  To put it in perspective, it corresponds to a deflection of 2.8% of the length of a line, and each word is in a line that contains 8.9 words on average, so that 2.8% of that corresponds to roughly one quarter of one word.  By contrast, the deflection when we factor in all positions is 9.9% of the length of the line, which corresponds to roughly nine tenths of a word.  In an effort to weigh the statistical significance of these figures, I measured the average relative rightwardness of ten pairs of groups of word tokens drawn from the manuscript at random, but of the same sizes as the whole sets of words beginning with ch and Sh (5325 and 2860 word tokens respectively).  Here’s what I got:

  • 0.502 vs 0.516 (difference 0.014)
  • 0.505 vs 0.493 (difference: 0.012)
  • 0.500 vs 0.493 (difference: 0.007)
  • 0.504 vs 0.498 (difference: 0.006)
  • 0.498 vs 0.494 (difference: 0.004)
  • 0.496 vs 0.499 (difference: 0.003)
  • 0.505 vs 0.502 (difference: 0.003)
  • 0.487 vs 0.490 (difference: 0.003)
  • 0.504 vs 0.505 (difference: 0.001)
  • 0.499 vs 0.498 (difference: 0.001)

The maximum difference among the paired results is 0.014.  By pairing the two most extreme single values, 0.487 and 0.516, we would get a difference of 0.029—which is to say that we should expect a difference of that magnitude to arise randomly, with token sets of this size, only once in a hundred times, if we assume that the results in the two columns are effectively independent of one another and therefore interchangeable.  A difference of 0.099 for all line positions is far in excess of that, so I don’t see any grounds for doubting its statistical significance.

I also tried a second, complementary experiment in which I took only the sets of word tokens beginning ch and Sh, combined them, randomized the order of the combined set five times, and each time calculated average rightwardness for the first 5325 values versus the remainder; and also for the last 5325 values versus the remainder.  The results were pretty much the same as in the first experiment.

  • 0.500 vs 0.499 (difference: 0.001); 0.500 vs 0.500 (difference: 0.000)
  • 0.496 vs 0.508 (difference: 0.012); 0.500 vs 0.499 (difference: 0.001)
  • 0.502 vs 0.497 (difference: 0.005); 0.497 vs 0.506 (difference: 0.009)
  • 0.495 vs 0.509 (difference: 0.014); 0.504 vs 0.493 (difference: 0.011)
  • 0.497 vs 0.506 (difference: 0.009); 0.500 vs 0.499 (difference: 0.001)

Once we remove the first, second, and last words of lines from consideration, the sizes of the sets of word tokens beginning ch and Sh drop respectively to 3670 and 1822 tokens. By repeating the first experiment above with groups of word tokens of those sizes—drawn at random from the manuscript, except now with all the first, second, and final words of lines removed from the pool—I got:

  • 0.570 vs 0.560 (difference: 0.010)
  • 0.560 vs 0.566 (difference: 0.006)
  • 0.560 vs 0.555 (difference: 0.005)
  • 0.565 vs 0.568 (difference: 0.003)
  • 0.563 vs 0.566 (difference: 0.003)
  • 0.566 vs 0.564 (difference: 0.002)
  • 0.568 vs 0.566 (difference: 0.002)
  • 0.567 vs 0.566 (difference: 0.001)
  • 0.565 vs 0.565 (difference: 0.000)

This time, the maximum difference among the paired results is 0.010.  Meanwhile, if we contrast the maximum and minimum single values, 0.570 versus 0.555, we get a difference of 0.015.  Judging from this dataset, we should expect a difference of that magnitude to arise randomly only once in a hundred times, with the same caveat as before.  A difference of 0.028 for “internal” line positions is far in excess of that.  Granted, based on the first experiment we’d expect a difference of that magnitude to arise randomly once in a hundred times, but, as I observed earlier, the numerical differences in the two cases aren’t fully comparable (and even if they were, I’d still think that odds of 1:100 would put us on reasonably firm ground as far as statistical significance goes).

I also took the reduced groups of words beginning with ch and Sh, combined the sets, randomized their order five times, and each time calculated average rightwardness for the first 3670 values and the remainder; and also for the last 3670 values and the remainder.  The individual figures came out a bit lower than in my other experiment limited to these “internal” words, but the differences between them—which is what we care about here—are comparable and don’t rise any higher than before.

  • 0.550 vs 0.550 (difference: 0.000); 0.551 vs 0.550 (difference: 0.001)
  • 0.548 vs 0.557 (difference: 0.009); 0.550 vs 0.552 (difference: 0.002)
  • 0.551 vs 0.549 (difference: 0.002); 0.548 vs 0.556 (difference: 0.008)
  • 0.549 vs 0.553 (difference: 0.004); 0.553 vs 0.546 (difference: 0.007)
  • 0.548 vs 0.555 (difference: 0.007); 0.552 vs 0.547 (difference: 0.005)

From all this, I conclude that the pattern in question has a statistically significant effect on every part of the line we’ve examined.  Its impact may be strongest near the extremities, but it can be detected everywhere at a strength that rises perceptibly above random background noise.

Another pattern I’ve noted previously, building on the observation by J. K. Petersen, is that when a minimal pair of words varies only in the presence of o or a, the o variant consistently favors an more leftward position than the a variant.  I’ve examined a number of minimal pairs of this kind (choosing a subjectively representative sample this time rather than applying an objective token-quantity threshold as I perhaps should have), and I’ve analyzed each pair twice, once factoring in all line positions, and then again while excluding the first, second, and last words of lines (i.e., thoroughly “internal” words).  The results are shown below, with the “internal” figure given first, in boldface, and with token counts given in square brackets.

  • ol (0.566 [246], 0.504 [363]), al (0.590 [102], 0.589 [137])
  • or (0.538 [175], 0.455 [250]), ar (0.624 [176], 0.566 [233])
  • chol (0.513 [224], 0.414 [321]), chal (0.541 [27], 0.557 [40])
  • chor (0.533 [118], 0.421 [192]), char (0.572 [47], 0.479 [62])
  • choiin (0.550 [6], 0.446 [11]), chaiin (0.502 [26], 0.542 [42])
  • dol (0.563 [66], 0.488 [86]), dal (0.582 [114], 0.642 [171])
  • dor (0.623 [30], 0.408 [58]), dar (0.584 [162], 0.538 [248])
  • doiin (0.540 [6], 0.312 [12]), daiin (0.591 [424], 0.498 [765])
  • kol (0.539 [13], 0.481 [20]), kal (0.518 [13], 0.552 [14])
  • kor (0.604 [14], 0.503 [17]), kar (0.619 [29], 0.565 [39])
  • tol (0.612 [11], 0.232 [29]), tal (0.675 [9], 0.591 [12])
  • tor (0.513 [4], 0.199 [17]), tar (0.686 [12], 0.368 [29])
  • okol (0.559 [42], 0.542 [54]), okal (0.556 [91], 0.557 [117])
  • okoly (0.476 [2], 0.650 [3]), okaly (0.569 [9], 0.703 [16])
  • okoldy (0.557 [6], 0.620 [7]), okaldy (0.619 [8], 0.695 [10])
  • okor (0.513 [16], 0.384 [26]), okar (0.597 [89], 0.528 [110])
  • otol (0.592 [47], 0.515 [61]), otal (0.603 [93], 0.625 [108])
  • otor (0.614 [24], 0.508 [32]), otar (0.631 [92], 0.611 [105])
  • ykol (0.546 [9], 0.493 [12]), ykal (0.472 [10], 0.486 [12])
  • ykor (0.444 [2], 0.274 [8]), ykar (0.668 [24], 0.537 [32])
  • ytol (0.663 [12], 0.530 [15]), ytal (0.471 [8], 0.521 [13])
  • ytor (0.705 [6], 0.363 [12]), ytar (0.639 [15], 0.486 [22])

Of 22 minimal pairs, 20 (91%) conform to the pattern when we factor in all positions, but only 15 (68%) conform when we limit our analysis to the “interior,” suggesting that the pattern is, again, less strongly operative there.  Here’s a scatterplot of the same data with the same color-coding as before using red for scores based on all line positions and green for scores based on strictly “internal” positions; with the o variants assigned to the axis and the a variants assigned to the axis; and with a black line once again showing the point along which the two variants would display equal rightwardness.

Meanwhile, if we compare the aggregated sets of all words ending in or and ar (including or and ar as self-standing words), we get:

  • 0.403 versus 0.501 for all line positions (difference: 0.098)
  • 0.475 versus 0.533 excluding first and last words of lines (difference: 0.058)
  • 0.559 versus 0.594 excluding first, second, and last words of lines (difference: 0.035)
  • 0.518 versus 0.579 excluding first words of lines (difference: 0.061)
  • 0.444 versus 0.542 excluding second words of lines (difference: 0.098)
  • 0.363 versus 0.455 excluding last words of lines (difference: 0.092)
  • 0.355 versus 0.446 excluding second-to-last words of lines (difference: 0.091)

And here’s a similar analysis for words ending in ol versus al:

  • 0.440 versus 0.548 for all line positions (difference: 0.108)
  • 0.473 versus 0.519 excluding first and last words of lines (difference: 0.046)
  • 0.543 versus 0.571 excluding first, second, and last words of lines (difference: 0.028)
  • 0.512 versus 0.594 excluding first words of lines (difference: 0.082)
  • 0.486 versus 0.589 excluding second words of lines (difference: 0.103)
  • 0.401 versus 0.472 excluding last words of lines (difference: 0.071)
  • 0.396 versus 0.503 excluding second-to-last words of lines (difference: 0.107)

The relative impact of removing particular word positions from consideration seems to vary from case to case, but the pattern appears once again to be present even when we limit our analysis to thoroughly “internal” words, and at roughly the same magnitude we saw with Sh / ch (~0.03 each time), in spite of the greater proportion of exceptions we found among specific o / a minimal pairs.  Similarly, when all positions are factored in, the magnitude seems consistently to be ~0.1 each time.  The fact that the “weaker” scores differ in the same direction as the “stronger” ones each time tells us something too; if that ~0.03 were random noise, we wouldn’t expect such consistency.  The figures calculated when excluding different word positions aren’t directly comparable, as I’ve pointed out, but I assume the figures within each category should be.

A third pattern I’ve noted previously is that words beginning with qo favor a more leftward line position than otherwise identical words beginning with o.  Here are rightwardness figures for some minimal pairs of that kind—once again just a subjectively representative selection—calculated and presented in the same way as above.

  • qokain (0.559 [214], 0.483 [278]), okain (0.569 [100], 0.521 [133])
  • qotain (0.553 [50], 0.541 [59]), otain (0.636 [77], 0.628 [92])
  • qodain (0.554 [6], 0.389 [11]), odain (0.515 [9], 0.610 [16])
  • qokaiin (0.535 [194], 0.457 [266]), okaiin (0.534 [149], 0.482 [203]) 
  • qotaiin (0.554 [50], 0.515 [78]), otaiin (0.606 [113], 0.598 [137]) 
  • qodaiin (0.501 [34], 0.451 [43]), odaiin (0.620 [36], 0.517 [53])
  • qokair (0.474 [13], 0.385 [17]), okair (0.571 [13], 0.433 [18])
  • qoeey (0.504 [12], 0.462 [13]), oeey (0.675 [4], 0.562 [5])
  • qoeedy (0.464 [11], 0.318 [18]), oeedy (0.690 [5], 0.589 [6])
  • qokal (0.563 [136], 0.498 [182]), okal (0.556 [91], 0.557 [117])
  • qotal (0.593 [46], 0.589 [56]), otal (0.603 [93], 0.625 [108])
  • qodal (0.457 [5], 0.422 [7]), odal (0.460 [7], 0.439 [10])
  • qokam (0.560 [4], 0.930 [25]), okam (0.604 [8], 0.856 [28])
  • qotam (0.762 [2], 0.957 [11]), otam (0.702 [14], 0.865 [44])
  • qokar (0.523 [122], 0.470 [147]), okar (0.597 [89], 0.528 [110])
  • qotar (0.597 [50], 0.593 [61]), otar (0.631 [92], 0.611 [105])
  • qodar (0.563 [7], 0.494 [10]), odar (0.603 [12], 0.595 [19])
  • qokol (0.522 [68], 0.429 [88]), okol (0.559 [42], 0.542 [54])
  • qotol (0.552 [28], 0.408 [40]), otol (0.592 [47], 0.515 [61])
  • qokor (0.491 [23], 0.430 [31]), okor (0.513 [16], 0.384 [26])
  • qotor (0.573 [13], 0.352 [24]), otor (0.614 [24], 0.508 [32])
  • qokedy (0.530 [213], 0.470 [271]); okedy (0.545 [93], 0.531 [107])
  • qotedy (0.592 [69], 0.542 [86]); otedy (0.570 [121], 0.554 [137])
  • qokeedy (0.546 [228], 0.443 [304]); okeedy (0.601 [75], 0.550 [101])
  • qoteedy (0.566 [55], 0.485 [74]); oteedy (0.583 [74], 0.545 [89])
  • qokeey (0.512 [224], 0.435 [298]); okeey (0.543 [114], 0.457 [154])
  • qoteey (0.482 [30], 0.380 [42]); oteey (0.606 [79], 0.520 [103])
  • qokey (0.491 [75], 0.425 [102]); okey (0.520 [39], 0.474 [54])
  • qotey (0.583 [16], 0.450 [22]); otey (0.634 [34], 0.601 [39])
  • qotchdy (0.551 [17], 0.573 [20]); otchdy (0.663 [17], 0.676 [23])
  • qokchdy (0.556 [43], 0.515 [49]); okchdy (0.656 [15], 0.593 [17])
  • qopchdy (0.528 [15], 0.502 [16]), opchdy (0.577 [13], 0.676 [17])
  • qotchedy (0.587 [20], 0.556 [24]), otchedy (0.542 [24], 0.499 [31])
  • qokchedy (0.527 [29], 0.458 [40]), okchedy (0.599 [14], 0.520 [21])
  • qopchedy (0.560 [27], 0.571 [30]), opchedy (0.570 [40], 0.551 [46])
  • qokchol (0.475 [12], 0.358 [17]), okchol (0.682 [9], 0.611 [11])
  • qotchol (0.531 [9], 0.509 [12]), otchol (0.617 [17], 0.403 [27])
  • qokchor (0.500 [1], 0.129 [8]), okchor (0.626 [6], 0.422 [15])
  • qotchor (0.550 [2], 0.092 [12]), otchor (0.616 [7], 0.355 [16])
  • qokchy (0.574 [36], 0.408 [63]), okchy (0.491 [14], 0.448 [25])
  • qotchy (0.513 [31], 0.350 [63]), otchy (0.664 [31], 0.550 [40])
  • qopchy (0.568 [10], 0.557 [12]), opchy (0.729 [5], 0.655 [9])
  • qotchey (0.472 [16], 0.408 [19]), otchey (0.581 [17], 0.381 [30])
  • qokchey (0.534 [18], 0.381 [27]), okchey (0.555 [20], 0.432 [32])
  • qopchey (0.514 [8], 0.471 [9]), opchey (0.575 [20], 0.509 [27])
  • qoky (0.579 [91], 0.605 [133]), oky (0.575 [60], 0.656 [85])
  • qoty (0.596 [50], 0.604 [76]), oty (0.637 [67], 0.699 [99])
  • qody (0.449 [8], 0.410 [14]), ody (0.553 [20], 0.615 [30])
  • qol (0.516 [75], 0.437 [110]), ol (0.566 [246], 0.504 [363])
  • qor (0.440 [8], 0.276 [19]), or (0.538 [175], 0.455 [250])

Of these 50 word pairs, 43 pairs (86%) conform to the pattern when we analyze all words together, while 42 pairs (84%) conform when we exclude the first, second, and last words of lines.  Here’s an equivalent scatterplot with qo assigned to the x axis and o assigned to the y axis.

If we further compare the aggregate sets of all words beginning with qo and o, we get the following:

  • 0.475 versus 0.546 for all line positions (difference: 0.071)
  • 0.486 versus 0.518 excluding first and last words of lines (difference: 0.032)
  • 0.539 versus 0.576 excluding first, second, and last words of lines (difference: 0.037)
  • 0.527 versus 0.593 excluding first words of lines (difference: 0.066)
  • 0.515 versus 0.592 excluding second words of lines (difference: 0.077)
  • 0.433 versus 0.470 excluding last words of lines (difference: 0.037)
  • 0.430 versus 0.502 excluding second-to-last words of lines (difference: 0.072)

Curiously, the pattern turns out to be a little stronger “internally” than in previous cases (at 0.037, contrasted with the usual ~0.03), while at the same time it’s significantly weaker than in previous cases when all positions are factored in (at 0.071, contrasted with the usual ~0.10).

I’ve tried one further method of testing whether rightwardness tendencies apply continuously throughout a line or are limited to a few specific points in it: namely, measuring the ratio between tokens of two contrastive phenomena at each successive line position to see if it changes steadily or abruptly.  Of course, organizing words into discrete categories by position is problematic because of differences in line length.  However, we have a couple different options for handling this.

One option is to take our fractional measures of line position in a range from one to zero, multiply them by some factor, and then round each of them to the nearest integer.  The resulting groups will vary in size, but we can normalize for that in our subsequent calculations.  If we divide the line into ten groups as I’ve described, we find that they contain 4134, 2489, 3294, 3152, 2223, 4244, 3152, 3294, 2489, and 4107 words respectively.  We can then count the tokens of words beginning with ch and Sh that appear in each of these positional groups and divide the sums by their groups’ token counts to determine what fraction of total words they comprise within each position.  And we can also calculate the ratio between the values obtained for ch and Sh to find out how common words beginning with these glyphs are relative to each other in each position.  Here are the results I got from doing this.

In the graph on the right we can see that, on average, the ratio of ch words to Sh words increases progressively throughout lines, although not at a uniform rate.  In the graph on the left, we can see what’s happening in more detail.  Words beginning with both glyphs are disproportionately rare at the beginnings of lines, but they both ascend to a peak immediately afterwards.  Sh words then drop to a lower level than ch words, after which both sets of words become gradually less prevalent as the line progresses rightward.  During this phase, Sh and ch lose their shares of total word count at what looks like about the same absolute rate, but the effect on Sh is proportionally greater, and the ratio of ch words to Sh words increases accordingly, a trend that accelerates towards the end of the line.

Another option we have here is to assess lines containing different quantities of word tokens separately from one another, and then to compare the outcomes.  That is, we can compare the first, second, third, fourth, and fifth words in five-word lines; and then, separately, the first, second, third, fourth, fifth, and sixth words in six-word lines; and so on and so forth; and then see how the outcomes match up with each other.  Here are results I obtained in that way by counting all words beginning with Sh and ch in the sets of lines ranging in length from five words to ten words, formatted in the same way as above.

From this second display we can see that the leftward “peak” affects the second word in the line specifically, and also that its average effect on Sh words varies noticeably with line length: slight at five words, moderate at six and seven words, and equal to that of ch words at eight words and up.  Of course, when Sh doesn’t rise to as high of a peak to begin with, it also has less far to come down afterwards.  The ratio of ch words to Sh words trends upwards overall with increasing rightwardness for all sentence lengths, although not as consistently as it did above when we drew on a greater amount of data.  The ratio curves for most sentence lengths feature occasional dips that take them temporarily in the “wrong” direction, including a particularly strong dip at the fifth word in eight-word lines that would be worth exploring further.  But out of twenty-seven individual adjacencies between “internal” word positions shown here, twenty (74%) show a point-to-point increase in the ratio of ch word tokens to Sh word tokens rather than a decrease.  For every line length, the ratio also increases specifically from the second word to the third word, from the second word to the second-to-last word, from the third word to the second-to-last word, and from the third word to the third-to-last word (where applicable)—all observations you can verify by comparing these points visually in the above graphs.  Meanwhile, in one case (five-word lines) the ratio actually decreases from the second-to-last to the last position, while the gradient of the ratio curve decreases there in another case (six-word lines).  These curves are doubtless “noisier” than the one in my previous experiment, but I believe they still show ample evidence of a continuous trend across the line rather than one limited to two or three fixed points.

The distinctive effects on the first and second words which we see here are interesting, but they’ve been anticipated to a degree by past observations.  Prescott Currier commented at an early date on the disproportionate line-initial rarity of words starting with ch and Sh, and eight of the ten words Emma May Smith and Marco Ponzi found to be disproportionately prevalent in second position in Quire 20 begin with ch or Sh, which seems compatible with a second-word “peak” for such words in general.

Thus, it’s the apparently continuous change in statistics over the latter part of the line to which I most want to draw attention.  With each and every step rightward, it seems that the relative probability of ch words goes a little up and the relative probability of Sh words goes a little down.  Unless I’m mistaken, this pattern is tied not to any one discrete position in the line, but to a continuous progression rightward.

And what about the other contrastive pairs of features we’ve examined?  Here’s a quick graphical analysis along similar lines comparing the sets of all words beginning with o and qo.

And here’s another for the sets of all words ending –al and -ol.

These look to me very much like further cases of trends developing continuously and progressively over a long stretch of the line.  And once again, their continuous character doesn’t appear to be an illusory artifact of the process of combining data from lines of multiple lengths, since we see similar “shapes” implicating multiple points when we analyze nine-word lines in isolation for the same contrastive pairs of word categories.

For what it’s worth, the rightwardness patterns I’ve described so far also appear to affect both Currier A and Currier B to about the same extent, judging from a check of just a few representative test pairs run separately against pages in both “languages.”  The figures below represent average relative rightwardness based on all positions (including line-initial and line-final).  The individual figures vary considerably from “language” to “language,” but the direction of the difference stays the same.

  • Sh- (A: 0.438, B: 0.439), ch- (A: 0.529, B: 0.534)
  • or (A: 0.415, B: 0.473), ar (A: 0.545, B: 0.570)
  • dol (A: 0.512, B: 0.451), dal (A: 0.674, B: 0.623)
  • qo- (A: 0.426, B: 0.487), o- (A: 0.513, B: 0.557)

The patterns also seem to persist if we limit ourselves just to the notoriously homogeneous Quire 13:

  • Sh- (0.467), ch- (0.541)
  • or (0.501), ar (0.575)
  • dol (0.459), dal (0.608)
  • qo- (0.465), o- (0.579)

Or to just the first three quires:

  • Sh- (0.430), ch- (0.515)
  • or (0.382), ar (0.604)
  • dol (0.620), dal (0.749)
  • qo- (0.401), o- (0.492)

In this light, we might also revisit the contrast Sheo / cheo, which was the sole exception in the first set of examples I presented above.  If we limit our analysis just to Currier A, it isn’t actually an exception: Sheo scores 0.438 for average relative rightwardness with 15 tokens, while cheo scores 0.462 with 14 tokens.  The exception arises only in Currier B, where Sheo scores 0.455 with 11 tokens and cheo scores 0.431 with 19 tokens.  It turns out that we can trace this particular anomaly back to several far-leftward appearances of cheo on folios 113-115.


Endwardness

So far we’ve been examining the positions of words within lines.  But what happens if we study their sequential placement within paragraphs in similar terms?  We can refer to this other metric as endwardness, with startward and endward as opposites, to distinguish it clearly from rightwardness and leftwardness.  Of course, word positions within lines are also commonly discussed in reference to “beginnings” and “ends,” but since lines—or at least the ones I’m examining here—are invariably also parts of paragraphs that extend beyond them in one or another direction, I believe the startward / endward terminology is better reserved for the matter of sequencing within whole paragraphs.

I’ve tried a few different methods for calculating endwardness.  One is to calculate the number of each word within a paragraph (minus one) divided by the number of words in the paragraph (minus one), and then to average the results for all tokens of a word or of words with some shared characteristic, yielding values in a range from zero to one, like our rightwardness values.  The result in this case ignores the absolute number of words in a paragraph, treating the thirtieth word in a 300-word paragraph the same as the third word in a thirty-word paragraph.  The “expected” default value is 0.500, so a value below this threshold will indicate a preference for a more relatively startward position within a paragraph, while a value above this threshold will indicate a preference for a more relatively endward position.

Another method I’ve tried entails calculating the average absolute word number within a paragraph; this way, if a given word turns up once as the twentieth word in a paragraph and once as the fortieth word in a paragraph, the result will always be (20+40)/2=30, regardless of the lengths of the paragraphs.  The thirtieth word in a 300-word paragraph ends up treated the same as the thirtieth word in a 30-word paragraph.  The average absolute numerical position of a word within a paragraph is 35.2, such that a value below this threshold will indicate a tendency towards greater absolute startwardness and a value above this threshold will indicate a tendency towards greater absolute endwardness.

We shouldn’t necessarily expect the results of the two methods I’ve described to match each other, since they target different phenomena (relative distance from beginning of paragraph versus absolute distance from beginning of paragraph).  Still, we should take note of where and how they diverge, so let’s give some thought as to how we can do that.  If we set the first method’s 0.500 (midway through a paragraph) equal to the second method’s 35.2 (the average absolute word number within a paragraph), we can try to predict the second result based on the first by multiplying the first value by 70.4 (35.2 times two).  We can then calculate how much the actual value differs from the predicted value.  Although I’m using the first result as the point of reference here, I don’t necessarily mean to privilege it over the second result (although I will in fact go on to conclude that it’s more reliable as a measure of positional preference per se).  My goal at this point is just to expose any differences between the two measures.

Either approach ends up confirming certain common observations, for a start.  Thus, words beginning with p and f yield low values regardless of which method we use, reflecting the oft-noted preference of such words for the first lines of paragraphs.  But words starting in other ways also display weaker preferences for more startward or endward positions within paragraphs, as the following results show (with the more discrepant pairs of results highlighted in red).

  • p- = 0.136 [11.1, predicted 9.6, deviation +1.5]
  • f- = 0.183 [12.6, predicted 12.9, deviation -0.3]
  • t- = 0.409 [30.0, predicted 28.8, deviation +1.2]
  • Sh- = 0.459 [36.6, predicted 32.3, deviation +4.3]
  • k- 0.460 [28.8, predicted 32.4, deviation -3.6]
  • qo- = 0.478 [37.4, predicted 33.7, deviation +3.7]
  • o- = 0.506 [34.1, predicted 35.6, deviation -1.5]
  • d- = 0.522 [33.7, predicted 36.7, deviation -3.0]
  • r- = 0.523 [38.2, predicted 36.8, deviation +1.4]
  • y- = 0.529 [32.3, predicted 37.2, deviation -4.9]
  • l- = 0.532 [45.3, predicted 37.4, deviation +7.9]
  • s- = 0.546 [38.8, predicted 38.4, deviation +0.4]
  • ch- (no inserted gallows) = 0.549 [36.9, predicted 38.6, deviation -1.7]
  • a- = 0.552 [40.1, predicted 38.9, deviation +1.2]

Of all the entries in this list, only o- seems reasonably neutral according to the first method, yielding a score close to the “expected” 0.500; and it’s also the most neutral according to the second method, with its result of 34.1 differing from the “expected” 35.2 by only 1.1.  (I note in passing that o is also overwhelmingly the most common first glyph of “labels” which lack a paragraphic context altogether.)   We can likewise run calculations for words ending with particular glyphs:

  • -p = 0.195 [10.7, predicted 13.7, deviation -3.0]
  • -f = 0.320 [23.7, predicted 22.5, deviation +1.2]
  • -d = 0.469 [30.3, predicted 33.0, deviation -2.7]
  • -k = 0.476 [29.3, predicted 33.5, deviation -4.2]
  • -y = 0.487 [35.6, predicted 34.3, deviation +1.3]
  • -r = 0.489 [33.4, predicted 34.4, deviation -1.0]
  • -o = 0.498 [30.9, predicted 35.1, deviation -4.2]
  • -s = 0.501 [32.0, predicted 35.3, deviation -3.3]
  • -l = 0.508 [37.0, predicted 35.8, deviation +1.2]
  • -t = 0.523 [35.9, predicted 36.8, deviation -0.9]
  • -n = 0.529 [35.7, predicted 37.2, deviation -1.5]
  • -g = 0.562 [32.9, predicted 39.6, deviation -6.7]
  • -m = 0.564 [38.0, predicted 39.7, deviation -1.7]

Is the spread in the statistics for glyphs other than and f likely to be meaningful?  We could try to calculate the probability of this, but to my eye, many of the differences appear so large as to render any formal investigation of statistical significance rather moot.  Consider: if we factor in every single word in every paragraph for the whole Voynich Manuscript, the average word beginning with l appears more than ten words further from the start of a paragraph than the average word beginning with o does, and fifteen words further from the start than the average word beginning with t.  Words ending with and m (which are notorious for ending lines) also appear on average at later relative positions in paragraphs than words ending with any other common glyphs by over 3% of total paragraph length, and over 9% of the way further along than the average word ending with d.  Meanwhile, the deviations between the results of the two methods of calculation suggest that the relationship between absolute and relative distances from the beginnings of paragraphs may not be straightforward—that some phenomena might correlate with the former, and others with the latter, and yet others with both in tandem.

Let’s take a closer look at one pair of glyphs which we’ve already associated with a significant contrast in rightwardness.  The aggregate measures of relative and absolute endwardness for words beginning with Sh are respectively 0.459 and 36.6, while for words beginning with ch they’re 0.549 and 36.9.  The difference in their relative endwardness is comparatively large with respect to the overall spread of scores, while the difference in their absolute endwardness is comparatively small.

If we analyze the same minimal pairs beginning with Sh and ch for endwardness which we examined previously for rightwardness, we find a similar pattern.  After each word below, I give its average relative endwardness score, its average absolute endwardness score (in square brackets), and the total number of tokens.  I’ve marked anomalous values based on absolute position in blue to make it easier to tell them apart at a glance from anomalous values based on relative position.

  • Shaiin (0.376 [16.4]; 19), chaiin (0.594 [28.5]; 42)
  • Shar (0.545 [55.9]; 25), char (0.476 [33.4]; 62)
  • ShcKhy (0.509 [53.8]; 56), chcKhy (0.557 [44.7]; 129)
  • ShcThy (0.546 [59.5]; 32), chcThy (0.572 [42.5]; 76)
  • Shdy (0.446 [23.7]; 43), chdy (0.500 [30.4];  127)
  • Sheal (0.599 [62.4]; 14), cheal (0.560 [67.9]; 25)
  • Shear (0.460 [54.0]; 22), chear (0.629 [53.7]; 42)
  • ShecKhy (0.500 [50.6]; 30), checKhy (0.541 [44.6]; 47)
  • ShecThy (0.542 [73.9]; 19), checThy (0.573 [70.6]; 27)
  • Shedy (0.459 [45.7]; 371), chedy (0.564 [47.9]; 436)
  • Sheedy (0.482 [38.8]; 73), cheedy (0.574 [44.7]; 54)
  • Sheey (0.488 [39.7]; 124), cheey (0.567 [59.2]; 145)
  • Sheky (0.473 [38.4]; 31), cheky (0.591 [39.0]; 55)
  • Sheo (0.395 [28.2]; 26), cheo (0.506 [35.3]; 33)
  • Sheody (0.445 [20.9]; 43), cheody (0.622 [28.2]; 78)
  • Sheol (0.471 [39.0]; 97), cheol (0.587 [50.1]; 140)
  • Sheor (0.463 [33.6]; 42), cheor (0.591 [43.4]; 74)
  • Shey (0.453 [44.1]; 230), chey (0.571 [44.0]; 283)
  • Sho (0.537 [29.9]; 98), cho (0.540 [31.5]; 40)
  • Shodaiin (0.425 [28.6]; 24), chodaiin (0.604 [39.4]; 44)
  • Shody (0.515 [31.0]; 52), chody (0.589 [35.3]; 85)
  • Shol (0.427 [24.6]; 165), chol (0.551 [32.0]; 321)
  • Shor (0.359 [20.1]; 88), chor (0.510 [25.3]; 192)
  • Shy (0.485 [34.3]; 76), chy (0.517 [32.2]; 117)

These data reinforce the conclusion that Sh words categorically tend to display greater endwardness than ch words, but the pattern is much stronger with the results of the first method than with the results of the second method, showing that it tracks better with relative endwardness than absolute endwardness, just as the aggregate figures would predict.  With the second method, there are eight exceptions, such that only ~67% of cases conform; but with the first method, there are only two exceptions in twenty-four word pairs, and ~92% of cases conform to the pattern.

So it’s looking very much as though words beginning with Sh tend to appear “earlier” than corresponding words beginning with ch both within lines and within paragraphs, as long as we evaluate these positions in relative terms rather than absolute ones.

Next, here’s an analysis of the same kind carried out on my earlier list of minimal pairs of words beginning with o and a:

  • ol (0.528 [41.0]; 363), al (0.566 [41.1]; 137)
  • or (0.520 [35.4]; 250), ar (0.543 [41.5]; 233)
  • chol (0.551 [32.0]; 321); chal (0.550 [50.6]; 40)
  • chor (0.510 [25.3]; 192]); char (0.476 [33.4]; 62)
  • choiin (0.378 [13.1]; 11); chaiin (0.594 [28.5]; 42)
  • dol (0.540 [38.7]; 86); dal (0.517 [39.1]; 171)
  • dor (0.553 [33.0]; 58); dar (0.498 [37.3]; 248)
  • doiin (0.466 [23.9], 12); daiin (0.528 [30.1]; 765)
  • kol (0.482 [33.6]; 20), kal (0.522 [54.8]; 14)
  • kor (0.607 [32.5]; 17), kar (0.561 [45.3]; 39)
  • tol (0.450 [26.8]; 29), tal (0.478 [38.2]; 12)
  • tor (0.585 [34.8]; 17), tar (0.437 [37.4]; 29)
  • okol (0.519 [30.1]; 54), okal (0.500 [34.2], 117)
  • okoly (0.362 [11.3]; 3), okaly (0.557 [47.3], 16)
  • okoldy (0.580 [32.6]; 7), okaldy (0.460 [33.7]; 10)
  • okor (0.544 [30.3]; 26), okar (0.521 [37.9]; 110)
  • otol (0.535 [31.3]; 61), otal (0.558 [37.6]; 108)
  • otor (0.433 [26.5]; 32), otar (0.524 [36.3]; 105)
  • ykol (0.639 [36.1]; 12), ykal (0.520 [50.3]; 12])
  • ykor (0.589 [28.1]; 8), ykar (0.654 [42.0]; 32)
  • ytol (0.483 [25.7]; 15), ytal (0.579 [55.6]; 13)
  • ytor (0.442 [27.1]; 12), ytar (0.574 [37.0]; 22)

With the first method, measuring relative endwardness, no pattern emerges at all this time: out of twenty-two pairs, eleven go one way and eleven go the other way.  But with the second method, measuring absolute endwardness, there’s only a single exception to the rule that a variants display greater average endwardness than o variants, with 21 pairs (95%) conforming to this pattern, and even the exception is actually almost a tie (41.0 versus 41.1).  The differential pattern for o / a turns out to be even more consistent in paragraphs than we found it to be in lines.  However, it emerges only using a method of calculation that, in the case of Sh / ch, gave us a weaker result and that—for reasons I’ll get into below—I consider of dubious reliability.

Finally, here’s an equivalent set of results for my earlier list of minimal pairs beginning with qo and o.

  • qokain (0.511 [52.7]; 278), okain (0.550 [43.5]; 133)
  • qotain (0.460 [49.9]; 59), otain (0.620 [46.9]; 92)
  • qodain (0.578 [23.5]; 11), odain (0.604 [29.9]; 16)
  • qokaiin (0.530 [37.7]; 264), okaiin (0.573 [35.7]; 203)
  • qotaiin (0.458 [28.8]; 78), otaiin (0.553 [29.0]; 137)
  • qodaiin (0.604 [28.6]; 43), odaiin (0.541 [39.2]; 53)
  • qokair (0.471 [43.9]; 17), okair (0.578 [33.7]; 18)
  • qoeey (0.612 [33.6]; 13), oeey (0.404 [26.6]; 5)
  • qoeedy (0.389 [25.2]; 18), oeedy (0.356 [14.0]; 6)
  • qokal (0.541 [49.1]; 182), okal (0.500 [34.2]; 117)
  • qotal (0.468 [37.5]; 56), otal (0.558 [37.6]; 108)
  • qodal (0.651 [43.0]; 7), odal (0.542 [23.6]; 10)
  • qokam (0.554 [40.9]; 25), okam (0.610 [35.7]; 28)
  • qotam (0.564 [30.8]; 11), otam (0.583 [40.4]; 44)
  • qokar (0.518 [44.1]; 147), okar (0.521 [37.9]; 110)
  • qotar (0.429 [25.9]; 61), otar (0.524 [36.3]; 105)
  • qodar (0.591 [43.1]; 10), odar (0.378 [24.7]; 19)
  • qokol (0.504 [30.2]; 88), okol (0.519 [30.1]; 54)
  • qotol (0.495 [27.6], 40), otol (0.535 [31.3]; 61)
  • qokor (0.548 [38.7]; 31), okor (0.544 [30.3]; 26)
  • qotor (0.395 [19.0]; 24), otor (0.433 [26.5]; 32)
  • qokedy (0.446 [40.4]; 271); okedy (0.511 [38.5]; 107)
  • qotedy (0.391 [29.9]; 86); otedy (0.456 [32.2]; 137)
  • qokeedy (0.480 [47.6]; 304); okeedy (0.500 [35.7]; 101)
  • qoteedy (0.459 [39.3]; 74); oteedy (0.477 [30.3]; 89)
  • qokeey (0.496 [44.1]; 298); okeey (0.534 [38.3]; 154)
  • qoteey (0.498 [33.9]; 42); oteey (0.498 [37.4]; 103)
  • qokey (0.534 [49.4]; 102); okey (0.552 [46.5]; 54)
  • qotey (0.492 [35.5]; 22); otey (0.492 [43.1]; 39)
  • qotchdy (0.393 [19.2]; 20); otchdy (0.458 [25.9]; 23)
  • qokchdy (0.442 [27.9]; 49); okchdy (0.432 [34.2]; 17)
  • qopchdy (0.233 [8.1]; 16), opchdy (0.299 [10.4], 17)
  • qotchedy (0.382 [22.9]; 24), otchedy (0.465 [20.5]; 31)
  • qokchedy (0.412 [18.1]; 40), okchedy (0.515 [23.7]; 21)
  • qopchedy (0.182 [10.3]; 30), opchedy (0.218 [12.9], 46)
  • qokchol (0.468 [18.5]; 17), okchol (0.643 [32.4]; 11)
  • qotchol (0.465 [22.7]; 12), otchol (0.482 [19.6]; 27)
  • qokchor (0.425 [16.5]; 8), okchor (0.583 [26.2]; 15)
  • qotchor (0.555 [28.3]; 12), otchor (0.391 [17.1]; 16)
  • qokchy (0.487 [24.7]; 63), okchy (0.614 [34.3]; 25)
  • qotchy (0.516 [27.6]; 63), otchy (0.461 [20.8]; 40)
  • qopchy (0.189 [9.3]; 12), opchy (0.218 [8.8]; 9)
  • qotchey (0.462 [30.9]; 19), otchey (0.399 [21.4]; 30)
  • qokchey (0.426 [20.0]; 27), okchey (0.588 [29.0]; 32)
  • qopchey (0.197 [35.8]; 9), opchey (0.225 [16.3]; 27)
  • qoky (0.503 [39.2]; 133), oky (0.499 [31.3]; 85)
  • qoty (0.493 [28.8]; 76), oty (0.493 [30.8]; 99)
  • qody (0.447 [19.4]; 14), ody (0.536 [26.7]; 30)
  • qol (0.581 [59.8]; 110), ol (0.528 [41.0]; 363)
  • qor (0.511 [40.2]; 19), or (0.520 [35.4]; 250)

This time, it’s the second method that doesn’t turn up any pattern at all: twenty-six pairs go one way and twenty-four pairs go the other way.  The first method shows a tendency to deflect in the expected direction, but only weakly and inconsistently compared to the other cases we’ve examined: 31 pairs (62%) conform, 16 pairs (32%) go the other way, and three pairs are exact ties, marked in green.  I present this as an example of results that I’d say teeter on the brink of inconclusiveness, to help put the other cases in perspective.

Based on the foregoing, I believe the following three claims are defensible:

  • For minimal pairs of words beginning with Sh and ch, tokens of the ch variant are, on average, both more relatively rightward and more relatively endward than tokens of the Sh variant.
  • For minimal pairs of words beginning with o and a, tokens of the a variant are, on average, both more relatively rightward and more absolutely endward than tokens of the o variant.
  • For minimal pairs of words beginning with and qo, tokens of the variant are, on average, more relatively rightward than tokens of the variant.

Downwardness

How surprised should we be to find correlations between rightwardness and endwardness?  In itself, it might not seem as though the fact that phenomena appear earlier in lines should cause them also to appear earlier within paragraphs, since the first word of each new line appears later in its paragraph than the last word of the previous line.  Nevertheless, it’s conceivable that sufficiently strong differential rightwardness patterns could cause differential endwardness patterns as a secondary effect, something like this:

This that this this this those this
this that this this this this those
that this this this this those this
this that this this those this this

If we want to exclude all traces of rightwardness from our assessment of vertical positioning so as to avoid any risk of contamination between parameters, we can substitute a metric of downwardness for endwardness, basing our calculations not on sequential word numbers within paragraphs but exclusively on line numbers, either divided by the total number of lines in a paragraph (relative downwardness) or taken without modification (absolute downwardness).  To see how much of a difference this makes, let’s measure our usual minimal pairs of words beginning with Sh and ch for average relative downwardness, reckoned for each token as one less than its line number divided by one less than the total lines in the paragraph.

  • Shaiin (0.359), chaiin (0.542)
  • Shar (0.543), char (0.440)
  • ShcKhy (0.486), chcKhy (0.525)
  • ShcThy (0.518), chcThy (0.532)
  • Shdy (0.383), chdy (0.450)
  • Sheal (0.575), cheal (0.544)
  • Shear (0.390), chear (0.615)
  • ShecKhy (0.482), checKhy (0.521)
  • ShecThy (0.519), checThy (0.507)
  • Shedy (0.418), chedy (0.532)
  • Sheedy (0.442), cheedy (0.530)
  • Sheey (0.461), cheey (0.567)
  • Sheky (0.432), cheky (0.584)
  • Sheo (0.331), cheo (0.492)
  • Sheody (0.397), cheody (0.598)
  • Sheol (0.446), cheol (0.581)
  • Sheor (0.435), cheor (0.582)
  • Shey (0.424), chey (0.560)
  • Sho (0.526), cho (0.502)
  • Shodaiin (0.367), chodaiin (0.564)
  • Shody (0.480), chody (0.565)
  • Shol (0.405), chol (0.543)
  • Shor (0.318), chor (0.483)
  • Shy (0.450), chy (0.482)

Shar / char and Sheal / cheal were also exceptions for average relative endwardness, while ShecThy / checThy and Sho / cho were not.  Thus, when we factor out the sequencing of words within lines, we get a couple more exceptions, but we still end up with 20 conforming pairs out of 24, or 83% conformity.  Note, incidentally, that a neutral value for average relative downwardness seems to be around 0.461, not 0.5, probably because the last lines of paragraphs typically contain fewer words than other lines.  Here’s a scatterplot of the same downwardness data for minimal pairs with values for Sh assigned to the axis and values for ch assigned to the axis; as usual, the black line shows the point along which the two values would be equal.

Meanwhile, for the set of all word tokens starting with Sh, average relative downwardness is 0.424, while for all word tokens starting with ch, it’s 0.514—a difference of 0.090.  Is this statistically significant?  In an effort to find out, I used the same methodology as before by calculating average relative downwardness for ten pairs of groupings of the same quantities of word tokens drawn at random from the whole text and got this:

  • 0.464 vs 0.447 (difference: 0.017)
  • 0.458 vs 0.471 (difference: 0.013)
  • 0.451 vs 0.461 (difference: 0.010)
  • 0.463 vs 0.453 (difference: 0.010)
  • 0.454 vs 0.463 (difference: 0.009)
  • 0.460 vs 0.464 (difference: 0.004)
  • 0.466 vs 0.468 (difference: 0.002)
  • 0.458 vs 0.459 (difference: 0.001)
  • 0.464 vs 0.464 (difference: 0.000)
  • 0.460 vs 0.460 (difference: 0.000)

The maximum difference among paired results is 0.017.  The greatest difference between any pair of figures drawn from the two columns (0.466 versus 0.447) would be 0.020—which is to say that we should expect a difference of that magnitude to arise randomly, with token sets of this size, only once in a hundred times.  That’s much lower than our figure of 0.090.

I also repeated my other, complementary type of experiment by taking only the sets of word tokens beginning ch and Sh, combining them, randomizing their order five times, and each time calculating average downwardness for the first 5325 values and the remainder; and also for the last 5325 values and the remainder.  I got a higher maximum difference of 0.021 this way, but that still falls far below our figure of 0.090.

  • 0.487 vs 0.474 (difference: 0.013); 0.482 vs 0.484 (difference: 0.002)
  • 0.482 vs 0.483 (difference: 0.001); 0.480 vs 0.486 (difference: 0.006)
  • 0.484 vs 0.479 (difference: 0.005); 0.481 vs 0.484 (difference: 0.003)
  • 0.475 vs 0.496 (difference: 0.021); 0.486 vs 0.474 (difference: 0.012)
  • 0.481 vs 0.484 (difference: 0.003); 0.485 vs 0.478 (difference: 0.007)

So I conclude that the pattern holds meaningfully even after we shift our metric from endwardness to downwardness.  The ch variants really do occur simultaneously further towards the right of lines and further towards the bottom of paragraphs on average than the Sh variants, even though we’d have had no reason to assume they would do both things at once.  We must, I think, be dealing with an average contrastive distribution something like this:

This that this this this this this
this that this this this this those
that this this this this those this
this this this this those this this

In practice, we never see any one word with a distribution quite so neat across the whole manuscript, much less an actual page that looks like this; but sometimes real aggregate distributions approach the model, as we see in the following plot of the relative downwardness and rightwardness of all tokens of chaiin (in blue) and Shaiin (in orange).

I introduced the gray frame above to make it visually easy to factor out the first lines of paragraphs and the first and last words of lines, in case we want to focus on the “internal” situation.  That said, I haven’t experimented with excluding the first lines of paragraphs from the kind of downwardness statistics I’ve presented so far.  This would probably be worth trying, and the main reason I haven’t done so is that the only arguments I’ve seen made about first lines in paragraphs involve the gallows and f, and those aren’t two of the glyphs I’m investigating here.

But I have applied the same technique to examine successive points in paragraphs which I used above to track the average ratio between token counts at successive points in lines.  In the case of paragraphs, I’ve divided relative vertical positions into six categories containing 6821, 4630, 4528, 7770, 4280, and 4548 tokens respectively.  Here’s what I found for counts of word tokens beginning with Sh and ch.

It looks as though the very tops and bottoms of paragraphs are responsible for the lion’s share of the downwardness differential for these two categories of word.  Words starting with Sh are apparently just as common as words starting with ch at the very tops of paragraphs, but in other vertical positions ch words are about twice as common as Sh words, with the imbalance peaking at the very bottom.   The only other detail I notice is a symmetrical “hump” on the right spanning positions 2-5, which indicates that the ratio of ch words to Sh words also increases slightly as we approach the vertical center of a paragraph.  I haven’t done a lot of this last kind of analysis yet, but my impression from what little I’ve tried is that the last result is fairly typical—that the most significant variation within paragraphs comes at their very tops and bottoms.

When we revisit the list of minimal pairs beginning qo / o, average relative downwardness yields the same lackluster conformity rate as average relative endwardness—31 pairs of 50 conform to the predicted deflection (62%)—although the specific anomalous pairs are different.

  • qokain (0.481), okain (0.517)
  • qotain (0.400), otain (0.579)
  • qodain (0.602), odain (0.558)
  • qokaiin (0.509), okaiin (0.555)
  • qotaiin (0.384), otaiin (0.489)
  • qodaiin (0.600), odaiin (0.514)
  • qokair (0.481), okair (0.540)
  • qoeey (0.599), oeey (0.317)
  • qoeedy (0.383), oeedy (0.242)
  • qokal (0.511), okal (0.452)
  • qotal (0.429), otal (0.490)
  • qodal (0.606), odal (0.508)
  • qokam (0.415), okam (0.519)
  • qotam (0.367), otam (0.475)
  • qokar (0.486), okar (0.482)
  • qotar (0.352), otar (0.459)
  • qodar (0.560), odar (0.322)
  • qokol (0.479), okol (0.485)
  • qotol (0.459), otol (0.476)
  • qokor (0.506), okor (0.545)
  • qotor (0.345), otor (0.347)
  • qokedy (0.412); okedy (0.477)
  • qotedy (0.332); otedy (0.405)
  • qokeedy (0.453); okeedy (0.457)
  • qoteedy (0.423); oteedy (0.418)
  • qokeey (0.479); okeey (0.508)
  • qoteey (0.477); oteey (0.458)
  • qokey (0.522); okey (0.523)
  • qotey (0.485); otey (0.452)
  • qotchdy (0.340); otchdy (0.391)
  • qokchdy (0.361); okchdy (0.363)
  • qopchdy (0.083), opchdy (0.088)
  • qotchedy (0.287), otchedy (0.414)
  • qokchedy (0.361), okchedy (0.453)
  • qopchedy (0.045), opchedy (0.088)
  • qokchol (0.466), okchol (0.586)
  • qotchol (0.434), otchol (0.431)
  • qokchor (0.443), okchor (0.590)
  • qotchor (0.565), otchor (0.364)
  • qokchy (0.464), okchy (0.588)
  • qotchy (0.502), otchy (0.378)
  • qopchy (0.068), opchy (0.077)
  • qotchey (0.400), otchey (0.357)
  • qokchey (0.373), okchey (0.562)
  • qopchey (0.095), opchey (0.116)
  • qoky (0.451), oky (0.433)
  • qoty (0.423), oty (0.409)
  • qody (0.400), ody (0.477)
  • qol (0.573), ol (0.499)
  • qor (0.518), or (0.497)

Judging from this, it doesn’t look as though the qo / o contrast at the beginning of words affects relative downwardness to any great extent, even though it seems to affect rightwardness, as we’ve seen.  If this is the case, it would be noteworthy for showing that tendencies towards rightwardness and downwardness don’t always coincide in such situations as they appear to do with Sh / ch.  The average relative downwardness scores for the whole sets of words starting with qo and o are 0.440 and 0.459, which isn’t much of a difference either.

Meanwhile, the nicely consistent patterning of o / a we found earlier with average absolute endwardness disappears when we switch our metric to average absolute downwardness:

  • ol (5.4), al (4.5)
  • or (4.5), ar (4.7)
  • chol (4.7); chal (5.7)
  • chor (3.9); char (4.3)
  • choiin (2.6); chaiin (4.4)
  • dol (5.5); dal (4.9)
  • dor (4.8); dar (4.5)
  • doiin (5.3); daiin (4.3)
  • kol (5.2), kal (5.9)
  • kor (4.2), kar (5.2)
  • tol (4.1), tal (4.8)
  • tor (4.6), tar (4.7)
  • okol (4.24), okal (4.21)
  • okoly (1.7), okaly (5.25)
  • okoldy (4.3), okaldy (4.2)
  • okor (4.4), okar (4.3)
  • otol (4.3), otal (4.4)
  • otor (3.8), otar (4.0)
  • ykol (5.4), ykal (5.7)
  • ykor (5.4), ykar (4.8)
  • ytol (4.3), ytal (6.7)
  • ytor (5.1), ytar (4.5)

There’s some curious clustering of “exceptions” here—note the consistency within the d group, for example—but the bottom line is that only 13 of 22 pairs (59%) conform to the predicted pattern now, as opposed to 21 of 22 (95%) before.  In this case, the differential pattern seems to affect absolute endwardness but not absolute downwardness (or, at least, not much).  Perhaps the sequencing of words within lines could somehow be responsible for the difference in the endwardness scores.  Before jumping to conlcusions, though, we should also consider some implications of the fact that absolute measures of endwardness and downwardness aren’t normalized for variation in paragraph length, as the relative measures are.  If certain words tend to turn up preferentially in longer or shorter paragraphs, that will affect their average absolute scores.  And it so happens that the overall average absolute endwardness for all words is 29.6 in Currier A but 37.7 in Currier B, due presumably to greater average paragraph length in the latter.  All other things being equal, then, a word that simply happens to be more common in Currier A than in Currier B—and we know that there are plenty of those—would receive a lower absolute endwardness score for that reason alone.  Absolute downwardness is instead based on line numbers, and there’s less variation on that front: the average comes out to 4.43 for Currier A and 4.49 for Currier B, a difference of just 0.06.  The differences in measurements between words tend to be rather greater than that, and I don’t usually even bother showing two decimal places for the figures I’ve given.  Still, more granular chunks of the Voynich Manuscript deviate further in this regard (for Quire 13, the average is 4.18; for Quires 1-3 inclusive, it’s 3.78), so differences in distribution purely by section could still “contaminate” a result.

On this basis, absolute endwardness and downwardness strike me as much less reliable as evidence of positional preference per se than measures of relative endwardness and downwardness are.  Among other things, this observation suggests that we shouldn’t hold the comparatively low difference in absolute endwardness scores associated with the distinction between word-initial Sh and ch too much against it.  Maybe some phenomena tend to occur only once a paragraph extends to an unusually great number of lines, and if so, that would be interesting to know; but it would be challenging to disentangle this factor from others without taking steps beyond what I’ve described.  So I incline more and more towards relative downwardness as the most promising complementary metric to relative rightwardness.

In case this isn’t obvious, I should point out that the fact that a given pair of words differ significantly in average rightwardness or downwardness doesn’t mean that individual tokens of those words won’t be found haphazardly mixed together.  Quite the contrary.  When we plot the rightwardness and downwardness of all individual tokens of minimal pairs of words such as tol / tal, chor / Shor, and qokeedy / okeedy

—it’s sometimes easy to spot a trend just by looking, but often enough it’s not.  The differences are there (unless I’ve made some really embarrassing mistake), and I’ve tried to make a case that they’re statistically significant, but they certainly wouldn’t leap out at the reader through casual observation.  They emerge only from subtle imbalances in the relative distribution of words over long stretches of text.  And I submit that they must also have been produced by means of some equally large-scale, long-term, and pervasive mechanism.


Glyphs and Adjacent Glyph Pairs

Words aren’t the only entities in the Voynich Manuscript that permit calculations of rightwardness and downwardness.  We can calculate them for individual glyphs as well, as long as we make a few working assumptions about what counts as a single glyph.  For myself, I’m going to treat chSh, cTh, and so forth as single glyphs, but not ai, aii, aiii, and the like, which could always be a step in the wrong direction but will at least get us started.

It turns out that individual glyphs show much the same magnitude of spread of values as words do.  Here, for instance, are average rightwardness and downwardness scores (given in that order) for four of the same glyphs we’ve examined above as components of words.

  • Sh = 0.385, 0.416
  • ch = 0.468, 0.460
  • = 0.477, 0.449
  • = 0.552, 0.468

The average line contains ~38 glyphs, so the differences between the rightwardness scores for Sh and ch and for o and a should both correspond loosely to ~3 glyph slots, while the difference between Sh and a should correspond to ~6.3 glyph slots.  Since the first glyph of each new word in a line is more rightward than the last glyph of the preceding word, I think that any positional tendencies within words as such ought mostly to cancel themselves out here (though I suppose there could be some minor “contamination” from first and last words).

One nice thing about these pairs of measurements is that we can plot them as and coordinates consistently with the spatial parameters they actually represent, allowing us to visualize them and the contrasts between them in a format that’s intuitively easy to grasp.

In the above graph, we can readily see that the glyph ch appears on average more rightward and downward than the glyph Sh, and that the glyph a appears on average more rightward and downward than the glyph o.  That observation should perhaps come as no surprise based on what we’ve seen above, but the fact that these glyphs’ total distribution throughout paragraphs mirrors the behavior of words that feature them in specific contexts is still noteworthy.  Below, for reference, is a graph plotting average rightwardness against average downwardness for all individual glyphs with at least 120 tokens (and both exclude ch, Sh, cTh, etc.).

We can also exclude the first and last glyphs of lines from our rightwardness statistics to gauge how much of an influence those two positions exert.  Below I’ve plotted average rightwardness for all positions along the x axis and average rightwardness for positions excluding the first and last along the y axis, covering the same set of glyphs as before, and with the midline plotted for context.

The glyphs most affected are p (favoring the first position), m, and g (both of these latter strongly favoring the last position).  Some other glyphs deflect as well, but not to nearly the same degree.  Notably, a, o, ch, and Sh all hew close to the midline, confirming that their rightwardness relative to each other isn’t primarily a function of line-initial and line-final phenomena.

Below is an equivalent graph for downwardness, in which I’ve plotted averages with all lines considered against averages with first and last lines of paragraphs excluded.

I haven’t drawn in the midline this time because the shorter length of final lines effectively skews it away from the expected spot.  Nevertheless, take note of the difference in the scales of the and axes: when first and last lines are factored out, not much of a spread remains among scores for most glyphs.  One noteworthy outlier is cPh, which has the third lowest downwardness score when all lines are factored in but the highest downwardness score of all, by a lot, when first and last lines are excluded.

Between words and glyphs there’s also another level of analysis I’d be remiss not to consider here: namely, pairs of adjacent glyphs.

Average downwardness is as easy to calculate for adjacent glyph pairs as it is for words; we can simply divide the sum of line numbers by the quantity of tokens.  Calculating rightwardness is a little less straightforward, but for this I decided to take the number of the first glyph in a pair divided by the total count of glyphs in the line.  Thus, the first adjacency in a line containing 100 glyphs, between glyphs 1 and 2, would score 1/100, while the last adjacency, between glyphs 99 and 100, would score 99/100.  I can then take the average of scores for each adjacent glyph pair to gauge whether it tends to appear more or less rightward.

When working out the following statistics, I’ve ignored all spaces within lines, and I’ve also disregarded adjacencies across line breaks (although it might have made sense to factor these latter into downwardness).

First, there are twenty-eight pairs of adjacencies of Sh and ch with other glyphs that have at least five tokens apiece.  We’ve come to expect ch to score rightward and downward from Sh.  Does that pattern continue to hold?

  • Sha (0.406, 0.417, 137 tokens); cha (0.535, 0.463, 459 tokens)
  • ShcKh (0.449, 0.506, 95 tokens); chcKh (0.529, 0.497, 243 tokens)
  • ShcPh (0.459, 0.156, 5 tokens), chcPh (0.571, 0.227, 27 tokens)
  • ShcTh (0.485, 0.536, 50 tokens), chcTh (0.598, 0.521, 134 tokens)
  • Shd (0.464, 0.306, 175 tokens), chd (0.548, 0.373, 745 tokens)
  • She (0.383, 0.428, 2474 tokens), che (0.455, 0.474, 4707 tokens)
  • Shk (0.443, 0.464, 51 tokens), chk (0.506, 0.509, 156 tokens)
  • Shl (0.492, 0.466, 9 tokens), chl (0.511, 0.522, 54 tokens)
  • Sho (0.355, 0.391, 944 tokens); cho (0.433, 0.476, 2440 tokens)
  • Shs (0.326, 0.392, 16 tokens), chs (0.504, 0.386, 77 tokens)
  • Sht (0.612, 0.276, 19 tokens); cht (0.544, 0.405, 73 tokens)
  • Shy (0.459, 0.421, 259 tokens); chy (0.519, 0.419, 918 tokens)
  • aSh (0.400, 0.150, 5 tokens); ach (0.366, 0.403, 11 tokens)
  • dSh (0.215, 0.431, 193 tokens); dch (0.328, 0.493, 373 tokens)
  • eSh (0.398, 0.407, 55 tokens); ech (0.429, 0.454, 154 tokens)
  • fSh (0.342, 0.000, 19 tokens); fch (0.509, 0.114, 176 tokens)
  • iSh (0.597, 0.794, 5 tokens); ich (0.465, 0.506, 9 tokens)
  • kSh (0.348, 0.356, 215 tokens); kch (0.449, 0.448, 1000 tokens)
  • lSh (0.393, 0.438, 907 tokens); lch (0.477, 0.521, 1801 tokens)
  • mSh (0.506, 0.397, 31 tokens); mch (0.618, 0.446, 76 tokens)
  • nSh (0.417, 0.460, 662 tokens); nch (0.492, 0.532, 1287 tokens)
  • oSh (0.280, 0.423, 101 tokens); och (0.344, 0.480, 246 tokens)
  • pSh (0.295, 0.067, 76 tokens); pch (0.445, 0.098, 714 tokens)
  • rSh (0.388, 0.386, 657 tokens); rch (0.464, 0.498, 1116 tokens)
  • sSh (0.345, 0.399, 124 tokens); sch (0.448, 0.508, 231 tokens)
  • tSh (0.324, 0.324, 178 tokens); tch (0.417, 0.400, 946 tokens)
  • ySh (0.410, 0.416, 830 tokens); ych (0.451, 0.511, 1835 tokens)
  • chSh (0.360, 0.290, 11 tokens); Shch (0.374, 0.546, 13 tokens) – but this one could be classified either way!

Of the 27 pairs it seems meaningful to assess contrastively, 24 (~88%) conform to expectations for rightwardness, while 22 (~81.5%) conform to expectations for downwardness.  Note also that some of the exceptions have low token counts.  Below are a couple of different scatterplot representations of the same data (excluding chSh and Shch).  In the scatterplot on the right, the red “cloud” of Sh looks to me rather as though it has shifted downward and rightward as a whole to produce the blue “cloud” of ch (except for a single anomalous red outlier way down at the bottom, the point for iSh / ich, which I think can safely be disregarded).

There are also thirty-seven pairs of adjacencies of o and a with other glyphs that have at least five tokens apiece.  Judging from the individual glyph scores, should be both rightward and downward from o, although in our earlier study of words containing these glyphs only the tendency towards rightwardness was clearly apparent.

  • ao (0.495, 0.334, 14 tokens); aa (0.581, 0.556, 6 tokens)
  • cFho (0.540, 0.116, 13 tokens); cFha (0.546, 0.308, 6 tokens)
  • cho (0.433, 0.476, 2440 tokens); cha (0.535, 0.463, 459 tokens)
  • cKho (0.472, 0.439, 93 tokens); cKha (0.579, 0.595, 33 tokens)
  • cPho (0.454, 0.228, 50 tokens); cPha (0.630, 0.270, 20 tokens)
  • cTho (0.556, 0.470, 221 tokens); cTha (0.574, 0.452, 64 tokens)
  • do (0.509, 0.442, 534 tokens); da (0.520, 0.448, 3755 tokens)
  • eo (0.431, 0.469, 2918 tokens); ea (0.502, 0.482, 416 tokens)
  • fo (0.448, 0.059, 54 tokens); fa (0.598, 0.203, 61 tokens)
  • go (0.675, 0.418, 8 tokens); ga (0.777, 0.756, 6 tokens)
  • ho (0.379, 0.335, 30 tokens); ha (0.559, 0.509, 13 tokens) = cases other than cho, Sho, cKho, etc.
  • ko (0.481, 0.454, 680 tokens); ka (0.532, 0.503, 2891 tokens)
  • lo (0.558, 0.480, 1444 tokens); la (0.573, 0.476, 472 tokens)
  • mo (0.664, 0.423, 83 tokens); ma (0.863, 0.624, 11 tokens)
  • no (0.547, 0.475, 1516 tokens); na (0.638, 0.534, 257 tokens)
  • oo (0.393, 0.440, 149 tokens); oa (0.351, 0.456, 280 tokens)
  • po (0.237, 0.089, 226 tokens); pa (0.529, 0.155, 166 tokens)
  • qo (0.442, 0.439, 5230 tokens); qa (0.430, 0.393, 7 tokens)
  • ro (0.542, 0.436, 1591 tokens); ra (0.587, 0.457, 1453 tokens)
  • so (0.371, 0.504, 580 tokens); sa (0.460, 0.517, 801 tokens)
  • Sho (0.355, 0.391, 944 tokens); Sha (0.406, 0.417, 137 tokens)
  • to (0.443, 0.417, 629 tokens); ta (0.591, 0.465, 1449 tokens)
  • yo (0.520, 0.439, 2390 tokens); ya (0.517, 0.473, 127 tokens)
  • och (0.344, 0.480, 246 tokens); ach (0.366, 0.403, 11 tokens)
  • od (0.495, 0.446, 2115 tokens); ad (0.565, 0.425, 41 tokens)
  • og (0.819, 0.453, 20 tokens); ag (0.897, 0.570, 18 tokens)
  • oi (0.392, 0.469, 294 tokens); ai (0.506, 0.485, 6315 tokens)
  • ok (0.471, 0.478, 5682 tokens); ak (0.534, 0.449, 37 tokens)
  • ol (0.484, 0.467, 5354 tokens); al (0.594, 0.459, 2654 tokens)
  • om (0.773, 0.443, 166 tokens); am (0.875, 0.451, 707 tokens)
  • on (0.543, 0.317, 5 tokens); an (0.712, 0.567, 109 tokens)
  • op (0.554, 0.085, 527 tokens); ap (0.668, 0.122, 7 tokens)
  • or (0.475, 0.462, 2588 tokens); ar (0.564, 0.440, 2875 tokens)
  • os (0.477, 0.495, 416 tokens); as (0.618, 0.401, 72 tokens)
  • oSh (0.280, 0.423, 101 tokens); aSh (0.400, 0.150, 5 tokens)
  • ot (0.529, 0.426, 3270 tokens); at (0.562, 0.356, 9 tokens)
  • oy (0.412, 0.449, 143 tokens); ay (0.668, 0.437, 12 tokens)

For rightwardness, an impressive 34 cases out of 37 (~92%) conform to expectations, the exceptions being oo / oa, qo / qa, and yo / ya.  For downwardness, only 26 cases out of 37 (~70%) conform, consistent with the dubious results we achieved for words with this distinction and parameter.  On the other hand, the scatterplot below on the left shows many of the exceptions for downwardness (in blue) clustering together close to the diagonal midline, and the aggregate point “cloud” on the right seems to have shifted downward as well as rightward, so perhaps there is some weak tendency towards downwardness here after all.

Judging from these two examples, it looks as though adjacent glyph pairs are behaving on the whole much like individual glyphs, as well as like specific words that begin with them or contain them, in both cases for rightwardness, and at least in the case of Sh / ch for downwardness too.

One popular line of speculation holds that Voynichese is made up mostly of bigrams such as or and ol that function as discrete units whose significance has nothing to do with any meaning attached to the individual glyphs that make them up.  If bigrams were to display positional tendencies distinctly unlike those of their elements, that might be taken as a point in favor of this possibility.  On the other hand, if bigrams were to conform closely to the positional tendencies of their component glyphs, that might be taken as a point against it.  As far as we can judge so far from the relative distributions of bigrams containing Sh, ch, o, and a, I’d say they seem to conform in a majority of cases.  But when it comes to addressing the behavior of bigrams more generally, it’s less clear how we ought to try predicting their scores based on the scores of their component glyphs, and hence to judge whether they’re conforming to expectations or not.

Should we predict that the rightwardness and downwardness scores for a bigram will simply be the averages of the individual scores for its two glyphs—for example, that the scores for dy will be the averages of the individual scores for and y?  Well, it never hurts to try.  Limiting my analysis to bigrams that occur at least a hundred times, I calculated averages in this way for each one and plotted these predicted scores along the y axis against the actual scores along the x axis.

If the predictions were correct, we’d see diagonal lines running from lower left to upper right; but we see nothing of the kind.  So if there’s any broad correlation—which, of course, there may not be—it’s apparently not as simple as this.  Nor should we really have expected it to be.  After all, even if we look at specific cases that definitely fit and reinforce the patterns we’ve been examining, the finer details of their interrelationships can still be opaque, as illustrated by the following graph of scores for the glyphs Sh, ch, o, a, and some of their common bigrams, with red lines marking the Sh / ch contrasts and green lines marking the o / a contrasts.  (The numbers in parentheses are token counts.)

Each contrast here fits the expected pattern (a variants rightward and downward from o variants, ch variants rightward from Sh variants), but it’s hard to fathom why—for example—the o-a line would end up transposed specifically to the Sho-Sha line under the influence of the location of Sh.  Random behavior?  Or maybe some operation of vector geometry?  For now, I’m going to leave this as an unresolved puzzle.  But as fodder for speculation, here’s a graph of average relative rightwardness and downwardness for all bigrams with more than a thousand tokens.


Spacing and Repetitions

So far, I’ve been considering all glyph adjacencies within lines, regardless of whether they occur within a word or have a space between them.  But it’s also worth investigating whether the phenomenon of spacing itself varies with rightwardness and/or downwardness.

A little background: in an earlier post, at section four, I observed

  • that some glyph pairings never (or almost never) have a space between them, while others always (or almost always) do;
  • that specific pairings responsible for most of the unpredictability in spacing are consistently inconsistent;
  • that their inconsistency extends to words or word pairs containing them, such that if l-ch is inconsistent (sometimes lch, sometimes l.ch), then we can predict that words containing l-ch will be inconsistent as well, such as qol-chedy (sometimes qolchedy, sometimes qol.chedy);
  • that these inconsistent pairings tend to be the same ones we find written ambiguously, such that it’s difficult to decide whether there’s a space present or not (a situation indicated in the Zandbergen transcription with a comma, e.g., l,ch).

From this I inferred that spacing probably reflects consistent relationships between glyph pairs and not “word divisions” in the ordinary sense: some glyph pairs require a space between them, others resist a space between them, and yet others can go either way, seemingly at random.  But of course “seemingly at random” means there may be a pattern we just haven’t noticed yet.  So I’ve since decided to check whether the distinction between spaced and unspaced adjacencies in this last category correlates at all with differences in rightwardness.  The absolute counts I came up with differ a little from the ones in my earlier post for some reason I haven’t pinned down, but not by a great amount.  I’ve tried to consider all pairs with at least fifteen tokens of both spaced and unspaced variants.

Here are figures for the adjacencies beginning with r.

  • ra (0.619, 573 tokens) / r.a (0.539, 632 tokens)
  • rch (0.516, 119 tokens) / r.ch (0.451, 944 tokens)
  • rd (0.747, 35 tokens) / r.d (0.580, 196 tokens)
  • ro (0.623, 324 tokens) / r.o (0.508, 1193 tokens)
  • rSh (0.386, 46 tokens) / r.Sh (0.383, 567 tokens)
  • ry (0.791, 206 tokens) / r.y (0.455, 189 tokens)

In each case but one, the “together” variant favors a more rightward position than the “apart” variant.  The exception contains Sh, which seems to be drawn strongly leftward in its own right, yielding the two least rightward averages in this whole group, as well as the two closest values, differing only by 0.003, which is nearly a tie.  Perhaps whatever factor tugs Sh towards a more leftward line position has overwhelmed any subtler difference between “together” and “apart” spacings.  For adjacencies beginning with s, by contrast, we see:

  • sa (0.411, 515 tokens) / s.a (0.547, 189 tokens)
  • sch (0.329, 64 tokens) / s.ch (0.483, 150 tokens)
  • sd (0.419, 17 tokens) / s.d (0.660, 29 tokens)
  • so (0.275, 336 tokens) / s.o (0.500, 215 tokens)
  • sSh (0.159, 30 tokens) / s.Sh (0.391, 85 tokens)
  • sy (0.624, 95 tokens) / s.y (0.487, 47 tokens)

This time, the pattern is reversed: the “together” variants favor a more leftward line position than the “apart” variants.  The exception now concerns y, which also gave us the most extreme difference with r, between ry (0.791) and r.y (0.455), hinting that a distinction between -y and -.y might have overshadowed the distinction between s- and s.-.  The “apart” values seem to be fairly similar for and s, as though it’s the “together” values that deviate, in one direction, in the other.  For example, r.o and s.o both score right around 0.5 (the middle of the line), but ro scores around 0.62 (more rightward) while so scores around 0.28 (less rightward).

Moving on, adjacencies after l follow a similar pattern to r, but with further complications (which may emerge in part because there are a greater number of adjacencies that rise above my “15 tokens each” threshold).

  • la (0.579, 326 tokens) / l.a (0.545, 99 tokens)
  • lch (0.482, 710 tokens) / l.ch (0.455, 959 tokens)
  • ld (0.681, 415 tokens) / l.d (0.572, 534 tokens)
  • le (0.452, 38 tokens) / l.e (0.423, 18 tokens)
  • lk (0.482, 1019 tokens) / l.k (0.496, 194 tokens)
  • ll (0.676, 25 tokens) / l.l (0.533, 178 tokens)
  • lo (0.612, 499 tokens) / l.o (0.516, 873 tokens)
  • lr (0.629, 26 tokens) / l.r (0.652, 57 tokens)
  • ls (0.681, 131 tokens) / l.s (0.559, 120 tokens)
  • lSh (0.367, 293 tokens) / l.Sh (0.402, 519 tokens)
  • lt (0.590, 96 tokens) / l.t (0.470, 90 tokens)
  • ly (0.728, 384 tokens) / l.y (0.506, 116 tokens)

The exception for lSh / l.Sh mirrors the exception for rSh / r.Sh.  The two other exceptions (lr / l.r, lk / l.k) mirror potential exceptions for that fall below the significance threshold: rr (0.26, 2 tokens) versus r.r (0.574, 17 tokens) and rk (0.419, 9 tokens) versus r.k (0.489, 47 tokens).  Thus, there’s consistency among the exceptions, insofar as there appear to be no significant cases in which adjacencies of and with the same second glyph deflect in opposite directions.  Followed by a, ch, d, e, l, o, s, t, or y, spaced variants deflect leftward; followed by k, r, or Sh, they deflect rightward.

Meanwhile, follows the same pattern as s, although often just barely:

  • da (0.511, 3709 tokens) / d.a (0.518, 25 tokens)
  • dch (0.287, 308 tokens) / d.ch (0.509, 61 tokens)
  • dd (0.590, 20 tokens) / d.d (0.595, 23 tokens)
  • dl (0.596, 69 tokens) / d.l (0.539, 25 tokens)
  • do (0.491, 446 tokens) / d.o (0.545, 83 tokens)
  • dSh (0.155, 156 tokens) / d.Sh (0.436, 34 tokens)
  • dy (0.548, 6419 tokens) / d.y (0.510, 33 tokens)

The exception for dl / d.l is mirrored by an exception between sl (0.716, 5 tokens) and s.l (0.344, 15 tokens) that falls below my arbitrary significance threshold.  So once again there appears to be consistency among the exceptions.  When or is followed by a, ch, d, o, or Sh, spaced variants deflect rightward; when followed by or y, they deflect leftward.

Even if there are legitimate patterns to be found here, it might be difficult to work out a comprehensive set of rules for them because so many adjacencies have too few tokens for their statistics to be persuasively meaningful.  But one possible hypothesis consistent with the foregoing builds on a distinction I made here, in section one, between minimars (glyphs formed with a minim plus a flourish, e.g., r, l) and curveletars (glyphs formed with a “c”-curve plus a flourish, e.g., s, d), and runs as follows:

  • Minimars are more likely to have a space after them the more leftward they are, unless followed by one of a limited set of glyphs (e.g., Sh).
  • Curveletars are more likely to have a space after them the more rightward they are, unless followed by one of another limited set of glyphs (e.g., l).

I decided to test this hypothesis on y, which is another curveletar and so would be predicted to follow the same pattern as and d.  I realize that you’ll need to take my word for it that I came up with the hypothesis before I carried out the experiment.  But I did.  And here’s what I found:

  • ya (0.185, 16 tokens) / y.a (0.550, 105 tokens)
  • ych (0.143, 229 tokens) / y.ch (0.497, 1554 tokens)
  • yd (0.569, 152 tokens) / y.d (0.572, 1172 tokens)
  • yf (0.560, 22 tokens) / y.f (0.472, 36 tokens)
  • yk (0.410, 635 tokens) / y.k (0.458, 431 tokens)
  • yl (0.650, 26 tokens) / y.l (0.546, 753 tokens)
  • yo (0.379, 41 tokens) / y.o (0.515, 2336 tokens)
  • yp (0.506, 78 tokens) / y.p (0.528, 89 tokens)
  • yr (0.730, 18 tokens) / y.r (0.620, 249 tokens)
  • ys (0.509, 25 tokens) / y.s (0.596, 357 tokens)
  • ySh (0.122, 92 tokens) / y.Sh (0.450, 714 tokens)
  • yt (0.438, 530 tokens) / y.t (0.543, 290 tokens)

Thus, my hypothesis had correctly predicted both the direction of the deflection and the fact that yl / y.l would be an exception to it.  There are, admittedly, two additional exceptions (yf / y.f and yr / y.r) involving subsequent glyphs that fell below the threshold when paired with and d.  The combination yy appears only twice and is line-final both times, which likewise fits the pattern as an “exception” (a “together” adjacency for a curveletar that deflects rightward).

Next, let’s take another look at some glyph pairs that contrast and o, this time monitoring the distinction between spaced and unspaced adjacencies (and excluding ambiguous “comma breaks”).  We’ve already established that a-pairs tend in general to display greater average rightwardness than o-pairs, with the three exceptions being oo / oa, qo / qa, and yo / ya, so our objective now is to see how that pattern plays itself out in conjunction with spacing factors.

  • ai (0.498, 6314 tokens) / oi (0.387, 292 tokens)
  • al (0.585, 2648 tokens) / ol (0.478, 5158 tokens)
  • ar (0.554, 2866 tokens) / or (0.464, 2479 tokens)
  • ad (0.569, 36 tokens) / od (0.488, 1994 tokens)
  • an (0.698, 109 tokens) / on (0.543, 5 tokens)
  • am (0.862, 707 tokens) / om (0.759, 164 tokens)
  • ak (0.526, 35 tokens) / ok (0.466, 5565 tokens)
  • d.a (0.518, 25 tokens) / d.o (0.545, 83 tokens)
  • da (0.511, 3709 tokens) / do (0.491, 446 tokens)
  • s.a (0.547, 189 tokens) / s.o (0.500, 215 tokens)
  • sa (0.411, 515 tokens) / so (0.275, 336 tokens)
  • cha (0.529, 457 tokens) / cho (0.436, 2435 tokens)
  • Sha (0.398, 137 tokens) / Sho (0.350, 936 tokens)
  • cKha (0.571, 33 tokens) / cKho (0.464, 92 tokens)
  • cPha (0.621, 20 tokens) / cPho (0.450, 50 tokens)
  • cTha (0.567, 64 tokens) / cTho (0.549, 220 tokens)
  • fa (0.592, 59 tokens) / fo (0.442, 54 tokens)
  • ka (0.524, 2886 tokens) / ko (0.476, 663 tokens)
  • pa (0.521, 163 tokens) / po (0.229, 222 tokens)
  • ta (0.583, 1444 tokens) / to (0.433, 619 tokens)
  • n.a (0.592, 221 tokens) / n.o (0.532, 1470 tokens)
  • l.a (0.545, 99 tokens) / l.o (0.516, 873 tokens)
  • la (0.579, 326 tokens) / lo (0.612, 499 tokens)
  • r.a (0.539, 632 tokens) / r.o (0.508, 1193 tokens)
  • ra (0.619, 573 tokens) / ro (0.623, 324 tokens)
  • o.a (0.450, 22 tokens) / o.o (0.366, 77 tokens)
  • oa (0.335, 232 tokens) / oo (0.403, 66 tokens)
  • y.a (0.550, 105 tokens) / y.o (0.515, 2336 tokens)
  • ya (0.185, 16 tokens) / yo (0.379, 41 tokens)

The exceptions marked above in red are notable but limited: to wit, lo, ro, oo, yo written “together” prefer more rightward positions than la, ra, oa, ya; while d.o written “apart” prefers a more rightward position than d.a.  These exceptions happen to impact five of the six cases in which we find significant spacing inconsistencies, e.g., we find numerous cases of both r.a and ra, and of both r.o and ro.  The sixth such case is s.a / sa / s.o / so, but that follows the expected rightwardness pattern across both the “together” and “apart” variants.  Note also that of lo, ro, oo, and yo, it’s the two exceptional cases with the greatest differences in score (oo / oa at 0.068, yo / ya at 0.194) that also constitute general exceptions even when spacing isn’t factored in.  As for the third general exception, qo / qa, there aren’t any tokens of q.o or q.a to be considered in the first place.  Thus, all three of the general exceptions among o / a bigrams which we noted earlier are actually exceptions only when these glyphs occur “together” rather than “apart.”

I’ve argued previously that spacing is largely predictable based on glyph adjacencies, and that it’s accordingly unlikely that word boundaries carry any meaning beyond insight they might offer into the logic of the system or the process of composition.  But certain exceptions—glyph adjacencies that are consistently treated inconsistently—remained unpredictable before.  Now it appears that those cases aren’t quite as unpredictable as they seemed, and that the presence or absence of spacing correlates at least somewhat with rightwardness.

Let me give just one example of the kind of puzzle we may now be in a position to unriddle.  As can be seen in the graph of common bigrams I shared above, y_k falls decisively to the left of the pack (0.441) while y_d falls decisively to the right (0.587).  But if we look at the only significant minimal pair of words beginning with those two bigrams, ydaiin and ykaiin, the direction of the contrast is oddly reversed: ydaiin (18 tokens) scores 0.336, while ykaiin (45 tokens) scores 0.423.  If we were to limit ourselves to studying discrete words with spaces around them, we might not be able to account for the discrepancy.  But there is an explanation—or, if not exactly an explanation, at least an interpretation that relates this phenomenon to other phenomena.  I’ve shown previously that the y_d glyph pair strongly prefers to be separated by a break (by a ratio of 1172 to 152), while the y_k glyph pair moderately prefers to be together (by a ratio of 635 to 431).  And we’ve now also discovered that both glyph pairs are more likely to appear together, without a space, the further leftward they are in a line.  When the two glyph pairs are spaced out, we end up with *y.daiin and *y.kaiin, where *can be either as a self-standing word or a longer word ending in y.  Of course, these alternate forms are excluded from our usual statistics for the words ydaiin and ykaiin.  It turns out that there are just 14 tokens of *y.kaiin to the 45 tokens of ykaiin, and, predictably, the spaced set has a higher average rightwardness score (0.554) than the unspaced set.  Meanwhile, there are a whopping 355 tokens of *y.daiin to the 18 tokens of ydaiin, consistent with the stronger tendency of y_d towards “apartness”; and here too the spaced set predictably has a higher average rightwardness score (0.528) than the unspaced set.  If we lump together all spaced and unspaced tokens of the glyph sequences ydaiin and ykaiin, we find that ydaiin scores 0.519 and ykaiin scores 0.453, which is much more consistent with we’d have expected based on the individual scores of the two bigrams.  I conclude that the word ydaiin probably scores leftward of the word ykaiin only because regular spacing and rightwardness tendencies have conspired to transform all but a small and leftward-biased group of tokens of ydaiin into *y.daiin.

I haven’t yet studied whether spacing also correlates with downwardness, since I did the bulk of the work I’ve just described before I hit upon “downwardness” as a concept.  But if rightwardness affects words, glyphs, bigrams, and spacing, it would seem to be casting its shadow pretty much everywhere.

And what about repetitions of words or parts of words?  Most specific repetitions are, I think, rare enough to defy meaningful statistical analysis for rightwardness, as are repetitions that follow specific patterns such as o~.qo~.  But some abstract patterns of repetition seem common enough to be viable subjects.  Here are calculations of average rightwardness and downwardness for repetitions in which ~ is the repeated element and x and y are unrepeated elements, with token quantities in parentheses.  To clarify: an example of ~.~ is chor.chor, while an example of x~.~ is ychor.chor and an example of ~.~y is chor.chorcheey.

  • ~.~ (258): 0.561, 0.490
  • x~.~ (327): 0.489, 0.485
  • ~.y~ (332): 0.519, 0.493
  • ~x.~ (90): 0.509, 0.529
  • ~.~y (128): 0.603, 0.491
  • x~.y~ (2574): 0.512, 0.459
  • x~.~y (276): 0.500, 0.508
  • ~x.y~ (312): 0.412, 0.539
  • ~x.~y (1936): 0.567, 0.482

The results differ, often by what looks like quite a lot, even for patterns with many tokens.  Repetitions following the model of chor.chorcheey are more rightward on average by 11.4% of a line than repetitions following the model of ychor.chor.  And this behavior may not be independent from the spacing patterns we considered a moment ago.  Consider that the “additions” to repeated elements often introduce glyph adjacencies that fall into the “consistently inconsistent” category, e.g.,

  • in ychor.chor, the adjacency ych (0.143, 229 tokens) / y.ch (0.497, 1554 tokens)
  • in chor.chorcheey, the adjacency rch (0.516, 119 tokens) / r.ch (0.451, 944 tokens)

For that particular pair of examples (which I chose arbitrarily), each repetition belongs to a category that deflects for rightwardness in the same direction its unspaced adjacency does: leftward for ych, rightward for rch.  Deciding whether this is a typical pair of examples would require further reflection.  But if it is, we might be able to trace the two patterns back to a common cause.


Conclusion

In this post, I’ve examined several features of Voynichese that appear to vary slightly but consistently by average rightwardness (in lines) and average downwardness (in paragraphs).  These have included the prevalence of words that begin with, end with, or contain particular glyphs; the prevalence of individual glyphs; the prevalence of adjacent glyph pairs without regard to spacing; the prevalence of spacing between particular glyph pairs; and the prevalence of repetitions following different abstract forms.  I haven’t yet tried to scope out the full extent of differences within any one of these categories, and my goal so far has been primarily to establish whether any meaningful patterns of each kind exist.  But if they do, to even the limited extent I’ve already tried to show, then I want to suggest that we should no longer regard the widely recognized patterns by which and f appear mostly at the beginnings and in the first lines of paragraphs, and by which and g appear mostly at the ends of lines, as isolated problems (for a recent articulation of which see pages 9-10 in René Zandbergen’s thoughtful paper, “The Cardan grille approach to the Voynich MS taken to the next level”).  Rather, they would stand revealed as just the most conspicuous manifestations of a kind of positional variability that also constrains numerous other characteristics of the text.

Some of the most basic morphological building-blocks of Voynichese have turned out to display subtly different average positions within lines and paragraphs—positions which we can represent by pairs of coordinates, one for average rightwardness, one for average downwardness.  Maybe we could call these pairs of numbers “qoordinates” in the same playful spirit that has led some people to call Voynichese words “vords.”  I rest my case for their statistical significance on two pillars: (1) a comparison of differences among actual scores with the distribution under a null hypothesis, which suggests that they rise above the level of background noise; and (2) the apparent consistency of the patterns among themselves.  In support of the latter, I might cite the similarity in contrastive scores between (1) minimal word pairs beginning with Sh and ch, analyzed in connection with other words; (2) the glyphs Sh and ch themselves, analyzed in connection with other glyphs; and (3) bigrams containing Sh and ch, analyzed in connection with other bigrams.

That consistency also strengthens my belief that words (i.e., “vords”; strings of glyphs with spaces to either side) aren’t the most promising long-term object for studies of rightwardness and downwardness.  So far, the best predictors of the rightwardness tendencies of words seem to be the rightwardness tendencies of their component parts.  Bearing that in mind, it seems more likely to me that these parts were ordered as they were one by one, according to their individual probabilities, than that someone continuously chose statistically appropriate words “whole” from an existing list, each time potentially needing to weigh multiple factors at once.  Furthermore, one of the phenomena I’ve linked to positional variation is the presence or absence of spacing, and it’s spacing that defines words in the first place.  I don’t mean to suggest that what’s generally called Voynichese “word structure” isn’t an important piece of the puzzle, but I suspect it might be more fruitfully applied to whole lines (or paragraphs) as continuous streams of alternating cyclical elements in the manner of Jorge Stolfi’s OKOKOKO.  My above analysis of the glyph sequences ydaiin and ykaiin hints at the avenues this could open up, as well as the roadblocks to understanding which might be erected by deference to word boundaries.  It’s possible that decomposing or categorizing the script itself differently than EVA does would better suit the study of rightwardness and downwardness as well.  In support of this latter idea, I point to my finding that differences in spacing associated with rightwardness vary according to the graphical distinction between what I call “minimars” and “curveletars,” which EVA tends to obscure.

Most of the positional patterns I’ve focused on here are, to the best of my knowledge, original discoveries with me (but please correct me if I’m wrong).  One exception is the rightward tendency of relative to o, which J. K. Petersen pointed out for ain/oin, aiin/oiin, and aiiin/oiiin in 2020, speculating that aiin and aiiin might have been used as filler for padding out lines.  Petersen also notes that aiiin trends more decisively rightward than aiin, which my statistics bear out (the whole set of word tokens ending in aiiin scores 0.586; for aiin it’s 0.487, and for ain it’s 0.489).  Smith and Ponzi also commented on the second-word “peak” of Sh and ch within lines, although they didn’t generalize it beyond Quire 20.  Otherwise, apart from the proclivities of p, f, m, and g—which would be hard to miss—and observations about word length varying by position, I haven’t seen any prior discussion of rightwardness or downwardness as broadly influential factors in Voynichese.  But I started finding rightwardness patterns myself only when I went out of my way to search for them, inspired at first by wanting to double-check Petersen’s discovery.  I don’t think these are a kind of pattern anyone would otherwise have had any reason to expect, and what we find depends largely on what we look for.

If this were only a matter of turning up other patterns similar to those already associated with p, f, m, and g, tied to the first lines of paragraphs and the last glyphs in lines, it might not have any radically new implications.  It is, rather, the apparently continuous nature of some rightwardness patterns across long stretches of line that I believe most stands to impact fundamental notions about how Voynichese might function, if further review bears it out.

Several past studies have already provided compelling evidence that Voynichese words aren’t distributed randomly relative to each other but are mutually “entangled” (my term) in various ways.  I have in mind particularly the work of Torsten Timm on similarities among nearby words (here and here) and an article by Emma May Smith and Marco Ponzi on glyph combinations across word breaks.  But these lack the element of directionality which I believe these latest findings introduce.

If my results hold up to scrutiny, then it would appear that some mechanism must have existed to change the relative probabilities of particular morphological features continuously over the course of lines.  So what could that have been?

We might turn for an answer to Timm’s “self-citation” hypothesis, according to which new text was generated by copying earlier text while making various changes to it.  However, I believe we’d need to modify it to specify that these changes were biased in a particular direction, towards phenomena with higher rightwardness scores, such that these would tend to accrue over time.

Another possibility I find attractive is a cumulative differential cipher—one in which information is encoded in the differences between successive units of ciphertext, and in which those differences are biased subtly towards phenomena with higher rightwardness scores, such that those phenomena naturally become more probable with increasing rightwardness.  For a crude example of the dynamic I have in mind, imagine a cipher using Roman numerals in which each plaintext letter is encoded by adding a quantity to the previous sum, ranging from A=I to Z=XXVI.

VIII XIII XXV XXXVII LII LXXII LXXX LXXXV CIII CVIII CXXXI CXLVI CLXIV CLXXVI CLXXX

In this scenario, it would take time to build up to the Roman numerals L for 50 and C for 100, which would accordingly skew rightward.  Thus, it’s easy to imagine at least one fairly simple kind of cipher that would naturally display statistical differences linked to rightwardness, if not necessarily differences of precisely the form we see in the Voynich Manuscript.

Other explanations might suggest themselves too.  As unexpected as differences tied to a rightwardness continuum might be, then, their mere existence shouldn’t be taken as baffling in and of itself.  Of course, the fact that plausible explanations can be imagined doesn’t necessarily bring us any closer to finding a correct explanation.  But hopefully finding that there’s something else to explain will have been a good start.

5 thoughts on “Rightward and Downward in the Voynich Manuscript

  1. Amazing work, Patrick. I tried very hard to follow as I am not a statistician. I took a break for a while from Voynich but now I’m back to the imagery and made some decent breakthroughs just in the last few days. It’s very likely we have a Christian hermetist/alchemist behind this with a particular interest In quintessence, who is referencing Rupescissa and his analogies of distilled quintessence to the blood of Jesus, the Philisopher’s Stone to resurrection/immortality, etc. This was not uncommon. I’m calling my article Distilling Jesus. I’ve even got great support for my idea that the law of correspondences is what changes glyph meaning in the text, i.e., in the balneoligical section, plants and animal parts that correspond to body parts (so “o” as head, flower head, god/spirit, aries; or c- as feet, roots, matter, pisces). Stigmata! Spikes through a root instead of feet, bloody flower (crown of thorns) instead of head, a beaten dead animal as sacrifice instead of the 7th stigmata of bruised and battered Jesus body, and a nymph spread out like Jesus with both hands pinned in branches, and the dots below her in three wound/mandorla shapes that symbolize quintessence! Just missing the side wound but don’t really expect to find it because as quintessence or aqua vita itself, it is a purposeful null for most of the tables. Or rather, becomes part of another glyph would be more accurate. That upside down v, which is quintessence, Leo, the heart, the seed, etc. I just knew that 5 meant something! Quintessence, not pentagram!

    Your last paragraph interests me very, very much because with such a prominent theme of distillation emerging from the symbolism, that is exactly what I think might be worth looking at. I’m keeping at these tables of correspondence not only because they’re proving useful for analyzing the symbolism, but because I think we’ve either got an exceptionally complicated cipher based on the glyphs as symbols, or a simple one with a simple cipher like that Polybius square I once mentioned, but we need to rule glyphs out and can only do so when we know what they are.

    If so, only 5- 7 glyphs actually might hold meaning, and in some way represent the essence of the text.

    The most obvious “essence” glyphs are actually, as I’m sure you’ve guessed, o, c-, and c. O for spirit, c- for anima, another c for quintessence. But one c has very possibly been converted into “a” because quintessence is an element and attached or “hidden” in water so add the “i”. So o, c-, a.

    Not enough, is it.

    I would try, then, 8airox. Meaning respectively: Homo, anima, element, form, spirit, matter, and add a c or c- for quintessence as the ringer. The title of 57v, the alphabet page. Our author’s a tricky bastard!

    I’m going to try a poly square right now with and without the c. Maybe without the ‘i’. Why not? You’ve inspired me with your conclusions!

    Great work, Patrick! An exhaustive amount of work, presented beautifully, and inspiring to our cryptologists and image analysts alike!

  2. I had no idea how truly regimented the VMS is. The vast majority of those combos are rarely if ever seen. They’re usually all mediated in some way. Tells you something in itself.

  3. Did you remove label text, radial text, circular text, etc, before running your tests? Your yk- stats etc sound like label text is in the mix.

    I’d also strongly advise partitioning data into sections: A vs B is a good first step, but Q20 and Q13 are really good sections with very different behaviours. Grouping everything together will very often give misleading results.

    Finally, bigram stats only make sense post-parsing, i.e. once a given parsing has gone through the text. You can’t get stats for ( da, ai, ii, in ) all at the same time without falling into some parsing trap, if that makes sense.

    • Nick– Thanks for reaching out. I did remove anything without a “P” metadata flag in the Zandbergen transcription, so there shouldn’t be any labels, etc., in the mix. I agree that limiting the analysis to specific sections would be advantageous, and I can only plead that this was intended more as an initial experiment than a thorough working-through of anything. The one seemingly strong pattern I found linked to absolute endwardness was probably an example of the “misleading results” you mention, although I think the risk with the relative parameters is more just that the results will be blurry, so to speak. (I’d cheerfully entertain any arguments to the contrary.) Your point about bigram stats is well taken. I’m doing some follow-up work right now on probabilities of glyph adjacency by line position, and calculating the probability of an “i” appearing after another “i” turns into a nice paradox, given that “ii” is more common than “i.” I think I found a solution that does what I want. But in the meantime, I’ll admit I thought the dots on my one graph for “i_i” and such looked a little silly myself.

  4. Pingback: Transitional Probabilities in the Voynich Manuscript | Griffonage-Dot-Com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.