Time-Based Averaging of Indiana University Yearbook Portraits

To help celebrate Indiana University’s upcoming bicentennial in 2020, I’ve been applying time-based averaging techniques to university-related images that document long-term change on a scale far beyond that of ordinary time-lapse efforts.  These techniques can be used to create either still images or video animations, and I’m aiming to do both, maybe leading to a film and a companion book (though it’s a bit early to tell).  My working title for the project as a whole is Time Passes at Indiana University.  In this blog article, I’ll describe one of several sub-projects I hope to fold into that larger one: namely, the time-based averaging of student portraits from Indiana University’s yearbook, the Arbutus.

Let me illustrate what I have in mind.  First, here’s a pair of averages from the Arbutus for 2016, the most recent volume available to me at the time of writing.

Next, here are similar averages from ten years earlier (2006).

And another couple averages from the Arbutus of nine years before that (1997).

Each of these image pairs looks distinctly different from the others, reflecting changes that took place gradually over the spans of time that separate them—changes in fashion, in ethnic mix, and in the style of the portraits themselves.  Not only can we create averages for widely separated points in time like this, but we can also assemble timelines made up of averages for successive years in which the differences between individual images are more subtle.

From there, it’s only a short step to creating animations in which long-term change can be viewed unfolding smoothly and seamlessly over time.

The examples I’ve shown above are just a proof of concepta teaser, if you will, of what’s possible.  Granted, the averages for 2012-2016 alone already incorporate a total of 3,442 source images, which is nothing to sneeze at.  But there are over a hundred more years’ worth of Arbutus student portraits just waiting to be processed, and they should allow us to make a continuous set of averages—and continuous animations—running back to at least 1906.  (The Arbutus actually began publication in 1894, but for the first dozen volumes it was inconsistent about printing individual student photographs.)

I came up with the idea for this project back in the summer of 2014 and even published a few preliminary results at the time, including this average of senior female portraits from 1919—

—but it’s only recently that I’ve streamlined my processes enough to make it feasible to tackle face averaging on the kind of scale that will be needed to carry it out.

Although it’s only tangential to my current plans, we can also separate out subsets of yearbook portraits for averaging based on criteria other than date and gender if there are specific communities or characteristics we’d like to follow.  For instance, we can easily contrast the average residence-hall female portrait of 2006 (below left) with the average sorority portrait of the same year (below right), thanks to the way in which portraits were divided into categories in that year’s Arbutus—and they do look different.

We can also sort out portraits based on any criteria that can be recognized by eye, although this is rarely a straightforward call and ought to be attempted with caution, particularly where complex issues of identity are involved.  With that caveat, below are my attempts at averaging (a) Black female portraits from 2016; (b) blonde female portraits from 2016; and (c) portraits of males with facial hair from 1986.

Of course, we can also create time-based averages for any of these subsets as long as we can find enough source portraits from multiple years in a row.


Background

I’ve covered the principles behind this project in several past blog articles.  Time-based face averaging can be traced back at least to Nancy Burson’s First Beauty Composite and Second Beauty Composite (1982) and Jason Salavon’s The Class of 1988 & The Class of 1967 (1998), and Larry Chait may have been the first person to apply it to more than two contrastive dates in his six-image video slideshow The Changing Face of Crime (2013).  As far as I’m aware, however, the idea of creating a smooth, continuous animation in this way was original with me, and I don’t know of anyone else who has yet tried doing this.  Some of my past efforts show the evolving face of Miss America (“Face Averaging as a Historical Technique,” July 1, 2014), stylistic change in Fayum mummy portraits and Western European paintings of women during the transition from late medieval to Renaissance art (“Face Averaging and Art History,” July 14, 2014), the rise of the smile in high school yearbook photos (“‘Say Cheese!’: Using Face Averaging to Track the Rise of the Photo Smile,” December 18, 2014), and the aging of Abraham Lincoln and Rembrandt van Rijn (“Time-Based Image Averaging,” October 31, 2016).

Meanwhile, a few other researchers have begun carrying out more ambitious time-based face-averaging projects, the most noteworthy example to date being “A Century of Portraits” by Shiry Ginosar, Kate Rakelly, Sarah Sachs, Brian Yin, and Alexei A. Efros (2015).  Like me, Ginosar et al. attempted to trace the rise of the smile in high school yearbook photos over time, and their results received some nice media attention that suggests this line of research can have broad popular appeal.  That said, their focus was a little different from mine.  They framed their work mainly as a case study in mining useful information of various kinds from large datasets of images—in this case, not only about the smile, but also about trends in hairstyle.  Although they published still images of male and female portraits averaged by decade, they seem to have understood these as just one among multiple options for sharing patterns they had discovered, alongside others such as smile-intensity graphs. It was left to Aymann Ismail at Slate to create a video from their dataset to accompany an article about their findings (and it was more rapid slideshow than animation, much like The Changing Face of Crime).  For their own part, they were seeking to explore innovative methods of extracting data, but not necessarily innovative methods of displaying data.

For me, by contrast, the display’s the thing.  I see this project as belonging to the same branch of the digital humanities as my efforts to translate graphic representations of sound on paper into audible form (“How to ‘play back’ a picture of a sound wave,” November 27, 2014; “What is Paleospectrophony?,” October 15, 2014).  Under some circumstances, audio, images, and video can convey vast quantities of information to our senses more effectively and appropriately than other modes of presentation, and emerging computational approaches are just as well suited to translating large-scale data into audio, images, and video as they are to extracting it from audio, images, and video.  From the perspective I’m taking, time-based image averaging straddles the boundaries between research, art, and media production.  The results should enhance our understanding, but aesthetics matter too.  And that can mean taking extra measures that might not be justified from a strict research standpoint, such as retaining color information whenever it’s present rather than reducing everything to grayscale for consistency, processing at a higher resolution than needed for sheer pattern recognition, and cropping source images to a frame size that significantly exceeds the analytical region of interest.

That brings me to the developments I’ll be describing here.  Ginosar et al. automated their image processing in ways that made it practical for them to review 154,976 portraits and to accept 37,921 of them into their study.  By contrast, my methods have depended in the past on lots of manual work, ranging from the selection and cropping of faces to the laborious fine-tuning of control points and one-by-one export of individual warped images in Abrosoft FaceMixer.  Those methods don’t scale well, which has kept me from tackling seriously big datasets.  Recently, however, I’ve cobbled together a set of algorithms that seem to be producing results just as good as the ones I was able to obtain before, but with a lot less human intervention.  Averaging dozens upon dozens of volumes’ worth of Arbutus portraits without compromising on quality suddenly seems quite doable.


Securing the Source Material

Arbutus portraits have undergone a number of changes over the years that make it challenging to take a consistent approach to averaging them.  They’ve been printed in color since 2012, but they were monochrome before that, so the transition from 2011 to 2012 could end up looking like Dorothy entering Oz.  In the most recent volumes, all the individual portraits have been lumped together into a single alphabetical sequence, but as we go back in time we often find them separated out into different categories, sometimes by school or college, and sometimes by residential affiliation: Greeks, Residence Halls, Off-Campus.  At other times, some of these groupings—especially fraternities and sororities—have instead been represented by group photos separately from the individual student portraits, and every now and then we find a mixed approach.  Sometimes individual portraits have been organized by class (Freshman, Sophomore, Junior, Senior), while sometimes there has only been a Seniors section, and sometimes there have been Senior portraits in one size and individual portraits of fraternity and sorority members in another size.  My inclination right now is to include all individual student portraits in the project, but for most years prior to 1906, there weren’t any sections of individual student portraits, so we might have occasional grounds for resorting to group portraits as well.  Or not—I still haven’t decided.  Nor am I sure whether the project should be limited just to the Bloomington campus or include other campuses when they were represented (as with the Indianapolis section in the 1946 Arbutus).

Some pesky practical obstacles have cropped up as well.  Digital images of scattered earlier Arbutuses are available through Archive.org, and iuyearbook.com offers scans of the volumes from 1960 through 2002, but the latter at least aren’t of high enough resolution for my purposes.  Fortunately, the reference room in Wells Library holds a set of Arbutuses in convenient proximity to the scanners in the Scholars’ Commons Digitization Lab, but the library catalog notes that it’s missing some years (1894, 1907, 2009, 2010), and at least one that’s supposed to be there isn’t actually on the shelf (1998).  I’ve also run into cases there already where a binding is too tight for flatbed scanning (1976) or where some portraits have been razored out (1992).  I can probably fill in the gaps using the set of Arbutuses at the Monroe County Public Library, or in some cases by requesting duplicate copies held at Indiana University’s own Auxiliary Library Facility, but the process of gathering source material has proven less straightforward than I thought it should be at first glance.  Purdue University yearbooks would be lower-hanging fruit.

Whenever possible, I’ve been scanning portrait pages in pairs at 400 dpi, beginning with the 2016 Arbutus, which makes for a nice example:

There are 620 portraits here in all, and 611 of them—if I’m counting correctly—are portraits of undergraduate seniors, but this number comprises only a fraction of the entire graduating senior class.  A total of 7,316 people reportedly received baccalaureate degrees from Indiana University Bloomington during the fiscal year 2015-16, only 8.35% of whom had their portraits in this section of the Arbutus.  Women and international students also seem to have comprised a higher proportion of that 8.35% than they did of the student body as a whole.  That said, we’re dealing here with a legitimately self-selected group of students who opted to put their faces on record, and our dataset is plainly “representative” of that group, however we may choose to understand its relationship to Indiana University students more generally.  The quantity of these portraits fluctuates as we go further back in time, and it’s not uncommon to find close to two thousand per volume.  We might be looking at something in the neighborhood of 150,000 portraits in all.  Picking out all these yearbook portraits by hand would be awfully time-consuming, but fortunately there are shortcuts we can take.


Viola-Jones Face Detection

One of the most popular and effective tools for programmatic face detection is the Viola-Jones object detection framework.  There’s a sample MATLAB script for implementing it here, under the heading “Detect Faces in an Image Using the Frontal Face Classification Model.”  On its face, this algorithm is concerned only with face detection, and not with aligning the positions of facial features as needed for averaging.  However, it’s designed to define “bounding boxes” around the faces it detects, and the eyes, nose, mouth, and so forth appear in reasonably consistent positions relative to these boxes.  I wasn’t sure whether simply cropping and overlaying these bounding boxes would align faces well enough for averaging purposes, but I decided to find out.  With that in mind, I tweaked the sample MATLAB script so that it would triple the and dimensions of the selected area around the faces and export the results as separate image files after detection, consistently resized to 1000×1000 pixels:

faceDetector=vision.CascadeObjectDetector;
[A,B]=uigetfile({‘*.gif;*.bmp;*.tif;*.tiff;*.jpg;*.jpeg;*.png’,’Select MULTIPLE image files’},’Select MULTIPLE image files.’,’MultiSelect’,’on’);
if(iscell(A)==0)
fprintf(‘Requires multiple image selection.’);
return;
end;
for q=1:size(A,2)
I=imread(strcat(B,A{1,q}));
bboxes=step(faceDetector,I);
for i=1:size(bboxes,1)
bboxesexp(1)=round(bboxes(i,1)+(bboxes(i,3)/2)-(bboxes(i,4)*1.5));
bboxesexp(2)=bboxes(i,2)-bboxes(i,4);
bboxesexp(3)=3*bboxes(i,4);
bboxesexp(4)=3*bboxes(i,4);
J=imcrop(I,bboxesexp);
J=imresize(J,[1000 1000]);
imwrite(J,strcat(A{1,q},’_’,num2str(i),’.jpg’));
end;
end;

This script found 601 of the 620 portraits in the scans I’d made from the 2016 Arbutus, or about 97% of the actual total.  That’s not perfect, but the missing 3% should be statistically insignificant for purposes of face-averaging on this scale.  The same script also returns an annoying number of false positives, but at least these tend to cluster at the beginning and end of the results for each image file, making them easy to weed out by hand.

Next, I sorted the 601 images manually into 221 male portraits and 380 female portraits.  It’s true that a good gender-recognition algorithm might be able to automate this step, and I’ve experimented with the MATLAB code for “gender recognition from face images with trainable COSFIRE filters” by George Azzopardi and Antonio Greco, available here.  The authors claim a success rate of 93.7%, and I was able to achieve 92.2% accuracy by substituting a dataset I drew from the 1996 Arbutus (100 training and 100 test images for both males and females, or 400 images total).  That’s not bad, but not quite good enough for me to want to rely on it either.  Ginosar et al. took a hybrid approach to gender recognition in their work, combining machine learning with crowdsourcing to resolve “difficult cases,” but they don’t report what proportion of cases were “difficult.”  Anyhow, I’m still doing all my gender-sorting by hand at the moment.  It’s usually pretty easy, but not always; sometimes I’ve ended up reviewing a student’s name when I’ve been unsure, and I’ve occasionally been stumped even then.  I’ll also concede that this approach forces all portraits into binary male and female categories which won’t reflect any nuances of gender fluidity.

In any case, the results of averaging the overlaid bounding boxes have turned out surprisingly well.  Below is a composite of median averages I created from male and female portraits for the years 2011-2016, organized chronologically from left to right and then from top to bottom.

The original bounding box extends beyond the edges of the yearbook portraits in most cases, but we can always crop our results to match the original dimensions more closely if we like.  Below is another sample sequence covering the years 1999 through 2008 (incorporating a total of 9,374 source images).  The portraits for 1999 were cropped more closely in print than usual, but for the most part the printed source images are compatible with the chosen window.

These results are already pretty decent, considering how little we’ve done to achieve them at this point.  But with a few more automated processing steps, we can make them even better.


Compensating for Asymmetry

The process I’ve described so far assumes that yearbook portraits will be more or less vertically symmetrical, and it works well as long as subjects are mostly forward-facing.  Even though faces will inevitably be turned a little to the right or the left—these aren’t passport photos, after all—the divergences from vertical symmetry will either balance each other out if they’re evenly scattered or reinforce one orientation if they’re not.  Any actual facial asymmetry will be factored out as well.

Problems can arise, however, when subjects are consistently posed facing at an angle away from the camera approaching a half profile in different directions.  This aspect of head position is known technically as yaw.  One solution would be to throw out any faces that deviate significantly from a forward orientation. Thus, Ginosar et al. applied a pose estimation algorithm called IntraFace to their dataset and excluded any images it determined weren’t forward-facing.  Fewer than one in four source images passed this test, reducing an initial dataset of 154,976 portraits down to only 37,921.  But I don’t want to throw out any of my source images if I can help it, so I’ve been trying a couple other strategies for mitigating the effects of asymmetry.  The first is to force an artificial symmetry on each of the individual source images.  The other is to harmonize poses by flipping or rotating images as needed to make them mutually consistent.  I prefer method #2, but I’ll describe both methods here for good measure.


Forcing Artificial Symmetry

The first method, forcing artificial symmetry, involves flipping a source image vertically and superimposing its flipped and unflipped versions (let’s call this “flip-mixing” for short).  Five variants are illustrated below as applied to a median average of 2016 Arbutus portraits.These were produced as follows:

  • By adding the image pairs (variant 1, counting from left to right).
  • By multiplying the image pairs.  This can reinforce the similarities between them more strongly than addition can, using a logic analogous to that of the audio cross-modulation technique I described here, but how we go about doing it makes a big difference.  Pixel intensity values range from 0 (darkest) to 255 (brightest).  If we multiply two images “as is” (variant 2), the resulting values will range from 0 to 255 squared, or 65025; and if we then divide those values by 255, we return to the desired 0 to 255 range.  In this case, the default tendency of pixels is to be dark, and output pixels are conspicuously bright only when they’re bright in both source images.  Multiplying the negatives of the two images instead (variant 3) gives a result in which the default tendency of pixels is to be bright and output pixels are conspicuously dark only when they’re dark in both source images.  If we convert the 0-255 range into a range from -128 to +127 before multiplication and adjust the sign of the product to match the sign of the sum, we get a result in which the default tendency of pixels is to be gray and output pixels are conspicuously dark or bright only when they’re dark or bright in both source images (variant 4).
  • By factoring out any pixels where the difference in intensity between the two images exceeds a given amount.  Specifically, I’ve tried adding the image pairs as above but then reassigning the value NaN (“not a number”) to each pixel whose source pixels differ in intensity by more than x so that it will be disregarded when we calculate averages across multiple images, and then inpainting any missing pixels in the final result (variant 5, x=20).  This approach has one annoying drawback: a faint vertical line tends to appear down the center of the image where the source pixels have been compared either with themselves or with immediately adjacent pixels.

Flip-mixing isn’t my preferred method of dealing with asymmetry for face averaging, but it may have other useful applications.  One popular technique for exploring facial symmetry and asymmetry more generally—used by Julian Wolkenstein and Alex John Beck, among others—is to create pairs of separate mirror images from the left and right sides of a portrait.  I’ve demonstrated this technique below on a photograph of Alphonse Bertillon, inventor of the mug shot.

The left-mirrored and right-mirrored results can look strikingly different from each other, as my Bertillon example illustrates.  However, flip-mixing offers an alternative approach in which the two sides of the face are averaged to create a single symmetrical composite.

If the question is “what would this person’s face look like if it were perfectly symmetrical,” then I’d argue that flip-mixing can provide another, equally valid answer.


Harmonizing Poses

Another strategy for handling asymmetry in face averaging—and the one I prefer—is to harmonize poses by making sure any asymmetries are consistent as to direction.

For example, we can harmonize the direction in which subjects are facing—in other words, the yaw—by vertically flipping any portraits that face the “wrong” way.  Here the challenge lies in choosing which images need to be flipped.  We can do the sorting by hand, although it can be surprisingly hard to tell by eye which way a subject is facing; the face might be oriented differently from the torso, for instance, and factors such as side lighting or the arrangement of hair—especially long hair—can be misleading.  Alternatively, we can try to automate this step.  The IntraFace software used by Ginosar et al. is no longer supported or available for download, and other pose estimation algorithms I’ve seen, such as this one, don’t seem to do quite what I want, so I came up with my own simple yaw-harmonization algorithm.  It uses the Viola-Jones object detection framework to locate the nose, mouth, and eye pairs—there are two pre-trained models for “big” and “small” eye pairs—and calculate the horizontal centers of all the bounding boxes; then, if the center of the nose box falls to the right of the centers of the mouth and eye boxes, the subject is identified as facing rightward, while if it falls to the left, the subject is identified as facing leftward.  I found that artifacts of half-tone printing were sometimes causing the detection algorithms to fail, so I’ve arranged to apply Gaussian filters with standard deviations of 1, 2, 3, etc., until a specified number of “hits” is reached.  Raising the merge threshold also seems to improve accuracy.

With recent Arbutuses, harmonizing facial yaw mostly just improves consistency across years.  For earlier volumes, though, this step is more crucial because we encounter a greater variety of poses.  Below are three mean averages of the male portraits in the 1894 Arbutus, the one on the left created from the unaltered source images, the one in the middle produced from images harmonized by one version of my algorithm, and the one on the right made after further manual adjustment of eighteen images out of 136.  The unadjusted average on the left looks much as though someone has taken an impression of a face by rolling it back and forth in clay—not so good.  In this case, harmonizing yaw makes a big difference, yielding much more acceptable results.

In addition to yaw, we can also try to mitigate inconsistencies in roll, or side-to-side tilt.  Here a promising strategy is to locate the positions of the left and right eyes and then to level out the angle between them by rotation.  Since roll can affect the results of my yaw-harmonization algorithm, we should ideally adjust for roll before we adjust for yaw if we’re going to do both, although we’d also have the option of adjusting only one or the other.

There isn’t much obvious variation in roll in recent Arbutus portraits, much as there isn’t much obvious variation in yaw; but even so, harmonizing roll produces marginally sharper results in these cases, with the effect on the eyebrows being most noticeable.  Once we get back into the 1990s, on the other hand, roll and yaw vary widely enough to make harmonizing both of them together more obviously worthwhile.  The animated GIF below displays averaged male and female portraits from the Arbutuses for 1997 (monochrome) and 2016 (color), alternating between unharmonized medians and medians with roll and yaw adjusted using the algorithms I’ve described.

I’ve found it hard to quantify the success rates of my algorithms for adjusting roll and yaw, mainly because it’s unclear what’s “off” enough to count as a failure.  However, the roll adjustment algorithm seems to be respectably accurate.  The yaw adjustment algorithm is less so; as a ballpark estimate, I’d say that around one portrait in fifteen comes out wrongly oriented on average, with the failure rate being higher than that with some image sets (e.g., 1996) and lower with others (e.g., 1986).  Practically speaking, it may be acceptable in its current state; after all, for purposes of averaging, we’re probably okay as long as most images get analyzed correctly.  But if we’re not satisfied, we could always adjust any wrongly-oriented faces by hand or try to tweak the algorithm itself.

For present purposes, I’m going to choose the second strategy—harmonizing poses—for further development below.


Sharpening the Focus

So far, we’ve limited our manipulation of images to the relatively straightforward processes of cropping, resizing, rotating, and translating, and we’ve made no effort to ensure that noses or mouths are aligned, beyond assuming that these features will always occupy a roughly similar position relative to the face as a whole.  Accuracy of alignment before averaging is proportional to sharpness of focus after averaging, so what’s at stake here is how sharp or how sharp or blurry our averaged results will be.  To improve alignment, we need to warp images based on control points (e.g., find the nose and move it here, and find the mouth and move it here, and interpolate pixels around them as needed).  Although we could simply assign default fixed values to these control points, my feeling is that it’s better for us to use the average values of all detected points across whatever set of images we’re averaging.

In the past, I’ve used Abrosoft FaceMixer whenever I’ve needed to warp faces, first as part of the software’s own averaging system, and later as a tool for warping only so that I could process the results in other ways.  But FaceMixer doesn’t lend itself to truly large-scale work for a variety of reasons I don’t want to belabor; it just isn’t designed with that sort of task in mind.  So I’ve come up with an alternative approach to warping that’s less ambitious in its analysis, but more rugged.

Rather than writing my own image-warping code from scratch, I’ve been using warpImage.m (available here), although I had to modify it to get it to work by removing the {‘QJ’} from griddata and substituting the line for i=1:numPoints in place of for i=1:1.  I believe this code was originally written by Ethan Meyers (see description here and zip file here), although he’s not credited in the version of it I borrowed, which has been altered to accept numeric arrays rather than cell arrays.  It takes three input arguments: (1) an image matrix, (2) an array of x,y coordinates for reference points within it, and (3) another array of x,y coordinates for the points to which you want to move them.  I’d been searching for a piece of code to do this for months and am grateful to have finally found one.

With this code in hand, we can detect facial landmarks with the Viola-Jones framework, define control points based on the coordinates of the bounding boxes, and warp them to their average positions across a given set of images.  After some experimenting, I found that I could get pretty good results by warping to the centers of the nose, mouth, and left and right eyes; the bottom center of the mouth; and four points 100 pixels to the left and right of the two detected mouth points—a total of nine control points, the last five of which serve mainly to sharpen the lower lip.  Through trial and error, I identified some of the most common mistakes in detection and built in protections against them, generally by defaulting to predefined coordinates whenever detected values fall outside a given range.  Occasionally a face still ends up strangely twisted or contorted, but this is rare enough that it shouldn’t impact averages on the scale we’re dealing with here.

Ready for a trip to the optometrist?  Here’s an animated GIF that shows the effect of the warping technique I just described on portraits from the 2016 Arbutus that have already been “harmonized” as described above.

The results of this warping technique compare favorably with what FaceMixer can deliver even after lots of meticulous manual adjustment of a far larger quantity of control points.  FaceMixer may yield more accurately aligned results with smaller quantities of portraits—say, three or five or a dozen—but when we’re working with hundreds of them, it doesn’t appear to offer much advantage.

I haven’t applied any sharpening filters to any of the examples featured so far in this post—except for a little contrast enhancement, what you see in each case is whatever came straight out of my algorithm.  But the option is there, and while it’s no substitute for aligning the source images, it could make the results of aligned averaging sharper yet.  I’m not sure this would necessarily be a good thing.  However, I’ve given some examples from the Arbutus for 1996 below to show the effect of several successive “smart sharpen” filters in Photoshop.


From Stills to Video

We have several options when it comes to creating animated averages rather than still ones.

One obvious way to create a time-based animation from yearbook portraits would be to generate separate averages for each year and then to line them up as successive video frames.  At a typical video frame rate of around thirty frames per second, however, the animation would then unfold at a rate of thirty years per second, or three and one third seconds per century.  That’s a lot faster than I want.  On the other hand, reducing the frame rate to something like five frames per second—corresponding to twenty seconds per century—would make the results look choppy and disjointed.  To overcome this problem, one strategy would be to interpolate (or “tween”) a number of intermediate frames between each annual average, and the simplest way of accomplishing that would be by successively fading each old image out while fading each new image in.  I see no reason not to do things in this way if we’re using unwarped images.  Once we start warping images based on control points, however, it’s unclear to me whether simple fading is still sufficient.  As long as the average positions of control points don’t vary too much from year to year, I think it should still work.  To illustrate, here’s a sample animation prepared simply by fading between independently warped annual averages for 2012-2016, with three frames interpolated between each adjacent pair.

However, the average control point positions could vary significantly from average to average, in which case we might need to re-warp the averages to the average positions of their mutual control points—certainly doable, but more complicated.

An alternative approach would be to load all our source portraits into a folder arranged in chronological order, with all the images for (say) 1906 first, then 1907, then 1908, and so forth; and then to average successive groupings with a substantial overlap between them, like this:

For example, we might create a video where the first frame is the average of images 1-100, the next frame is the average of images 2-101, then 3-102, then 4-103, and so on.  In this way, we could achieve a smooth animation in which each year was spread out over several seconds.

That’s more or less how I generated the sample animation I presented near the beginning of this article.  In a first pass, portraits were averaged in groups of 150, advancing forty source images per frame for females and twenty-five source images per frame for males (since there are fewer male portraits available).  In a second pass, the initial output was re-averaged in groups of five to enhance smoothing.  My goal in this case was to produce something that would work decently as an animated GIF, which is one reason I opted to reduce the number of output frames, but ordinarily I’d export the frames as a video file—so bear in mind that it would have been easy to create an equally smooth video that would run much longer.  I prefer this approach because it gives more of an impression of seamless continuity than fading between discrete annual averages does (with the latter technique, the animation appears to bounce from point to point regardless of how smooth the transitions are).

The video time base in the scheme I’ve just described will vary with the quantity of source images available for each year.  But we could adjust for that factor by defining the beginning and end points of the source selection for each frame based on percentages rather than absolute quantities: one frame might cover the span from the 95% point of one year to the 5% point of the following year, then 96%-6%, then 97%-7%, etc.

Here too we’d also need to decide whether to re-warp source images separately for each average or to assume that warp points for adjacent years are close enough.  For my sample animation illustrating the technique, I went with “close enough.”


Conclusion

Time-based face averaging gives us an attractive way of revealing patterns hidden in large and unwieldy historical collections of individual portraits.  To the best of my knowledge, nobody has ever tried to create a sequence of averaged yearbook photos from any one institution on the scale I’m envisioning here, so I believe an Arbutus project could break some worthwhile new ground.  Meanwhile, the techniques I’ve described here would leave us with only three manual steps remaining to manage: (a) scanning the yearbook pages, (b) weeding out false positives from automated face detection, and (c) sorting subjects into males and females, which I still wouldn’t trust an algorithm to do.  Those steps would take some time, but maybe not a prohibitively great amount of it—and everything else could be handled automatically, perhaps by a computer left running overnight.  I know there’s still some room for improvement; for example, I’d like to align ears, shoulders, and neckties in addition to eyes, noses, and mouths.  But that would be icing on the cake.

What else do I have up my sleeve for Time Passes at Indiana University?  Well, much as we can average images of faces, we can also average images of places—a subject I hope to explore soon in another post.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s