Digital face averaging has been a perennial topic on this blog, and in this post I’d like to share some of my latest results, which I think far outmatch what I was able to do just ten months ago. I’ll discuss techniques below, but let’s start by letting the images speak for themselves. Click on any image to view it in full resolution (usually 1000×1000 pixels per face).
First, here are some averages of photographs returned by Google image searches on the names of specific people.
Next, here are some averages of multiple paintings of specific people who lived before the birth of photography.
We can also generate averaged portraits of specific people from sequences of video frames.
Now let’s turn to some examples of face averaging defined by general categories rather than specific people as subjects.
We can also present face averages from multiple perspectives side by side.
Multi-perspective face averages can also be presented as animations—
—or as anaglyph 3D images (get out your red-cyan glasses if you’ve got ’em):
Sometimes the technique I use to generate multiple perspectives—which I’ll describe below—results in images that differ in more ways than perspective alone, for instance if we’re dealing with an artistic genre in which different orientations of the head are associated with different styles or treatments. Here are examples from a few of the “same” projects shown above.
The examples I’ve shared so far have all drawn on image quantities that are large enough to smooth out the distinctive quirks of individual sources (except perhaps around the edges, depending on the treatment of the background). The next few examples instead embrace the aesthetic effects of averaging a smaller quantity of sources.
And some more time-based averages.
If we average hundreds or even thousands of images of a generic subject all at once, the results have the distinctive “hyper-airbrushed” appearance exemplified by the following run-of-the-mill specimens.
When I look at other face/portrait averaging projects out there, I don’t sense that there’s been much overlap between “artistic” averaging and “scientific” averaging (think Jason Salavon versus Shiry Ginosar)—but I don’t see why there shouldn’t be. We could draw an analogy here with audio restoration, where accuracy and aesthetic impact can both be valued at the same time, and where optimal results can require a measured compromise between the two.
At first, the out-of-the-box face averaging software I had at my disposal (Abrosoft FaceMixer) didn’t give me much control over the process, but in 2016 I’d began intervening in it to try to get better results (e.g., by exporting warped face images to average in customized ways), and in 2017 I started writing my own face-averaging code. For landmark detection, I relied initially on some tools built into MATLAB that define four-point bounding boxes around faces, eyes, noses, and mouths. These worked for me up to a point, but they weren’t very precise: they’ll try to pinpoint where a whole eye is, for example, but won’t reveal anything more about its shape or features. I’ve since switched to using Yuval Nirkin’s Find_Face_Landmarks, which draws on the dlib library via OpenCV to identify far more points than those built-in tools—sixty-eight, to be precise. It still doesn’t define any points above the eyes or on the ears, which I wish it did. However, I’ve tried interpolating some points around the forehead area based on other points, with decent results. (Affected images above are described as having “augmented points” as opposed to “default points.”) Google’s ARCore will detect a whopping 468 control points covering the whole face including the forehead, which would be another big step forward, but it’s designed to work with the real-time output of a digital camera in Android, Unity, Unreal, or iOS, and I haven’t yet invested the time to figure out whether it could be repurposed to my ends. I’m curious what kind of point detection the Face Lab at Liverpool John Moores University used for their impressively (but anachronistically?) sharp averages of photographs of nineteenth-century British and Tasmanian convicts.
Even with just sixty-eight points, the improvement over what I was doing before has been great enough to make me feel a little embarrassed about the quality of some of my previous efforts. Witness the following old and new results drawn from my dataset of Valparaiso High School yearbook portraits, which I published for purposes of comparison in my hundredth Griffonage post.
I’m still using the old built-in tools for the initial step of detecting faces in source images, although I apply different thresholds depending on the project and an effort to balance the importance of finding faces against the aggravation of having to weed out lots of false positives. One important strategy at the detection stage is rotation. I start with each source image “as is” and try to detect faces in it. If none were found, my earlier software would rotate the source slightly in one direction, and then the other direction, and then a little more in each direction, until there was at least one hit, up to a particular threshold (say, a forty degree rotation, after which it would give up). With images containing multiple faces, of course, this approach risks missing some of them. The latest version of my code just rotates once forty degrees clockwise and once forty degrees counterclockwise to ensure that faces won’t be detected multiple times. One consequence is that the edges of the resulting averages can have conspicuous angles corresponding to the forty-degree rotations.
Some face averaging efforts take a mean average, others take a median average, and yet others take a mixture of the two. However, when we’re dealing with hundreds or even thousands of source images, a mean average renders outlying features much less distinct than a median average does, so it’s usually not a very attractive option.
If control points were added around the shoulders, outer hairline, and so forth, the mean average might be more competitive, but as it stands, I almost always choose to go with the median (the only exceptions above are “Face Mask” and “Construction Worker”). Even with a median, though, averages based on very large quantities of images can become blurry and indistinct. Philosophically, the more images the better; but practically, reducing the quantity of images can yield a sharper result—if that’s what we decide we want.
Smaller image quantities can also make the workings of the process more conspicuous or produce interesting abstract background effects (as seen above most notably in the Vogue cover examples). Below are a few further examples to illustrate, each involving the kind of generalized female portrait used to promote commercially available face-averaging apps such as this one. As long as we keep the number of sources below a certain level, a mean average can yield an effect vaguely reminiscent of mist or gossamer veils.
With comparable source quantities, a median average tends to produce vibrant geometric patterns that give an impression of something hastily sketched and slightly abstract.
When I’m working with generalized categories of face, I ordinarily flip faces horizontally as needed to make them face all in the same direction so that leftward and rightward facing sources don’t cancel each other out. By default, my script flips images to face towards the viewer’s left. When I’m working with a specific individual, on the other hand, I tend not to do this because I want to preserve personal asymmetries such as the wart on Abraham Lincoln’s right cheek. (I’ve been referring to this step in image captions as “harmonization.”)
I’ve also tried different approaches to handling the space beyond the borders of source images, which will typically come into a project cropped in different ways, sometimes even missing part of the face itself. Filling the space in black is easiest, since the added pixels simply have a value of 0, but it produces a conspicuous black border around the resulting average. Filling it in with white pixels at a value of 255 has a similar effect, except that the border becomes white. Filling it in with an average color calculated from each image produces a subtler border in which the content fades gently into a neutral tone, usually a shade of brown. Finally, rendering the area beyond the border of each source image transparent—by assigning its pixels a non-numerical value for averaging purposes—will result in a background that becomes sharper and more vibrant towards the edges as fewer and fewer source images are factored into it, often with recognizable features such as lettering peeping through. My current code outputs all four background types by default so that I can choose whichever one I like best in any given case.
A number of the averages I’ve presented here fall into the category I’ve been calling algorithmic autoportraits, based on averaging the results of Google image searches on particular search terms (either taking every returned image “as is” or curating the results manually to exclude irrelevant ones). When I first started making averages of this kind a couple years ago, I used an inconvenient multi-step process that involved loading a page of image results into a browser, scrolling down until I hit the “Load More Results” button, extracting a list of image links, and then downloading the images separately. Later, I found a Python script written by Hardik Vasa called “Google Images Download” that seemed to do what I needed more efficiently, so I started using that. There was one drawback: without some further setup involving Selenium and Chromedriver (a rabbit hole down which I wasn’t eager to go), “Google Images Download” was limited to grabbing a hundred images per search. However, it also contained a handy provision for limiting a search by date range. By creating batch scripts, therefore, I could download as many different groups of a hundred images as I could specify non-overlapping date ranges, gathering thousands upon thousands of images with ease. Meanwhile, this mechanism also provided a handy way of combining the algorithmic autoportrait with time-based image averaging. Instead of averaging all the Google image results for “Barack Obama” or “Miley Cyrus” all at once into a single image, I could average them grouped separately by year, month, or day and then arrange the output frames into still sequences or even turn them into videos. I found out that Google images can be searched by date back to February 12, 2008, which gives us over twelve years’ worth of image data to play around with.
It’s true that the date associated with an image appears only to reflect when Google indexed it, and that an old picture posted on a blog today will register as a “new” picture as far as dating goes. Many dates therefore don’t correspond to when images were actually made. However, as long as the majority of images returned for a given search were posted and indexed soon after they were created, the averages should still illustrate change over time. We should be on particularly safe ground in the case of celebrities who are constantly in the news, since the flood of current photos would tend to overwhelm the impact of any retrospective ones. As an early proof of concept, I chose the search term “Selena Gomez,” and the results presented below seemed to bear out my assumption. I curated the dataset manually to remove pictures of other people, but not in any other way.
Sometime in the spring of 2020, Google redesigned its format for image metadata in a way that caused the “Google Images Download” Python script to stop working. I found some efforts to create alternatives on Github, but none of them seemed to suit my purposes, so I wrote a new Python script of my own which is simple but effective. Here it is if you want it, although I don’t know how long it will work (given the likelihood of future changes at Google), and it’s inclined to hang at the end of a project for reasons that are mysterious to me (as a Python neophyte). My default script specifies image type “face” among the search parameters, but I’m not sure whether or not that’s a good idea.
When I’m aiming to gather a lot of images, without particularly caring when they’re from, I’ve mostly been searching hundred-day date ranges from sometime in 2008 or 2009 to the present, which pulls up between 3,500 and 3,750 image files for a well-represented search term. This is what captions mean above when I say “scraped in 100-day increments.” The number of sources given in each caption above corresponds to the number of extracted faces in a whole project, some of which may have failed import into later processing stages.
The kind of face averaging I do involves warping images beforehand to bring facial features into their average positions. Standard operating procedure within the art has been to take the average positions across all the source images, but we can also take the averages of the positions in subsets of images, such as groups with faces oriented more or less obliquely towards the observer—the parameter known as yaw. Thus, we can generate a set of control points from the “leftwards” half of the images, and another set from the “rightwards” half of the images, and create two averages using them (drawing either on all the images at once, which will make the results more similar-looking and can be preferable with smaller datasets; or from just the images within the subsets, which may do a better job with ears, noses, and other such features that differ a lot depending on perspective—all examples shown in this post take the latter approach). As long as our source base is large enough, the result should simulate the left and right halves of a stereoview. Combining the left and right views produces too extreme of a contrast in perspectives for good stereoscopy, but combining left-and-center or right-and center views can produce a fair illusion of depth.
More recently, I’ve revised the script to let me group source images for averaging into any desired number of categories based on yaw (the leftward or rightward orientation of the head). The groups are divided on the basis of equal quantities of images, not amounts of yaw. This also gives me more flexibility when it comes to choosing pairs of images for stereoscopy. I label the results as whole when using the complete image dataset, or 1/2 and 2/2 when divided into two perspectives, or 1/3, 2/3, 3/3 when divided into three perspectives, or 1/4, 2/4, 3/4, 4/4 when divided into four perspectives, and so on.
For most of the examples presented above, I’ve chosen some perspective other than the “whole.” I usually have far more source images than I’d need to produce a decent average, so I can afford to throw some of them away in the interest of securing a more compelling facial orientation. But sometimes different “perspectives” vary in other aspects besides.
So there you have it! Comments, criticisms, commissions? The line is open.