Face Averaging and Google Image Search Results

“Okay Google, I’d like to connect you with this algorithm I’ve worked out to create pictures of faces by averaging your image search results.  First, show me what you think Man, Woman, Girl, and Boy look like.”

“Well, that gives us a baseline of sorts, but let’s see what you can do with something more specific.  How about ‘Boy with Braces’ and ‘Girl with Braces’?”

“Next, let’s have Blonde, Brunette, and Redhead, all in a row.”

“Now show me a ‘Dia de los Muertos’ face.”

“How about ‘Drag Queen’?”

“And ‘Bridal Veil’?”

“Neat—do I see hints of trees and sky in the background?  But portraits of specific people would be nice too.  What can you do with ‘Daniel Radcliffe’, ‘Emma Watson’, and ‘Rupert Grint’?”


Don’t stop here—there are more pictures to see below!  (Also, you can click on any image to view it in higher resolution.)

The trick, of course, is to think of search terms that will pull up groups of facial images for averaging that tend to share some visually distinctive features in common.  Images of some famous person will all tend to look like that person, although their common features will be the ones that have remained constant over time rather than any that have changed over the course of a life in the public eye.  Images of faces corresponding to some more abstract descriptive category will tend to display any visually distinctive characteristics of that category as it’s represented in Google’s image database, whether these are conspicuous or subtle.

Granted, not every returned image will perfectly match the subject of a search.  Some images won’t contain faces at all, which doesn’t matter, since the face-detection algorithm will simply skip over them.  Others will contain faces that fall outside our intended search parameters.  So, for instance, if we search on “Humphrey Bogart,” we’ll invariably also pull up some images that include the face of Lauren Bacall, and vice versa; and unless we do something about it, these will get thrown into the mix for averaging.  I suppose a face-recognition step could be introduced beforehand, but that would only work if we knew in advance what “face” we wanted to recognize, which would defeat my purpose (at least to some extent).  Fortunately, though, the subject of a search seems in practice to appear so much more frequently than anything else as to average out the mismatches.  With that in mind, I’ve been doing no vetting of images here at all.  Whatever each search returns gets fed straight into my algorithm, which is designed to detect and average faces from them indiscriminately.  The pictures you see represent the unscreened results of the boldfaced queries.  Judging from the likenesses I’ve been getting of specific people, this approach can in fact deliver a surprisingly recognizable averaged view of the subject of a search, and not just of the search results as such.  What we get, then, is a kind of visual archetype corresponding, according to Google’s image database, to a given search term—insofar as it can be represented by a face.  Of course, certain biases in Google data also come through loud and clear.  What race and ethnicity are Man, Woman, Girl, and Boy?

I generate each list of sources by running a Google Image Search, scrolling down until I hit the “Load More Results” button, and then exporting a CSV of linked URLs using a Chrome plugin called Link Klipper.  The number of face images actually incorporated into each average ends up ranging from roughly 150 to 550, depending on the nature of the search results and my algorithm’s ability to process them.


“Back to work, Google!  Can you paint Medusa for me?”

“And ‘Bisque Doll’?”

“And Neanderthal?”

“How about ‘Jesus Christ’?”

“Hmm… I wonder whether spelling makes a difference with these things.  Give me ‘Man with Mustache’ and ‘Man with Moustache’ side by side for comparison.”


There’s still more to see further down, but maybe you’d like to know a little more about how I’m doing this.

I’ve been using the same basic face-averaging approach here which I developed last year for yearbook portraits: applying the Viola-Jones object detection framework to identify and crop out faces from among a group of source images; to detect the eyes and rotate them into a horizontal position; to detect orientation and flip subjects as needed to make them all face left; and to locate eyes, nose, and mouth so that these can be used as control points for averaging and warping.  However, I’ve made one modification: if no faces are detected in a given source image at first, I now rotate it clockwise and then counterclockwise at increments of twelve degrees until either at least one face is detected and successfully passes the “rotation” step or we reach fifty degrees rotation.  The nice thing about this method is that it scales really well, producing respectable averages from hundreds upon hundreds of source images without the user needing to do anything except select them—and, in fact, I’ve been able to streamline things even further for this project than I had before.  That isn’t to say it’s not time-consuming.  Each 1000×1000-pixel average of a full complement of Google Image Search results—with the face itself occupying a 333×333-pixel square in the middle—takes around two hours and fifteen minutes to process when most results contain detectable faces.  If they don’t, the process runs even longer because of all the unsuccessful rotation attempts it needs to cycle through before giving up.  This sort of thing is best left batch-running overnight.  But it’s fun to imagine posing queries and getting results back instantly.


“Let’s try a few more portraits of specific people, Google.  Can you show me ‘Jennifer Lawrence’?”

‘Leonardo DiCaprio’?”

‘Lupita Nyong’o’?”

‘Rowan Atkinson’?”

‘Kristin Kreuk’?”

‘Kurt Vonnegut’?”

‘Lauren Bacall’?”

‘Kylie Jenner’?”

“Those are some pretty convincing likenesses, GoogleI guess we could keep going on and on with this.  Are there any requests from the audience?”


The results of averaging search results for a specific person’s name aren’t just subjectively recognizable.  They also fare pretty well when submitted to automated face-recognition sites such as CelebsLike.me or pictriev.com, even in some cases where I didn’t think the process had worked out all that successfully.The fact that our averages of specific individuals are so easily recognizable implies that our averages of more abstract subjects ought to be equally accurate, at least as far as the averaging part goes.

The averaging process itself is neutral, I think, but it can be put to uses both whimsical and rhetorically serious.  It’s up to human beings to ascribe significance both to the results and to the fact that I chose to run a particular query in the first place.  (While I’ve playfully composed much of this post as a dialog with Google, asking it to “paint” things for me and so forth, I should clarify that I also think of these as my own works, like those of any other algorithmic artist—but that’s not to say I have a clear sense of what they mean.)


“We’re not quite done putting you through your paces, Google.  Give me….”

‘James Bond’

Fantasy Elf

Pharaoh

‘Down Syndrome’

‘On Death Row’

Mullet


I set up an automated “action” in Photoshop to do some further processing of the raw results of my algorithm, which tend to look rather washed-out.  It overlays two copies as layers, auto-tones the top layer, applies blend mode “Multiply,” flattens the image, then applies auto-contrast, then applies unsharp mask (at 80%, radius 5, threshold 10).  And then I carry out some final processing manually using the dodge tool with range set to “shadows,” an exposure of 100%, and a hardness of 0%.  First, I apply this once with a size of 120 centered on each eye; otherwise, the eyes tend to come out too dark relative to the rest of the face because of their unusually sharp alignment before averaging.  And then I use it as needed to mitigate one other problem that sometimes occurs, which we can call “tongue-teeth syndrome”: depending on the proportions of source images with closed mouths, teeth showing, and mouths wide open, the averaged result can contain a darkened gap between the lips that makes it look as though the subject’s teeth are dyed red or, more comically, as though the subject is sticking out his or her tongue.  Whenever a case of “tongue-teeth syndrome” strikes me as painfully conspicuous, I copy the image as a new layer, apply the dodge tool to the teeth area, and then carefully trace around it with an eraser tool to prune the lightened region into the correct shape before flattening.  This last step is by far the most subjective one in the whole process and the only one I wouldn’t know how to automate.  Fortunately, it only seems to be called for about 10-12% of the time.  I haven’t taken this step with any of the averages shown above, but the next group of images all demonstrate its use.


Taylor Swift’

‘Bernie Sanders’ and ‘Alexandria Ocasio-Cortez’

‘Lady Gaga’

‘Aretha Franklin’


As for the control points used for face-warping, I’ve settled on a group of ten for the moment, all of which are derived from four automatically detected “bounding boxes” (BB): rectangles around the left eye, the right eye, the nose, and the mouth.  Those control points are: (1) center of left eye BB; (2) center of right eye BB; (3) center of nose BB; (4) center bottom of nose BB; (5) center of mouth BB; (6) bottom center of mouth BB; (7-8) points 100 pixels to the left and right of #5; (9-10) points 100 pixels to the left and right of #6.  The final four points are intended mainly to anchor the lower part of the face.  I’m not sure this is the absolute best possible choice of control points, but when combined with appropriate “regions of interest” (ROI), it works surprisingly well for as little data as it requires.  Adding extra control points based on BB coordinates can actually produce worse results, because not all BB coordinates are equally reliable.  Even so, I’d still like to train a cascade object viewer to recognize ears.  Left-right flipping is based on the positions of mouth and eye pairs relative to the nose; this isn’t perfect, but it works maybe 85% of the time.  I’m sure this step is appropriate for categories of face, such as “redhead,” but less sure when it comes to faces of specific people, which aren’t perfectly symmetrical.  Even then, however, flipping still seems to mitigate more distortions than it causes.

I haven’t seen anything else out there quite like the averages I’ve been presenting here, but for projects that resonate in various ways with mine you might check out Justin Greenough’s portraits based on averaging internet image search results, Steve Socha’s Overlay Art (with technique illustrated on YouTube), the work of Alexei Efros and others with AverageExplorer, and the Reddit posts of Osmutiar featuring averages of, for example, the “top 500 professional golfers.”

Of course, the same approach I’ve illustrated here could also be used with search engines other than Google, or with tags based on face recognition in, say, the Windows 10 Photo app, and it would easily lend itself to time-based averaging besides, since Google allows image searches to be limited by date range.

What average(s) would you like to see?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.