It’s common practice to use a single still image to represent a motion picture (in the common sense of a “moving” picture presented on a screen over time). A “moving” picture can’t be printed as such on ordinary paper using ordinary methods to illustrate an academic journal article or the cover of a book or magazine, or to hang on the wall of a living room or cinema. But a still image can convey at least some of the information associated with it for promotion, explanation, appreciation, or whatever.
Somewhat confusingly, the terminology of the motion picture “still” is used to refer to two different things: production stills, a.k.a. publicity stills, which are still photographs of the subjects of a motion picture taken separately from it, by separate photographers who specialize in such things, for use in advertising and such, and still frames, a.k.a. frame grabs, which are individual frames excerpted from a motion picture itself. For my purposes, the key distinction is that the former requires something special to be done at the time the motion picture is being captured—a parallel creative endeavor in its own right—while the latter can be grabbed from the motion picture itself at any future time.
If all you have is a motion picture, and you want to represent it using a single still image, you might assume your only option is a frame grab. I’ll admit that a frame grab might often also be the best option, although it would be interesting to reflect on what kinds of substantive information such stills can and can’t contribute to (say) a journal article in film studies. But the usual reliance on frame grabs might, to some degree, also reflect a lack of imagination, so in this blog post I’d like to explore some alternatives including face averages, frame averages, and time-slice images.
Face averaging entails overlaying and averaging multiple images of faces. I’ve posted a number of articles about face averaging on this blog, but for an overview of my most recent techniques as of this writing, see here. One prior effort to detect and average large quantities of face images in popular movies is Portrait by Shinseungback Kimyonghun. They write: “A custom software detects faces from every 24 frames of a movie, and creates an average face of all found faces.” The results are supposed to reflect “the centric figure(s) and the visual mood of the movie”; in the words of Kevin Holmes, they “have a painterly quality to them along with an uncanny creepiness, as a haunting, ghostly amalgamation of the star of the movie is filtered through a blurry haze of all the other actors that appear on screen.” Blurriness can have a certain aesthetic appeal, but at the same time it’s optional—in this case, it results from choosing not to invest much programming or computational expense in aligning the faces before averaging. Faces can be aligned with accuracy and precision, but this takes time, and when you’re dealing with hundreds or thousands (or potentially hundreds of thousands) of sources, it can take a lot of time. Still, in my opinion, it’s worth it. I haven’t yet had the patience to tackle a whole feature-length film, but below is a an average of faces I made with my usual methods from just the trailer for Harry Potter and the Sorceror’s Stone. Does it capture the “feel” of the thing?
There may be some appeal in generating the average face for a whole movie (or just a trailer), but the fact that we’re employing averaging as a technique doesn’t necessarily mean we also have to average the faces of all the people represented in it. We can also sort sources by person to create average portraits of individual people as they appear in a given video or film. Below, for example, is a set of face averages generated from the three televised debates of the 2020 United States presidential campaign, as grouped by participant (and also by the participant’s orientation or facial “perspective”).
These images look more stylized than frame grabs, particularly when it comes to clothing (which wasn’t aligned prior to averaging as facial landmarks were). That could be either an asset or a liability; certainly there are contexts in which a slightly abstract image of one of these candidates or moderators might suit better than a fully photorealistic image, such as illustrations for editorials. But the facial expressions should arguably be more representative than they would be in a frame grab, presenting an average across the whole debate rather than the fleeting circumstances of a single moment (which might be chosen strategically to cast a subject in a positive or negative light). The above images could be said to distill facial expression rather than merely sampling it.
Video of talking heads is particularly well suited to the face-averaging approach, but so is any scene in which the face is a focus of attention.
If we don’t want to bring faces into focus, one option is for us simply to average a whole sequence of frames by overlaying them “as is.” In one project, a redditor known as vvdr12 even reduced entire movies to single blocks of color, but most such cases at least retain the original frame dimensions. We can find this approach tackled and conceptualized in multiple ways. Jim Campbell’s Illuminated Averages series—including works based on Psycho, The Wizard of Oz, and the breakfast table sequence of Citizen Kane—is explicitly presented in terms of digital averaging. On the other hand, Jason Shulman’s Photographs of Films series achieves similar results through time-lapse photography with the exposure time set to the duration of each film, with no mention of the word “average” (as far as I’ve seen). With the exception of Campbell’s treatment of Citizen Kane, works of this kind tend to be based on whole movies, maybe because that approach strikes artists as particularly impressive from a technical or conceptual standpoint. In my experience, it can yield visually coherent results if we’re dealing with a relatively short, single-scene film.
But I find that an average of a whole feature-length film often produces an image that’s too indistinct and unrecognizable for my taste. And there’s a practical obstacle besides. I like to experiment with both mean and median averages. It’s easy enough to generate a mean average from a fairly long video just by adding all the frames together and dividing the sum by the quantity. But generating a median average requires holding all the separate values in memory so that the middle value can be identified and selected. If I’ve run up against the limits of available memory, I’ve sometimes resorted to processing images row by row, or column by column, or color channel by color channel, but even that only gets one so far. So for both practical and aesthetic reasons, I’ve preferred to generate averages from specific scenes rather than whole movies—see some results below.
Much as we can align faces, we can also align backgrounds for scenes in which these are held relatively constant, one simple solution being the Auto-Align feature in Photoshop. This can be particularly advantageous when we’re dealing with jittery early films, since the jitter would otherwise blur our results.
We can also layer frames in other ways to achieve other interesting effects. Here are a couple examples using readily-available Photoshop Smart Object Stack Modes.
A Kickstarter project known as MovieDNA, launched by Rob Hansen and Garrick Dartnell back in 2014, used the strategy (if I understand their video explanation correctly) of resizing each frame of a film to a single column—or perhaps even less, with multiple frames averaged into single columns—and lining all the columns up in chronological order from left to right. Their technique is easy enough to imitate—see my own effort below—and it differs from the other approaches I’ve covered so far in that it preserves and highlights a movie’s time base.
Hansen and Dartnell quickly followed this project up with another called MusicDNA, which fell into the category I’ve referred to elsewhere as waveform “thumbnails”: images that are based on sound recordings but compress the time axis so drastically as to render the results unintelligible as acoustic information. The MovieDNA approach seems to me to share part of this liability: it preserves the time base of a film at the expense of its horizontal dimension, and the content of a 2D frame collapsed to a 1D line is no longer recognizable. Still, this approach arguably does a nice job of laying out the structure of a whole film for the eye to take in at a glance, and the color patterns are, I think, more informative than the squished amplitude contours of a waveform thumbnail. Of course, there’s no reason why the time axis couldn’t run from top to bottom instead, compressing the frames’ vertical axis rather than their horizontal axis.
And the results might be even more visually compelling if we were to overlay the horizontal and vertical variants to produce a plaid-like pattern (although I suppose this tactic might be more topically suited to Brave, Braveheart, or Trainspotting).
So I like this approach and think more could be done with it than has been. But it’s not the only way we can generate images that preserve aspects of a film’s time base, and I’m not sure it’s necessarily the most meaningful way.
Here I’d like to turn for inspiration to a photographic technique sometimes known as time-slice photography, with a couple notable practitioners being Dan Marker-Moore and Eirik Solheim. This entails taking a series of images taken over time of the same subject from the same perspective, dividing them up spatially into regular segments (such as columns), and creating composites in which each successive segment is drawn from a different successive image. By way of illustration, let’s look at a few examples I created from archived National Park Service air quality webcam images (which I’ve previously harnessed as the raw material for time-lapsed moving pictures). In the first two cases below, each column represents fifteen minutes in elapsed time, running from left to right.
The repeating cycle of days and nights produces a conspicuous vertical banding. Another alternative is to select only frames taken at one particular time of day, such as noon, which has the added benefit of letting us squeeze a lot more time into a single image, so that banding now corresponds to seasons rather than days.
Slicing can occur in different directions (vertical, horizontal, diagonal, etc.) to different effect. In the following examples, the camera didn’t stay completely still over the course of multiple years, which resulted in some unfortunate jaggedness along the visible horizon and elsewhere. I could have aligned the images before time-slicing, but that would have been a lot more time-consuming.
But what happens when we apply this same technique to a film or video in which subjects and/or cameras move? The results can be downright Daliesque.
I doubt any of the techniques I’ve demonstrated above will compete seriously with frame grabs when it comes to illustrating moving pictures in books, journal articles, and the like. They might fare better in art galleries or on Etsy. But by drawing together these alternatives, I’d like to encourage anyone who cares about such things to question just how “natural” the frame grab is as a representation of a moving picture. For comparison, the equivalent for a sound recording would be to display a single sample—one amplitude value representing a split second in time. I’ll concede that a frame grab is more meaningful and more recognizable than that would be. But to whatever extent we understand moving pictures and sound recordings alike as things in motion, the excerpting of a single fleeting moment, rendered static, to stand in for either strikes me as potentially deceptive and misleading. In all their visual weirdness, these alternative techniques might do a more honest job of representing moving pictures as intrinsically dynamic—as things that ultimately can’t be pinned down on paper, but can only be hinted at or approximated.