Alternatives to the Moving Picture Frame Grab

It’s common practice to use a single still image to represent a motion picture (in the common sense of a “moving” picture presented on a screen over time).  A “moving” picture can’t be printed as such on ordinary paper using ordinary methods to illustrate an academic journal article or the cover of a book or magazine, or to hang on the wall of a living room or cinema.  But a still image can convey at least some of the information associated with it for promotion, explanation, appreciation, or whatever.

Somewhat confusingly, the terminology of the motion picture “still” is used to refer to two different things: production stills, a.k.a. publicity stills, which are still photographs of the subjects of a motion picture taken separately from it, by separate photographers who specialize in such things, for use in advertising and such, and still frames, a.k.a. frame grabs, which are individual frames excerpted from a motion picture itself.  For my purposes, the key distinction is that the former requires something special to be done at the time the motion picture is being captured—a parallel creative endeavor in its own right—while the latter can be grabbed from the motion picture itself at any future time.

If all you have is a motion picture, and you want to represent it using a single still image, you might assume your only option is a frame grab.  I’ll admit that a frame grab might often also be the best option, although it would be interesting to reflect on what kinds of substantive information such stills can and can’t contribute to (say) a journal article in film studies.  But the usual reliance on frame grabs might, to some degree, also reflect a lack of imagination, so in this blog post I’d like to explore some alternatives including face averages, frame averages, and time-slice images.


Face Averages

Face averaging entails overlaying and averaging multiple images of faces.  I’ve posted a number of articles about face averaging on this blog, but for an overview of my most recent techniques as of this writing, see here.  One prior effort to detect and average large quantities of face images in popular movies is Portrait by Shinseungback Kimyonghun.  They write: “A custom software detects faces from every 24 frames of a movie, and creates an average face of all found faces.”  The results are supposed to reflect “the centric figure(s) and the visual mood of the movie”; in the words of Kevin Holmes, they “have a painterly quality to them along with an uncanny creepiness, as a haunting, ghostly amalgamation of the star of the movie is filtered through a blurry haze of all the other actors that appear on screen.”  Blurriness can have a certain aesthetic appeal, but at the same time it’s optional—in this case, it results from choosing not to invest much programming or computational expense in aligning the faces before averaging.  Faces can be aligned with accuracy and precision, but this takes time, and when you’re dealing with hundreds or thousands (or potentially hundreds of thousands) of sources, it can take a lot of time.  Still, in my opinion, it’s worth it.  I haven’t yet had the patience to tackle a whole feature-length film, but below is a an average of faces I made with my usual methods from just the trailer for Harry Potter and the Sorceror’s Stone.  Does it capture the “feel” of the thing?

Trailer for Harry Potter and the Sorceror’s Stone, face average, all frames, no perspective (whole dataset), median, transparent background.

There may be some appeal in generating the average face for a whole movie (or just a trailer), but the fact that we’re employing averaging as a technique doesn’t necessarily mean we also have to average the faces of all the people represented in it.  We can also sort sources by person to create average portraits of individual people as they appear in a given video or film.  Below, for example, is a set of face averages generated from the three televised debates of the 2020 United States presidential campaign, as grouped by participant (and also by the participant’s orientation or facial “perspective”).

The (First) Presidential Debate of September 29, 2020.  Joe Biden, perspective 3 of 3; Chris Wallace, perspective 2 of 4; Donald Trump, perspective 2 of 3; median face averages, every fiftieth frame, transparent background.

The Vice Presidential Debate of October 7, 2020.  Mike Pence, perspective 2 of 4; Susan Page, perspective 2 of 3; Kamala Harris, perspective 2 of 3; median face averages, every fiftieth frame, transparent background.

The (Final) Presidential Debate of October 22, 2020; Joe Biden, perspective 1 of 3; Kristen Welker, perspective 4 of 4; Donald Trump, perspective 4 of 4; median face averages, every fiftieth frame, transparent background.

These images look more stylized than frame grabs, particularly when it comes to clothing (which wasn’t aligned prior to averaging as facial landmarks were).  That could be either an asset or a liability; certainly there are contexts in which a slightly abstract image of one of these candidates or moderators might suit better than a fully photorealistic image, such as illustrations for editorials.  But the facial expressions should arguably be more representative than they would be in a frame grab, presenting an average across the whole debate rather than the fleeting circumstances of a single moment (which might be chosen strategically to cast a subject in a positive or negative light).  The above images could be said to distill facial expression rather than merely sampling it.

Video of talking heads is particularly well suited to the face-averaging approach, but so is any scene in which the face is a focus of attention.

Dorothy (Judy Garland) in the “Somewhere Over the Rainbow” sequence from The Wizard of Oz, perspectives 3/4, 1/4, and 4/4, median, average color background, face average, all frames.  Note the varying degrees to which we see the haystack against which she leans for part of the sequence.

Humphrey Bogart in Casablanca, “Play it Again, Sam” scene, perspectives 2 of 3, 3 of 3, and 2 of 5, face average, all frames.

Ingrid Bergman in Casablanca, “Play it Again, Sam” scene, perspectives 1 of 4, 3 of 4, and 4 of 4, median, face average, all frames.

Rose (Kate Winslet) in the “I’m Flying” scene from Titanic, perspective 4/4, mean, average color background, face average, all frames.  The brighter rectangle represents the framing of a close-up.  Note the ghostly head of Jack (Leonardo diCaprio) to the left.

Vincent Vega (John Travolta) in restaurant dialog from Pulp Fiction, perspectives 3/3, 2/3, 1/3, median, face average, all frames, white background, cropped.

Mia Wallace (Uma Thurman) in restaurant dialog from Pulp Fiction, perspectives 1/4, 2/4, 4/4, median, face average, all frames, cropped.

The Great Train Robbery, “Realism” scene, all frames, whole dataset (no perspective), median, transparent background.

Helen Mirren advertising rubber gloves in Herostratus, all frames, perspectives 2/3, 4/4, median, transparent background.


Frame Averages

If we don’t want to bring faces into focus, one option is for us simply to average a whole sequence of frames by overlaying them “as is.”  In one project, a redditor known as vvdr12 even reduced entire movies to single blocks of color, but most such cases at least retain the original frame dimensions.  We can find this approach tackled and conceptualized in multiple ways.  Jim Campbell’s Illuminated Averages series—including works based on Psycho, The Wizard of Oz, and the breakfast table sequence of Citizen Kane—is explicitly presented in terms of digital averaging.  On the other hand, Jason Shulman’s Photographs of Films series achieves similar results through time-lapse photography with the exposure time set to the duration of each film, with no mention of the word “average” (as far as I’ve seen).  With the exception of Campbell’s treatment of Citizen Kane, works of this kind tend to be based on whole movies, maybe because that approach strikes artists as particularly impressive from a technical or conceptual standpoint.  In my experience, it can yield visually coherent results if we’re dealing with a relatively short, single-scene film.

Annabelle Serpentine Dance (1895), median average of all frames, auto-contrast.

But I find that an average of a whole feature-length film often produces an image that’s too indistinct and unrecognizable for my taste.  And there’s a practical obstacle besides.  I like to experiment with both mean and median averages.  It’s easy enough to generate a mean average from a fairly long video just by adding all the frames together and dividing the sum by the quantity.  But generating a median average requires holding all the separate values in memory so that the middle value can be identified and selected.  If I’ve run up against the limits of available memory, I’ve sometimes resorted to processing images row by row, or column by column, or color channel by color channel, but even that only gets one so far.  So for both practical and aesthetic reasons, I’ve preferred to generate averages from specific scenes rather than whole movies—see some results below.

Fight Club, ending scene, median average of all frames, auto-tone.

The Wizard of Oz, “Follow the Yellow Brick Road” scene, mean average of all frames, cropped, auto-contrast, auto-color.

Citizen Kane, window scene, median average of all frames, auto-contrast.

2001: A Space Odyssey, “Blue Danube” scene, mean average of all frames, auto-tone.

Much as we can align faces, we can also align backgrounds for scenes in which these are held relatively constant, one simple solution being the Auto-Align feature in Photoshop.  This can be particularly advantageous when we’re dealing with jittery early films, since the jitter would otherwise blur our results.

La sortie de l’usine Lumière à Lyon / Workers Leaving the Lumière Factory (1895), median averages of three different “takes” averaged separately using every tenth frame (auto-aligned).

The Astronomer’s Dream, first scene, mean average of every tenth frame (auto-aligned).

Scene from A Trip to the Moon, median average of every tenth frame (auto-aligned).

Another scene from A Trip to the Moon, median average of every tenth frame (auto-aligned).

The Great Train Robbery title frame, median average of all instances of the frame (auto-aligned)—considerably cleaner and sharper than any comparable “stills” of the same title frame I’ve seen online.

The Great Train Robbery, “At the Railroad Water Tank” scene, median average of every tenth frame (auto-aligned; cropped).

The Great Train Robbery, “Interior of a Dance Hall” scene, mean average of every tenth frame (auto-aligned).

The Great Train Robbery, “A Beautiful Scene in a Valley” with camera panning, median average of every tenth frame (auto-aligned).

We can also layer frames in other ways to achieve other interesting effects.  Here are a couple examples using readily-available Photoshop Smart Object Stack Modes.

The Great Train Robbery, “Off to the Mountains” scene, composite of every tenth frame (auto-aligned), Stack Mode = Range.

Scene from A Trip to the Moon, composite of every tenth frame (auto-aligned), Stack Mode = Skewness.


Time-Slice Images

A Kickstarter project known as MovieDNA, launched by Rob Hansen and Garrick Dartnell back in 2014, used the strategy (if I understand their video explanation correctly) of resizing each frame of a film to a single column—or perhaps even less, with multiple frames averaged into single columns—and lining all the columns up in chronological order from left to right.  Their technique is easy enough to imitate—see my own effort below—and it differs from the other approaches I’ve covered so far in that it preserves and highlights a movie’s time base.

“What Does the Fox Say” with video frames reduced to vertical strips.

Hansen and Dartnell quickly followed this project up with another called MusicDNA, which fell into the category I’ve referred to elsewhere as waveform “thumbnails”: images that are based on sound recordings but compress the time axis so drastically as to render the results unintelligible as acoustic information.  The MovieDNA approach seems to me to share part of this liability: it preserves the time base of a film at the expense of its horizontal dimension, and the content of a 2D frame collapsed to a 1D line is no longer recognizable.  Still, this approach arguably does a nice job of laying out the structure of a whole film for the eye to take in at a glance, and the color patterns are, I think, more informative than the squished amplitude contours of a waveform thumbnail.  Of course, there’s no reason why the time axis couldn’t run from top to bottom instead, compressing the frames’ vertical axis rather than their horizontal axis.

“What Does the Fox Say” with video frames reduced to horizontal strips.

And the results might be even more visually compelling if we were to overlay the horizontal and vertical variants to produce a plaid-like pattern (although I suppose this tactic might be more topically suited to Brave, Braveheart, or Trainspotting).

“What Does the Fox Say” with video frames reduced to overlaid vertical and horizontal strips.

So I like this approach and think more could be done with it than has been.  But it’s not the only way we can generate images that preserve aspects of a film’s time base, and I’m not sure it’s necessarily the most meaningful way.

Here I’d like to turn for inspiration to a photographic technique sometimes known as time-slice photography, with a couple notable practitioners being Dan Marker-Moore and Eirik Solheim.  This entails taking a series of images taken over time of the same subject from the same perspective, dividing them up spatially into regular segments (such as columns), and creating composites in which each successive segment is drawn from a different successive image.  By way of illustration, let’s look at a few examples I created from archived National Park Service air quality webcam images (which I’ve previously harnessed as the raw material for time-lapsed moving pictures).  In the first two cases below, each column represents fifteen minutes in elapsed time, running from left to right.

Joshua Tree National Park, View from Belle Mountain Weather Station, starting January 1, 2020, all times.  Note the “flares” corresponding to sunrises.

The repeating cycle of days and nights produces a conspicuous vertical banding.  Another alternative is to select only frames taken at one particular time of day, such as noon, which has the added benefit of letting us squeeze a lot more time into a single image, so that banding now corresponds to seasons rather than days.

Great Smoky Mountains National Park, View from Look Rock Observation Tower, starting December 23, 2012, images taken at noon over the course of roughly three years.

Slicing can occur in different directions (vertical, horizontal, diagonal, etc.) to different effect.  In the following examples, the camera didn’t stay completely still over the course of multiple years, which resulted in some unfortunate jaggedness along the visible horizon and elsewhere.  I could have aligned the images before time-slicing, but that would have been a lot more time-consuming.

Mammoth Cave National Park, View from Hiking trail near Earth House, starting November 30, 2010, images taken at noon, vertical slices

Mammoth Cave National Park, View from Hiking trail near Earth House, starting January 5, 2013, images taken at noon, horizontal slices

Mammoth Cave National Park, View from Hiking trail near Earth House, starting October 1, 2010, images taken at noon, diagonal slices from upper left

Mammoth Cave National Park, View from Hiking trail near Earth House, starting October 1, 2010, images taken at noon, diagonal slices from upper right

But what happens when we apply this same technique to a film or video in which subjects and/or cameras move?  The results can be downright Daliesque.

The Vice Presidential Debate of October 7, 2020.  Assorted vertical and horizontal time-slice images, cropped.

Citizen Kane, excerpt of window scene, horizontal time-slice.

2001: A Space Odyssey, excerpt of “Blue Danube” scene, vertical time-slice.

The Wizard of Oz, excerpt of “Follow the Yellow Brick Road” scene, horizontal time-slice.

The Wizard of Oz, excerpt of “Follow the Yellow Brick Road” scene, vertical time-slice.

PS. (November 7, 2020): There’s a good deal of conceptual overlap between this last technique and creative tricks using the TikTok “time warp scan” filter (see e.g. here or here), except that the above examples use preexisting movies as their source material.  We might even refer to my examples as “found” time warp scans, although I’ll admit they look more like some of the alleged TikTok “fails”!  Apparently these tricks began trending back in September, but I only just learned about them myself and liked them well enough that I couldn’t resist adding a nod to them here.

Conclusion

I doubt any of the techniques I’ve demonstrated above will compete seriously with frame grabs when it comes to illustrating moving pictures in books, journal articles, and the like.  They might fare better in art galleries or on Etsy.  But by drawing together these alternatives, I’d like to encourage anyone who cares about such things to question just how “natural” the frame grab is as a representation of a moving picture.  For comparison, the equivalent for a sound recording would be to display a single sample—one amplitude value representing a split second in time.  I’ll concede that a frame grab is more meaningful and more recognizable than that would be.  But to whatever extent we understand moving pictures and sound recordings alike as things in motion, the excerpting of a single fleeting moment, rendered static, to stand in for either strikes me as potentially deceptive and misleading.  In all their visual weirdness, these alternative techniques might do a more honest job of representing moving pictures as intrinsically dynamic—as things that ultimately can’t be pinned down on paper, but can only be hinted at or approximated.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.