Imagine watching a time-lapse video covering more than a century that shows the evolution of a landscape, a magazine cover, trends in fashion, or automobile design. Or picture the same thing being done with the face of a politician or celebrity from youth to old age, or with artistic trends in painting spanning a millennium or longer. And then envision the results being displayed in 3D, maybe letting you change perspective at will—or else collapsed into a single still image that embodies years of history at a glance.
A number of seemingly unrelated image processing techniques have been converging on the set of possibilities I’ve just described, promising to give us vivid new ways of displaying historical imagery in a wide variety of contexts. Some developments in this direction have received individual attention, but I haven’t yet seen anyone try to scope out the future art of time-based averaging as a whole—an art for which these developments are gradually contributing the necessary toolkit. That’s what I’d like to do here.
Time-based averaging combines two techniques that have until now mostly been pursued separately, probably because they both pose practical challenges that have taken time and effort to work through. However, the idea of combining them seems obvious once you’ve thought of it, and as both techniques reach a state of maturity, it ought to become increasingly easy to plug one into the other. One is image averaging (or image fusion), in which groups of images are superimposed and—if necessary—rescaled and warped to a common shape or perspective to create visual averages of subjects ranging from human faces to magazine cover layouts. The other is time-lapse imaging, in which visual data for an extended span of time is either animated at a faster rate than represented by its time scale (for example, one second of display time corresponds to one day’s worth of data) or collapsed into a single image (for example, a single year-long photographic exposure).
Time-based image averaging entails making averages from groups of source images that represent particular windows of time. It can be used to create single long-term time-lapse images, multiple stills representing different moments for comparison, or video animations in which long-term change unfolds dynamically before the viewer’s eyes.
For the past couple years, I’ve been using commonly available software myself to create time-based stills and animations of source materials such as yearbook photos, images from annual pageant competitions, and pictures taken repeatedly of the same indoor or outdoor scene. I’ll be showing and discussing some of my own results below, as well as results other people have achieved in the same spirit. Meanwhile, there are several new tools on the horizon that I believe will revolutionize this kind of work if and when they become widely accessible, vastly expanding its potential subject matter and flexibility. With those in mind, I’ll also spend some time speculating about future possibilities, which is where things start to get really exciting. In quoting relevant projects, and occasionally reworking them, I’ve aimed to be respectful of others’ intellectual property while still drawing in enough of it to support a review of the past and present state of the art.
For convenience, I’m going to divide my subject into three parts: the time-based averaging of faces, the time-based averaging of scenes, and the time-based averaging of other things. At the same time, I want to stress that these are just three different facets of a single coherent approach.
Part One: FACES
Let’s start by considering images of the human face, first with regard to image averaging and time-lapse imaging separately, and then with these two approaches combined.
The first person to pursue image averaging in a serious way was Francis Galton (1822-1911), who applied it starting in the 1870s specifically to studies of the human face. Galton was into eugenics—he actually coined the term—and most recent accounts of his image-averaging work focus on his efforts to identify visually distinctive traits of criminality, disease, and ethnicity by averaging groups of forward-facing portraits of people with the pertinent characteristics. Below is one typical example, taken from the frontispiece of his 1883 book Inquiries Into Human Faculty and Its Development.
But that wasn’t all. Galton also applied image averaging to historical subject matter and to faces in profile, as illustrated below by three plates from an article he published about “generic images” in 1879.
Of course, Galton did all his image averaging in the analog domain, which was the only option available to him back in the nineteenth century. Sometimes he would line up the eyes of his source portraits, poke a pair of “register marks” on either side of them, and then hang them on pins by these marks while briefly exposing a photosensitive plate to each one in turn. The source images “must be similar in attitude and size,” he wrote, “but no exactness is necessary in either of these respects.”
The sort of camera setup Galton used for his experiments is no longer necessary. Since the pioneering work of Nancy Burson in the 1980s, face averaging has increasingly been done with computers, and a number of software applications are now available commercially to help people carry it out, making it the easiest and most convenient kind of averaging for the average person to attempt.
One advantage of the digital turn is that we can warp a group of source images to a common perspective or shape for averaging even if they don’t start out particularly “similar in attitude,” as Galton put it. Below I’ve shown an example of how this is done in Abrosoft FaceMixer by identifying a face that happens to be tilted upward at an idiosyncratic angle and then adjusting individual warp points, based on a built-in template, to bring it into alignment with the average of all the faces in a given project.
As a result, we can take a heterogeneous group of face images, such as portraits of winners of the Miss America competition—shown below is a sample from a larger data set I collected through online searches—
—and produce compelling averages from them. I created the median average below from 180 source photos covering all Miss America competition winners from 1921-2017 (ordinarily two source images per person, but sometimes modified to fit special historical circumstances), using the warping and compositing techniques described here.
This is arguably a time-lapse image of sorts, since it collapses nearly a century’s worth of photographs into a single view, even if it’s not the sort of thing we might ordinarily think of as a time-lapse photograph. Much the same thing could be said of the average shown below, which I generated from eighty self-portraits by Rembrandt van Rijn dating from the 1620s through the 1660s.
Francis Galton anticipated this approach with his “Likenesses of Napoleon I. taken at different periods, and the composite of them.” It gives us a single idealized version of a subject’s appearance, whether that subject is an individual or a category of person, revealing only those details that held constant over a long span of time. In terms of technique, though, this is no different from using face averaging to create generic face images based on other criteria, such as nationality or attractiveness. Meanwhile, averageness is itself associated with greater attractiveness, as Wikipedia notes: “An averaged face is not unremarkable, but is, in fact, quite good looking.”
A better-known branch of time-lapse face imaging is the “photo-a-day” or “picture-a-day” project where people take photos of themselves every day for months or years and then display the results as videos or grids of stills. One person who has done a lot to popularize this technique is Noah Kalina, first with a six-year sequence and then more recently with an extended twelve-and-a-half-year sequence (see the video here). Other examples include a sixteen-year sequence here and an eight-year sequence here that cleverly varies its background from day to day to simulate movement through space. Check them out if you haven’t. And then if you’re inspired to make one of your own, a Watch Me Change app is now available to help people create photo-a-day projects by giving timely reminders and presenting the previous day’s photo as a template for alignment. The photo-a-day technique has also been artfully simulated, most notably for a Croatian domestic abuse spot.
Sequences like these “work” as time-lapse videos, or as hypnotically repetitive grids of still images, because the face is oriented similarly from image to image: the eyes, nose, mouth, and so forth line up roughly with each other from frame to frame. To sum this up conveniently in a single word, let’s say that the daily images of the face are congruent. In cases where they’re not congruent, the results don’t have the same impact, however impressive they might otherwise be. A prime example is the eighteen-year sequence here, where a video has simply been strung together from differently-posed daily photos.
But animations of even the most congruent picture-a-day projects, in their raw form, inevitably have conspicuous flicker or jerkiness. This may actually enhance their appeal, since seeing something like a graduation cap or Halloween mask pop into view for a split second and then vanish helps drive home the cool way in which the sequences were captured. However, that sort of thing also tends to distract the viewer from more interesting long-term processes of change. And by applying face-averaging techniques to time-based sequences such as these, we can eliminate the frenetic day-to-day jitter and draw out the more gradual evolutions it masks.
One of the best-known and best-executed picture-a-day projects was carried out for five and a half years starting in 2006 by someone known to the Internet as Clickflashwhirr (much of her material has since been taken offline, but see here for one surviving video of her results). In 2011, while the project was still underway, Tiemen Rapati famously averaged five hundred of the photos into a single composite, explaining: “Code’s really basic, I’m simply counting the individual rgb values for each pixel and for each portrait, and dividing those values by the number of portraits.”
This composite is a time-lapse image in the same sense as the composites I showed earlier of Miss America and Rembrandt. But we can also average sequential subsets from a photo-a-day project to take advantage of its time dimension more fully. In September 2014, for example, Than Tibbetts posted a grid of twelve averages generated from Noah Kalina’s photo-a-day sequence, each representing one year:
In effect, this reduces all the daily images into twelve, each representing the average of 365 or 366 of them, factoring out day-to-day “noise” to expose only those features that remained consistent over a long time span. (J. K. Keller did the same thing with his own photo-a-day project, although if you want to see it these days you need to resort to the Wayback Machine.)
The twelve frames Than Tibbetts created from Noah Kalina’s photo-a-day sequence could also be displayed as a very short video of their own. But if we’re aiming at a video rather than a grid of still images, it would make sense to go about the averaging itself rather differently. I assume Tibbetts has averaged Kalina’s images 1-365, 366-730, 731-1095, and so on, with no overlap of data between the results, but there’s no reason we couldn’t generate intermediate frames for images 2-366, 3-367, 4-368, 5-369, et cetera, with each one being a legitimate year-long average in its own right and the overall sequence showing a process of continuous gradual change. To illustrate the concept graphically, here’s a diagram showing how a seven-item selection window could progress through a data set of chronologically ordered images as the basis for averaging successive frames for an animation:
Let’s take a look at how this works out in practice. I downloaded a five-and-a-half-year video of the Clickflashwhirr sequence here, announced as covering the period from 26 September 2006 through 29 May 2012; exported the source photos from it as JPGs; generated median averages from them in overlapping sequential groups of thirty; and then created mean averages from the result in groups of three to reduce flicker. Here’s what I ended up with:
And here’s the same thing done in groups of one hundred, advancing three source images per output frame, with the result again re-averaged by threes.
Instead of the frenetic pace of the raw photo-a-day sequence, the averaged animation presents a serene view of long-term change over time. And each frame simultaneously retains a lot more specific detail than Tiemen Rapati’s single composite image.
I was able to average the Clickflashwhirr source images “as is” because they were already remarkably congruent. Noah Kalina’s pictures don’t line up quite as consistently with each other, which is why the averages Than Tibbetts created from them are sometimes so blurry. But let’s see what his pictures look like processed into an animation using my method, just for good measure. I exported the frames of the twelve-and-a-half-year video using VLC and created mean averages from them in groups of two hundred (source images came out repeated inconsistently one, two, or three times each), advancing ten source frames per output frame. You can download the full result here (39.2 MB)—and it’s well worth seeing in that form—but here’s a pared-down, shorter version with the results re-averaged by fives, advancing three source frames per output frame:
The image fades in and out of focus depending on how consistent the position of the face was in the frame during particular spans of time. It’s a pretty neat effect, as though we’re seeing someone peering at us from the other side of some very uncanny kind of webcam linkage. Makers of sci-fi and horror films, take note!
But we don’t necessarily have to content ourselves with such results. As I showed earlier, we can realign and warp incongruent images of faces to a common shape and perspective when creating averages from them, and if we can do this when we’re creating still averages, there’s no reason why we can’t also do it when creating animated averages. With a bit of work, we could align Noah Kalina’s pictures so that his animation would consistently be as clear and stable as Clickflashwhirr’s. More importantly, we could create similar time-based animations from source images that are even less congruent than Kalina’s, opening the technique up to a far wider range of source materials. We could make a video like the ones shown above of the face of anybody who has been photographed or portrayed regularly enough—or of any category of person, for that matter.
Now, the idea of mining miscellaneous still images of faces found in the wild as the raw material for motion pictures is nothing new in itself. Consider the work that led in 2010 to the introduction of Face Movies in Picasa, where the images of a particular person’s face in a sequence of photos are all automatically aligned with each other and each image is then cross-faded into the next to create a “moving portrait.” But that strategy more closely resembles the practice of morphing between successive single images in a sequence—as seen in 500 Years of Female Portraits in Art by Philip Scott Johnson—than it does the face-averaging strategy I’ve just illustrated. Sometimes people have also morphed or cross-faded between individual non-overlapping averages, as Larry Chait did in his video The Changing Face of Crime (2013), of which he writes:
This animation shows the way the “average” face of the FBI’s “10 Most Wanted” has changed over time. I made it by morphing together six composite images, one from each decade from the 1950’s through the 2000’s. Each composite image was made by averaging together the mugshots of all of the “10 Most Wanted” from that decade.
But morphing between individual images, whether “originals” or averages, is a very different thing from averaging sources with a sliding time window—as different, I’d argue, as the difference between “time studies” that transition between a scene photographed at two or more widely separated moments and “true time lapse” based on images captured at frequent enough intervals to document the actual course of change.
Meanwhile, The Changing Face of Crime draws us into the territory of longer-term comparative time-based face averages, displaying multiple decades’ worth of change, which has a history of its own. The earliest experiments in this direction of which I’m aware are Nancy Burson’s First Beauty Composite and Second Beauty Composite (1982), published in Composites: Computer-Generated Portraits (1986):
The first average combines images of Bette Davis, Audrey Hepburn, Grace Kelly, Sophia Loren, and Marilyn Monroe, while the second combines images of Jane Fonda, Jacqueline Bisset, Diane Keaton, Brooke Shields, and Meryl Streep. Burson wrote: “This was for me about the passage of time in relation to the differences and changes in style. Note, for instance, the arched eyebrows and heavily made-up mouth of the fifties beauty and the more ‘generic’ look of the woman of the eighties” (Composites, p. 94). Burson may have been the first to attempt this sort of thing—it’s hard to know for sure—but others have since followed in her footsteps. In 2008, for example, Dienekes Pontikos posted a comparable pair of averages based on seven faces from the 1940s and eight from the 2000s, asking readers which they thought was “more attractive.” More recently, PearlsOnly generated ten averages of Hollywood actresses grouped by decade based on a total data set of over three hundred photos (see coverage at PetaPixel).
Similar work has been done with high school yearbook photos, ranging from a pair of averages twenty-one years apart in Jason Salavon’s The Class of 1988 & The Class of 1967 (1998) to a research project by Shiry Ginosar, Kate Rakelly, Sarah Sachs, Brian Yin, and Alexei A. Efros called “A Century of Portraits” (2015) based on 37,921 images, which they show in their paper averaged by decade, like this:
Aymann Ismail of Slate also prepared a video from the latter project with cross-fades between single-year averages, much like Larry Chait’s The Changing Face of Crime.
The results I’ve cited are all interesting and impressive. However, if we average comparable source materials as animations with sliding time windows, I think we can ratchet things up a notch or two. Let me illustrate how this works with one of my own projects involving the faces of winners of the Miss America competition. It’s easy enough to generate still averages from these by decade, much as we saw done above with Hollywood actresses and yearbook photos (bear in mind that the 2010s are still ongoing, and that there wasn’t a pageant for every year during the 1920s and 1930s):
But dividing things up by decade in this way is really pretty arbitrary. We could question both the assumption that these ten-year groupings are meaningful (among other things, it’s been argued that the “1960s” as a distinctive historical era really began in 1963 and ended in 1973) and the belief that cultural developments are best bracketed off into ten-year periods in general. One of the advantages of averaging faces with a sliding time window is that it doesn’t require us to impose any arbitrary periodization; all we need to do is pick a window size, recognizing that the narrower the window, the more sensitive the result will be to short-term fluctuations. The animation below shows the winners of the Miss America competition averaged in successive groups of twenty (that’s twenty competitions, and not twenty years):
This is an updated and improved version of an animation I first published here and here back in mid-2014, and it was created with the Photoshop CS5 script I discuss here. I’m currently working on a larger-scale project along similar lines based on pictures drawn from fashion and movie magazines, called The Fashionable Face, with a working data set currently totaling 4,649 prepared source images.
Of course, we can also apply the same technique to photographs of a single person. Below is an animation I generated from fifty-nine photographs of Abraham Lincoln—basically the content of the Wikipedia page “List of photographs of Abraham Lincoln,” minus the profiles. The alignment isn’t perfect and could certainly be improved upon, but even so, this does a good job of illustrating the technique I have in mind.
The process I’ve outlined lends new significance to discussions of who the most frequently photographed people of the nineteenth century were (see for example the consideration of Frederick Douglass here), and it could obviously be applied as well to the Instagram feeds of contemporary celebrities. Nor is it limited to photographs. Here’s an animation of the face of Rembrandt van Rijn based on eighty of his self-portraits, averaged in overlapping groups of twenty:
The tools for carrying out projects like these aren’t as convenient to use right now as they could be—I’ve suggested some desirable tweaks to Abrosoft FaceMixer here—but at least they’re available.
I’m sure it would be possible to automate much of this process in the future by auto-detecting faces in a corpus of photographs, applying cutting-edge face-recognition algorithms to them, auto-aligning all results identified as representing a particular person (or type of person), and then auto-averaging them with a sliding time window based on their digital time stamps. This work would center primarily on programming.
But I also foresee a key role here for expert practitioners who would be able to achieve feats with the technique beyond what it could accomplish running on auto-pilot. Rather than picking the low-hanging fruit, these users would go for the high-hanging stuff, scouring archives for relevant materials, assessing them for authenticity and date, and factoring in any margins of uncertainty. Consider the skill set someone would need to locate, obtain, and date every known likeness of (say) Queen Victoria, and you’ll have a sense for the kind of research effort that would be involved here. And that wouldn’t be the end of it. Expert practitioners would also need to understand the capacity and limitations of the technique itself well enough to judge whether particular images were suitable for it and to choose among various technical options for averaging them, balancing aesthetic judgments against objective accuracy.
The time base for a project like these can be constant-data, based on set quantities of source images, or constant-duration, based on a set time window. If images represent strictly set intervals, such as one photo per day, there won’t be any difference between the two. But Rembrandt, for example, clearly didn’t create his self-portraits at such regular intervals. I chose to tie the pace of my own Rembrandt animation to a particular rate of data display (twenty source images averaged at a time, advancing one source image per frame) rather than to the “original” flow of time, but I could instead have averaged the self-portraits based strictly on overlapping time periods (e.g., first everything from 1625-30, then 1626-31, then 1627-32, and so on), regardless of how many examples I had from each period. Or we might tie a time base to some other variable, such as a particular quantity of Miss America competitions, which occasionally skipped one or more years.
My Miss America, Lincoln, and Rembrandt animations above owe their stability in part to the fact that I’ve set them up to display time-based variations in features while holding the positions of the warp points—which control shape—constant at their averages for the entire project. But we could vary the positions of the warp points based on a sliding time window too, whether it’s the same window we use for the features or a different one. This isn’t as convenient to do given the tools I have to work with right now, but it can certainly be done. Below is an alternative animation of Miss America winners, again averaged in groups of twenty, but this time with time-variant features and warp points.
We could also hold the features static while varying the warp points to display only the changes over time in shape, angle, and expression. And there are other things we could do with the warp points as well. A lack of congruity between source images may be a liability when it comes to creating averages from a static perspective, but it can be advantageous if we want to create averages from a changing perspective because it gives us information about more parts of the face than would ordinarily be seen at one time. Below is an animated average of twenty paintings of George Washington (see here for more about the project to which these belong) which I created by grouping the sources by angle and then continuously varying how much each group contributes to the output shape:
It would be possible to pair this kind of motion in space with a display of change over time, although that would be prohibitively time-consuming to do with the tools I presently have. We can also generate pairs of averages that vary in horizontal perspective to create stereoviews (get out your red-cyan anaglyph-viewing glasses if you’ve got ’em).
The multiple-shape approach could benefit further from recent work on 3D face modeling from images “in the wild,” described here, as well as the development of “collection flow,” described here, which is intended to harness gradations of expression in a corpus of photographs of the same person. Faces could be made to move and change expression in “accurate” ways while simultaneously changing over time as in the examples shown above; and of course the same thing could be done less “accurately” but more easily by animating single averages (e.g., with Motion Portrait). The evolving face of Miss America or Rembrandt could be tied to motion capture and even made to speak onscreen with moving lips (“Hello! I am an average….”) rather than holding still in place. Would that be cool or what?
Part Two: SCENES
Long-term time-lapse photography is more often applied to scenes than it is to people’s faces, since scenes stay conveniently wherever they are while people tend to move around (a “photo a day” project like Noah Kalina’s or Clickflashwhirr’s requires subjects to re-pose in the same place and position relative to a camera time and time again while presumably going about their daily business in between shots). Like facial time-lapse, long-term scenic time-lapse has two branches, one aimed at securing still images embodying longer-than-usual spans of time, the other aimed at securing motion pictures with an accelerated time base. It would be hard to draw precise boundaries between “ordinary” photography, time lapse, long-term time lapse, and super-long-term time lapse. After all, every photograph has a duration, however brief; and in the earliest history of photography all exposure times were longer by default than much of what is now regarded as time-lapse photography. But some time-lapse photography clearly stands out today for its unusually long durations.
On the still photography front, some long-term time-lapse images have been made using traditional analog photographic methods, including this six-month exposure by Matt Bigwood captured using a beer-can pinhole camera:
One frequent feature of such images—also visible in some work by Michael Wesely spanning up to three years—is the rainbow-like band of parallel streaks we see traced across the sky by the sun. Sun-trails attest to the fact that these images were created through continuous exposure, and they might be viewed as a kind of proof of authenticity. On the other hand, such trails aren’t typical of the long-term appearance of the scenes they depict, so they could be considered undesirable artifacts depending on what our goals are. Meanwhile, this approach poses some daunting technical obstacles; consider, for instance, the measures you’d need to take to avoid overexposure over a period of three whole years, or to fix a camera safely and securely in place overlooking a scene for that entire time.
A different method for creating still images of scenes over longer spans of time is to sample them regularly by taking ordinary-exposure photographs and then to stack those photographs and average them. For practical reasons, this seems to be how virtually all digital time-lapse photography is being carried out these days over durations above thirty seconds or so. Francisco J. Estrada writes in connection with one particular averaging algorithm (in Computer Vision, 2012):
Until now, HDR [high dynamic range] and image fusion have been concerned with the blending of multiply-exposed shots to increase the dynamic range of the scene. Here we propose that image fusion can also be used to blend images taken over a (possibly very long) interval of time, blending visual information from a dynamically changing scene while preserving detail and interesting structure…. Time-lapse fusion can, in this way, provide photographers with a tool to expand their ability to create depictions of the world that are beyond physical or practical limitations.
Estrada refers to “possibly very long” intervals, which are certainly possible with this technique, although his own ambitions seem limited to a couple hours at most, judging from the scenic examples he shows. If a scene is sampled frequently enough, time-lapse image fusion can create effects similar to sun-trails, one especially popular genre being the “star trail”:
Or if we instead capture our images at longer intervals, such as once a day, the same technique can yield a typical or average view of a scene as it appeared from day to day, but not on any one particular day.
I’ve been carrying out some photo-a-day projects of my own, but instead of photographing myself à la Noah Kalina or Clickflashwhirr, I’ve been taking daily pictures of indoor and outdoor scenes (or twice-daily, or several times a week, depending on the scene and how often I can conveniently visit it). My trick is to find vantage points where I can put my camera back into roughly the same spot time and time again, such as the ninety-degree corner of a railing or bench. Below is a screenshot of thumbnails for part of one of my sequences, taken of a scene along a path on the Indiana University Bloomington campus from the top of a post:
Granted, I can’t get my camera back into exactly the same position from shot to shot, but I can usually auto-align the photos pretty well in Photoshop (which wouldn’t be necessary, of course, if I kept my camera fixed in place the whole time as most time-lapse artists do). With auto-alignment, my images lend themselves well to time-lapse image fusion on a scale of months or years. Below are the results I got from averaging 258 photos “as is” (left) and then after auto-aligning them (right):
In this case, I took the source photographs more or less regularly over the course of fifteen months—from July 27, 2015 through October 27, 2016—so what you see above is in effect a fifteen-month time-lapse photograph. Meanwhile, the same source images could naturally also be presented as an animation. An ordinary time-lapse video would just show them all in rapid succession, like this:
Unfortunately, this approach carries some conspicuous distractions with it. Day-to-day changes in light conditions produce an almost strobe-like flicker, while trees and flowers appear to twitch and vibrate, as though someone were violently shaking the change of the seasons out of them. And these characteristics are typical of super-long-term time lapse video in general. Although they’re not technically inaccurate as representations of what actually happened, they’re still somewhat jarring to watch, which may explain why so few people seem to be attempting time-lapse video on this scale.
But we can apply the technique of averaging source images with a sliding selection window here, just as we did earlier with images of faces—and to similar effect. Here’s the same sequence of images averaged first in groups of ten, and then with the result re-averaged by threes (see here for more on the technique itself, together with the Photoshop CS5 script I used):
Depending on how we group our source images, this technique can reveal or conceal different cycles of change. Grouping images from multiple years based on calendar day (all images from January 1st, then January 2nd, then January 3rd, etc.) would highlight the cyclical pattern of the seasons, for instance by showing the average rhythm according to which leaves change color, fall, and regrow. On the other hand, grouping the same images with a year-long sliding window (e.g., January 1, 2014 through December 31, 2014; then January 2, 2014 through January 1, 2015; then January 3, 2014 through January 2, 2015, etc.) would factor out seasonal fluctuations and expose only longer-term changes, such as saplings growing into taller trees.
Some other obvious sources of material for this kind of treatment would be webcams and security cameras, although the examples I’ve investigated so far tend to be less stable over the long term than you might expect. Take the National Park Service air quality webcam at Mammoth Cave National Park trained on a “View of Green River Valley, Looking North-Northwest.” Current images from this webcam are displayed and periodically refreshed online, but one image per day is also archived here (new images are still being added daily, but my cut-off date for this project was August 16, 2016). The archived photographs date back to January 1, 2002, and thus cover a span of nearly fifteen years. There are some gaps in coverage from periods when the webcam was down, most notably from June 1-July 20, 2006, October 22 to December 7, 2009, April 2 to June 10, 2010, and October 1-16, 2013; but that still leaves 4,570 images representing 85.5% of possible days. Below is an animation I created from the archived webcam images taken “as is,” with no realignment, averaged in groups of thirty, advancing seven images per frame, and with the result re-averaged by threes to reduce flicker.
And here’s a single still of all 4,570 webcam images auto-aligned with each other and averaged at once, giving us in effect a fourteen-and-a-half-year time lapse photograph:
Fourteen and a half years is nothing to sneeze at, but there’s no reason we shouldn’t set our sights on time-lapse stills and videos of even longer duration—say, something comparable to my Miss America examples, which cover almost a century. It might be challenging to find images in the wild of the same scenes created from similarly congruent viewpoints over such a long time span. In fact, no suitable sets of images may even exist. But that isn’t necessarily an obstacle.
Some artists simply haven’t been concerned with congruence, as we see in Jason Salavon’s The Loop, Chicago, 1848-2007 (2007), a composite of images from different periods overlaid apparently without any attempt to line up positions or buildings:
It’s easy to imagine this same approach being extended into a time-based animation, with superimposed images from the 1840s gradually giving way to others from the 1850s, and then the 1860s, and so on up to the present, yielding a generalized visual impression of transition and growth. In fact, this would be trivially easy to do from a technical standpoint, although there would be a lot of room for artistic discretion.
Meanwhile, other artists have attempted a kind of rough congruence, as illustrated by Corinne Vionnet’s project “Photo Opportunities,” based on tourist photos of well-known landmarks. “I chose a single segment of the landscape that I found important as a meeting point to line the images up,” she explains. Below is one specimen of her work to show what she’s been able to accomplish by manually lining up certain parts of carefully-chosen source images while leaving others incongruent:
Again, it’s easy to imagine a time-based animation based on Vionnet’s technique, where the World Trade Center fades into view in the early 1970s and then abruptly away after September 2001.
But there are more sophisticated tools out there for warping multiple images of a scene to a common perspective, based on the Structure from Motion technique. One popular implementation of this technique is found in Microsoft Photosynth—a piece of free software designed to take any group of photographs you upload and attempt to derive a 3D model from them which can then be navigated with a cursor at will. I’m not sure Photosynth itself could handle historical imagery well in its currently available form; the assumption seems to be that you’ll take a bunch of photographs yourself specifically with 3D analysis in mind, and the developers provide tips on how to go about doing this.
But other researchers have already succeeded in applying the Structure from Motion technique to time-based projects. In particular, Ricardo Martin-Brualla, David Gallup, and Steven M. Seitz have pioneered what they call time-lapse mining: creating seamless time-lapse videos from large numbers of time-stamped, geotagged Internet photos of popular landmarks. In their first paper on the subject, “Time-lapse Mining from Internet Photos” (SIGGRAPH 2015), they describe their main contribution as
an approach for producing extremely stable videos, in which viewpoint and transient appearance changes are almost imperceptible, allowing the viewer to focus on the more salient, longer time scale scene changes. We employ structure-from-motion and stereo algorithms to compensate for viewpoint variations, and a simple but effective new temporal filtering approach to stabilize appearance.
Their full accompanying video is well worth watching, but here’s a representative sample to illustrate the quality of their results:
As for what’s next, the researchers write: “Future work includes enabling interactive visualizations of these photorealistic 3D time-lapses.” So before long time-lapse viewers might be able to change perspectives and positions at will, and maybe control the time base dynamically as well. (Judging from the movement of the timeline cursor, the group’s time-lapse animations currently use a constant-data time base rather than a constant-duration one, such that they “slow down” when representing times of year when more people were taking photos.)
The team responsible for this work has overcome a lot of daunting technical hurdles, but in terms of content they’ve targeted relatively low-hanging fruit: born-digital images that are conveniently time-stamped and geotagged. Granted, they’ve had to deal with the occasional incorrect timestamp. But what about the higher-hanging fruit? I’m thinking here of the vast archive of older photographs documenting the world’s visual history back to the middle of the nineteenth century. I don’t think older images, once they’ve been digitized, should pose any unusual technical challenges when it comes to warping and averaging them. It’s the curatorial work which I assume would be more demanding. Existing descriptive metadata might help get such a thing off the ground, but I doubt it would ever be fully sufficient. Programmatic image recognition and image registration techniques could certainly be leveraged. But archivists and historians would still probably need to get involved to identify locations and assign dates with the distinctive technical needs of the project in mind, based on both their specialized expertise and their more general human ability to solve Captcha-like puzzles.
As proof that such work is feasible, we have the 4D Cities project led by Frank Dellaert at Georgia Tech, with its fully automated generation of sparse time-varying 3D models of cites from historical photographs, including a Lower Manhattan project spanning the years 1928-2010 (albeit with big temporal gaps).
The researchers have even been able to use these time-varying 3D models in turn as points of reference for estimating the dates when other photographs of the same places were taken. As the project website notes:
There is a growing need for novel ways to access the exponentially growing archives of historical imagery. It is imperative to go beyond cataloging, indexing, and keyword driven databases, to a paradigm where the computer at least partially understands the content of images. Pushing the state of the art in scene understanding and 3D modeling will enable radical new ways to view and experience historical and/or temporally varying imagery.
So now let’s imagine what would happen if we were to combine the historical depth of the 4D Cities project with the photorealistic results of time-lapse mining—a development that seems almost inevitable. I see no reason why we couldn’t generate photographic time-lapse video sequences spanning periods of up to 175 years if we cast our net back into the infancy of the daguerreotype. We might only be able to reconstruct a few scenes quite that far back, but there are plenty of locations that have been photographed regularly and continually since—say—the 1860s. How about all those views of government buildings in Washington? Or the pyramids in Egypt? Or the sights in Paris?
Just think: we could create an authentically photographic video of the evolution of Berlin since the middle of the nineteenth century, including its widespread destruction at the close of World War Two and the building and removal of the Berlin Wall. Or we could do the same thing with San Francisco, including the earthquake of 1906; or Chicago, including the fire of 1871. Or we could watch the trappings of the Oval Office change from year to year and presidency to presidency. Video along these lines from a single static viewpoint would be plenty impressive, and adding a moving camera effect could make it even more so, offering a brilliant new tool to the makers of historical documentaries. But what about photorealistic reconstructions in which the viewer could interactively change position and perspective? That too seems almost within reach, and we could throw in real stereoscopic 3D while we were at it.
Every major historical site might offer tourists an app that would let them explore how whatever they’re looking at has changed over the years. Historical imagery could even be displayed via smart eyewear such as Google Glass, translucently superimposed on the real-time view of the same scenes. For the past three years, the Street View feature of Google Maps has enabled people to “time travel” by toggling between imagery for different dates going back to 2007, but the approach we’re contemplating could supply it with another century and a half’s worth of data. Some places would be better documented than others, of course, but that’s already the case with Street View. Indeed, one outcome of such a project might be that it would more clearly reveal the geographic and temporal gaps in the historical photographic record—the missing years, the rarely-photographed neighborhoods.
Meanwhile, let’s not forget that we can also generate time-lapse stills from the same source images as long as we can warp them to a common perspective. Imagine a single time-lapse photograph of Times Square covering the whole of the past century, for example.
Some of what I’ve described might admittedly be a tall order for automated algorithms. But even if some things couldn’t be handled automatically—either at first or at all—maybe tools could still be created to enable historians and artists to guide the process manually and to make and implement judgment calls and educated guesses where needed. And let’s say the most compelling time-lapse reconstructions ended up taking a year of full-time work by a small team to complete. Wouldn’t it be worth it? After all, how much time, effort, and money goes into making the average documentary film as it is?
Incidentally, we wouldn’t need to limit ourselves to still photographs as source material. Historical motion pictures would be fair game too, and they might be particularly helpful in 3D scenic reconstruction. Paintings, drawings, and the like might not be accurate enough for automated analysis, but even if they aren’t, perhaps they could be drawn into the mix manually, pushing our temporal horizon back even before the 1840s. What were the most frequently depicted buildings or scenes of the eighteenth century?
Part Three: OTHER THINGS
After faces and scenes, the third most common subject category for time-based image averaging seems to be magazine covers. In theory, these ought to be easier to manage than faces or scenes because they’re strictly two-dimensional, although in practice I’ve found it can be surprisingly difficult to align them. As far as I’m aware, previous work with magazine covers has been limited to creating sets of still averages representing time spans such as decades or years. Thus, Lindsay King and Peter Leonard at Yale University Library have brought us “Robots Reading Vogue”: averages of all covers of Vogue for individual years at one-decade intervals, hand-aligned before averaging:
And Adrian Rosebrock has generated averages of complete groups of Time covers by decade:
Seb Przd has also created Time cover averages by individual year. But you can guess where I’m headed with this. As with faces and scenes, we can average magazine covers with a sliding time window to create smooth animations that display long-term change as a seamless continuum, rather than broken into arbitrary segments. Below is a time-lapse animation I created from covers of Time magazine for the years 1923-2006 with each frame representing a year’s worth of material and a time window sliding forward six weeks per frame:
I bulk-downloaded my source images from Coverbrowser.com and averaged them “as is,” since I’m not sure whether discrepancies in alignment are due to changes in actual cover layout or to vagaries of the digitization process. Either way, the effect looks a bit like someone adjusting zoom and focus on an ordinary video camera. I’m sure my time-lapse could be improved upon, but I don’t think it’s bad for a proof of concept.
The same approach would lend itself to plenty of other subjects as well. The front pages of newspapers would be an obvious choice—we could watch the typical layout of page one of the New York Times evolve over the decades, for example. And how about websites? It shouldn’t be hard to come up with an automated means of scraping the complete Wayback Machine archive of any URL, converting the HTML into images, and then creating an animation from the results. That would be worthwhile even without averaging.
And here’s an animation I created from a database of United States postage stamps, averaged in overlapping ten-year groups from 1847 through 1964 based on initial release date.
The jerkiness and fluctuations in sharpness you see aren’t glitches but accurately reflect the pace of new stamp designs during the mid-nineteenth century, which was far less regular than the publication schedule for Time magazine. A constant-data approach (say, grouping stamps by thirties) would have yielded a smoother animation, but I don’t think it would have been as effective if we’re looking for an animated history of stamp design.
This technique would probably yield even better results with coins—just picture an animation showing changes in United States coin designs from 1792 to present, year by year, with the different denominations laid out side by side, heads on top, tails on the bottom. However, aligning images of coins would require us not only to scale them consistently, but also to rotate them to a common angular orientation. I haven’t yet found any satisfactory way of doing that automatically, and doing it by hand would take some time.
What about subjects that don’t line up as neatly with each other as magazine covers, stamps, and coins do, and which aren’t faces or scenes (since those have specialized averaging tools available for handling them)? A few past time-based averaging efforts have entered upon such territory, one notable example being Jason Salavon’s Every Playboy Centerfold, The Decades (2002):
As with The Loop, Chicago, 1848-2007, Salavon appears to have overlaid his source images without trying to align them, which may be why he was able to get G-rated results from Playboy centerfolds: any salacious detail ends up blurred away, leaving behind an unresolvably abstract form (which I assume is precisely the aesthetic effect he was aiming at). Once again, it’s easy to imagine the same technique being used to yield an animation in which the target image would transform gradually over time rather than being shown broken up by decade. Another relevant project is Alejandro Almaraz’s Portraits of Power, which averages official portraits of state leaders over particular time spans—not just their faces, mind you, but whole portraits. Here’s “All the Presidents from Argentina from 1826 to 1892”:
As we can see, Almaraz has scaled and rotated his source images so that the subjects’ eyes, mouths, and so forth line up approximately with each other. His process leaves behind conspicuous traces of layering, but those may be desirable in this case. As Jordan Teicher comments at Slate: “in places like North Korea, the small number of layers in the images over long periods of time speaks to the degree of control held by single individuals.” If the quantity of layers is meaningful as an index of the number of transfers of power in a given state, it’s not something Almaraz would have wanted to conceal. Again, the same process could obviously be extended to create time-based animations.
If we want to align source images more precisely, though, one option we have is iterative morphing (which I’ve previously described here). We start by taking two source images and creating a morph sequence between them using standard morphing software, such as FotoMorph. All we have to do is manually assign pairs of corresponding warp points to both images, like this—
—and the software will create an animation in which the warp points move steadily between their positions in one image and their positions in the other image while the details of the two images are simultaneously cross-faded:
In the above example, I’ve strategically sacrificed a few less consistently-positioned details, such as the positions of the Christ child’s hands, in favor of more consistently-positioned ones which I’d expect to bring into focus across a larger number of sources.
Now we export the midpoint of the morph sequence, which is the average of the two source images; and we do the same thing a second time with two different source images. Next, we create a new morph sequence between the midpoints of the two earlier morph sequences. And then we export the midpoint of that morph sequence—which is the average of the four source images—and continue in this way until we’ve averaged together as many source images as we want.
Let’s see how the different approaches to alignment we’ve considered so far work for averaging eight “Madonna and Child” paintings of the thirteenth and fourteenth centuries. Here are our sources, which pair some basic structural similarities with significant differences in pose:
The average on the left shows the images stacked with no attempt whatsoever at alignment; the one in the middle has Mary’s eyes approximately lined up with each other (but with no rotation); and the one on the right was created through iterative morphing. Each option yields a different aesthetic impact. The version on the left reminds me of Salavon’s Every Playboy Centerfold, while the one in the middle reminds me of Almaraz’s Portraits of Power. But the version on the right plainly offers the sharpest results, and if our goal is to display historical imagery in informative ways, I think it’s the best choice. With the addition of even more source images, the other two versions would quickly become hazy blurs—witness the comparable average generated from one hundred source images by Justin Greenough:
Unfortunately, iterative morphing doesn’t scale very well as an averaging technique. To average my eight “Madonna and Child” images, I first had to average 1+2, 3+4, 5+6, and 7+8, and then (1+2)+(3+4) and (5+6)+(7+8), and finally ((1+2)+(3+4))+((5+6)+(7+8))—that’s seven independent morph sequences. For a sliding selection window moving forward one source image per frame, we’d need to create entirely new “trees” of averages for images 2-9, 3-10, 4-11, and so forth, and I don’t see any obvious strategies for improving efficiency over the whole sequence. Creating the sort of animation I’ve been describing in this way would be extremely time-consuming.
One alternative would be to use morphing software to warp all the source images in a project to a set template. The idea here would be to vary the “start” image while consistently using the template as the “end” image; to retain the detail of the “start” image rather than cross-fading as usual; and to stack the final frames of the sequences for averaging. In this way, we would end up with the details of all our source images warped to a common shape. I’ve confirmed that this can be done in the Pro or Deluxe versions of Abrosoft’s FantaMorph by setting the “feature curve” to a flat line:
This approach ought to work for a subject that doesn’t change shape significantly over time, or for a subject where change occurs but isn’t the focus of our interest (maybe the fashionably clothed figures shown in fashion plates). But it wouldn’t work for a subject that does change shape over time in ways we find meaningful. I don’t think it would be suitable for, say, creating an animated history of the average automobile from decade to decade or the average gravestone from century to century. And in any case, this would be immensely time-consuming work.
But there’s hope: a system called AverageExplorer under development by Jun-Yan Zhu, Yong Jae Lee, and Alexei Efros at the University of California, Berkeley, promises to make the creation of averaged images of miscellaneous subjects a lot more efficient. The research team writes:
Average images have been gaining popularity as a means of artistic expression and data visualization, but the creation of compelling examples is a surprisingly laborious and manual process. Our interactive, real-time system provides a way to summarize large amounts of visual data by weighted average(s) of an image collection, with the weights reflecting user-indicated importance. The aim is to capture not just the mean of the distribution, but a set of modes discovered via interactive exploration. We pose this exploration in terms of a user interactively “editing” the average image using various types of strokes, brushes and warps, similar to a normal image editor, with each user interaction providing a new constraint to update the average.
Here’s an illustration from their paper that shows the results they’ve been achieving through user-guided alignment:
Unfortunately, AverageExplorer doesn’t yet seem available for outsiders like me to play with. But if source images could simultaneously be clustered by date, I believe it could be used to carry out the alignment we’d need to create well-defined time-based animations of the evolution of subjects such as automobiles, gravestones, churches, houses, “Madonna and Child” paintings, and so forth—which means that if and when it ever does become available, all those projects would suddenly become feasible. And if we could add in Structure from Motion techniques on top of that, we could conceivably even rotate the results in three dimensions.
Then, once we’d built up a robust time-variant catalog of “average” faces, scenes, and things such as automobiles and clothing, the number of clues available to us for automatically dating otherwise undated historical photographs would obviously increase; and once these had been auto-dated, they could be fed back into the process in turn as additional source material. The long-term prospects for harnessing the world’s archives of historical imagery in this way are stunning. Some collections that might have seemed until now to be of relatively low value could turn out to be centrally important from this new perspective.
Time-based image averaging is a technique that can be applied equally well to faces, scenes, and other subjects. It can be used to create still averages covering long time spans, multiple averages with sources grouped by time period, or time-lapse video animations in which time-based averages are displayed rapidly in chronological order. It shares areas of overlap with digital averaging in general, super-long-term photographic exposures, time-lapse cinema, the creative curation of “found” media, and the subversive practice of eduction against the grain—that is, making inscriptions sensorily accessible in ways their creators never intended. I think it has a bright future ahead of it, with lots of room for exciting collaborations among programmers, historians, archivists, and artists.
Appendix: Notes on Specific Examples
Here are a few additional technical details I left out above because I didn’t want to bog things down.
Miss America animation #1: All images were warped in Abrosoft FaceMixer to the average shape for the entire data set, and exported; then I created a staggered combination of mean and median averages from these in Photoshop (i.e., mean 1 + median 1, then mean 2 + median 1, then mean 2 + median 2) using my Averager 1.0 script, averaged in groups of forty pairs; and finally re-averaged the result by threes to reduce flicker.
Miss America animation #2: I exported the Abrosoft FaceMixer result for each sequential group of forty source images, re-averaged the results in groups of five to reduce jitter, and then contrast-enhanced each frame.
Clickflashwhirr: The announced time span should have been 2,073 days, corresponding to 2,073 photos at one photo per day; but when I used VLC and the command line method described here to export every other frame from the video (since each source photo appeared to take up two frames), I ended up with only 632 images. I’m not sure why.
Abraham Lincoln: I loaded the source images into Abrosoft FaceMixer, arranged in roughly chronological order; manually adjusted the warp points and exported the individually warped results; opened them in Photoshop; blurred out Lincoln’s occluded left ear where it appeared, along with other problem areas; generated mean averages in successive overlapping groups of twenty source images; and then vignetted the result. Ordinarily I try to flip images as needed for the subject to face in the same direction, but I couldn’t do that here because of the mole that makes Lincoln’s face conspicuously asymmetrical.
Rembrandt animation: I averaged the eighty source images in overlapping groups of twenty, advancing one source image per frame, creating a median and mean average in both cases; and then I re-averaged the median and mean averages in groups of six (i.e., three pairs) to reduce flicker.
Time magazine: I downloaded my source images from Coverbrowser.com using DownThemAll, checking them for completeness and filling in a few gaps manually from the Time website. The latter website has an even more complete set of digitized covers, but I found I couldn’t use it as the principal source for my animation; it’s easy to scrape the images from the site for averaging by year or decade, but not for averaging them in their specific order by date, since neither the filenames nor the digital metadata seem to include sortable dates. As for the processing, I took a median average in groups of fifty-two, advancing two source images per frame; and then re-averaged the results in groups of five, advancing three source images per frame.
Postage stamps: Averaged in ten-year groups based on release date, advancing one year per frame, with the results re-averaged by threes to reduce flicker. I also had to make a judgment call about scale: the database already had most images set to a height of 300 pixels, and I kept that dimension for everything but taller-than-usual stamps, which I resized to a width of 300 pixels instead (since these are less common, the “overhang” at top and bottom was effectively blacked out in the averaged result and ended up cropped out of the frame). However, there are other approaches I could have taken, such as rescaling all stamps to a common height and width (which would have been easy) or scaling them all to their original relative sizes and just centering them (which would have required research into what those sizes were).
Mammoth Cave National Park webcam: The URLs of the individual images take the form http://www.nature.nps.gov/air/WebCams/parks/macacam/archive/maca2002_123_1300.jpg, where macacam and maca represent a particular webcam, 2002 is the year, 123 denotes the numerical day of the year, and 1300 is the time of day, in this case 1 PM. I couldn’t find a public directory from which I could conveniently batch-download these files, so I tried to predict what the sequential URLs would be, concatenated these in Excel, and then used a download manager to grab whatever was at each of them. The main difficulty was that time of day occasionally varied, such that a given webcam could have image URLs ending in, say, 1200.jpg, 1201.jpg, and 1300.jpg. I tried to work out the full range of possibilities for each webcam by manually spot-checking auto-generated URLs that had turned up File 404 errors in order to see, in each case, whether there was really a photo there that had been taken at a different time than I’d predicted. I think I ended up getting everything, but I can’t be sure. The still average actually resulted from my efforts to create an animation from day-of-the-year averages, highlighting the annual cycle of the seasons. To do this, I used Bulk Rename Utility to move the first nine characters of each filename to the end of the filename with a hyphen separating the parts, so that (for example) maca2002_23_1300.jpg became 23_1300-maca2002_.jpg. This meant that the files would now sort by the numerical day of the year, irrespective of specific year. I then auto-aligned the images in groups of roughly thirty days apiece. I left out a handful of images that didn’t auto-align correctly for whatever reason (day 231 of 2002, day 25 of 2004, and days 44 and 295 of 2007), as well as the first eighteen days of the first year, 2002, since they conspicuously show a tree on the left that’s missing from all the later images—it must have come down sometime on January 18th or 19th. Next, I manually selected all the images for each day of the year in turn (reckoned numerically from 1-366 rather than by month-and-day combinations), generated the median average for each of these daily data sets, and then re-auto-aligned the results for the entire annual cycle. As for leap years, I simply combined the images for days 365 and 366. Here’s the looped result:
For the composite still, I then generated a mean average from all the daily averages.