Displaying Historical Newspapers as Motion Pictures

The rise of digital newspaper archives has opened up many new possibilities for cultivating the periodical past.  Old newspapers have obviously become easier to search and more convenient to read than they were before, but here I’d like to explore another novel opportunity we have for experiencing them: as motion pictures made up of long runs of successive front pages.  By animating old newspapers as movies on these terms, we can see their long-term graphic evolution unfolding vividly before our eyes.  This wouldn’t have been feasible in the age of crumbling bound volumes, or in the age of microfilm that followed, but today it is, thanks to the wonderful malleability of digital data.

And the results are, in a sense, authentic motion picture actualities.  Granted, they won’t revivify any motion that could originally have been seen “live”—not even in the sense of a time-lapse video of a seed growing into a plant.  But we sometimes also speak of motion linked to more abstract forms: in 1872 the volume and issue number moved from the right to the left side of the masthead.  Moreover, the expression “motion picture” is itself something of a misnomer.  What “motion pictures” convey isn’t motion as such, but visual appearance as a function of time.  This might involve motion, but not necessarily.  Imagine a video of a traffic light switching from green to yellow and then to red, or of a chameleon changing color: these are changes of state, even if they’re brought about on some level through movement (of electrons and guanine nanocrystals, respectively).  Or consider security camera footage in which nothing at all happens.  If those records of visual appearance over time would count as legitimate “motion pictures,” then why shouldn’t the same be true of a comparable record of the appearance over time of a newspaper’s front page?

I first brought this idea up in a blog post called “Time-Based Image Averaging” back on October 31, 2016.  I’d just shared an animated GIF of the evolving front cover of Time magazine from 1923 through 2006, presented as a moving average, but I added:

The same approach would lend itself to plenty of other subjects as well.  The front pages of newspapers would be an obvious choice—we could watch the typical layout of page one of the New York Times evolve over the decades, for example.  And how about websites?  It shouldn’t be hard to come up with an automated means of scraping the complete Wayback Machine archive of any URL, converting the HTML into images, and then creating an animation from the results.  That would be worthwhile even without averaging.

At that point, I’d already tried some experiments with digitized historical newspaper front pages—both averaged and un-averaged—but downloading the images by hand was awfully time-consuming, and I’d run into difficulties with video stabilization besides, so I’d put the idea on hold.  But the world continued to turn in the meantime.  In February 2017, just four months after I’d published my post, Josh Begley uploaded a video to Vimeo entitled Every NYT front page since 1852.  He’d come up with an idea I hadn’t, which was to display multiple front pages at once in a nine-by-five grid.  The chronic misalignment of scanned pages from issue to issue is less conspicuous with a nine-by-five grid than it would be with a single page taking up the entire frame, and I assume Begley had also found some way to batch-download the front-page images themselves.  His strategy does a fine job of illustrating what one reviewer calls “The Rise of the Image”: namely, the gradual transition from an all-text front page to one routinely featuring photographs and other pictures.  On the other hand, the scale of the display is too small and the speed too fast for the viewer to grasp other interesting details, such as the evolution of the masthead, and the aspect ratio also seems a little off at points.  Begley’s video is wonderfully effective at illustrating the point about front-page illustrations, but it doesn’t do quite what I’d had in mind, which was less to make new movies out of front pages than to treat sequences of front pages themselves as existing or potential movies—movies in need of nothing but eduction.  They are, I’d argue, already defined as a “thing”: newspapers organized as successions of numbered issues with the front page of each serving as its visual icon.  Digital repositories often single them out for chronological display as it is.  And animating a chronological sequence of images is a well-established convention, right?

The process would ideally embrace three steps: (1) batch-downloading appropriate groups of online images and converting them to video; (2) aligning the images with each other; and (3) optionally averaging the aligned images using a sliding time window.  I’d already figured out step #3 some time ago, and I’ve now worked out step #1 as well, but step #2 is the stumbling block—a special case of the more general problem of rigid image registration, for which I can’t say I’ve yet found a fully satisfactory solution.  And we can’t carry out step #3 effectively without step #2.  So in the present post, I’ll be focusing on step #1.  The raw results of that step can be interesting and appealing in their own right, as Begley’s video shows.  They tend to be highly unstable, but that arguably just reinforces the continuity with other historical motion pictures, which sometimes look distressed in much the same way due to factors such as gate weave (which plugins have even been designed to mimic on purpose).  Working our way further through the analogy, it might be argued that step #2 resembles film restoration more closely than it does film preservation: a horse of another color, with no shame in leaving it to be tamed another day.

The first script I wrote was designed to scrape the front pages of any newspaper archived in Chronicling America: Historic American Newspapers and assemble them into a video.  This archive contains over two thousand different newspapers, so the first thing we need to do is specify which one we want by entering the number that uniquely identifies it.  We can find the number for any given newspaper by clicking on the “More Info” link next to it in this list and scrolling down to the LCCN field (“Library of Congress Control Number”), where it appears prefixed by “sn”; or we could just hover over any of the links associated with the newspaper and copy the number from the URL.  If we want to create a video for, say, the Hillsborough (NC) Recorder, we just need to plug in its number, which happens to be 84026472, and set the date range from March 1, 1820 (the first available issue) to March 5, 1879 (the last available issue).  After letting the script run its course, we get this:


Here’s a link to a sample MATLAB script similar to the one I used to generate that video, except that this one is set up to create a movie from the front pages of all available issues of the Evening Star (Washington DC)—newspaper number 83045462—from October 9, 1854 through April 20, 1924.  I chose the Evening Star because it’s the newspaper with the most issues available in Chronicling America: a whopping 22,426 of them.  In practice, I’ve found that MATLAB often hangs up unaccountably somewhere in the 1890s when I run this script, so it might be wiser to tackle a run of this magnitude in smaller chunks and then stitch the separate video segments together afterwards.  The line defining pickDateFormatted is optional, but it displays each date as the process runs so that we can keep track of our progress.  The “try/catch” loop accomplishes a few different things at once: it’s designed to skip missing image URLs, to wait out any protracted server outages, and to keep things going in the event of a corrupt image file.  The thumbnails vary slightly in size from issue to issue, but the video needs each frame to have the exact same dimensions, so I resize everything to a consistent height and then pad it out to a 4:3 aspect ratio.  The script also saves a MAT file with a name matching the output AVI file to preserve some data about each video frame for future reference: frameDate records its date, while frameWidth records its width after resizing but before padding.  This script harnesses the site’s low-resolution thumbnail images, but higher-resolution results could be obtained by substituting “/ed-1/seq-1.jp2” to work from the large JP2 files instead, at the expense of longer processing times.

I’ve verified that my script still works as of this writing, but changes to the website design have broken it in the past, including a switch from “http:” to “https:” sometime within the last year.  In any case, you don’t need to run the script yourself to enjoy some of its results: here’s the full run of Evening Star thumbnails from 1854 through 1924.


With a few modifications, the same approach can be used to scrape images from other repositories.  For example, I’ve adapted it to grab the front pages of newspapers from Gallica, first downloading the html and then extracting an arbitrary unique identifier from it that’s needed to form a URL for downloading the desired image file.  The string used to identify an individual newspaper is part of the URL of its main page, for example http://gallica.bnf.fr/ark:/12148/cb34448033b/date for La Presse, with identifier cb34448033b.  If we assemble a URL ending with a specific date, such as cb34448033b/date18360615, this redirects to another URL of the form bpt6k426719v.item, from which we can generate a link in turn to a downloadable thumbnail, bpt6k426719v.lowres.jpg.  (To boost resolution, we could instead opt for medres.jpg or highres.jpg.)  Sometimes the available run of a newspaper is split up among a few different identifiers for different date ranges or variant titles, in which case we might need to switch between them.  Take Le Figaro, a leading daily newspaper that began life as a satirical weekly back in 1826.  Early dates are split among identifiers cb344484501 (1826-1834), cb344482258 (1835-1838), and cb344551004 (1839-1840), reflecting a somewhat erratic publication history during those years.


Then, after a fourteen-year hiatus, Gallica coverage resumes with a nicely straightforward run from 1856-1942, including the transition to daily publication in November 1866 (cb34355551z):


I’ve taken a similar approach to NewspaperArchive.com, harnessing publicly-available thumbnail images that don’t require membership credentials to access.  My code first downloads the page associated with each day’s newspaper as html and then extracts the numerical string from it to form the correct URL for the image download.  Because of the website’s file structure, it’s also necessary to enter a few more variables than in the case of Chronicling America or Gallica: the newspaper name, the country, the state, and the city, all exactly as they appear on the site itself.  For this post, I’ve produced a movie from the front pages of my hometown newspaper, the Vidette-Messenger (Valparaiso, Indiana) for the whole fifty-year period of coverage, running from 1927 through 1977.  The necessary variables can all be found in the URL of the home page for the newspaper: https://newspaperarchive.com/us/indiana/valparaiso/valparaiso-vidette-messenger/.  In running this particular example, I found that the wrong page had occasionally been identified as page one, although not so often as to seriously mar the resulting effect.  Glitches of other kinds occur too; for some reason, every now and then a frame turns out to be the front page of an issue of the Salt Lake Tribune!


The Chicago Tribune maintains its own online archive of back issues.  It’s now restricted to paid subscribers, but I tapped it when it was still in “free” beta mode back in August 2017 and scraped all the front page thumbnails covering the years then available (1857-1991), which were nicely predictable in form—for example, http://archives.chicagotribune.com/1964/08/01/page/1/xsmall.jpg.  The following video might not be possible to create today, but it was easy to pull together at the time.


There are plenty of other online newspaper archives where these came from.  Pick one and see what you can do.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.