As an economist, my research has been enriched by the growing body of digitized text—historical newspapers, magazines, books, legal briefs, sermons, even personal diaries—that is now searchable. I have long felt that historical information is of great importance for economic research; we shouldn’t bless models about historic economic events without listening to what people were saying when they were happening (which does not mean that we believe everything they say).
For decades, I conducted research using microfilm of the New York Times and the annual printed volumes of the New York Times Index at the Yale Sterling Memorial Library. But microfilm research was a slow and limited process. The digitization of newspapers has enabled search engines to make researching historical perspectives faster and more productive, with more sources. This newfound availability of digitized data has been growing for several decades and is now a vast resource, for academic scholars as well as for the general public.
Google Books Ngrams Viewer adds an additional dimension, enabling us to learn about the sources themselves—including how new books have changed through time. Google Ngrams is free and available to everyone, on their mobile phones even, and the site has a blanket statement that “Ngram Viewer graphs and data may be freely used for any purpose” with attribution.
To linguists, an “ngram” is a sequence of n words that recurs in a corpus of words: “digitization” and “metadata” are both 1-grams, while “search engine” is a 2-gram (these are displayed in the accompanying graph). For every year going back centuries, Google Books Ngrams provides for that year the count for any ngram chosen in every book published in their digitized corpus as a percent of all ngrams in the database. The percents are very low—for “search engine” in the English corpus it peaks at 0.000250%—that is, 1 in 400,000. But that is not a small number if one considers the vast number of ngrams out there. Phrases that we have all heard, like “delicious coffee” or “king of spades”, have even lower ngrams percentages.
In the 1800s, people thought differently about prices of real estate, preferring to talk of “land prices” rather than “home prices.” But they were just as lured by speculative excitement as we are today.
The way in which people use words and the interpretation of those words has changed over time. For example, prompted by suspicions about bubble-thinking about home prices in recent years, I did a Google Ngram search for “home prices.” I was surprised to see that in the 1800s home prices were talked about frequently, but this 2-gram often referred to prices of commodities in the home country, as opposed to foreign prices of the same commodities. People thought differently about prices of real estate, preferring to talk of “land prices” rather than “home prices.” But they were just as lured by speculative excitement as we are today.
Google Books Ngrams Viewer also allows one to see a selection of the books that were used to come up with the ngrams counts. This is very helpful. Searching selectively for the 2-gram “land prices” during the severe recession of 1837-1840 that followed a speculative boom in U.S. land prices revealed that people were just as speculative about land prices then as we are today when discussing stock prices, crypto prices, or housing prices. To understand what they were thinking, it helps to hear them talking. I found through that search an 1839 historical novel about land speculation by Hannah Bowen Allen, Farmer Housten and the Speculator: A New England Tale. A search of digital news sources revealed that the book is quite forgotten today. However, I enjoyed reading it, for it showed me that human speculative impulses at the time were in some ways from a different world and in other ways completely familiar.
This newfound digitization has profoundly changed my access to historical economic research, with the help of search engines and student research assistants who searched and read digitized historical news sources such as Google Ngrams. Having the ability to search extensively in news reports and letters to the editor and self-help advice columns and then actually being able to read the original text is a bit like going back in a time machine with my student assistants. In effect, it is like asking people living at previous turning points in history what they were doing and why—and, in the process, understanding their perspectives on others’ thinking at the time.
This research method will be even better in the future as digitization and metadata become more inclusive and as search engines and machine learning techniques get even better.