Google Books Ngram Viewer
Google Books engineering manager Jon Orwant writes (http://googleblog.blogspot.com/2010/12/find-out-whats-in-word-or-five-with.html) of the tool: ". . . we hope the Google Books Ngram Viewer will spark some new hypotheses ripe for in-depth investigation, and invite casual exploration at the same time." The Viewer is a visualization tool that draws an over 5.2 million book sample (American English, British English, Chinese (simplified), English, English fiction, French, German, Hebrew, Russian, and Spanish) from the Google corpus of over 15 million books. It enables a user to discover and graphically display the appearance of words and phrases across time, suggesting the ebb and flow of ideas, changes in style and usage, and historical change. It does this by counting the appearance of ngrams (words or phrases) in books - not by counting books that host a given ngram(s). By supporting the display of multiple datasets on the same graph, it reveals, or at least suggests, correlations.
What is an Ngram? Ngram is a technical term for a sequence of letters of any length. The Ngram dataset is comprised of over 500 billion words. The chronological tables that appear below a graph do not lead directly to the dataset; but, instead, represent a search across the entire Google Books corpus. There is no link to the books in the dataset. The tables offer excellent access to an almost unimaginable wealth of primary source material. Use the smoothing capability provided in graphing to give granularity or trend focus to a search. You can emphasize individual years or long trends. Use of this feature will dramatically change the graphic presentation of a search. "Smoothing": "Often trends become more apparent when data is viewed as a moving average. A smoothing of 1 means that the data shown for 1950 will be an average of the raw count for 1950 plus 1 value on either side: ("count for 1949" + "count for 1950" + "count for 1951"), divided by 3. So a smoothing of 10 means that 21 values will be averaged: 10 on either side, plus the target value in the center of them."
For background on the Viewer and the searches it makes possible, visit: http://www.culturomics.org/. Here are links to pivotal articles. You will find the experiments of many Ngram creators by simply doing a Google search. Also, try the Twitter http://twitter.com/ hastags #ngram and #culturomics. And see Anthony Grafton's piece in the newsletter of the American Historical Association, Nature News' Culturomics: Word Play http://www.nature.com/news/2011/110617/full/474436a.html, and the Harvard University Press Blog's Culturomics, Close Reading, and Casaubon http://harvardpress.typepad.com/hup_publicity/2011/06/culturomics-close-reading-and-casaubon.html. For a reflection on why cultyuromics is not the same as doing history, see Analyzing Culture with Google Books: Is It Soical Science? http://www.miller-mccune.com/media/culturomics-an-idea-whose-time-has-come-34742/.
Books in American English, British English, Chinese (simplified), English, English fiction, French, German, Hebrew, Russian, and Spanish, 1500-2008. All datasets may be downloaded http://ngrams.googlelabs.com/datasets.
- American National Corpus
- Corpus of Contemporary American English
- Corpus of Historical American English
- British National Corpus
- TIME Magazine Corpus