Some N-grams info
 
Notifications
Clear all

Some N-grams info

Posts: 0
Guest
(@Anonymous)
Joined: 1 second ago

The Google Ngram viewer has a lower-limit of n-grams that are in at least 40 books. That means your search might exist in the corpus but lies below that threshold. Other corpora may have this same 40 citation limitation. Looks like a size restriction to avoid very large dataset sizes.

BYU has an impressive number of resources which appear to augment and even surpass Google's Ngram viewer.
They detail the differences and strengths for most of theirs in links on the corpus description page.
The soap opera corpus is unexpected and interesting.

http://corpus.byu.edu/
corpora,Β size,Β queriesΒ =Β better resources,Β more insight

http://corpus.byu.edu/coca/x.asp?r1=&w=600&h=1024
CORPUS OF CONTEMPORARY AMERICAN ENGLISH
The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the onlyΒ large and balancedΒ corpus of American English.
The corpus contains more thanΒ 520 million wordsΒ of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. It includesΒ 20 million words each year from 1990-2015Β and the corpus is also updated regularly (the most recent texts are from December 2015). Because of its design, it is perhaps the only corpus of English that is suitable for looking atΒ current, ongoing changesΒ in the language.

http://corpus.byu.edu/coha/x.asp?r1=&w=600&h=1024
CORPUS OF HISTORICAL AMERICAN ENGLISH
The Corpus of Historical American English (COHA) is the largest structured corpus of historical English.
COHA allows you to quickly and easily search more thanΒ 400 million words of text of American English from 1810 to 2009. You can see how words, phrases and grammatical constructions have increased or decreased in frequency, how words have changed meaning over time, and how stylistic changes have taken place in the language. It's a lot more than just frequency charts for individual words and phrases (like withΒ Google BooksΒ /Β Culturomics) -- although those types of searches can be done here as well, and yield essentially the same results as Google Books.

http://corpus.byu.edu/bnc/x.asp?r1=&w=600&h=1024
BYU-BNC: BRITISH NATIONAL CORPUS
This website allows you to quickly and easily search theΒ 100 million wordΒ British National CorpusΒ (1970s-1993). The BNC was originally created byΒ Oxford University PressΒ in the 1980s - early 1990s, and now exists in various versions on the web.

http://corpus.byu.edu/soap/
CORPUS OF AMERICAN SOAP OPERAS 100 MILLION WORDS, 1990-2012
(for very informal language)


Recent posts