Grant is pleased as punch about BYU Professor Mark Davies’ new Google Books Corpus, which contains entries for every word ever in the entire Google Books database. In addition to parts of speech and definitions, the site provides contextual examples for each word. For example, the database has revealed that the word suitcase is often preceded by the adjective battered. Writers, teachers, English learners and language enthusiasts will love prospecting in this lexical goldmine. This is part of a complete episode.
Transcript of “Google Books Corpus”
You’re listening to A Way with Words. I’m Martha Barnette.
And I’m Grant Barrett, and I’m a very happy man.
Yeah! Why is that?
Well, it’s something called the Google Books Corpus.
Oh, that, yeah.
Mark Davies, who is a researcher at Brigham Young University,
Has a site where he’s made a lot of corpora.
Now, these are big collections of text that you can use to discover new things about words.
It’s what we call in the dictionary trade the pragmatics, how words are really used.
It’s like what words they keep company with.
Right.
So corpora from the Latin for body.
Exactly right.
Big body of…
One of them is a corpus and more than one are corpora.
And so what he’s done is taking Google Books, which is this massive online archive of text,
And he’s marked every word in it that he’s been given with a part of speech.
So the nouns are marked and the verbs are marked and so forth.
And then he’s put them into his program and his database, and we can search that in clever ways.
For example, if you type in the word suitcase, you’ll see that suitcase is often prefixed by the word battered.
Oh, really?
Battered suitcase is like a thing.
It’s actually so much of a thing, but it’s kind of a cliche.
So maybe if you’re a writer and you’re using this corpus, you might want to avoid it.
On the other hand, if you’re a dictionary editor and you want to provide a good illustrative sentence for your readers, right,
If you want to show how the word is really used, you might say,
He picked up his battered suitcase and went to the bus station, right?
Yeah.
So that’s just one of a thousand examples of the kind of things you can find there.
But for the average person who’s not a dictionary editor, I think the best use of these corpora are kind of doing dictionary work yourself.
Look, let’s put it this way.
Who arranges your travel now?
You do, right?
Yeah, right, right.
20 or 30 years ago, you called an agent, right?
Right, right, right.
When you want to do research on the best computer to buy, what?
Do you go into a store now?
Most of us don’t.
We do it online, right?
Oh, yeah.
We do our own digging.
Yeah, buying a car.
So many different things that we do now.
We go online.
We figure it out for ourselves because the data are out there.
And it’s the same with language.
If you go to Mark Davies’ site, just look for Mark Davies, BYU.
It’ll be the first thing you find.
You can do the kind of dictionary digging that I do for a living and get paid for it.
You can do it for yourself for free, right?
So it’s not a dictionary itself.
It’s all the raw material that somebody would use to make a dictionary.
It’s kind of a, frankly, for our audience, it’s a black hole because I just know that they’re going to go there and go,
Oh, just one more search.
Oh, just one more search.
Because you’ll type in words.
Here’s a favorite thing to do.
Go to Mark Davies’ site and type in the word instantly, and then type in the word instantaneously,
And you will see immediately that although they seem to be synonyms, they have differences.
For example, instantaneously is often used after the word it’s modified,
Where instantly is almost always used before the word it modifies.
Hours of fun for the whole family.
Well, no, but if you want to be a precise writer, and if you want to get, particularly if you’re not a native speaker of English, you can get closer to the native speaker intuition by getting in there and kind of just thinking about every word that you use.
The best writers think about every single word they put down in this way, and this is a tool to help you get there.
I mean, I don’t know that you should be looking up and and the and so forth, but for the most part, if you’re trying to decide between two words, this is a tool that can help you.
Interesting.
What’s the side again?
Just look for Mark Davies, that’s D-A-V-I-E-S, and B-Y-U in Google or Yahoo or wherever,
And it’ll be the first thing that comes up, I promise.
Cool.
It’s tremendous. It’s free.
You may have to register after a certain number of searches, but it’s still free.
And just think about using it to inform your own writing and your own speech in a way
Where you learn new things that you didn’t know about language.
So, Grant, are you saying that going to someplace like dictionary.com isn’t enough anymore?
I think in most cases going to an online dictionary is fine and will get you there.
But I also know from the email and the phone calls that we get, we have a lot of sophisticated language users who need a little more.
They need more example sentences, for example.
They need to find words that the dictionaries haven’t recorded yet, which is actually quite a large number.
Yeah, a lot of people don’t think of dictionaries as a work in progress.
Yeah, actually, dictionaries are usually far behind the language.
They’re slow to update, and they’re rather conservative in their inclusion policy.
So I think for a lot of our more advanced users, and frankly, it might be most of our listeners, there’s some use at Mark Davies’ site for them.
At the very least, it’s no different than just looking for a random page on Wikipedia and just seeing what you can see.
Type in your own name or the name of a town or your favorite verb and just see what comes up, and you might learn something new.
Interesting. Do-it-yourself.
Do-it-yourself lexicography.
Well, if you want to talk about dictionaries, grammar, language, slang, call us 877-929-9673
Or send those emails to words@waywordradio.org.

