Twitter Data-Mining

The Library of Congress is archiving the entire content of Twitter. Grant explains why that’s a gold mine for language researchers like David Bamman at Tufts University. You can see some of the results Bamman’s compiled at Lexicalist.com. This is part of a complete episode.

Transcript of “Twitter Data-Mining”

You’re listening to A Way with Words. I’m Martha Barnette.

And I’m Grant Barrett.

Martha, I saw that you linked to something on our Facebook page the other day.

You did?

Yeah, it was that story by Randall Strauss in the New York Times about Twitter.

And he’s talking about Twitter, this short messaging service, in terms of history and how things are being recorded there that we’ll want to know later.

Oh, right, right. Data mining.

Right, data mining. Well, the Library of Congress will keep a big file of it.

I’m sure that archive.org will keep a file of it.

Other people will keep it and mine it and do stuff with it.

And what’s interesting to me is I think it’s nice to see somebody in a big place like the New York Times get it.

Because as far as you and I are concerned, Twitter is an incredibly useful tool.

It’s not just about communicating with friends, right?

It’s not just about tuna sandwiches for lunch.

No, no, no.

I mean, obviously, it depends on who you’re following.

But if you’re following your friends, you’re getting personal messages.

If you’re following famous people, you get a little bit of marketing stuff.

But also, they’ll tell you some of their inside life.

But the thing is, there’s information stored in these apparently meaningless messages, right?

In bulk, Twitter is incredibly useful.

It’s a place where you can observe the brightest minds of our day having conversations with each other, right?

Yeah, 140-character conversations, right?

You know, and often they’re linking to their own blog or to news stories that they’ve written or places that they’ve appeared in radio and television where they all explain their 140-character ideas at length, right?

Let’s just turn it around.

Imagine that Isaac Newton was alive today, right?

And you could observe him working out the principles of physics in real time.

That’s kind of what we’re seeing here.

Oh, wow.

Tweeting about all that.

Yeah, imagine that.

That’s really cool.

And it doesn’t diminish the discoveries that he made, right?

Right.

You’re watching him in real time, though.

You’re watching him figure it out, make mistakes.

It’s everybody.

We’re watching media change before our eyes.

People like Jay Rosen at NYU, he’s constantly tweeting about new forms of media, right?

And I think that Randall Strauss in New York Times kind of got it in terms of linguistics as well.

There’s a fellow by the name of David Bammen at Tufts University who’s created a new website called Lexicalist.

L-E-X-I-C-A-L-I-S-T dot com.

Lexicalist dot com.

And what he’s done is use the information stored in Twitter, such as location.

Like when, you know, Martha’s in San Diego, so everything she sends out is kind of located in San Diego to show that there are regional differences in things like the pronunciation or the spelling of the word bro, short for brother.

In parts of the country, they’re more likely to say bra.

And he can show on a map that Twitter has this evidence.

He can do linguistic research.

Oh, wow.

Based on the body of Twitter.

Based on the Twitter.

That kind of conversation is easy to pass off as irrelevant.

But in the real data, there’s evidence that people talk a different way in other parts of the country.

They use different words, different pronunciations.

They say different things about different subjects.

And it’s a great example of why I would never want this to go away.

How interesting.

So instead of linguists walking around with tape recorders and microphones or digital recorders.

There’s still always be room for that.

Always a need for that.

Yeah, but they actually have this whole other thing to analyze.

That’s right.

That’s what you’re saying.

That’s right, a new database that’s kind of made on the fly by people who are unselfconsciously recording in text form their character, their dialect, their ideolect, their language.

They’re showing language change.

And in 10 or 20 years, when we look at this over time, we’ll find out even more about how language changes.

That’s very cool.

Well, if you want to let us know what you’re thinking, you can always communicate with us the old-fashioned way, 1-877-929-9673.

Or send an email to words@waywordradio.org.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from this show