A weblog by Will Fitzgerald

Computational lexicography on the cheap using Twitter

Let’s say you want to investigate the use of “tweet” as a verb (see “Tweet this” at Language Log), and you want to collect, oh, 10,000 examples or so and do some concordance work, for example:

What iss the most popular question then? Tweet    the answer and hopefully u may only get asked 500 times?
what is there to                         tweet    about this morning?
What is your biggest food weakness?      Tweet    @Thintervention for motivation! #thinterventionG

This is simple to do with a bash command line, perl, Ruby, the Tweetstream gem, and a spreadsheet program (or just plain old grep).

To download 10,000 tweets containing “tweet,” “tweets”, or “tweeting” and save them in a file called “tweet.tweets”:

> @client = TweetStream::Client.new('user','pass')
> File.open("tweet.tweets", "w+") do |f|
        n = 0
        @client.track('tweet','tweets','tweeting') do |s|
            n+= 1
           @client.stop if n >= 10000
           f.puts "#{s.text}"

When these are finished downloading, you can tab separate the contexts using perl, and sort on the right context:

> cat tweet.tweets | perl -pe 's/\b(tweet|tweets|tweeting)\b/\t$1\t/gi' |sort -f -k2,3 -t\t    > tweets.txt

You can then import this file into your speadsheet program and slice and dice to your heart’s content.

Note: it took longer to write this blog post than it did to collect the data. Analysis to follow, though!


3 responses to “Computational lexicography on the cheap using Twitter

  1. Pingback: Tweet is a Radio Verb « Will.Whim

  2. Pingback: Text is a radio verb « Will.Whim

  3. Pingback: Computational Social Science on the cheap using Twitter « Will.Whim

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: