Twitter word clouds explained

Word Cloud for my Twitter Account (pbaumgartner)

Word Cloud for my Twitter Account (pbaumgartner)

In yesterday’s post, I have experimented with R packages for generating Twitter Word clouds. In this post, I will give some hints how to proceed. I will also refer to my GitHub repository, where you can find the complete program code. I have added some examples in generating all the twitter clouds for all member of the IBM staff with a Twitter account, for the department and the university account. 

Steps for generating twitter word clouds 

1. Generate Twitter API key

For the purpose of authentication, you have to get a Twitter API key. You have to create an application in Twitter via Creating a Twitter application is free and you don’t need to know all the details for programming a Twitter API. This is done by the R packages twitteR

There are several tutorials how to get the Twitter API key: See for instance this YouTube Video or read the article on R-bloggers.

2. Install R and copy the R word cloud program

If you haven’t installed R yet, read one of the many tutorials: For instance: How to install R and a Brief Introduction to R. I recommend also to install RStudio as THE interactive integrated development environment (IDE) for R. (You must install first R, and after that RStudio.) If you want more to do with R as just producing the word cloud, then you should read the (in my opinion) best and still very gentle introductory book by Hadley Wickam: R for Data Science. It is free available on the internet!

You have to fill in your authentication keys and the user account for the word cloud. For instance, the line with my account would be:

user = 'pbaumgartner'

3. Experiment with the different parameters

The last task before you run the program is to adapt the parameters for your word cloud.

Twitter Word clouds: Setting parameters

 # experiment with different settings of the parameters
 if (require(RColorBrewer)) {      # using color palette from RColorBrewer
     pal <- brewer.pal(9,"Blues")  # sequential color palettes
     pal <- pal[-(1:4)]            # for a one color (shaded) appearance
     wordcloud(                    # call the essential function
        words,                     # used words by this account
        freqs,                     # frequencies of every word in this account
        scale = c(4.5, .3),        # size of the wordcloud
        min.freq = 6,              # high (5+) if not many different words
        max.words = 200,           # use less (100) if the account is new 
                                   # (< 500 tweets)
        random.order = FALSE,      # most important words in the center
        random.color = FALSE,      # color shades provided by RColorBrewer
                                   # remove RcolorBrewer and set to TRUE
        rot.per = .15,             # percentage of words 90% rotated
        colors = pal)              # use shaded color palette from RColorBrewer

You see this is a little bit complex as there are many different parameters. The best and fastest way is to duplicate the program snippet above and to run it as a separate program. For this, it is essential that the hard word (text mining and transforming the data from the Twitter account is already done and all the variables are still in the R memory.

Examples of Twitter word clouds

You can see a big difference in comparison with the clouds I have published yesterday. This time I have adjusted the parameters so that all word cloud have a similar size and have more or less the same amount of information. You can see the parameters I have used on this page here.

The Twitter account of Wolfgang Rauter is a very new one. So he has not many tweets yet (21). Therefore I had to tweak the parameters. Instead of using a minimum frequency of 5(yesterday)  I had to use 1 and to limit to 100 (yesterday: 200) words.

Adjusted Word Cloud for @wolfgangrauter

Adjusted Word Cloud for @wolfgangrauter

Wordcloud @wolfgangrauter

Wordcloud not adjusted @wolfgangrauter








Another interesting tweaking example is the timeline of @donau_uni.  The word ‚presseaussendung‘ (yesterday) is very dominant (frequency = 240) and destroys a nice appearance of the word cloud. I could delete this word from the list or – I used a greater scale for the cloud with the effect that the huge word ‚presseaussendung“ could not be displayed in the predefined limits.

Word Cloud adjusted: @donau_uni

Word Cloud adjusted: @donau_uni

Wordcloud not adjusted @donau_uni









Flattr this!

Verschlagwortet mit , , , . Bookmark the permalink.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.

Time limit is exhausted. Please reload CAPTCHA.