The data sets are just tweets about the teachers strike. The bar graph shows more insight in that among the tweets users mention different words during tweets and the frequency of these words is what I have used to plot this graph. Also check out the word cloud below.
For more of this kind of analysis follow me on twitter
Twitter text mining is a very well documented area
of data mining. I will therefore explain little of it here just to make this
article comprehensive. In twitter text mining I will be using the package
twitter and tm in R. The main function will be searchTwitteR{} because of its
ability to data mine with specific details such as location, dates, language
and many more.
#Load packages
required(tm)
required (twitteR)
Libarary(wordcloud)
#Authorize twitter API
reqURL<-"https://api.twitter.com/oauth/request_token"
access_token<-" refer from the twitter
app"
access_token_secret<-" refer from the
twitter app"
apiKey<-" refer from the twitter app"
apiSecret<-" refer from the twitter
app"
setup_twitter_oauth(apiKey,apiSecret,access_token,access_token_secret)
#Scrape twitter using searchTwitter function
#This function picks tweets with Brexit
#The language is null to accommodate other languages
#Date has been set from 23-june-2016
#location is actually worldwide
tweets<-searchTwitter("Brexit",n=1000,lang
= NULL,since =
"2016-06-23",until=NULL,locale=NULL,geocode=NULL,sinceID=NULL,maxID=NULL)
tweets_text<-sapply(tweets,function(x)x$getText())
Now we have the tweets but there other things that
is contained in the tweets that we do not need. We do not need emojis, stop
words like, the, why and many more. So we now proceed to removing this unwanted
sets in our data.
#removing emojis in r
tweets1<-iconv(tweets_text,'UTF-8','ASCII')
Bulid a corpus and save the tweets as data frame
tweets_corpus=Corpus(VectorSource(tweets1))
tdm=TermDocumentMatrix(
tweets_corpus,
control=list(
removePunctuation=TRUE,stopwords=c("England","Wales","Scotland",stopwords("english")),
removeNumbers=TRUE,tolower=TRUE)
)
m=as.matrix(tdm)
# get word counts in decreasing order
word_freqs=sort(rowSums(m),decreasing=TRUE)
# create a data frame with words and their
frequencies
dm=data.frame(word =
names(word_freqs),freq=word_freqs)
wordcloud(dm$word,dm$freq,random.order=FALSE,colors=brewer.pal(8,"Dark2"))
returning the number of tweets scrapped
length(tweets)

No comments:
Post a Comment