Perhaps
you have heard about R and its unlimited capability but not the experience of R
and market basket analysis. First and foremost I would like to elaborate on
market basket analysis. MBA as it is frequently abbreviated, explains the
combinations of products that frequently co-occur in transactions. For
instance, people who buy bread and eggs, also tend to buy butter as many of
them are planning to make an omelet.
Marketing team should target customers who
buy bread and eggs with offers on butter, to encourage them to spend more on
their shopping basket.
Market
basket analysis or association rules as it is well referred, is a
well-documented area of data mining. In R platform, the analysis has been done
under arules package, and the visualization is done by arulesViz package.
Majority of blogs I have seen uses the Grocery data from the arules package. However, in
this tutorial I intend explain how to use a different data set (csv file) to fulfill you
analytics desire.
Basic definition in association rules
Items (Products)
An
item is the single product in the basket, each line is called
a transaction and each column in a row represents an item.
The
most logical representation of item is
I = {i1, i2,…,in }
Support
The support of a product or set of products is the fraction
of transactions in our data set that contain that product or set of products.
High support should be taken into consideration as opposed to lower ones.
Confidence
Confidence is conditional probability that customer buying an
item A will also buy item B.
Lift
It is the percentage increase of buying product B when one
buys product A. We do look at rules with lift of more than one.
Lift(im=>in)=Support(imuin)/(Support(im)*Support(im))
Basket Analysis with R
As opposed to my previous post customer data analysis in Kenya in which i illustrated the importance of customer data analysis to supermarkets/retailers. In this post I intend to show how basket analysis can be done in R. For this analysis you can download the dataset here. The data set is in CSV format and must be
imported to R in a transaction class. Use the code bellow for importation. First to remember to download latest R for this analysis to avoid syntax errors.
rm(list = ls()) #clear the memory
install.packages("arules") #installing required arules packages,
install.packages("arulesViz") #installing required arulesViz packages
library(arules) # Load the libraries
library(arulesViz) # Load the libraries
#importing csv file into R as a transaction class.
#rm.duplicate is meant remove any duplicate transactions
Data<-read.transaction(“D:directory/data.csv”, format = "basket",sep =
",",rm.duplicates=TRUE)
Upon importing the data set into R, try to check its form
just for comparison purposes. The data is in csv format, the rows represents transactions and each record represent the item purchased.
size(basket) # size function will display the imported transactions
It should give the following output. This means that every
product has been assigned a product ID
LIST(basket[1:3]) #To confirm the transaction, use the list function to display first three
# calculates support for frequent items
frequentItems<-eclat(basket,parameter=list(supp
=0.07,maxlen = 15))
itemFrequencyPlot(basket,topN=20,type="absolute",col="green")#
plot frequent items
Lets apply the apriori function from arules package. a low support and a high confidence will help to extract the relationship even for less overall co-occurrences in the data.
rules<-apriori(basket,parameter=list(supp=0.01,conf=0.5,maxlen=1000))
# show the support, lift and confidence for all rules
options (digits=5) # Show the top 5 rules, but only 5 digits
inspect (rules[1:8]) # Show the top 5 rules, but only 5 digits
To sort the rules with respect to confidence. The code sorts the rules from the highest confidence to the lowest. Decreasing is set to TRUE
rules1<-sort
(rules, by="confidence",decreasing=TRUE) # 'high-confidence' rules.
To
Remove Redundant Rules
The code bellow remove the redundancy in the rules. It filters the non-redundant.
redundant<-which(colSums(is.subset(rules1,rules1))>1)
# get redundant rules in vector
rulesnow<-rules1[-redundant]
# remove redundant rules
To
find out what customers had purchased before buying ‘KETCHUP'
Ketchupprior<-apriori(data=basket,
parameter=list(supp=0.01,conf = 0.08),
appearance =
list(default="lhs",rhs="Ketchup"),
control=list(verbose=F)) # get
rules that lead to buying 'whole milk'
Ketchuppriorrules<-sort(Ketchupprior,
decreasing=TRUE,by="confidence")
inspect(Ketchuppriorrules[1:5])
#
Interactive Plot
plot(Ketchupprior[1:25],method="graph",interactive=TRUE,shading="confidence",main="Products
before Ketchup") # feel free to expand and move around the objects in this
plot
plot(Ketchupprior,
measure=c("support", "lift"),
shading="confidence",main="Products before Ketchup")

Excelent article. Could you please send me the data to jsalinas@lamolina.edu.pe ? Thanks
ReplyDelete