Saturday, 20 June 2015

Data Pre-Processing before predictive modeling with R


Every data analyst need to Pre-process the data before commencing any analysis tasks. In my desktop, there is an excel sheet named modeling. Actually I chose the name because of my intended purpose for this article. I am going to use the data sets to do some modeling, but prior to that I have to prepare my data for the main task. The preparation is simple but very important for any serious data analyst who wants to get insight to his or her data.

Transformation for a Single Predictor


Centering and scaling the predictor variables


To center a predictor variable, the average predictor value is subtracted from all the values. As a result of centering, the predictor has a zero mean



To scale the data, each value of the predictor variable is divided by its standard deviation. Scaling the data coerce the values to have a common standard deviation of one



These manipulations are generally used to improve the numerical stability of some calculations.



Transformations to Resolve Skewness




Another form of preprocessing is transformation to remove skewness (not symmetric). A right skewed distribution has humped to the left and vice versa. A very common way to determine whether a data set is skewed is calculate the ratio of highest to lowest values, if the ratio is greater than 20 then there is significance skewness.



Replacing the data with square root, log or inverse may help in removing the skewness.



Transformation for multiple predictors



Transformations to Resolve Outliers



Most people understand skewness but for a data set one has to remember the following:

With small sample sizes, the outliers might be as a result of a skewed distribution where there are not yet enough data to see the skewness. Or the data may indicate a special part of the population under study that was just starting to be sampled for the survey.



There are a lot about this that I cannot spell out here, or else it will not be a blog but a book.

Other data processes include

  • Principal component Analysis
  • Dealing with Missing Values

  • Removing Predictors

  • Adding Predictors

  • Binning Predictors

  • Spatial Sign





R codes for pre-processing data




I will start by first conducting principal component analysis. I do assume that as a data analyst you are conversant with this type of transformation and therefore I will not go into detail of explaining the whole thing here. In any case you are newbie then PCA as mostly known is a analysis to determine those predictors mostly explain the model as opposed to using all models to model.



PCA

The aim of this transformation is to determine those variables that greatly contribute to Pregnancy


#import my data set to R for the analysis

data<-read.csv("C:/Users/doe/Desktop/modelling.csv",header=T)

#load all the package you need here

library(caret)

library(corrplot)

library(e1071)

library(lattice)

# I will not explain what the packages do for now, pardon me please

#the simple code bellow will center, Scale and perform PCA

pcaObject<-prcomp(data,center=TRUE,scale=TRUE)

#am hundred percent sure the data set is transformed, that simple aha!

#rotation stores the variable loadings, where rows correspond to predictor variables and columns are #associated with the components:




That marks the end of beginning today, follow visit the website for more information or contact the Unitary Analytics (Intelligent Data Analytics) offices in Nairobi Kenya.

Predictive Modeling in R


Prescient modeling or investigation includes a mixed bag of statistical procedures from modeling, machine learning, and data mining that analyze present and recorded truths to make expectations about future, or generally obscure, occasions.In business, predictive models adventure examples found in recorded and value-based information to distinguish risk and opportunities. Models catch connections among numerous components to permit appraisal of danger or potential connected with a specific arrangement of conditions, directing choice making for applicant exchanges.The characterizing useful impact of these specialized methodologies is that predictive analytics gives a predictive score for every person such customer, employee, healthcare patient, product, vehicle, component, machine, or other organizational unit with a specific end goal to , inform, or influence organizational processes that pertain across large numbers of individuals, such as in marketing, credit risk assessment, fraud detection, manufacturing, healthcare, and government operations including law enforcement.

Business Applications of Predictive Analytics
Associations of all sizes apply predictive analytics to make operational decisions, both online and offline, crosswise over advertising, deals and past. Which business use of predictive analytics is best for you is a key question, and relies on upon which sort of choice you decide to take, the way prescient scores will best serve to drive decisions inside of your firm.At Unitary Analytics we apply predictive models to do the following:

Credit Scoring
This is utilized all through monetary administrations. Scoring models transform a client's financial record, advance application, client information, and so forth, with a specific end goal to rank loan applicant with their probability of making future credit installments on time.

Client Predictions Drive Operational Decisions
Prescient scores are the brilliant eggs delivered by prescient investigation – one prescient score for each client or prospect. Every client's score, thus, advises what move to make with that client. Business knowledge simply doesn't get more significant than this sort of choice computerization.Prescient investigation is connected from numerous points of view to help organizations conquer a plenty of difficulties. The center distinction in one method of utilization to another is in what's being anticipated. Anticipating client reaction, snap, or deserting are each altogether different things, and convey business esteem in distinctive ways.This just an overview of what we will be discussing in the main article, after all you need to have the theory first before we embark on real R programming.



Tuesday, 9 June 2015

Welcome to Data Analytics with R

If you want to derive meaning out of you data the you will almost obviously wing up with R or Python. Today we will be discussing R.