Selected Youtube Tutorials

  1. Linear Regression: The aim of linear regression is to model a continuous variable Y as a mathematical function of one or more X variable(s), so that we can use this regression model to predict the Y when only the X is known. This mathematical equation can be generalized as follows: Y = β1 + β2X + ϵ, where, β1 is the intercept and β2 is the slope. Collectively, they are called regression coefficientsϵ is the error term, the part of Y the regression model is unable to explain. The video can be found here: Linear Regression in R & source-code can be found here: Linear_Regression_R_Code                                          
  2. Tree Based Models: Recursive partitioning is a fundamental tool in data mining. It helps us explore the structure of a set of data, while developing easy to visualize decision rules for predicting a categorical (classification tree) or continuous (regression tree) outcome. Classification and regression trees (as described by Brieman, Freidman, Olshen, and Stone) can be generated through the rpart package. In R, decision tree uses a complexity parameter (cp). It measures the tradeoff between model complexity and accuracy on training set. A smaller cp will lead to a bigger tree, which might overfit the model. Conversely, a large cp value might underfit the model. Underfitting occurs when the model does not capture underlying trends properly. The video can be found here: Tree based models in R & source-code can be found here: Tree based models code3. Machine learning (ML) continues to grow in importance for many organizations across nearly all domains. Some example applications of machine learning in practice include: Predicting the likelihood of a patient returning to the hospital (readmission) within 30 days of discharge. Segmenting customers based on common attributes or purchasing behavior for targeted marketing. Predicting coupon redemption rates for a given marketing campaign. Predicting customer churn so an organization can perform preventative intervention. In essence, these tasks all seek to learn from data. To address each scenario, we can use a given set of features to train an algorithm and extract insights. These algorithms, or learners, can be classified according to the amount and type of supervision needed during training. This tutorial is focused on implementing 5 most popular ML algorithms (Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel and Random Forest (RF)) in R. The video can be found here: Machine learning in R & source-code can be found here: ML_techniques_in_R