Naive Bayes vs Binary Logistic regression using R

The Naïve Bayes method is one of the most frequently used machine learning algorithms and is used for classification problems. It can therefore be considered as an alternative to Binary Logistic regression and Multinomial Logistic Regression. We have discussed these in previous tutorials In this tutorial we’ll look at Naïve Bayes in detail. Data for the case study can be downloaded here.

(more…)

Time Series Decomposition in R

In a previous tutorial, we discussed the basics of time series and time series analysis. We looked at how to convert data into time series data and analyze this in R. In this tutorial, we”ll go into more depth and look at time series decomposition.

We’ll firstly recap the components of time series and then discuss the moving average concept. After that we’ll focus on two time series decompositions – a simple method based on moving averages and the local regression method.

(more…)

Binary Logistic Regression in Python – a tutorial Part 1

In this tutorial, we will learn about binary logistic regression and its application to real life data using Python. We have also covered binary logistic regression in R in another tutorial. Without a doubt, binary logistic regression remains the most widely used predictive modeling method. Logistic Regression is a classification algorithm that is used to predict the probability of a categorical dependent variable. The method is used to model a binary variable that takes two possible values, typically coded as 0 and 1

(more…)

Binary Logistic Regression – a tutorial

In this tutorial we’ll learn about binary logistic regression and its application to real life data. Without any doubt, binary logistic regression remains the most widely used predictive modeling method.

(more…)

Binary Logistic Regression with R – a tutorial

In a previous tutorial, we discussed the concept and application of binary logistic regression. We’ll now learn more about binary logistic regression model building and its assessment using R.

Firstly, we’ll recap our earlier case study and then develop a binary logistic regression model in R. followed by and explanation of model sensitivity and specificity, and how to estimate these using R.

(more…)

Multiple Linear Regression in R – a tutorial

Multiple Linear Regression (MLR) is the backbone of predictive modeling and machine learning and an in-depth knowledge of MLR is critical to understanding these key areas of data science. This tutorial is intended to provide an initial introduction to MLR using R. If you’d like to cover the same area using Python, you can find our tutorial here

(more…)

Predictive Analytics – An introductory overview

We’ll begin with an introduction to predictive modelling. We’ll then discuss important statistical models, followed by a general approach to building predictive models and finally, we’ll cover the key steps in building predictive models. Please note that prerequisites for starting out in predictive modeling are an understanding of exploratory data analysis and statistical inference.

(more…)

T Distribution , Kolmogrov Smirnov, Shapiro Wilk Tests

In a previous tutorial we looked at key concepts in statistical inference. We’ll now look at T Distribution , Kolmogrov Smirnov, Shapiro Wilk, and standard parametric tests. Parametric tests are tests that make assumptions about the parameters of the population distribution from which a sample is drawn. We’ll begin with normality assessment using the Quantile-Quantile Plot (also called the Q-Q plot), the Shapiro-Wilk test and the Kolmogrov Smirnov test. Then, we’ll cover T distribution briefly. Finally, the one sample t-test, which is a standard parametric test will be looked in detail.

(more…)

What is Statistical Inference – Key concepts

In this session, we’ll learn the concept of Statistical Inference.  Statistical inference is a vast area which includes many statistical methods from analyzing data to drawing inferences or conclusions in research or business problems. It plays a vital role in the application of data science across industries.

(more…)