Course : Data Science – Basic
Course Duration : 25 Hours
Batches on:
1st August 2017
1 September 2017
R Programming Software
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, surveys of data miners,and studies of scholarly literature databases show that R’s popularity has increased substantially in recent years
Introduction to R
- Principle and software paradigm
- Description of R interface
- Advantages of R
- Drawbacks of R
Advance Data Manipulation in R (Packages like DPLYR, PLYR,SQLDF, MASS)
- Importing and exporting data from .txt files and .xls -like files
- Advanced data manipulation
- Accessing variables and management of subsets in data
- Working with characters, text and dates
Module 1 – Fundamental of Statistics
- Types of Variables, measures of central tendency and dispersion
- Variable Distributions and Probability Distributions
- Normal Distribution and Properties
- Central Limit Theorem and Application
Module 3 : Data Preparation
- Need for data preparation
- Outlier treatment
- Missing values treatment
- Multicollinearity
Module 5 : Machine Learning Algorithm
- Decision tree
- NaïveBayes Algorithm
- K-NN Classification& Regression
Module 2 : Statistical Significant Tests
- Hypothesis Testing Null/Alternative Hypothesis formulation
- Z‐Test,T‐Test, Chi‐Squaretest
- Analysis ofVariance(ANOVA)
- ChiSquareTest
- Correlation
Module 4 : Predictive modeling & Time Series Analysis
- Basics of regression analysis
- Linear regression
- Logistic regression
- Interpretation of results
- Multivariate Regression modeling
Case Study Project:
- Customer Marketing Response Predictive Modeling
- Patient Satisfaction Analysis
- Call Centre Effectiveness Predictive Modeling
- Customer Segmentation for Cross Sell-UpSell Modeling