MTech Introduction To Data Science syllabus for 2 Sem 2020 scheme 20MAE252

Module-1 Introduction 0 hours

Introduction

What is Data Science? - Big Data and Data Science hype - Current landscape of perspectives - Skill sets needed, Statistical Inference - Populations and samples - Statistical modeling, probability distributions, fitting a model - Introduction to R

Module-2 Exploratory Data Analysis and the Data Science Process 0 hours

Exploratory Data Analysis and the Data Science Process

Basic tools (plots, graphs and summary statistics) of EDA - Philosophy of EDA - The Data Science Process - Case Study, Three Basic Machine Learning Algorithms - Linear Regression - k-Nearest Neighbors (k-NN) - kmeans , Data Wrangling

A d v e r t i s e m e n t
Module-3 Feature Generation and Feature Selection 0 hours

Feature Generation and Feature Selection (Extracting Meaning from Data)

Feature Selection algorithms – Filters; Wrappers; Decision Trees; Random Forests, Recommendation Systems, Dimensionality Reduction - Singular Value Decomposition - Principal Component Analysis

Module-4 Mining Social 0 hours

Mining Social-

Network Graphs - Social networks as graphs - Clustering of graphs - Direct discovery of communities in graphs - Partitioning of graphs - Neighbourhood properties in graphs

Module-5 Data Visualization 0 hours

Data Visualization

Basic principles, ideas and tools for data visualization, Examples of inspiring (industry) projects, Data Science and Ethical Issues - Discussions on privacy, security, ethics - A look back at Data Science - Next-generation data scientists

 

Course outcomes:

At the end of the course the student will be able to:

1. To apply data science and related skill sets

2. To understand Statistical Inference, probability distributions commonly , statistical modeling and model fitting

3. Apply R to carry out basic statistical modeling and analysis.

4. Apply exploratory data analysis (EDA) in data science.

5. To Apply the data science process

 

Question paper pattern:

The SEE question paper will be set for 100 marks and the marks scored will be proportionately reduced to 60.

  • The question paper will have ten full questions carrying equal marks.
  • Each full question is for 20 marks.
  • There will be two full questions (with a maximum of four sub questions) from each module.
  • Each full question will have sub question covering all the topics under a module.
  • The students will have to answer five full questions, selecting one full question from each module.

 

Textbook/ Textbooks

1 Doing Data Science, Straight Talk From The Frontline. Cathy O’Neil and Rachel Schutt O'Reilly Media 2014

2 Mining of Massive Datasets. v2.1 Jure Leskovek, Anand Rajaraman and Jeffrey Ullman Cambridge University Press 2014

 

Reference Books

1 Machine Learning: A Probabilistic Perspective Kevin P. Murphy MIT Press 2012

2 Data Science for Business Foster Provost and Tom Fawcett O'Reilly Media 2013

3 Foundations of Data Science Avrim Blum, John Hopcroft and Ravindran Kannan K.,Vachtsevanos, George J Cambridge University Press 2020.

4 Data Mining and Analysis: Fundamental Concepts and Algorithms Mohammed J. Zaki and Wagner Miera Jr Cambridge University Press 2014.