17MCA452 Big Data Analytics syllabus for MCA



A d v e r t i s e m e n t

Module-1 Big Data and Analytics 8 hours

Big Data and Analytics

Example Applications, Basic Nomenclature, Analysis Process Model, Analytical Model Requirements , types of Data Sources, Sampling, Types of data elements, data explorations, exploratory statistical analysis, missing values, outlier detection and Treatment, standardizing data labels, categorization

Module-2 Big Data Technology 8 hours

Big Data Technology

Hadoop’s Parallel World – Data discovery – Open source technology for Big Data Analytics – cloud and Big Data –Predictive Analytics – Mobile Business Intelligence and Big Data – Crowd Sourcing Analytics – Inter- and Trans-Firewall Analytics

Module-3 Meet Hadoop 8 hours

Meet Hadoop

Data, Data Storage and Analysis ,Comparison with Other Systems,RDBMS,Grid Computing Volunteer Computing, A Brief History of Hadoop,ApacheHadoop and the Hadoop Ecosystem Hadoop Releases Response

Module-4 The Hadoop Distributed File system 8 hours

The Hadoop Distributed File system

The Design of HDFS, HDFS Concepts, Blocks, Namenodes and Datanodes, HDFS Federation, HDFS High-Availability, The Command-Line Interface, Basic Filesystem Operations, HadoopFilesystems Interfaces ,The Java Interface, Reading Data from a Hadoop URL, Reading Data Using the FileSystem API, Writing Data, Directories, Querying the Filesystem, Deleting Data, Data Flow Anatomy of a File Read ,Anatomy of a File Write, Coherency Model, Parallel Copying with distcp Keeping an HDFS Cluster Balanced, Hadoop Archives

Module-5 Map Reduce 8 hours

Map Reduce

A Weather Dataset ,Data Format, Analyzing the Data with Unix Tools, Analyzing the Data with Hadoop, Map and Reduce, Java MapReduce, Scaling Out, Data Flow, Combiner functions, Running a Distributed MapReduce Job, Hadoop Streaming, Hadoop Pipes, Compiling and Running, Developing a MapReduce Application, The Configuration API, Combining Resources, Variable Expansion, Configuring the Development Environment, Managing Configuration, GenericOptionsParser, Tool and ToolRunner, Writing a Unit Test, Mapper, Reducer, Running Locally on Test Data, Running a Job in a Local Job Runner, Testing the Driver, Running on a Cluster, Packaging, Launching a Job, The MapReduce Web UI, Retrieving the Results, Debugging a Job, Hadoop Logs, Remote Debugging.

 

Question paper pattern:

  • The question paper will have ten questions.
  • Each full question consists of 16 marks.
  • There will be 2full questions (with a maximum of four sub questions) from each module.
  • Each full question will have sub questions covering all the topics under a module.
  • The students will have to answer 5 full questions, selecting one full question from each module.

 

Text Books:

1. Bart Baesens, “ Analytics in a Big Data World : The Essential Guide to Data Science and its Applications” Wiley

2. Michael Minelli, Michehe Chambers, “Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses”, 1st Edition, Michael Minelli, Michele Chambers, AmbigaDhiraj, Wiley CIO Series, 2013.

3. Tom White, “Hadoop: The Definitive Guide”, 3rd Edition, O’reilly, 2012

 

Reference Books:

1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”, Wiley, ISBN: 9788126551071, 2015.

2. Chris Eaton, Dirk deroos et al. , “Understanding Big data ”, McGraw Hill, 2012.

3. VigneshPrajapati, “Big Data Analytics with R and Haoop”, Packet Publishing 2013.

4. Tom Plunkett, Brian Macdonald et al, “Oracle Big Data Handbook”, Oracle Press, 2014.

Last Updated: Tuesday, January 24, 2023