Big Data and Analytics
Example Applications, Basic Nomenclature, Analysis Process Model, Analytical Model Requirements , types of Data Sources, Sampling, Types of data elements, data explorations, exploratory statistical analysis, missing values, outlier detection and Treatment, standardizing data labels, categorization
Big Data Technology
Hadoop’s Parallel World – Data discovery – Open source technology for Big Data Analytics – cloud and Big Data –Predictive Analytics – Mobile Business Intelligence and Big Data – Crowd Sourcing Analytics – Inter- and Trans-Firewall Analytics
Meet Hadoop
Data, Data Storage and Analysis ,Comparison with Other Systems,RDBMS,Grid Computing Volunteer Computing, A Brief History of Hadoop,ApacheHadoop and the Hadoop Ecosystem Hadoop Releases Response
The Hadoop Distributed File system
The Design of HDFS, HDFS Concepts, Blocks, Namenodes and Datanodes, HDFS Federation, HDFS High-Availability, The Command-Line Interface, Basic Filesystem Operations, HadoopFilesystems Interfaces ,The Java Interface, Reading Data from a Hadoop URL, Reading Data Using the FileSystem API, Writing Data, Directories, Querying the Filesystem, Deleting Data, Data Flow Anatomy of a File Read ,Anatomy of a File Write, Coherency Model, Parallel Copying with distcp Keeping an HDFS Cluster Balanced, Hadoop Archives
Map Reduce
A Weather Dataset ,Data Format, Analyzing the Data with Unix Tools, Analyzing the Data with Hadoop, Map and Reduce, Java MapReduce, Scaling Out, Data Flow, Combiner functions, Running a Distributed MapReduce Job, Hadoop Streaming, Hadoop Pipes, Compiling and Running, Developing a MapReduce Application, The Configuration API, Combining Resources, Variable Expansion, Configuring the Development Environment, Managing Configuration, GenericOptionsParser, Tool and ToolRunner, Writing a Unit Test, Mapper, Reducer, Running Locally on Test Data, Running a Job in a Local Job Runner, Testing the Driver, Running on a Cluster, Packaging, Launching a Job, The MapReduce Web UI, Retrieving the Results, Debugging a Job, Hadoop Logs, Remote Debugging.
Question paper pattern:
Text Books:
1. Bart Baesens, “ Analytics in a Big Data World : The Essential Guide to Data Science and its Applications” Wiley
2. Michael Minelli, Michehe Chambers, “Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses”, 1st Edition, Michael Minelli, Michele Chambers, AmbigaDhiraj, Wiley CIO Series, 2013.
3. Tom White, “Hadoop: The Definitive Guide”, 3rd Edition, O’reilly, 2012
Reference Books:
1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”, Wiley, ISBN: 9788126551071, 2015.
2. Chris Eaton, Dirk deroos et al. , “Understanding Big data ”, McGraw Hill, 2012.
3. VigneshPrajapati, “Big Data Analytics with R and Haoop”, Packet Publishing 2013.
4. Tom Plunkett, Brian Macdonald et al, “Oracle Big Data Handbook”, Oracle Press, 2014.