Big Data

The information age gives businesses of all kinds access to Big Data that’s growing in volume, variety, velocity & complexity. With more data coming from more sources faster than ever, the question is: what is your Big Data strategy? How are you combining new and existing data sources to make better decisions about your business? How could new data sources including social, sensors, location and video help improve your business performance? Will your Big Data remain dormant or will you make it work for you?

As your trusted partner, Raz Systems will work to help you bring order to your Big Data. With proven expertise in mature technologies and thought leadership in those that are emerging, our team of senior-level consultants will help you implement the technologies you need to manage and understand your data – allowing you to predict customer demand and make better decisions faster than ever before. Whatever your Big Data challenges are, we’ll provide you the strategic guidance you need to succeed.



Learn from Database Expert with 18 years of experience in developing databases.

Start Date:
Frequency: Every Saturday
Timing: 9:00 A.M. to 5:00 P.M.
Duration: 32 Hours

Fee: $499

Prepares for Certification Path: Cloudera Certified Developer for Apache Hadoop (CCDH) exam
Contact: edu@razsystems.com , for details
Important: Max 35 students, qualified students First Come First Server

Brief:Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

  • The internals of MapReduce and HDFS and how to write MapReduce code
  • Best practices for Hadoop development, debugging, and implementation of workflowsand common algorithms
  • How to leverage Hive, Pig, Sqoop, Flume, Oozie, Mahout, and other Hadoop ecosystem projects
  • Optimal hardware configurations and network considerations for integrating a Hadoop
  • cluster with the data center
  • Writing and executing joins to link data sets in MapReduce
  • Advanced Hadoop API topics required for real-world data analysis
  • Introduction to HBase

Audience: This course is best suited to developers and engineers who have some programming experience. Prior knowledge of Apache Hadoop is not required.

Reference Book: hadoop the definitive guide 3nd edition

Course Details

  • Introduction
  • The Motivation for Hadoop
  • Problems with Traditional
  • Large-Scale Systems
  • Requirements for a New Approach
  • Introducing Hadoop
  • Hadoop: Basic Concepts
  • The Hadoop Project and
  • Hadoop Components
  • The Hadoop Distributed File System
  • How MapReduce Works
  • Running a MapReduce Job
  • How a Hadoop Cluster Operates
  • Other Hadoop Ecosystem Projects
  • Writing a MapReduce Program
  • The MapReduce Flow
  • Basic MapReduce API Concepts
  • Writing MapReduce Drivers, Mappers
  • and Reducers in Java
  • Writing Mappers and Reducers
  • in Other Languages Using the
  • Streaming API
  • Speeding Up Hadoop Development
  • by Using Eclipse
  • Writing a MapReduce Program
  • Differences Between the
  • Old and New MapReduce APIs
  • Unit Testing MapReduce Programs
  • Unit Testing
  • The JUnit and MRUnit Testing Frameworks
  • Delving Deeper into the Hadoop API
  • Using the ToolRunner Class
  • Decreasing the Amount of
  • Intermediate Data with Combiners
  • Writing Unit Tests with MRUnit
  • Tests with the MRUnit Framework
  • Implementing a Combiner
  • Setting Up and Tearing Down
  • Mappers and Reducers by Using the
  • Configure and Close Methods
  • Writing Custom Partitioners for
  • Better Load Balancing
  • Accessing HDFS Programmatically
  • Using The Distributed Cache
  • Using the Hadoop API’s Library of
  • Mappers, Reducers and Partitioners
  • Practical Development Tips
  • and Techniques
  • Strategies for Debugging
  • MapReduce Code Testing MapReduce Code Locally
  • by Using LocalJobReducer
  • Writing and Viewing Log Files
  • Retrieving Job Information
  • with Counters
  • Determining the Optimal Number
  • of Reducers for a Job
  • Creating Map-Only MapReduce Jobs
  • Data Input and Output
  • Creating Custom Writable
  • and WritableComparable
  • Implementing Custom Input Formats
  • and Output Formats
  • Issues to Consider When Using File Compression
  • Common MapReduce Algorithms
  • Sorting and Searching Large Data Sets
  • Performing a Secondary Sort 
  • Implementations
  • Saving Binary Data Using
  • SequenceFile and Avro Data Files
  • Indexing Data
  • Computing Term Frequency
  • Inverse Document Frequency
  • Calculating Word Co-Occurrence
  • Joining Data Sets in MapReduce Jobs
  • Writing a Map-Side Join
  • Writing a Reduce-Side Join
  • Integrating Hadoop into the
  • Enterprise Workflow
  • Integrating Hadoop into an
  • Existing Enterprise
  • Loading Data from an RDBMS into
  • HDFS by Using Sqoop
  • Managing Real-Time Data Using
  • Flume
  • Accessing HDFS from Legacy
  • Systems with FuseDFS and HttpFS
  • Machine Learning and Mahout
  • Introduction to Machine Learning
  • Using Mahout
  • An Introduction to Hive and Pig
  • The Motivation for Hive and Pig
  • Hive Basics
  • Pig Basics
  • Recommender
  • Choosing Between Hive and Pig
  • An Introduction to Oozie
  • Introduction to Oozie
  • Creating Oozie Workflows
  • An Introduction to HBase
  • Oracle Big data solution using exadata.