top of page
programming-background-with-person-working-with-codes-computer.jpg

Analyzing Big Data with Hive

The main objective of this course is to help you understand complex architectures of Hadoop and its components, guide you in the right direction to start with, and quickly start working with Hadoop and its components. It covers everything what you need as a Big Data Beginner. Learn about Big Data market, different job roles, technology trends, history of Hadoop, HDFS, Hadoop Ecosystem, Hive and Pig. In this course, we will see how as a beginner one should start with Hadoop. This course comes with a lot of hands-on examples which will help you learn Hadoop quickly.

  • Analyzing Big Data

  • Introduction

  • Motivation for Hadoop

  • Distributed Computing Challenges

  • Hadoop File System (HDFS)

  • Mapreduce

  • Word Count Example

  • Demo Basic Hadoop Command and environment Setup

  • Introduction to Hive

  • Hive Motivation

  • Hive Architecture

  • Hive principles Schema-On-Read

  • Hive Warehouse

  • Hive Query Language Basics

  • Creating Database and Tables with Hive

  • Working with Hive Tables and loading data into Warehouse

  • Loading Data into hive and managing external table

  • Data Types

  • Type Conversions

  • Managed Partitioned Tables

  • External Partitioned Tables

  • Multiuser and dynamic partition Inserts

  • Loading Data use case

  • Data Retrieval Group-By function

  • Sorting and Controlling data Flow

  • The Command line and Variable substitution

  • Bucketing

  • Bucketing and Block sampling

  • Joins

  • Joins in depth & joins Optimization

  • Map-side Joins for Bucketed Tables

  • Distributed Cache

  • UDTFs Explode and Lateral View

  • Extending Hive _Creating your own UDF

  • Hive Compiling and testing custom UDF

  • Extending Hive Custom UDF

  • Hive Initialization File

  • Accessing the distributed cache

  • Hadoop Streaming and Transform

  • Windowing and Analytic Function

bottom of page