Analyzing Big Data with Hive
The main objective of this course is to help you understand complex architectures of Hadoop and its components, guide you in the right direction to start with, and quickly start working with Hadoop and its components. It covers everything what you need as a Big Data Beginner. Learn about Big Data market, different job roles, technology trends, history of Hadoop, HDFS, Hadoop Ecosystem, Hive and Pig. In this course, we will see how as a beginner one should start with Hadoop. This course comes with a lot of hands-on examples which will help you learn Hadoop quickly.
Analyzing Big Data
Introduction
Motivation for Hadoop
Distributed Computing Challenges
Hadoop File System (HDFS)
Mapreduce
Word Count Example
Demo Basic Hadoop Command and environment Setup
Introduction to Hive
Hive Motivation
Hive Architecture
Hive principles Schema-On-Read
Hive Warehouse
Hive Query Language Basics
Creating Database and Tables with Hive
Working with Hive Tables and loading data into Warehouse
Loading Data into hive and managing external table
Data Types
Type Conversions
Managed Partitioned Tables
External Partitioned Tables
Multiuser and dynamic partition Inserts
Loading Data use case
Data Retrieval Group-By function
Sorting and Controlling data Flow
The Command line and Variable substitution
Bucketing
Bucketing and Block sampling
Joins
Joins in depth & joins Optimization
Map-side Joins for Bucketed Tables
Distributed Cache
UDTFs Explode and Lateral View
Extending Hive _Creating your own UDF
Hive Compiling and testing custom UDF
Extending Hive Custom UDF
Hive Initialization File
Accessing the distributed cache
Hadoop Streaming and Transform
Windowing and Analytic Function