This course provides a technical overview of Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course provides an optional primer for those who plan to attend a hands-on, instructor-led courses.
- Describe the use case for Hadoop
- Identify Hadoop Ecosystem architectural categories
- Data Management
- Data Access
- Data Governance and Integration
- Detail the HDFS architecture
- Describe data ingestion options and frameworks for batch and real-time streaming
- Explain the fundamentals of parallel processing
- See popular data transformation and processing engines in action
- Apache Hive
- Apache Pig
- Apache Spark
- Detail the architecture and features of YARN
- Describe how to secure Hadoop
- Operational overview with Ambari
- Loading data into HDFS
- Data manipulation with Hive
- Risk Analysis with Pig
- Risk Analysis with Spark and Zeppelin
- Securing Hive with Ranger
No previous Hadoop or programming knowledge is required. Students will need browser access to the Internet.
Data architects, data integration architects, managers, C-level executives, decision makers, technical infrastructure team, and Hadoop administrators or developers who want to understand the fundamentals of Big Data and the Hadoop ecosystem.