Search
Close this search box.

About Data Engineering with Apache Beam

Takes a participant from no knowledge of Beam to being able to develop with Beam professionally. It covers the reasons why Beam is changing how we do data engineering. There is an in-depth coverage of Beam’s features and API. The class ends with a consideration of how to architect Big Data solutions with Beam and the Big Data ecosystem.

Duration: 1 day

Intended Audience: Technical, Software Engineers, QA, Analysts

Prerequisites: Intermediate-Level Java

You Will Learn

  • What exists in the Big Data ecosystem so you can use the right tool for the right job.
  • An understanding of how MapReduce works and how each phase works.
  • What are Java 8 Lambdas and how they make your Beam code humanly readable.
  • The basics of coding a Beam pipeline with Java to build your Big Data foundation.
  • What is Avro, how it works with Beam, and how top data engineers use it to make maintainable and evolving data schemas.
  • How Beam uses windows to make it easy to sessionize and trigger on time frames.
  • How to integrate and use Beam with the rest of your Big Data systems.

Course Outline

Thinking in Big Data
  Introducing Big Data
  What is Beam?
  Introduction to MapReduce
Getting Ready for Beam
  Using Eclipse
  Using Apache Maven
  Functional Programming
Coding With Beam
  Beam Model
  Beam API Pipelines
  Beam API Processing
Avro and Beam
  Avro
  Beam and Avro
Advanced Beam
  Joins
  Beam Operations
  Side Inputs
  Unit Testing
Windowing Beam
  Windowing
  Windowing API
Beam Runners
  Possible Runners
  Choosing a runner
Beam and Ecosystem
  Real-time Beam
  Kafka Pub/Sub
  BigTable
  BigQuery
Conclusion

Technologies Covered

  • Apache Beam
  • Apache Hadoop
  • Apache Spark
  • Apache Kafka

I want this class