Search
Close this search box.

About Professional Spark Development

Takes a participant from no knowledge of Apache Spark to being able to develop with Spark professionally. It covers the main technologies of Hadoop: HDFS and MapReduce. There is an in-depth coverage of essential Big Data and Hadoop ecosystem technologies. The class ends with a consideration of how to architect Big Data solutions with Hadoop and its ecosystem.

Duration: 3 days

Intended Audience: Technical, Software Engineers, QA, Analysts

Prerequisites: Intermediate-Level Java

You Will Learn

  • What exists in the Big Data ecosystem so you can use the right tool for the right job.

  • An understanding of how HDFS works and how to interact with it.

  • An understanding of how MapReduce works and how each phase works.

  • An understanding of how Spark works and how each phase works.

  • What are Java 8 Lambdas and how they make your Spark code humanly readable.
  • The basics of coding a Spark job with Java to build your Big Data foundation.
  • The various API methods in Spark and what they do.
  • How SQL can be used with a Spark job and when that vastly improves your productivity and code.
  • How to create Java code that runs as a function during a Spark SQL command to use existing Java code or do use case specific queries.
  • How to process data in real-time with Spark.
  • How to integrate and use Spark with the rest of your Big Data systems.

Course Outline

Professional Spark Development
Thinking in Big Data
  Introducing Big Data
  What is Hadoop?
  The Ecosystem
  Introduction to HDFS
  Introduction to MapReduce
Coding With Spark
  About Spark
  Using Eclipse
  Using Apache Maven
  Functional Programming
  Java API
  Built-In Transformations and Actions
Advanced Spark
  Advanced API
  Shuffles
  Caching
  Avro
  Spark and Avro
  Unit Testing
Spark SQL
  Spark SQL
  Spark SQL API
  Spark SQL UDFs
Spark Streaming
  Spark Streaming
  Streaming API
  Advanced Streaming
Integrating Spark
  Real-time Systems
  Using With Hadoop MapReduce
  Replacing Other Systems
Conclusion

Technologies Covered

  • Apache Spark
  • Apache Hadoop
  • Apache Kafka

I want this class