Send your request
Send your request
Send your request
Course objectives

Introduction to Apache Spark as an effective data processing tool.

Benefits

Participants will learn and understand:

  • Apache Spark platform architecture
  • Development environment
  • Principles of building scalable applications
  • Building data flows using Apache Spark
  • Modern concepts addressing many known problems in the world of Big Data (Delta Lake)

Participants will be able to:

  • Create applications using Apache Spark
  • Create applications that process large amounts of data
Training scope
  • Apache Spark – architecture
  • Historical overview
    • Solution architecture
    • Running the application
    • Monitoring
    • Troubleshooting / debugging
  • Data processing with Apache Spark
    • RDDs, DataFrames, and DataSets
    • Spark SQL
    • Joins
    • File formats
    • Data aggregation
  • Preparing the development environment to work with Apache Spark (part conducted using Scala)
    • Working with IntelliJ
    • Introduction to SBT
    • Passing parameters / configuration using external libraries
    • Code testing
  • Delta Lake – a format that facilitates data processing
    • Introduction to the concept
    • Writing / reading data using the Delta format
    • The most important functions and differences compared to classic files (Parquet / ORC)
Audience

All people involved in the software development process, members of project teams. Developers familiar with the basics of programming, who want to learn the basics of the Scala language necessary to effectively write applications using Apache Spark.

Course objectives

Introduction to Apache Spark as an effective data processing tool.

Benefits

Participants will learn and understand:

  • Apache Spark platform architecture
  • Development environment
  • Principles of building scalable applications
  • Building data flows using Apache Spark
  • Modern concepts addressing many known problems in the world of Big Data (Delta Lake)

Participants will be able to:

  • Create applications using Apache Spark
  • Create applications that process large amounts of data
Training scope
  • Apache Spark – architecture
  • Historical overview
    • Solution architecture
    • Running the application
    • Monitoring
    • Troubleshooting / debugging
  • Data processing with Apache Spark
    • RDDs, DataFrames, and DataSets
    • Spark SQL
    • Joins
    • File formats
    • Data aggregation
  • Preparing the development environment to work with Apache Spark (part conducted using Scala)
    • Working with IntelliJ
    • Introduction to SBT
    • Passing parameters / configuration using external libraries
    • Code testing
  • Delta Lake – a format that facilitates data processing
    • Introduction to the concept
    • Writing / reading data using the Delta format
    • The most important functions and differences compared to classic files (Parquet / ORC)
Audience

All people involved in the software development process, members of project teams. Developers familiar with the basics of programming, who want to learn the basics of the Scala language necessary to effectively write applications using Apache Spark.

The number of participants: 8-15 people

Duration: 2 days

Available language: PL / EN

Available course material: PL / EN

Course form
Presentation, workshop, exercises, discussion.

Download materials

Download

Thank You!

Sorry, something went wrong.

Please refresh the page and try again.

File upload error

No file was uploaded

Incorrect file size (max 5MB)

Invalid file format

Empty file

Processing...

Download materials

Download

Contact our experts with questions about any of our trainings

Natalia & Agata

Training Practice Team

Contact our experts

Natalia & Agata

Uploaded file:

  • danieltroc.pdf

Allowed extensions: DOC, DOCX, PDF (max 5MB)

Thank you for filling out the form

We will look over your message and get back to you as soon as possible

Sorry, something went wrong and your message couldn’t be delivered

Please refresh the page and try again

File upload error

No file was uploaded

Incorrect file size (max 5MB)

Invalid file format

Empty file

Processing...

You might also like

Big Data

Apache Airflow – a modern way to orchestrate tasks

Duration:
1 day
Big Data

Apache Hadoop - data oriented system

Duration:
2 days
Big DataAWS

AWS - designing of Big Data Systems

Duration:
2 days
Big Data

Apache Kafka - Stream data processing

Duration:
2 days

ITIL® and PRINCE2® are registered trademarks of AXELOS Limited, used under permission of AXELOS Limited. All rights reserved.
AgilePM® is a registered trademark of Agile Business Consortium Limited.
All AgilePM® Courses are offered by Sii, an Affiliate of Quint Wellington Redwood, an Accredited Training Organization of The APM Group Ltd.
Lean IT® Association is a registered trademark of the Lean IT Association LLC. All rights reserved. Sii is an Affiliate of Accredited Training OrganizationQuint Wellington Redwood.
SIAM™ is a registered trademark of EXIN Holding B.V.
All prices presented on the website are net prices. 23% VAT should be added.

SUBMIT

Ta treść jest dostępna tylko w jednej wersji językowej.
Nastąpi przekierowanie do strony głównej.

Czy chcesz opuścić tę stronę?

Einige Inhalte sind nicht in deutscher Sprache verfügbar.
Sie werden auf die deutsche Homepage weitergeleitet.

Möchten Sie fortsetzen?