Send your request
Send your request
Send your request
Course objectives

Introduction to data processing on the Apache Hadoop – Cloudera platform using Apache Hive, Impala, Sqoop and HBase database frameworks.

Benefits

Participants will learn and understand:

  • A practical approach to data processing on the Apache Hadoop platform
  • Apache Hadoop platform architecture
  • Working environment
  • Work with data
  • Real use cases

Participants will be able to:

  • Create data processing tasks
  • Create data-oriented systems
Training scope
  • Big Data in data engineering
  • Historical outline – basic concepts
    • Apache Hadoop – theoretical overview
    • Architectural concepts used in on-premise solutions
    • Key players and technologies
  • Apache Hive
    • Definitions, operations and aggregation of data
    • Performance and optimization
  •  Apache Impala
    • Key elements
    • Query language
    • Efficiency
    • Integration with other frameworks
  •  Apache HBase
    • NoSQL
    • Schema definition
    • Data Types
    • Operations and scans
    • Filters and counters
    • Designing keys
    • Framework clients
  •  Apache Sqoop
    • Importing and exporting data
    • Integration with the Hadoop ecosystem
  • Project / exercises
    • Building a comprehensive solution to integrate the above platforms
Audience

People involved in data processing in databases, data warehouses or BI. Engineers who work with different data sources and data formats on a daily basis

Course objectives

Introduction to data processing on the Apache Hadoop – Cloudera platform using Apache Hive, Impala, Sqoop and HBase database frameworks.

Benefits

Participants will learn and understand:

  • A practical approach to data processing on the Apache Hadoop platform
  • Apache Hadoop platform architecture
  • Working environment
  • Work with data
  • Real use cases

Participants will be able to:

  • Create data processing tasks
  • Create data-oriented systems
Training scope
  • Big Data in data engineering
  • Historical outline – basic concepts
    • Apache Hadoop – theoretical overview
    • Architectural concepts used in on-premise solutions
    • Key players and technologies
  • Apache Hive
    • Definitions, operations and aggregation of data
    • Performance and optimization
  •  Apache Impala
    • Key elements
    • Query language
    • Efficiency
    • Integration with other frameworks
  •  Apache HBase
    • NoSQL
    • Schema definition
    • Data Types
    • Operations and scans
    • Filters and counters
    • Designing keys
    • Framework clients
  •  Apache Sqoop
    • Importing and exporting data
    • Integration with the Hadoop ecosystem
  • Project / exercises
    • Building a comprehensive solution to integrate the above platforms
Audience

People involved in data processing in databases, data warehouses or BI. Engineers who work with different data sources and data formats on a daily basis

The number of participants: 8-15 people

Duration: 2 days

Available language: PL / EN

Available course material: PL / EN

Course form
Presentation, workshop, exercises, discussion

Download materials

Download

Thank You!

Sorry, something went wrong.

Please refresh the page and try again.

File upload error

No file was uploaded

Incorrect file size (max 5MB)

Invalid file format

Empty file

Processing...

Download materials

Download

Contact our experts with questions about any of our trainings

Natalia & Agata

Training Practice Team

Contact our experts

Natalia & Agata

Uploaded file:

  • danieltroc.pdf

Allowed extensions: DOC, DOCX, PDF (max 5MB)

Thank you for filling out the form

We will look over your message and get back to you as soon as possible

Sorry, something went wrong and your message couldn’t be delivered

Please refresh the page and try again

File upload error

No file was uploaded

Incorrect file size (max 5MB)

Invalid file format

Empty file

Processing...

You might also like

Big DataAWS

AWS - designing of Big Data Systems

Duration:
2 days
Big Data

Apache Kafka - Stream data processing

Duration:
2 days
Big Data

Apache Spark - building systems for real-time data processing

Duration:
2 days
Big Data

Apache Airflow – a modern way to orchestrate tasks

Duration:
1 day

ITIL® and PRINCE2® are registered trademarks of AXELOS Limited, used under permission of AXELOS Limited. All rights reserved.
AgilePM® is a registered trademark of Agile Business Consortium Limited.
All AgilePM® Courses are offered by Sii, an Affiliate of Quint Wellington Redwood, an Accredited Training Organization of The APM Group Ltd.
Lean IT® Association is a registered trademark of the Lean IT Association LLC. All rights reserved. Sii is an Affiliate of Accredited Training OrganizationQuint Wellington Redwood.
SIAM™ is a registered trademark of EXIN Holding B.V.
All prices presented on the website are net prices. 23% VAT should be added.

SUBMIT

Ta treść jest dostępna tylko w jednej wersji językowej.
Nastąpi przekierowanie do strony głównej.

Czy chcesz opuścić tę stronę?

Einige Inhalte sind nicht in deutscher Sprache verfügbar.
Sie werden auf die deutsche Homepage weitergeleitet.

Möchten Sie fortsetzen?