Sunday, 26 April 2020

Introduction to MESOS


Introduction to MESOS

Apache Mesos is an open source cluster management project designed to set up and optimize distributed systems. Mesos allows the management and sharing of resources in a fine and dynamic way between different nodes and for various applications. This article covers the architecture of Mesos, its fundamentals and its support for NVIDIA GPUs.
Architecture of Mesos
Mesos consists of several elements:

Master daemon: runs on master nodes and controls “slave daemons”.
Slave daemon: runs on slave nodes and allows tasks to be launched.
Framework: better known as “Mesos”, it is made up of:

a scheduler which asks the master for available resources
one or more executors that launch applications on the workstations.
Offer: lists the available resources “CPU and memory”.
Task: run on slave nodes, it can be any type of application (bash, Query SQL, Hadoop job ...).
Zookeeper: allows coordinating masters nodes
High availability
In order to avoid a SPOF (Single Point of Failure), several masters, a master master (leader) and backup masters must be used. Zookeeper replicates the master at N node to form a Zookeeper quorum. It is he who coordinates the election of the master master. At least 3 masters are required for high availability.


Marathon
Marathon is a container orchestrator for Mesos that allows you to launch applications. It is equipped with a REST API to start and stop applications.

Chronos
Chronos is a framework for Mesos developed by Airbnb to replace standard crontab. It is a complete, distributed, fault tolerant scheduler that facilitates the orchestration of tasks. Chronos has a REST API for creating planning tasks from a web interface.

Principle of operation



This diagram explains to us how a task is launched and orchestrated:

Agent 1 informs the master master of the resources available on the slave node with which it is associated. The master can then edit an investment strategy, it offers all the resources available to framework 1.
The master informs framework 1 of the resources available for agent 1.
The orchestrator responds to the master "I will perform two tasks on agent 1" depending on the resources available.
The master sends the two tasks to the agent who will allocate the resources to the two new tasks.
Containerizer
Containerizer is a Mesos component that launches containers, it is responsible for isolating and managing container resources.

Creation and launch of a containerizer:

The agent creates a containerizer with the --containerizer option
To run a containerizer, you must specify the type of executor (mesos, docker, composing) otherwise it will use the default. You can find out the default executor using the TaskInfo command

mesos-executor -> default executor
mesos-docker-executor -> Docker executor
Types of containers:

Mesos supports different types of containers:

Composing: implementation of docker-compose
Docker containerizer: manages containers using the Docker-engine.
Mesos containerizer are the native containers of Mesos
NVIDIA and Mesos GPUs
Using GPU with Mesos is not a big problem. The agents must first be configured so that they take GPUs into account when they inform the master of the resources available. It is obviously necessary to configure the masters so that they too can inform frameworks of the available resources offered.

Launching tasks is performed in the same way by adding a GPU resource type. However, unlike processors, memory and disks, only whole numbers of GPUs can be selected. If a fractional quantity is chosen, launching the task will cause a TASK_ERROR type error.

For the moment, only Mesos containerizers are capable of launching tasks with Nvidia GPUs. Normally this does not bring any limitations because Mesos containerizer natively supports Docker images.

In addition, Mesos incorporates the operating principle of the "nvidia-docker" image exposing the CUDA Toolkit to developers and Data Scientists. This allows to directly mount the drivers and tools necessary for GPUs in the container. We can therefore locally build our container and deploy it easily with Mesos.

Conclusion
Mesos is a solution that allows companies to deploy and manage Docker containers, while sharing the available resources of their infrastructures. In addition, thanks to the Mesos containerizer, we can perform deep learning in a distributed way or share GPU resources between several users.


No comments:

Post a Comment