Introduction to MESOS
Apache Mesos is an open source cluster management project
designed to set up and optimize distributed systems. Mesos allows the
management and sharing of resources in a fine and dynamic way between different
nodes and for various applications. This article covers the architecture of
Mesos, its fundamentals and its support for NVIDIA GPUs.
Architecture of Mesos
Mesos consists of several elements:
Master daemon:
runs on master nodes and controls “slave daemons”.
Slave daemon:
runs on slave nodes and allows tasks to be launched.
Framework: better
known as “Mesos”, it is made up of:
a scheduler which asks the master for available resources
one or more executors that launch applications on the
workstations.
Offer: lists the
available resources “CPU and memory”.
Task: run on
slave nodes, it can be any type of application (bash, Query SQL, Hadoop job
...).
Zookeeper: allows
coordinating masters nodes
High availability
In order to avoid a SPOF (Single Point of Failure), several
masters, a master master (leader) and backup masters must be used. Zookeeper
replicates the master at N node to form a Zookeeper quorum. It is he who
coordinates the election of the master master. At least 3 masters are required
for high availability.
Marathon
Marathon is a container orchestrator for Mesos that allows
you to launch applications. It is equipped with a REST API to start and stop
applications.
Chronos
Chronos is a framework for Mesos developed by Airbnb to
replace standard crontab. It is a complete, distributed, fault tolerant
scheduler that facilitates the orchestration of tasks. Chronos has a REST API
for creating planning tasks from a web interface.
Principle of operation
This diagram explains to us how a task is launched and
orchestrated:
Agent 1 informs the master master of the resources available
on the slave node with which it is associated. The master can then edit an
investment strategy, it offers all the resources available to framework 1.
The master informs framework 1 of the resources available
for agent 1.
The orchestrator responds to the master "I will perform
two tasks on agent 1" depending on the resources available.
The master sends the two tasks to the agent who will
allocate the resources to the two new tasks.
Containerizer
Containerizer is a Mesos component that launches containers,
it is responsible for isolating and managing container resources.
Creation and launch of a containerizer:
The agent creates a containerizer with the --containerizer
option
To run a containerizer, you must specify the type of
executor (mesos, docker, composing) otherwise it will use the default. You can
find out the default executor using the TaskInfo command
mesos-executor -> default executor
mesos-docker-executor -> Docker executor
Types of containers:
Mesos supports different types of containers:
Composing: implementation of docker-compose
Docker containerizer: manages containers using the
Docker-engine.
Mesos containerizer are the native containers of Mesos
NVIDIA and Mesos GPUs
Using GPU with Mesos is not a big problem. The agents must
first be configured so that they take GPUs into account when they inform the
master of the resources available. It is obviously necessary to configure the
masters so that they too can inform frameworks of the available resources
offered.
Launching tasks is performed in the same way by adding a GPU
resource type. However, unlike processors, memory and disks, only whole numbers
of GPUs can be selected. If a fractional quantity is chosen, launching the task
will cause a TASK_ERROR type error.
For the moment, only Mesos containerizers are capable of
launching tasks with Nvidia GPUs. Normally this does not bring any limitations
because Mesos containerizer natively supports Docker images.
In addition, Mesos incorporates the operating principle of
the "nvidia-docker" image exposing the CUDA Toolkit to developers and
Data Scientists. This allows to directly mount the drivers and tools necessary
for GPUs in the container. We can therefore locally build our container and
deploy it easily with Mesos.
Conclusion
Mesos is a solution that allows companies to deploy and
manage Docker containers, while sharing the available resources of their
infrastructures. In addition, thanks to the Mesos containerizer, we can perform
deep learning in a distributed way or share GPU resources between several
users.
No comments:
Post a Comment