← Retour aux offres

IME Internship-Burst Buffer for Hadoop based Big Data and AI applications

Postée le 06 fév.

Lieu : Meudon la Foret Paris  · Contrat : Stage · Rémunération : 1200 Euro pcm €

Société : DDN Storage

DDN Storage is the world leader in high performance and massively scalable data management and storage solutions that accelerate business results and scientific insights for data-centric organizations worldwide. Our unified, end-to-end platform niquely addresses the tiered storage and large scale data management demands of mixed workloads, multiple collaborative data centres and Web and cloud environments. Across traditional and commercial high performance markets, customers rely on DDN Storage to solve the most demanding big data problems in industries such as cloud, online content and social networking, security and intelligence, life sciences, finance, energy and media production.

DDN’s Infinite Memory Engine (IME) is a scale-out, software-defined, flash storage platform that streamlines the data path for application I/O. IME interfaces directly to applications and secures I/O via a data path that eliminates file system bottlenecks.

Description du poste


DDN provides world’s leading solutions for storage systems. As part of its product line, DDN develops Infinite Memory Engine (IME), a new generation of Burst Buffers that allows for a low latency, high bandwidth and highly scalable parallel file system [1].

Whilst the adoption IME for the High Performance Computing (HPC) workloads is clearly established and IME now powers the largest supercomputers [2], there is a strong and growing demand for Big Data and Artificial Intelligence (AI) workloads.

At DDN, we believe that these relevant Big Data and AI workloads may benefit from the data acceleration of IME.

The internship will start with a learning period on the burst buffer technology and the IME product. With the help of DDN team, the intern will then set up an Apache Hadoop cluster with IME as a replacement for Hadoop Distributed File System (HDFS) as a backend file-system.

Once the cluster is ready, the intern will run a variety of Big Data and AI applications to characterize their performance with IME compared to HDFS.

Finally, the intern will study the performance gain of developing an IME plugin to Hadoop for an even faster access to the storage system. If this study is convincing, the student will develop this plugin and push it upstream to the Hadoop community.

Internship’s goals are the following:

1/ Learn how to deploy a Hadoop cluster and how to use IME;

2/ Performance study of Big Data and AI workloads using reference benchmarks and applications;

3/ Performance comparison and characterization between HDFS and IME as backend file-systems of Hadoop;

4/ Study the feasibility of an IME plugin to the Apache Hadoop distribution, develop it and push it upstream [3].

Finally (and optionally, depending on the time we have), the internship will address the two additional points:

1/ Study the performance of IME in virtual (QEMU) or containerized (Docker) environments.

2/ Develop a native plugin for QEMU and Docker to directly interact with the storage file system and avoid the extra overhead of the POSIX interface.

DDN R&D team will be providing support all along the way. The intern will also have a total access to all the hardware resources available to successfully accomplish the internship.

[1] Next Platform, “Memory-Like Storage Means File Systems Must Change”:


[2] Virtual Institute for I/O: “IO-500 List – November 2018”: https://www.vi4io.org/io500/list/18-11/full

[3] Apache Hadoop Github Repository: https://github.com/apache/hadoop

Diplomas / Training

Master level in Computer Science

Required skills

- Basic knowledge of Hadoop

- C and/or java development

- UNIX / Linux environment

- Scripting language: Shell or Python

Profil recherché

as above

Pour postuler :

Applications can be made directly to bbrown@ddn.com
Please provide an English version of your CV