← Retour aux offres


Postée le 28 nov.

Lieu : CEA Saclay · Contrat : Stage · Rémunération : À partir de 850 euros/mois €

Société : Commissariat à l'Energie Atomique (CEA)

Le Commissariat à l'Energie Atomique (CEA) est un acteur européen majeur en matière de recherche, de développement et d'innovation. Cet organisme de recherche technologique intervient dans quatre grands domaines : l'énergie, les technologies pour l'information, la santé et la défense. Situé en île de France sud sur le campus de Saclay, le Laboratoire d'Intégration des Systèmes et des Technologies (LIST) conçoit des systèmes numériques intelligents. Ses 750 collaborateurs accompagnent chaque année 200 entreprises françaises et étrangères sur des projets de recherche appliquée, dans quatre domaines : Manufacturing avancé, Systèmes embarqués, Data Intelligence et Maîtrise des rayonnements pour la santé. Les équipes du CEA LIST sont également partenaires de nombreux laboratoires universitaires, de grandes écoles et d'autres organismes de recherche au travers des projets de recherche collaborative. Elles font partie intégrante de DIGITEO LABS, campus scientifique réunissant plus de 1200 chercheurs spécialistes des technologies de l'information (CEA, INRIA, CNRS, Supélec, Ecole Polytechnique, Université d'Orsay). Au sein cet institut, le SID (Service Intelligence des Données) développe des solutions d’Intelligence Artificielle pour l’aide à la décision, orientées utilisateur et supervision automatique de systèmes complexes.

Description du poste

Nowadays, data streams are present in more and more applications and domains where dynamism and speed truly matters. In practice those streams represent dynamic data flows, coming from different sources, where their content evolves in time. Research has been done in the subject and many techniques for stream mining have emerged [5]. These algorithms usually sample the data stream in a certain way and deal with them incrementally or online. Despite the results provided by these new techniques, the flow of data is underutilized, which potentially leads to a loss of useful information on the one hand, and to forget what has been previously discovered on the other. Moreover the complexity of current digital applications, and those of the near future, is constantly increasing due to a combination of aspects such as the large number of sources, the non-linearity of certain processes, the distribution of knowledge and control, the time response, the strong dynamics of its environment or the unpredictability of interactions among others. The aforementioned complexity opens new research challenges about the generation and processing (learning) of these streams, especially in distributed, heterogeneous and collaborative environments. Existing ones lack, in general, the means for collaborating, negotiating, sharing, or validating data streams on such kind of heterogeneous environments. Multi-Agent Systems (MAS) have been demonstrated to be an appropriate technology for dealing with those issues. Their principles enable some of these features but however there is still work to do in order to comply them with the characteristics of data streams.

The aim of this internship is to use agents in data streams to deal with the previously mentioned challenges. Four main blocks are potentially identified: (i) manage non synchronised data streams from different sources, (ii) do research in distributed on-line learning algorithms, (iii) increase the robustness of online learning models that deal with such streams (make them reliable through unexpected environment changes) and (iv) generate new metrics to evaluate the needs mentioned before. This work will rely and improve the STREAMER framework already existing in the lab. STREAMER is a cutting-edge data stream processing (Complex Event Processing) platform devoted to analyzing sequential data for electrical or industrial systems. Such ongoing platform already counts with independent modules which provide their own functionality, such as a graphical interface, machine learning algorithms, communication utilities, etc.

This internship may be potentially extended to a 3 years PhD.

Profil recherché

We look for a candidate with:
• Engineering diploma/master 2 studies, preferable in computer science.
• Strong programming skills (Java, R, Python).
• English proficiency speaking.
• French and/or spanish is a plus.
• Knowledge in machine learning is a plus.
• Knowledge in multi-agent systems is a plus.
• Familiar with InfluxDB [1], Kafka [2], Redis [3] is a plus.

Voir le fichier joint