Lieu : · Contrat : Stage · Rémunération : A négocier
What You’ll do:
You will use apply statistics concepts like confidence intervals, point estimates and sample size to make sound and confident inferences on data and A/B tests.
You will use and manipulate large data sets to design algorithmic and machine learning solutions as well as provide business insights.
Apply solid coding skills, strong analytical and innovative thinking, and Machine Learning expertise to quickly learn new domains and turn innovative ideas into working solutions
You will also communicate complex analytical topics in a clean & simple way to multiple partners and senior leadership (both internal & external).
Within Data Science, we have three different tracks roles with multiple openings in each of the following areas.
Machine Learning Scientist Intern: Role typically includes building feature pipelines, prototyping new machine learning models and evaluating performance of the algorithms both offline and via A/B tests.
Machine Learning Engineer Intern: Ideal candidate enjoys building ML systems and cares about software engineering principles like CD/CI and code stability, while productionizing machine learning models.
Statistician Intern: Candidates typically have an Operations Research or Statistics background and care about defining measurements for algorithm performance in the wild and help define and implement core e-commerce concepts like customer life-time value, marketing attribution etc.
Who You are:
You are currently pursuing a master’s or PhD degree in quantitative fields such as: Computer Science (with focus in areas like Artificial Intelligence, Machine Learning, Natural Language Processing, Data Mining, Data Science), Mathematics, Statistics, Operations Research, Electrical & Computer Engineering
Graduating in 2021
You have proven theoretical understanding various machine learning topics like Regression, Naïve Bayes, Decision Trees, Random Forests, SVMs, Neural Networks.
We would like to see experience with statistical computing environments such as R, scikit-learn, SparkML, Python (pandas) etc.
You should have strong knowledge and experience in one or more database technologies, including SQL and other relational databases, no-SQL, and Time Series databases
You have an understanding of distributed file systems, scalable datastores, distributed computing and related technologies (Spark, Hadoop, etc.); implementation experience of MapReduce techniques, in-memory data processing, etc.
You have familiarity with cloud computing, AWS specifically, in a distributed computing context.
Must-have: Scala and/or Python, SQL
Nice-to-have: Java, R, C++
Data Science Technologies:
A few of these: Spark/PySpark, MLlib, TensorFlow, Keras, PyTorch, Caffe, Python ML libs (Pandas, Matplotlib, Scipy, Sklearn, Numpy etc)
Nice-to-have: Hive, Hadoop, Microsoft SQL Server