Goal of the project:
The deliverable of this project is a model lifecycle management tool. Our data scientists need a tool that supports the creation, management and monitoring of data scientist models. The tool we will deliver thus contains a model factory that allows swift model creation. Besides that, the tool will also support model management and model monitoring.
Requirements specific for this project:
In depth programming skills in Python and eager to learn within this new environment is crucial.
We need someone who is flexible and can operate in a frequently changing environment that is still under construction. Short time to market, short interval releases, frequent demos at the end of each sprint require these individuals to have a strong sense of urgency working towards a future proof solution. They can capture feedback on a frequent basis and incorporate this into the design of technical solutions. Specific knowledge of Big Data tools and technologies like Hadoop, Spark and OO development is required. They understand how to apply these tools and technologies to solve big data problems and to develop innovative big data solutions.
1. Analysing, designing, developing, constructing and testing new and existing projects in Python
2. Developing and designing a framework with the focus on efficiency (starting from a prototype)
3. Incorporating enterprise standards into new software projects (CI/CD pipelines, unit and integration testing and versioning)
· Big Data Framework / Hadoop: Cluster, HDFS, Yarn and MapReduce
· Apache Spark
· Big Data Framework / Hadoop: Hive
3. Extra technical experience that can help:
o Data science & tools: H20, scikit-learn, Jupyter notebooks
o CI/CD tools: pybuilder, Anaconda, Jenkins, Nexus
4. Development methodologies:
· Agile: SAFe