About the role:
Responsible for managing and engineering the “data” captured or generated for the training of “AI” modules. This includes the delivery and management of “AI” models in test and production. Data management includes capturing, transporting, storing and indexing of data to facilitate data access for “AI” training. Data engineering includes applying various algorithms and data engineering techniques to prepare, transform and handle the data to be delivered to “AI” research teams and training engineers or systems. “AI” model delivery and management includes designing and implementing the infrastructure for storing, versioning, deployment and serving of “AI” models.
What we need your help with:
- Create and maintain optimal data pipeline architecture;
- Ensure optimal data structuring for fast data retrieval;
- Coordinate (with the relevant departments/teams) the design and implementation of computational and communication infrastructure for data capturing, transport, storage and indexing;
- Coordinate (with the relevant departments/teams) the design and mplementation of “AI” training infrastructure, model storage, versioning and delivery;
- Coordinate with the relevant departments to ensure data security and integrity standards are met;
- Ensure data is handled according to the company policies, security standards and ethics;
- Design and coordinate (with the relevant departments/teams) the implementation of software infrastructure required for optimal extraction, transformation (data engineering), and loading of data from a wide variety of data sources;
- Design and implement internal process improvements and automations, optimizing data delivery, re-designing infrastructure for greater scalability;
- Design and coordinate (with the relevant departments/teams) the tools for data annotation and labelling;
- Assemble large, complex data sets that meet functional and non-functional business and technical requirements;
- Coordinate with the relevant stakeholders to improve the data sets through augmentation and synthetic data generation;
- Coordinate with the relevant stakeholders to get access to data and communication and storage infrastructure;
- Ensure structure and standardisation in the process of engineering data;
- Manage and coordinate the activity of a team of data engineers;
- Mentor and coach other team members in the “art” of data engineering;
- Provide administrative support as needed;
- Perform other duties as assigned.
Technical skills:
- Database technologies: SQL-based (e.g. PostgreSQL and MySQL); NoSQL (e.g. Cassandra and MongoDB);
- Programming/Scripting languages: Python, C/C++, JavaScript, Linux Bash;
- Linux OS;
- Machine learning;
- Big data tools (e.g. Hadoop, Spark, Kafka, Elasticsearch);
- Data mining and data modelling tools;
- Statistical analysis and modelling;
- Data lake, data warehousing and data mart solutions;
- Data pipeline and workflow management tools;
- Stream-processing systems.
Of course, everyone’s career has a different path and you may not have all of the skills and/or experience listed below.
However, if you feel you might fit the job, feel free to prove your point and contact us.
If you're in any doubt whether to apply and you have many unanswered questions (after all, an enquiring mind is what we're looking for) please e-mail Diana at: diana.zimbru@adrya.ro and she'll get back ASAP with a response to your enquiry.