Data Engineer - Scala/Spark - Robosoft Technologies

Job Description

Roles and Responsibilities:

Develop and automate large-scale, high-performance data processing systems (batch and/or streaming) to drive Ola group business growth and enhance the product experience.
Advocate for high-quality software engineering practices in building scalable data infrastructure and pipelines.
Lead data engineering projects to ensure the reliability, efficiency, testability, and maintainability of pipelines.
Design data models for optimal storage, retrieval, and alignment with critical product and business requirements.
Architect logging practices to support data flow effectively and promote best practices where necessary.
Contribute to shared Data Engineering tooling and standards to enhance the productivity and quality of output for Data Engineers company-wide.
Collaborate with leadership, engineers, program managers, and data scientists to understand data requirements.
Educate partners by leveraging data and analytics experience to identify and address gaps in existing logging and processes.
Collaborate with stakeholders to establish data lineage, data governance, and data cataloging.
Lead projects using agile methodologies.
Communicate effectively with individuals at all levels within the organization.
Recruit, retain, and develop personnel to handle greater responsibilities and challenges.

Experience & Skills:

Over 10 years of relevant industry experience.
At least 6 years of experience managing teams of 10 or more members.
Extensive experience (7+ years) in custom ETL design, implementation, and maintenance.
5+ years of experience with workflow management engines like Airflow, Dagster, etc.
Proficiency in relational databases and SQL query authoring.
Preferably experienced with Java/Scala/Spark.
Hands-on experience with data at the petabyte scale.
Experience in designing, building, and operating robust distributed systems.
Proficient in designing and deploying high-performance systems with reliable monitoring and logging practices.
Ability to work across team boundaries to establish overarching data architecture and provide guidance to individual teams.
Expertise in Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure solutions such as Microsoft Azure or Google Cloud.
Experience in managing and deploying containerized environments using Docker, Mesos/Kubernetes is a plus.
Familiarity with managing projects using scrum methodology.

Tech Stack:

CloudEra Stack, Oozie, Hive, Spark, Flink, K8s, EMR, Presto, Pinot, Trino, IceBerg, File Formats: Parquet, Avro ORC format, DeltaLake, Airflow, Druid, Nifi, Hive/HiveQL, Clickhouse, Debezium (CDC)/DBT, Lakehouse, Spark Structured/Streaming, Flink, Apache Beam, Kafka, Hadoop/Hbase/HDFS, Hudi, File Formats (ORC, Parquet, Iceberg), Query engines (Presto, Hue, Trino), Stream Processing: Hudi vs Iceberg vs Delta-lake, Druid vs Apache Pinot vs Trino, Apache Nifi, Apache Flume, Fabric and Secor, Maxwell, K8S, LakeHouse.

Educational Qualifications: