Data Engineer – Scala/Spark – Robosoft Technologies

Job Description

Roles and Responsibilities:

  • Develop and automate large-scale, high-performance data processing systems (batch and/or streaming) to drive Ola group business growth and enhance the product experience.
  • Advocate for high-quality software engineering practices in building scalable data infrastructure and pipelines.
  • Lead data engineering projects to ensure the reliability, efficiency, testability, and maintainability of pipelines.
  • Design data models for optimal storage, retrieval, and alignment with critical product and business requirements.
  • Architect logging practices to support data flow effectively and promote best practices where necessary.
  • Contribute to shared Data Engineering tooling and standards to enhance the productivity and quality of output for Data Engineers company-wide.
  • Collaborate with leadership, engineers, program managers, and data scientists to understand data requirements.
  • Educate partners by leveraging data and analytics experience to identify and address gaps in existing logging and processes.
  • Collaborate with stakeholders to establish data lineage, data governance, and data cataloging.
  • Lead projects using agile methodologies.
  • Communicate effectively with individuals at all levels within the organization.
  • Recruit, retain, and develop personnel to handle greater responsibilities and challenges.

Experience & Skills:

  • Over 10 years of relevant industry experience.
  • At least 6 years of experience managing teams of 10 or more members.
  • Extensive experience (7+ years) in custom ETL design, implementation, and maintenance.
  • 5+ years of experience with workflow management engines like Airflow, Dagster, etc.
  • Proficiency in relational databases and SQL query authoring.
  • Preferably experienced with Java/Scala/Spark.
  • Hands-on experience with data at the petabyte scale.
  • Experience in designing, building, and operating robust distributed systems.
  • Proficient in designing and deploying high-performance systems with reliable monitoring and logging practices.
  • Ability to work across team boundaries to establish overarching data architecture and provide guidance to individual teams.
  • Expertise in Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure solutions such as Microsoft Azure or Google Cloud.
  • Experience in managing and deploying containerized environments using Docker, Mesos/Kubernetes is a plus.
  • Familiarity with managing projects using scrum methodology.

Tech Stack:

CloudEra Stack, Oozie, Hive, Spark, Flink, K8s, EMR, Presto, Pinot, Trino, IceBerg, File Formats: Parquet, Avro ORC format, DeltaLake, Airflow, Druid, Nifi, Hive/HiveQL, Clickhouse, Debezium (CDC)/DBT, Lakehouse, Spark Structured/Streaming, Flink, Apache Beam, Kafka, Hadoop/Hbase/HDFS, Hudi, File Formats (ORC, Parquet, Iceberg), Query engines (Presto, Hue, Trino), Stream Processing: Hudi vs Iceberg vs Delta-lake, Druid vs Apache Pinot vs Trino, Apache Nifi, Apache Flume, Fabric and Secor, Maxwell, K8S, LakeHouse.

Educational Qualifications:

  • Bachelor’s or Master’s degree in Engineering or related technical discipline (from premier institutes preferred).

More Information

Apply for this job
Share this job