Job Description
Roles and Responsibilities:
- Develop and automate large-scale, high-performance data processing systems (batch and/or streaming) to drive Ola group business growth and enhance the product experience.
- Advocate for high-quality software engineering practices in building scalable data infrastructure and pipelines.
- Lead data engineering projects to ensure the reliability, efficiency, testability, and maintainability of pipelines.
- Design data models for optimal storage, retrieval, and alignment with critical product and business requirements.
- Architect logging practices to support data flow effectively and promote best practices where necessary.
- Contribute to shared Data Engineering tooling and standards to enhance the productivity and quality of output for Data Engineers company-wide.
- Collaborate with leadership, engineers, program managers, and data scientists to understand data requirements.
- Educate partners by leveraging data and analytics experience to identify and address gaps in existing logging and processes.
- Collaborate with stakeholders to establish data lineage, data governance, and data cataloging.
- Lead projects using agile methodologies.
- Communicate effectively with individuals at all levels within the organization.
- Recruit, retain, and develop personnel to handle greater responsibilities and challenges.
Experience & Skills:
- Over 10 years of relevant industry experience.
- At least 6 years of experience managing teams of 10 or more members.
- Extensive experience (7+ years) in custom ETL design, implementation, and maintenance.
- 5+ years of experience with workflow management engines like Airflow, Dagster, etc.
- Proficiency in relational databases and SQL query authoring.
- Preferably experienced with Java/Scala/Spark.
- Hands-on experience with data at the petabyte scale.
- Experience in designing, building, and operating robust distributed systems.
- Proficient in designing and deploying high-performance systems with reliable monitoring and logging practices.
- Ability to work across team boundaries to establish overarching data architecture and provide guidance to individual teams.
- Expertise in Amazon Web Services (AWS) and/or other relevant Cloud Infrastructure solutions such as Microsoft Azure or Google Cloud.
- Experience in managing and deploying containerized environments using Docker, Mesos/Kubernetes is a plus.
- Familiarity with managing projects using scrum methodology.
Tech Stack:
CloudEra Stack, Oozie, Hive, Spark, Flink, K8s, EMR, Presto, Pinot, Trino, IceBerg, File Formats: Parquet, Avro ORC format, DeltaLake, Airflow, Druid, Nifi, Hive/HiveQL, Clickhouse, Debezium (CDC)/DBT, Lakehouse, Spark Structured/Streaming, Flink, Apache Beam, Kafka, Hadoop/Hbase/HDFS, Hudi, File Formats (ORC, Parquet, Iceberg), Query engines (Presto, Hue, Trino), Stream Processing: Hudi vs Iceberg vs Delta-lake, Druid vs Apache Pinot vs Trino, Apache Nifi, Apache Flume, Fabric and Secor, Maxwell, K8S, LakeHouse.
Educational Qualifications:
- Bachelor’s or Master’s degree in Engineering or related technical discipline (from premier institutes preferred).
More Information
- Experience 10+ Years