- Work on database reliability and performance aspects for all of grofers’ products
- Analyze solutions and implement best practices for our main PostgreSQL database cluster and its components.
- Build tools for observability and monitoring of our database to lower the impact of production incidents
- Work with peer engineers to roll out changes to our production environment and help mitigate database-related production incidents.
- Provide oncall support to the team. Support and debug database production issues across services and levels of the stack.
- Provide database expertise to engineering teams (for example through reviews of database migrations, queries and performance optimizations). Scale database engineering as a practice in other engineering teams.
- Work on automation of database infrastructure and help engineering succeed by providing self-service tools.
- Plan the growth of grofers’ database infrastructure by evaluating novel SQL as well as NoSQL solutions specific to varying business needs.
- Make monitoring and alerting alert on symptoms and not on outages.
- Drive DevOps culture in the tech organization by working with engineering and product teams.
- Own our database ecosystem. Take charge of planning the roadmap for improving usage of databases and work with all the teams to continuously improve this ecosystem.
- 6-10 years of software engineering experience.
- At least 2 years of Infrastructure development and operations experience, particularly with Postgres.
- Experience in maintaining internet facing production-grade applications in cloud environments.
- Have solid understanding of SQL and PL/pgSQL
- Have solid understanding of the internals of PostgreSQL
- Strong data modeling and data structure design skills
- Have some backend experience with any modern programming language (such as Python, Ruby, Golang, Java, etc.), web development framework (such as Rails, Django, Flask, Spring, etc.). It is important to us that you have some experience of building applications.
- Experience in solving problems and working with a team to resolve large-scale production issues.
- Experience in Unix and/or Linux system administration.
- Experience with Infrastructure-as-Code and configuration management, deployment and orchestration technologies (such as Terraform, Ansible, Puppet, Chef, Docker). We are big on Terraform and Ansible.
- Experience with cloud platforms such as AWS, Azure or GCP. We use AWS.
- Experience with setting up data pipelines, managing ingestion of batch / real-time data flow, configuring databases for analytical workloads.
- Experience of setting up reliable databases, disaster recovery procedures, RTO/RPO objectives.
- Proficiency with Git or a similar version control system.
- Experience with a distributed datastore (such as RabbitMQ, Kafka, Redis, Elasticsearch, Cassandra, etc.)
- Experience of working with data lake and data warehouses
- Experience with containers and container orchestration systems (such as Kubernetes, Docker Swarm etc) and cloud-native technologies (such as Helm, Skaffold, Draft, Telepresence, Jenkins X, etc.)
- Have contributed to opensource (however basic that might be).