Job Description
Job Spec’s.
In this role, you will be responsible for the availability, latency and performance of our customer’s platforms.
You will work proactively to identify areas for improvement, implementing automation, DevOps tooling and observability to reduce the impact of failure, provide scale and improved cost efficiency.
You will triage monitoring events and service desk requests, following a well defined incident response process and provide guidance and expertise to customers in a clear, calm and concise manner.
Finally, you will work under the guidance and mentoring of our Solutions Architects to undertake cloud infrastructure and transformation projects in-line with Well-Architected principals.
Key tasks
- Continually evaluate customer platforms and identify areas for improvement.
- Build reliable and sustainable observability into our customer’s environments.
- Provide expertise and guidance to customers through service desk and phone.
- Triage problems through an established incident response process.
- Implement corrective and preventative measures in response to incidents.
- Act as a mentor to new team members.
- On-call duties.
Technologies
AWS, Lambda, Kubernetes, ECS, Terraform, Packer, Helm, Jenkins, Puppet, Grafana, Prometheus, ElasticSearch, Aurora, Kinesis, DynamoDB, ElastiCache, KMS, SSM, IAM.
Key skills / experience
- Solid Linux experience.
- Excellent communication skills.
- Ability to work on several work streams with excellent time management.
- A positive, constructive approach with an emphasis on collaboration.
- Enthusiastic and eager to learn new skills, technologies and gain industry recognised certifications.