Site Reliability Engineer (AWS) - #2098896
Spectrum IT Recruitment
- Fully Remote (UK)
- 24/7 Shift Pattern (28-day rota including days & nights)
- £ Competitive + Bonus + Excellent Benefits
We're recruiting Site Reliability Engineers to join a global leader in AI-powered customer experience and cloud technology. Following the award of a major government programme, they're expanding their engineering teams to build and support highly secure, cloud-native platforms that deliver sensitive communication services.
This is an opportunity to join an organisation investing heavily in modern cloud engineering, automation and reliability. Working as part of a collaborative SRE team, you'll help ensure large-scale production environments remain secure, available and resilient, whilst continuously improving the way they're operated through automation and engineering best practice.
If you enjoy solving production challenges, improving reliability and automating away operational toil, we'd love to hear from you.
What you'll be doing- Monitoring and maintaining highly available production platforms running in AWS
- Responding to and managing production incidents across a 24/7 service
- Investigating complex technical issues and restoring services quickly and effectively
- Developing automation to reduce manual operational tasks and improve platform resilience
- Building and improving monitoring, alerting and observability across cloud environments
- Working alongside Software, Platform, Cloud and Security Engineers to improve reliability and operational excellence
- Contributing to post-incident reviews and driving continuous service improvements
- Supporting containerised workloads using Kubernetes and Docker
You'll ideally have experience in a Site Reliability Engineering, Production Engineering, Cloud Operations or NOC environment with exposure to:
- Linux systems administration
- AWS cloud infrastructure
- Kubernetes and Docker
- Production support and incident management
- Python, Bash or Go scripting
- Monitoring and observability platforms such as Grafana, Prometheus, Datadog, Splunk or CloudWatch
- Networking fundamentals including DNS, TCP/IP and load balancing
- A passion for automation, continuous improvement and operational excellence
Experience with Infrastructure as Code (Terraform), SRE principles (SLIs, SLOs), or regulated environments would be beneficial but isn't essential.
Why join?This is far more than a traditional NOC role.
You'll be joining an engineering-led organisation where reliability, automation and continuous improvement sit at the heart of the platform. Rather than simply responding to incidents, you'll work to prevent them by improving systems, automating operational processes and helping shape the future of highly resilient cloud services.
If you're passionate about building reliable cloud platforms and enjoy solving complex technical problems in large-scale production environments, we'd love to hear from you.
Apply today or contact Dave Carlisle at Spectrum IT Recruitment for a confidential discussion.
Spectrum IT Recruitment (South) Limited is acting as an Employment Agency in relation to this vacancy.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Senior Building Surveyor
Account Manager
Civil Litigation Solicitor