Apply now »

Lead Site Reliability Engineer (AWS, Cloud)

Lead Site Reliability Engineer (AWS, Cloud)

Custom Field 1:  DevOps
Custom Field 3:  DevOps
Country/Region:  VN
Date:  Sep 16, 2025
Location: 

Ho Chi Minh City, VN, 700000 Hanoi, VN, 10000

Working place:  Hybrid

Role Summary 

We are seeking a highly skilled and motivated Lead Site Reliability Engineer (SRE) with strong AWS expertise to lead our Service Operations team. You will be responsible for driving SRE practices, ensuring the scalability, reliability, and performance of mission-critical systems for our digital banking clients. This role requires balancing technical depth with leadership capability — setting direction, mentoring engineers, and ensuring service reliability at scale across multiple teams and clients. 

 

Sign-on Bonus: Eligible for candidates who are currently employed elsewhere and able to join GFT within 30 days of offer acceptance.

 

Key Responsibilities 

  • Leadership & Mentorship: Lead a team of SREs, providing technical guidance, coaching, and fostering a culture of reliability and continuous improvement. 

  • SRE Practices: Define and mature SRE practices, including SLIs/SLOs, error budgets, and incident response processes across production systems. 

  • Architecture & Automation: Own the design and evolution of automated cloud operations, driving adoption of Infrastructure-as-Code (Terraform, CloudFormation) and CI/CD pipelines. 

  • Incident Management: Lead major incident responses, ensuring rapid resolution, root cause analysis, and implementation of preventive measures. 

  • Collaboration: Work closely with Development, DevOps, and Cloud Engineering teams to ensure reliability and resilience are built into every stage of delivery. 

  • Operational Excellence: Establish and track key reliability metrics (availability, latency, error rates) and drive initiatives to continuously improve them. 

  • Innovation & Tooling: Evaluate and implement AWS-native and third-party tools to improve monitoring, alerting, and automation. 

  • Stakeholder Engagement: Act as the primary contact point for Service Reliability topics with clients, ensuring transparency and alignment on reliability goals. 

  • Governance: Ensure compliance with industry standards and internal policies around security, audit, and operational risk. 

 

Required Education & Experience 

  • Experience: 7–10 years in SRE/DevOps/Cloud Engineering, with at least 2–3 years in a lead or managerial capacity. 

  • Cloud Expertise: Deep hands-on experience with AWS services (EC2, ECS/EKS, S3, RDS, IAM, VPC, CloudWatch). 

  • Infrastructure as Code: Strong experience with Terraform, CloudFormation, and automated deployment pipelines (Harness, GitLab, Jenkins). 

  • Containerization & Orchestration: Expertise in Kubernetes and container-based workloads in production. 

  • Monitoring & Observability: Proficiency with monitoring, logging, and alerting tools (CloudWatch, Prometheus, Grafana, ELK). 

  • Incident Leadership: Proven ability to lead high-pressure incident response and post-mortem processes. 

  • Problem-Solving & Risk Management: Strong analytical skills with the ability to anticipate, assess, and mitigate technical risks. 

  • Collaboration & Communication: Excellent stakeholder management skills; fluent English required, with good communication in Vietnamese for local collaboration. 

Nice-to-Have Skills 

  • Certifications such as AWS Certified DevOps Engineer – Professional or AWS Solutions Architect – Professional. 

  • Experience in financial services or other highly regulated industries. 

  • Knowledge of advanced security practices and compliance frameworks (PCI-DSS, ISO 27001, SOC2). 

  • Multi-region/multi-AZ architecture design for high availability and disaster recovery. 

What We Offer You 

  • Competitive salary and benefits package. 

  • 13th-month salary guarantee. 

  • Performance bonus. 

  • Professional English courses. 

  • Premium health insurance. 

  • Extensive annual leave and flexible working arrangements. 

  • Opportunity to shape the SRE function and drive reliability practices for leading digital banking clients. 

Due to the high volume of applications we receive, we are unable to respond to every candidate individually. If you have not received a response from GFT regarding your application within 10 workdays, please consider that we have decided to proceed with other candidates. We truly appreciate your interest in GFT and thank you for your understanding.

About Us

We show commitment to our investors and stand for solid, long-term growth performance. Founded in Germany in 1987 and in American territory since 2008, GFT expanded globally to over 10,000 experts. And to more than 15 markets to ensure proximity to clients. With new opportunities from Asia to Brazil, the international growth story continues. We are committed to grow tech talents worldwide. Because our team’s strong consulting and development skills across legacy and pioneering technologies, like GreenCoding, underpin success. We maintain a family atmosphere in an inclusive work environment.

There is room for your talent!

Put your talent to work. At GFT, you'll be working with some of the brightest people in business and technology on challenging and rewarding projects in, a team of like-minded individuals.
Feel it. We are #one team collaboratively working towards the same goal.

Not Ready To Apply?

Stay connected! Enter your e-mail and we will keep you informed about upcoming events and opportunities that match your interests.

Apply now »