Apply now »

Site Reliability Engineer

Site Reliability Engineer

Custom Field 1:  DevOps
Custom Field 3:  DevOps
Country/Region:  VN
Date:  Aug 22, 2024
Location: 

Hanoi, VN, 10000 Ho Chi Minh City, VN, 700000

Working place:  Hybrid

What do we do?

As a pioneer for digital transformation GFT develops sustainable solutions across new technologies – from cloud engineering and artificial intelligence to blockchain/DLT. With its deep technological expertise, strong partnerships, and comprehensive market know-how GFT offers advice to the financial and insurance sectors, as well as in the manufacturing industry. Through the intelligent use of IT solutions GFT increases productivity and creates added value for clients. Companies gain easy and safe access to scalable IT-applications and innovative business models.

 

Who are we? 

Having started in Germany in 1987, GFT Technologies has grown to become a trusted Software Engineering and Consulting specialist for the international financial industry, counting many of the world’s largest and best-known Banks as our clients. We are an organization that empowers you to not only explore but raise your potential and seek out opportunities that add value. At GFT, diversity, equality, and inclusion are at the core of who we are. Ensuring a diverse and inclusive working environment for all communities is one of the main pillars of our diversity strategy, based on our core values and culture. We have been certified for 2022/23 as a ‘Great place to work’ in the APAC region. So, if you want to have the opportunity to work with an outstanding and progressive organization this position could be right for you.

 

Role Summary

As a Site Reliability Engineer (SRE) you will play a critical role in ensuring the reliability, availability, and performance of our systems and services. You will work closely with development and operations teams to build and maintain observable, scalable, reliable infrastructure on AWS, utilizing Kubernetes for orchestration and Python for automation. Proficiency in resilience testing and capacity management is also essential for this role.
 
Key Responsibilities

  • Develop and maintain automation scripts and tools using Python to improve operational efficiency.
  • Conduct resilience testing to ensure system reliability under varying conditions.
  • Perform capacity planning and management to ensure systems can handle growth and peak demand.
  • Monitor system performance and reliability, and respond to incidents to minimize downtime.
  • Collaborate with development teams to ensure best practices for observing, deploying and maintaining applications.
  • Implement and manage monitoring and alerting solutions to proactively identify and resolve issues.
  • Participate in on-call rotations to provide 24/7 support for critical systems.

 

Required Skills and Qualifications

  • Strong experience with AWS, including services such as EC2, EKS, Lambda, S3, RDS, and VPC.
  • Proficiency in managing and orchestrating containerized applications using Kubernetes.
  • Solid scripting skills in Python for automation and tool development.
  • Experience with infrastructure as code (IaC) tools like Terraform or CloudFormation.
  • Experience with resilience testing methodologies and tools.
  • Proven ability to perform capacity planning and management.
  • Familiarity with logging and monitoring tools like Sumologic, Prometheus, Grafana, or similar.
  • Excellent problem-solving skills and the ability to troubleshoot complex issues in a distributed system.
  • Strong communication and collaboration skills, with the ability to work effectively in a team environment, and to stay calm and composed in high-pressure situations.
  • Ability to work with application teams to guide the design of SLO’s.

 
Preferred Skills

  • Experience with OpenTelemetry, Prometheus, and Sumologic for observability and monitoring.
  • Familiarity with incident management tools such as PagerDuty.
  • Experience with Jira and Confluence for project management and documentation.
  • Knowledge of CI/CD pipelines and experience with tools such as Harness.
  • Understanding of modern development frameworks and languages, including Kotlin, Spring Boot, Kafka, and Postgres.

 

What can we offer you?

  • Competitive salary
  • 13th-month salary guarantee
  • Performance bonus
  • Professional English course for employees
  • Premium health insurance

About Us

We show commitment to our investors and stand for solid, long-term growth performance. Founded in Germany in 1987 and in American territory since 2008, GFT expanded globally to over 10,000 experts. And to more than 15 markets to ensure proximity to clients. With new opportunities from Asia to Brazil, the international growth story continues. We are committed to grow tech talents worldwide. Because our team’s strong consulting and development skills across legacy and pioneering technologies, like GreenCoding, underpin success. We maintain a family atmosphere in an inclusive work environment.

Why Choose GFT?

  • Competitive Compensation
  • Benefits package including comprehensive medical, dental, vision and others
  • Company Culture based on our Core Values
  • Professional Development Training with Individual Development Plans to map out your career growth
  • Opportunity to work in a global environment with diverse teams built with colleagues from around the world
  • Opportunity to work with technology industry leaders in the financial services industry
  • Opportunity to work for big name clients in capital markets, banking and other industries

Not Ready To Apply?

Stay connected! Enter your e-mail and we will keep you informed about upcoming events and opportunities that match your interests.

Apply now »