Apply now »

Site Reliability Engineer

Nov 24, 2022
GFT Technologies SE

Systems Reliability Engineer

Job Description

Our Ideal Candidate:


Is ready to join a diverse, highly technical & dynamic team of engineers that build and deliver critical internal, infrastructure services. Our engineers ensure that our services meet the needs of our customers with the desired levels of reliability, performance, and availability by developing continuous improvements, tools, and automation. We're looking for people passionate about technology while bringing a business mindset to their every day, who have a high sense of ownership and accountability, with the ability to influence those around them, have intellectual curiosity, and who are passionate for performance debugging and benchmarking.


The role and Responsibilities:


Systems Reliability Engineering (SRE) is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems. SRE ensures that internal and external services meet or exceed reliability and performance expectations while adhering to our engineering principles. SRE is also an engineering approach to building and running production systems, we engineer solutions to operational problems. As SREs are responsible for overall system operation, we use a breadth of tools and approaches to solve a broad set of problems. Practices such as limiting time spent on operational work, blameless postmortems, proactive identification, and prevention of potential outages.


  • Support multi-tenant OpenShift/Kubernetes clusters – to global standards.
  • Assist client developer and support teams to ensure issues are dealt with in a professional and timely manner.
  • Support developers in Dev, Test, Staging and Production environments.
  • Collaborate with both internal and external team members.
  • Work as part of both the local regional team, as well as the global application development and platform architecture teams.
  • Handle OpenShift/Kubernetes platform incidents and requests per Client SLA guidelines.
  • Vulnerability management
  • Assist with application security scanning.
  • Manage the monitoring application and take action on alerts.
  • Support the deployment and operation of Client’s strategic global PaaS platform, based on Red Hat OpenShift/Kubernetes.
  • Assist in troubleshooting/remediation, and pro-active maintenance of all hardware and software elements that combine to deliver the platform. 
  • Ensure that all work carried out complies with global standards and contribute to the development of those standards, to enable ongoing fit-for-purpose. 
  • Assist developers in using the OpenShift/Kubernetes platforms.
  • Work as part of the operations team to handle incidents and requests for the OpenShift/Kubernetes platforms.
  • Assist developers in using the in-house docker registry to create/publish applications. 
  • Engage in 24/7 Production support of globally deployed solutions both On-prem and in the Cloud.


Skills Required:


  • Three years of experience in the offered position, IT Engineering, or IT Architecture.

  • Experience with…
    • Linux (RHEL)
    • Red Hat OpenShift
    • Kubernetes
    • Docker
    • TCP Networking
    • Amazon Web Services
    • Ansible
    • Ansible Tower
    • Jenkins
    • Gluster File System
    • Sysdig or other monitoring application
    • Fluentd Forwarding/Splunk
    • Strong scripting or programming background.
    • Strong communication skills and ability to work in a globally distributed team.
    • Good operational skills with Linux, Networking, DNS and SSL.


Desire or Nice to Have Skills:


  • Bachelor’s degree, foreign equivalent in Computer Science, Electronics and Communication, Engineering or equivalent work experience.
  • OpenShift/Kubernetes certification.



Thanks for applying with us!!

Apply now »