Site Reliability Engineering (SRE) Director

Site Reliability Engineering (SRE) Director

BeathChapman Pte Ltd
15-18 years
255000 - 300000 SGD

Job Description

Job Details
Seeking a technically savvy, experienced, and inspiring leader to be based in Singapore to lead the SRE Function under the Technology Team across all the business unit.
The SRE team supports the Run function in the entire organisation and responsible for ensuringthe right processes, people and tools in place to keep the environment running 24x7. The role is includes day-to-day operational management aspect of Technology Services, this includes active management of partners and outsourced service providers. The role involves managing the team (onshore and offshore), supporting BAU support services, developed creative ways to solve problems, bring an engineering mindset whilst engaging with the Technology leadership team. The role will entail people, process, project management and engineering work.
  • The role is expected to work with the development team and application support team to ensure performance and high level of application availability by preventing incidents through proactive monitoring and incidents correlation as well as constantly establishing and tracking user experience metrics.
  • By bringing an Engineering mindset to the table, the role is expected to be creative in problem solving, passionate about process and understanding code with a view to accelerating problem solving proactively.
  • Assume the role of Major Incident Manager and manage the lifecycle of major incidents. Provide command & control, mobilize resolvers, identify paths for mitigation, track multiple workstreams for closure
  • Besides the above, the role shall assume the BAU responsibilities of managing the Run the bank function and overall ownership for the strategy and design of the ITSM processes including Incident Management, Problem Management, Configuration Management (CMDB), Change Management, Event Management.
  • Support service quality deep dives for technology incidents, service disruptions caused by data transmissions failures, batch processing delays, erroneous code deployments, Continuity of Business failures etc.
  • Providing management support in ensuring highest levels of service quality and improving service levels through identification of problem trends and causes which impact the delivery of production services
  • Ability to communicate well and manage highly stressful situations over the phone. Demonstrate proven leadership qualities removing any ambiguity as to who is coordinating the incident resolution.
  • Develop and maintain the Business Continuity Plan and Disaster Recovery Plans for IT and to implement measures designed to safeguard the Information Technology and needs of the business in the event of major incidents or disasters. . Design and run the Operational Acceptance Testing strategy for the services moving into production.
Our Ideal Candidate:
  • Strong background and fundamentals in engineering concepts across DevOps / Infrastructure Management.
  • Degree in Engineering or Software Engineering is key.
  • 15+ years of Technology experience or which 5+ years in working as an SRE or in a DevOps / Agile environment
  • Technical knowledge on management of AWS/Cloud hosted services
  • Excellent verbal and written communication skills with the ability to deliver presentations to multiple levels of the Management

Reg No. R1652932
BeathChapman Pte Ltd
Licence no. 16S8112

Job Details

Employment Types:




Similar Jobs

People Also Considered

Career Advice to Find Better

Simple body text this will replace with orginal content