Manager, Site Reliability Engineering

Dealer.com

Location: Burlington, Vermont

Type: Full Time

Education: Bachelor's Degree

Experience: 5 - 10 Years

This position is hybrid and can work from any of the following office locations: Atlanta, GA; Burlington, VT at Dealer.com; Irvine, CA; Mission, KS.

We are seeking a seasoned Site Reliability Engineering (SRE) Manager to lead our team of talented engineers. The SRE Manager will be responsible for ensuring the reliability, scalability, and performance of our company’s infrastructure and applications. This role requires a combination of software engineering skills and operational excellence.

Key Responsibilities:

Leadership : Lead a team of Site Reliability Engineers, fostering an environment of technical growth and continuous improvement.

System Design and Implementation : Collaborate with product development teams to design and implement cloud solutions that are reliable, scalable, and efficient.

Policy Development : Develop and implement policies and procedures to ensure consistent reliability and performance levels across all services.

Incident Management : Drive the incident management process, ensuring efficient resolution of production issues. Support a blameless post-mortem culture to learn from incidents.

Automation : Implement automation tools and technologies to replace manual operations and processes, improving system efficiency and effectiveness.

On-call Support : Participate in on-call rotations and provide after-hours support as needed to ensure system uptime and reliability.

Continuous Learning : Stay updated with the latest industry trends and technologies and apply this knowledge to improve our systems and processes.

Qualifications:

  • Bachelor’s degree in Computer Science, Engineering, or a related discipline and 6 year’s experience in a related field. The right candidate could also have a different combination, such as Masters degree and 4 year’s experience, PhD and 1 year experience in a related field; or 18 years’ experience in a related field.
  • 1+ years’ experience in a lead or management role
  • Background as a Site Reliability Engineer, or similar role building solutions in a software engineering environment.
  • Experience planning and implementing cloud migrations (AWS preferred)
  • Proficiency in scripting languages such as Python, Bash, or Ruby.
  • Familiarity with containerization technologies like Docker and Kubernetes.
  • Excellent problem-solving skills and attention to detail.
  • Strong leadership and team management skills.
  • Excellent communication and interpersonal skills.

© 2024 Vermont Technology Alliance

Site by Scout Digital