SRE Engineer with Programming Expertise (*) Remote
Job Description
(*) This is a remote position; however, the candidate must reside within 30 miles of one of the following locations: Boston, MA; Dallas, TX; San Francisco Bay Area, CA; Portland, ME; and Washington, D.C.
About the Team/RoleThe WEX Site Reliability Engineering (SRE) team seeks individuals passionate about developing software and solutions for observability, incident response, reliability, performance, operational excellence, and compliance. As part of the Platform Reliability organization, you will support internal stakeholders and Funding Platform teams, tackling complex challenges and enhancing our engineering teams' and customers' experience. The ideal candidate will have a strong aptitude for learning new technologies, driving meaningful projects to completion, and thriving under pressure while closely collaborating with engineering teams.
As an SRE at WEX, you will be exposed to numerous code bases and languages. You will need to be able to assess for bugs during on-call events, find performance issues, and write custom tools to help automate operational tasks.
How you’ll make an impactWillingness to dig deep into code, networking, operating systems, and/or storage solutions to solve complex issues
Develop automation and utilize monitoring tools to ensure system reliability.
Participate in incident response and troubleshooting
Participate in 24x7 Site Reliability rotations and escalation workflows
Identify and address performance bottlenecks. This will include code optimization, configuration changes, or infrastructure upgrade recommendations.
Collaborate with development teams to ensure software design meets operational requirements.
Continuously improve processes and procedures to increase system reliability and efficiency.
Stay up-to-date with the latest industry trends and technologies
Design, code, and debug applications while assisting with CI/CD pipelines, automating infrastructure tasks, and ensuring system scalability and security.
2+ years of hands-on experience as a Site Reliability Engineer or equivalent role
Development experience OR consistent knowledge of at least one major programming language C#, Java, GoLang, Python
Experience with Cloud Computing platforms (AWS, Azure, GCP)
Ability to thrive in a fast-paced development and operations world
Strong communication and collaboration skills
Experience with observability and logging technologies
Experience with at least one major RDBMS and NoSQL data store
Experience with containerization technologies such as Docker or Kubernetes
BA/BS degree in Computer Science or related technical field or equivalent job experience
Experience with infrastructure as code, preferably Terraform
Working knowledge in building and designing RESTful APIs.
Experience with Datadog, Grafana, and Splunk
Familiarity with Agile methodologies and practices
- Experience with GitOps
Experience with Apache Kafka and eventing technologies
Salary & Benefits
•