- Integrate SRE Technical Best Practices and Processes
- As an SRE, the expectation is that reliability, scalability, and supportability of CT solutions will continuously improve by rigorously implementing SRE innovations and technical best practices such as:
- Automation focus
- Monitoring & Incident Response
- Recommending Performance and Scalability requirements for CT solutions
- Establishing Service Level Indicators and Objectives (SLI and SLO) with the business so action can be taken to prevent and quickly address issues.
- As an SRE, the expectation is that reliability, scalability, and supportability of CT solutions will continuously improve by rigorously implementing SRE innovations and technical best practices such as:
- Establish SRE Strategic Collaboration and Responsibility Model
- Serve as a technical consultant, a partner, and a collaborator to bridge the gaps among delivery and operational teams -- Product Development, Product Delivery, Study Delivery, Clinical Operations, Support, Infrastructure, etc. -- to ensure reliability and availability requirements, tools, process, and practices are applied to create scalable and reliable systems.
- This includes hands-on technical SRE work coupled with embedding SRE practices into the SDLC.
- Measure and share KPI trends, action plan, and decision impacts
- Establish clearly defined Service Level Indicators and Objectives (SLI and SLO) and thresholds.
- The intent is to manage risk and take action to prevent and quickly address issues related to performance, reliability, or system errors.
- This includes establishing a communication plan for routinely making actions, decisions, and trends visible across the business, delivery, and operational teams.
- Troubleshoot
- Establish and guide Incident Response processes with support teams. Participate in and lead (as appropriate) performance issue investigations across teams.
- Train, guide, and equip teams to adapt these investigation processes.
- Evangelize, train, and transform
- Educate, train, share knowledge, establish cultural norms for a blameless culture of continuous improvement for the objectives above.
- Embed and integrate practices that prioritize reliability
- BS or MS in Computer Science, Engineering or a related technical discipline or 5 years equivalent technical experience in Software, DevOps, or related fields.
- 2+ years in an SRE or DevOps role or equivalent responsibilities
- 2+ years demonstrated experience in the automation of observability, monitoring, alerting or similar support processes
- 4+ years software or infrastructure delivery experience
- 2+ year or demonstrated proficiency with New Relic or similar Application Performance Monitoring (APM) toolsets
- Experience with “MELT” (Monitoring, Events, Logging, and Tracing) toolsets and practices
- Proficiency with PowerShell as an automation toolset
- Proficiency with .NET or Cloud (AWS) technologies
- Familiarity with software delivery practices
- Familiarity with Windows and VM-based administration
- Familiarity with Infrastructure as Code (IaC) - knowledge of Terraform, Kubernetes or other containerization technologies for managing infrastructure
- MS in Computer Science, Engineering, or a related technical discipline, or equivalent technical experience
- Related DevOps, SRE, or Cloud certifications
- 2+ years as a technical lead
- 1+ years developing and implementing observability solutions
- Familiarity with Microsoft Power Automate
- Familiarity with software delivery pipelines, CI/CD, and related application delivery automation disciplines
- Medical, Vision & Dental benefits from the 1st of the month following start date
- 20 days PTO per year, accrued monthly following start date
- 12 holidays per year as well as one day for Annual Diversity Day
- Company paid Long and Short-term disability along with Life Insurance
- 401k company contribution
- Hybrid work available for applicable roles
- Professional development programs/ continuous learning opportunities