Dynamic Signal is the leading Employee Communication and Engagement Platform, committed to creating a connected, inclusive, and engaged workforce where people feel valued and empowered to be their best. From factory workers and field employees, to knowledge workers in any time zone, hundreds of companies across every industry use Dynamic Signal’s web, desktop, and mobile applications to build aligned, productive, and engaged communities and employee advocates.
Reporting to the VP of Technical Operations, the Sr. Manager of Site Reliability Engineering is responsible for production operations and availability of all Infrastructure platforms and components. They will be responsible for the software operations and cloud engineering teams across all infrastructure technology domains with a focus on availability, productivity and efficiency through automation and instrumentation.
- Provide technical leadership as well as people management responsibilities for a team of 4 operations engineers located in the San Francisco Bay Area, Chicago, and Belfast, NI.
- Own end-to-end availability and performance of mission critical services and build automation to prevent problem recurrence.
- Design, deploy and maintain production IT infrastructure across global locations.
- Administration of virtualized platforms on public cloud providers, including AWS, Azure.
- Build and manage globally distributed teams to operate a large-scale SaaS platform.
- Hands-on management skills on Linux/Windows servers, TCP/IP, Docker containers, nginx and other enterprise-wide tools.
- Coordinate with Incident management team to produce weekly reports and dashboards for various products.
- You’ll provide a clear path for progression through personal development plans, defining goals, perform evaluations, and collaborate with employees on their objectives
- Prioritize and manage quarterly goals and projects.
- Hands-on management skills on Postgres, RabbitMQ, Redis and ELK.
- Hands-on designing and management of network infrastructure.
- Hands-on programming and scripting to automate processes.
- On-call rotations supporting 24×7 production ops.
KEY SKILLS AND QUALIFICATIONS:
- 4+ years of people management experience and 10+ years of proven experience in technical operations or development within a globally distributed cloud SaaS environment.
- 8+ years of enterprise-level infrastructure design and deployment enviroment.
- 2+ years of system programming and automation scripting.
- Prior experience building infrastructure with Terraform, Kubernetes, VMs and bare metal co-location environments.
- Proven experience growing employees through career development, coaching, and mentoring while guiding senior contributors to maximize their potential.
- Problem-solver able to effectively use lessons learned, iterate and improve.
- You like to address matters proactively, come up with creative solutions and plan for the future.
- In-depth knowledge of Python, Powershell, or equivalent scripting required.
- Working knowledge of Terraform, Ansible, TFS, git required.
- Linux system administration skills are required.