Site Reliability Engineer II Job at Abnormal Security, New York, NY

R3JsdWw2QTlKKy9JanUvVmp6VGNQRFly
  • Abnormal Security
  • New York, NY

Job Description

Job Description

Job Description

About The Role

Enterprises of all sizes trust Abnormal Security's cloud products to stop cybercrime. These products must scale with the growth of our customers, and ensure reliability and availability by being resilient. This is where our SRE fits in, ensuring the prevention, detection, efficient remediation, and quick recovery from outages that impact the Abnormal Security Platform.

Come empower the rest of engineering to stop cybercrime as we expand our offerings across both clouds and regions.

There are a lot of opportunities for growth and career advancement – it's up to you to own your career here. Some potential career paths for this role include:

  • Positioning yourself to be a founding member of a team that will have an outsized impact on the rest of the company.
  • Growing into a Senior technical leadership role.
What You Will Do
  • Deployment Operations
    • Build tools and processes to standardize deployment of Abnormal Security product suite in a multi-datacenter setup.
    • Partner with R&D teams to develop pre and post deployment checklists, canary test environments and workflows, and safe rollback processes.
  • Incident Prevention
    • Identify gaps in existing processes and advocate for necessary changes to improve overall system stability and availability.
    • Lead the Production Readiness Review process to ensure the resilience of systems before customer deployment.
    • Oversee the Critical Change Management Review process for the safe application of changes to critical services.
    • Develop and enforce architecture guidelines to minimize downtime and ensure high system availability.
  • Detection
    • Establish consistent definition of metrics for "Is this product working".
    • Define and monitor SLAs/SLOs for critical systems, actively tracking deviations and triggering alerts when necessary.
  • Remediation
    • Define incident severity classification guidelines and implement incident response protocols to promptly address issues and reduce downtime.
    • Facilitate effective communication between Engineering and Customer Success teams during incidents.
  • Incident Recovery
    • Design and implement tools to expedite system recovery and minimize the impact of incidents.
    • Develop guidelines for Post Mortems after incidents to prevent recurrence.
Must Have
  • Bachelor's in Computer Science, Computer Engineering, or equivalent professional experience
  • 1+ experience as a Site Reliability Engineer, responsible for the reliability of shared services
  • Experience with a public cloud provider (AWS, Azure, GCP), observability stack (Prometheus, Grafana), and incident management tools (PagerDuty, Sentry, Slack integration).
Nice To Have
  • Experience with defining and implementing SRE practices such as Change Management, Production Readiness Review, and Incident Post Mortems.
  • Experience with container orchestration, preferably Kubernetes and Helm.
  • Experience developing Infrastructure as Code (IaC) modules and building automation, preferably Terraform.

#LI-NT1


At Abnormal Security certain roles are eligible for a bonus, restricted stock units (RSUs), and benefits. Individual compensation packages are based on factors unique to each candidate, including their skills, experience, qualifications and other job-related reasons. We know that benefits are also an important piece of your total compensation package. Learn more about our Compensation and Equity Philosophy on our Benefits & Perks page.

Base salary range:

$147,200—$173,200 USD

Job Tags

Similar Jobs

Memphis Business Academy

Certified Computer Science Teacher Job at Memphis Business Academy

 ...Job Description Job Description Position Overview Teachers are entrusted with advancing the mission of Memphis Business Academy. Reporting to the Principal and various network administrators, teachers are responsible for the development and execution of a standards... 

VRC Companies

Data Entry Clerk Job at VRC Companies

 ...similar role. Fast typing skills with an eye for detail and familiarity with spreadsheets and online forms. 60 wpm required Excellent knowledge of word processing tools and spreadsheets (MS Office Word, Excel, etc.). Working knowledge of office equipment and computer... 

Medical Clinic

Dental Practice Manager Job at Medical Clinic

 ...Job Description Job Description JOB PURPOSE: The Dental Practice Manager is responsible for overseeing the daily operations of the dental clinic, ensuring compliance with federal, state, and local regulations, and managing the staff, budget, and inventory of the... 

Ohm Digital

Pay Per Click (PPC) Manager Job at Ohm Digital

 ...just fine! About Us: Ohm Digital is a small but rapidly growing agency with a big heart and big ideas. We're on the hunt for a PPC Manager with a passion for higher education to help our clients shine. If you're someone who loves solving complex problems, diving into... 

Brightwheel

Customer Support Advocate Job at Brightwheel

**Customer Support Advocate** Denver, Austin, or Remote (US Only) / Customer Success Success / Full-time **Our Mission and Opportunity...  ...for every child by giving teachers meaningfully more time with students each day, engaging parents in the development of their kids,...