Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Staff Software Engineer - Site Reliability and Observability

AT General Motors
General Motors

Staff Software Engineer - Site Reliability and Observability

Austin, TX

Description

Work Arrangement:

Hybrid: This role is categorized as hybrid. This means the successful candidate is expected to report to either Austin, TX or Atlanta, GA at their respective innovation centers three times per week.

The Role:

The Software Engineering Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of software systems. Their job profile includes:

  • System Monitoring and Troubleshooting: Monitoring the performance and availability of software systems, identifying and resolving issues, and implementing proactive measures to prevent future incidents.
  • Automation and Infrastructure: Developing and maintaining automation tools and infrastructure to streamline software deployment, configuration management, and system monitoring.
  • Performance Optimization: Analyzing system performance, identifying bottlenecks, and implementing optimizations to improve the efficiency and scalability of software systems.
  • Incident Response and Root Cause Analysis: Responding to incidents, conducting root cause analysis, and implementing corrective actions to prevent similar incidents in the future.
  • Collaboration with Development Teams: Collaborating with software development teams to ensure that reliability and scalability considerations are incorporated into the software design and implementation.
  • Continuous Improvement: Identifying opportunities for process improvement, implementing best practices, and driving initiatives to enhance the reliability and performance of software systems.

Want more jobs like this?

Get jobs delivered to your inbox every week.

Select a location
By signing up, you agree to our Terms of Service & Privacy Policy.

Additional Description

What You'll Do

  • Implement scalable, reliable, secure SRE and Observability platform to monitor health of our production system and provide a holistic view of the environment.
  • Deliver tools/software to improve the reliability, scalability and operability of services.
  • Collaborate with engineering teams to analyze and provide inputs in architecture, infrastructure resources, observability to achieve reliability and scalability goals.
  • Collaborate with engineering teams to conduct production readiness reviews, deployment, operation and refinement.
  • Partner with stakeholders to ensure data and observability tools are effectively integrated with other systems and processes.
  • Partner with stakeholders to identify, measure and monitor availability, latency and overall service health.
  • Participate in on-call engineering duty to support production.
  • Instill Site Reliability best practice through automation, data insights, and observability
  • Perform initial incident root cause analysis with engineers, carryout incident postmortem.
  • Build run books, tooling to carry out production support activities.
  • Actively participate in technical discussions and deep dives with Architectural group

Your Skills & Abilities (Required Qualifications)

  • 7+ years of hands-on SRE experience (software development, systems monitoring) with at least one of the public cloud providers - Azure (strongly preferred), AWS, GCP
  • Experience operating high-availability, fault-tolerant, scalable, distributed software in production: Building monitoring, defining alerts, writing run books, establishing dashboards etc.
  • Experience with monitoring and log aggregation frameworks, such as Azure Monitor/Sentinel, Datadog(preferred), Dynatrace, Elasticsearch, Kibana, Logstash.
  • Strong working knowledge of Docker, Kubernetes, Terraform, Chef or Ansible
  • Experience troubleshooting JVM based applications.
  • Chaos engineering implementation and experience a big plus.
  • Extensive knowledge Infrastructure as a code tool Terraform
  • Extensive knowledge of Trace monitoring, installation and configuration of Open telemetry.
  • Strong experience in scripting/programming - Python, Java, Go, PowerShell, Bash.
  • Experience with configuration and management of SSO, Big Data/ No-SQL in cloud infrastructure.
  • CI/CD automation frameworks knowledge - Jenkins/Azure DevOps
  • Strong understanding of public cloud networking components.
  • You have a story to tell how you lead and influence cross-organization effort to improve uptime to at least 99.99%
  • Working experience with source control management tools, such as GitHub (Preferred), Azure Devops
  • Experience with IoT stack is a big plus
  • BS/MS in Computer Science/Engineering preferred

This job may be eligible for relocation benefits.

A company vehicle will be provided for this role with successful completion of a Motor Vehicle Report review.

#LI-KB1

About GM

Our vision is a world with Zero Crashes, Zero Emissions and Zero Congestion and we embrace the responsibility to lead the change that will make our world better, safer and more equitable for all.

Why Join Us

We believe we all must make a choice every day - individually and collectively - to drive meaningful change through our words, our deeds and our culture. Every day, we want every employee to feel they belong to one General Motors team.

Total Rewards | Benefits Overview

From day one, we're looking out for your well-being-at work and at home-so you can focus on realizing your ambitions. Learn how GM supports a rewarding career that rewards you personally by visiting Total Rewards resources.

Non-Discrimination and Equal Employment Opportunities (U.S.)

General Motors is committed to being a workplace that is not only free of unlawful discrimination, but one that genuinely fosters inclusion and belonging. We strongly believe that providing an inclusive workplace creates an environment in which our employees can thrive and develop better products for our customers.

All employment decisions are made on a non-discriminatory basis without regard to sex, race, color, national origin, citizenship status, religion, age, disability, pregnancy or maternity status, sexual orientation, gender identity, status as a veteran or protected veteran, or any other similarly protected status in accordance with federal, state and local laws.

We encourage interested candidates to review the key responsibilities and qualifications for each role and apply for any positions that match their skills and capabilities. Applicants in the recruitment process may be required, where applicable, to successfully complete a role-related assessment(s) and/or a pre-employment screening prior to beginning employment. To learn more, visit How we Hire.

Accommodations

General Motors offers opportunities to all job seekers including individuals with disabilities. If you need a reasonable accommodation to assist with your job search or application for employment, email us [email protected] or call us at 800-865-7580. In your email, please include a description of the specific accommodation you are requesting as well as the job title and requisition number of the position for which you are applying.

Client-provided location(s): Austin, TX, USA; Roswell, GA, USA; Warren, MI, USA
Job ID: General_Motors-JR-202510222
Employment Type: Full Time

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Health Reimbursement Account
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
    • FSA
    • HSA
    • HSA With Employer Contribution
  • Parental Benefits

    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
    • Adoption Leave
    • Fertility Benefits
    • Adoption Assistance Program
    • Family Support Resources
  • Work Flexibility

    • Flexible Work Hours
    • Remote Work Opportunities
    • Hybrid Work Opportunities
  • Office Life and Perks

    • Casual Dress
    • Happy Hours
    • On-Site Cafeteria
  • Vacation and Time Off

    • Paid Vacation
    • Paid Holidays
    • Personal/Sick Days
    • Leave of Absence
  • Financial and Retirement

    • 401(K)
    • 401(K) With Company Matching
    • Performance Bonus
    • Relocation Assistance
    • Stock Purchase Program
  • Professional Development

    • Tuition Reimbursement
    • Learning and Development Stipend
    • Promote From Within
    • Mentor Program
    • Shadowing Opportunities
    • Access to Online Courses
    • Lunch and Learns
    • Internship Program
  • Diversity and Inclusion

    • Diversity, Equity, and Inclusion Program
    • Woman founded/led
    • Employee Resource Groups (ERG)

Company Videos

Hear directly from employees about what it is like to work at General Motors.