Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

Staff Engineer, Reliability

Yesterday Hyderabad, India

IND - Staff Engineer, Reliability - GCC070
We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Join our team as we help shape the future.

Cloud Services Team is searching for a Reliability Engineer . Candidate must have hands-on experience operating and engineering services on Google Cloud Platform (GCP), including data, compute , and observability services . The team is accountable for the operations, engineering, and governance of 200+ Cloud Technologies across a multiple cloud environment. Role requires helping mature operational practices for GCP workloads as part of our multi-cloud strategy. This is an excellent opportunity for someone who is interested in a mix of strategy and hands-on work . The ideal candidate should feel comfortable working with teammates at all levels of the organization including leadership.

Key Responsibilities

  • Assists in the development, maintenance and operations of IT services across 200+ infra services across our Cloud transformation landscape.
  • Develop solutions and drive adoption of enterprise solutions such as Cyber Protection, Disaster Recovery, and Security enhancements, across Line of business teams.
  • Drive improvement, through automation, of software delivered as a service from an efficiency and simplicity perspective .
  • Provide clear operational documents and construction/support specifications to IT userbase .
  • Provide insight into operational Metrics across the entire Cloud Environment.
  • Consult with customers on any new requirements or design questions or functionality configurations for environments on and off premise
  • Delivers the tooling and capabilities needed to enable cloud compliance, metrics and reporting and cost management roadmap and strategy.
  • Participate in incident resolution and change implementation as necessary. This may occasionally include support during non standard hours .
  • Operate and improve reliability for production workloads running on Google Cloud Platform (GCP), focusing on availability, scalability, and operational readiness rather than application development.
  • Own day - to - day operational concerns for core GCP services including Compute Engine, GKE, Cloud Run, BigQuery , Cloud Storage, and supporting platform services.
  • Provide operational support for BigQuery platforms including job performance troubleshooting, capacity planning, quota management, dataset permissions, and cost optimization (slot usage, reservations, and quotas).
  • Support Vertex AI platforms from an operations and reliability standpoint, including environment readiness, access controls, monitoring, pipeline execution health, and incident response (not model development).
  • Build and maintain observability standards using Cloud Monitoring, Cloud Logging, Error Reporting, and custom SLI/SLO dashboards for GCP workloads.
  • Implement alerting strategies aligned to error budgets and production reliability goals; reduce alert noise and prevent toil.
  • Execute incident response, triage, and post - incident analysis for GCP services, contributing to PIRs and corrective actions.
  • Develop and maintain runbooks, operational playbooks, and escalation workflows for GCP services.
  • Drive automation-first operations, including self - healing patterns using Cloud Functions, Cloud Run jobs, Scheduler, and event - driven remediation.
  • Enforce and operate GCP security and governance controls, including IAM, service accounts, Org Policies, VPC Service Controls, KMS, Secret Manager, and networking guardrails.
  • Partner with engineering and data teams to review designs for operability, resiliency, and supportability, ensuring workloads meet production readiness standards before launch .


Required Skills & Experience :

  • Expert understanding of how applications should be engineered by following fault tolerate best practices, separation of duties, observability, and being operator friendly.
  • Expert on being Self-motivated and results-oriented with the ability to work in a team environment and independently
  • Strong hands-on experience with BigQuery , including performance tuning, cost management, and governance.
  • Experience with Vertex AI, including pipelines, model deployment, model monitoring, and integration with BigQuery .

Want more jobs like this?

Get Science and Engineering jobs in Hyderabad, India delivered to your inbox every week.

Job alert subscription
  • Deep knowledge of Cloud IAM, service accounts, Workload Identity Federation, and principle-of-least-privilege controls.
  • Experience with GKE operations (clusters, node pools, autoscaling, workload identity, Istio/Anthos optional).
  • Understanding of Cloud Storage, Pub/Sub, Dataflow, Dataproc , and Cloud Composer for data/ML workflows.
  • Experience building CI/CD pipelines targeting GCP using Cloud Build, Artifact Registry, and Terraform.
  • Ability to troubleshoot GCP networking: VPCs, firewall rules, private service access, interconnects/VPN.


Nice to Have

  • Intermediate knowledge of Terraform and Cloud Formation required .
  • Intermediate Microsoft office skills
  • Hands-on experience with advanced GCP services such as Vertex AI, BigQuery , Dataflow, Pub/Sub, Cloud Run, and GKE.
  • Experience creating org-level policies, security baselines, and automation patterns for GCP environments


What We Offer

  • Collaborative work environment with global teams.
  • Competitive compensation and comprehensive benefits.
  • Continuous learning and growth opportunities in geospatial and risk analytics technologies.

Client-provided location(s): Hyderabad, India
Job ID: Hartford_Fire_Insurance_Company_FGB-4094
Employment Type: FULL_TIME
Posted: 2026-04-18T18:51:48

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Health Reimbursement Account
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
    • On-Site Gym
    • Mental Health Benefits
    • Virtual Fitness Classes
    • Fitness Subsidies
    • FSA
    • HSA
  • Parental Benefits

    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
    • Fertility Benefits
    • Adoption Assistance Program
    • Family Support Resources
    • Adoption Leave
  • Work Flexibility

    • Hybrid Work Opportunities
    • Remote Work Opportunities
    • Flexible Work Hours
  • Office Life and Perks

    • Commuter Benefits Program
    • Casual Dress
    • On-Site Cafeteria
    • Company Outings
    • Holiday Events
  • Vacation and Time Off

    • Paid Vacation
    • Paid Holidays
    • Volunteer Time Off
    • Personal/Sick Days
  • Financial and Retirement

    • 401(K) With Company Matching
    • Stock Purchase Program
    • Performance Bonus
    • Relocation Assistance
    • Financial Counseling
    • Profit Sharing
  • Professional Development

    • Internship Program
    • Leadership Training Program
    • Associate or Rotational Training Program
    • Tuition Reimbursement
    • Promote From Within
    • Mentor Program
    • Shadowing Opportunities
    • Access to Online Courses
    • Lunch and Learns
    • Learning and Development Stipend
  • Diversity and Inclusion

    • Employee Resource Groups (ERG)
    • Diversity, Equity, and Inclusion Program

Company Videos

Hear directly from employees about what it is like to work at The Hartford.