Staff Engineer, Reliability
IND - Staff Engineer, Reliability - GCC070
We're determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals - and to help others accomplish theirs, too. Join our team as we help shape the future.
Cloud Services Team is searching for a Reliability Engineer . Candidate must have hands-on experience operating and engineering services on Google Cloud Platform (GCP), including data, compute , and observability services . The team is accountable for the operations, engineering, and governance of 200+ Cloud Technologies across a multiple cloud environment. Role requires helping mature operational practices for GCP workloads as part of our multi-cloud strategy. This is an excellent opportunity for someone who is interested in a mix of strategy and hands-on work . The ideal candidate should feel comfortable working with teammates at all levels of the organization including leadership.
Key Responsibilities
- Assists in the development, maintenance and operations of IT services across 200+ infra services across our Cloud transformation landscape.
- Develop solutions and drive adoption of enterprise solutions such as Cyber Protection, Disaster Recovery, and Security enhancements, across Line of business teams.
- Drive improvement, through automation, of software delivered as a service from an efficiency and simplicity perspective .
- Provide clear operational documents and construction/support specifications to IT userbase .
- Provide insight into operational Metrics across the entire Cloud Environment.
- Consult with customers on any new requirements or design questions or functionality configurations for environments on and off premise
- Delivers the tooling and capabilities needed to enable cloud compliance, metrics and reporting and cost management roadmap and strategy.
- Participate in incident resolution and change implementation as necessary. This may occasionally include support during non standard hours .
- Operate and improve reliability for production workloads running on Google Cloud Platform (GCP), focusing on availability, scalability, and operational readiness rather than application development.
- Own day - to - day operational concerns for core GCP services including Compute Engine, GKE, Cloud Run, BigQuery , Cloud Storage, and supporting platform services.
- Provide operational support for BigQuery platforms including job performance troubleshooting, capacity planning, quota management, dataset permissions, and cost optimization (slot usage, reservations, and quotas).
- Support Vertex AI platforms from an operations and reliability standpoint, including environment readiness, access controls, monitoring, pipeline execution health, and incident response (not model development).
- Build and maintain observability standards using Cloud Monitoring, Cloud Logging, Error Reporting, and custom SLI/SLO dashboards for GCP workloads.
- Implement alerting strategies aligned to error budgets and production reliability goals; reduce alert noise and prevent toil.
- Execute incident response, triage, and post - incident analysis for GCP services, contributing to PIRs and corrective actions.
- Develop and maintain runbooks, operational playbooks, and escalation workflows for GCP services.
- Drive automation-first operations, including self - healing patterns using Cloud Functions, Cloud Run jobs, Scheduler, and event - driven remediation.
- Enforce and operate GCP security and governance controls, including IAM, service accounts, Org Policies, VPC Service Controls, KMS, Secret Manager, and networking guardrails.
- Partner with engineering and data teams to review designs for operability, resiliency, and supportability, ensuring workloads meet production readiness standards before launch .
Required Skills & Experience :
- Expert understanding of how applications should be engineered by following fault tolerate best practices, separation of duties, observability, and being operator friendly.
- Expert on being Self-motivated and results-oriented with the ability to work in a team environment and independently
- Strong hands-on experience with BigQuery , including performance tuning, cost management, and governance.
- Experience with Vertex AI, including pipelines, model deployment, model monitoring, and integration with BigQuery .
Want more jobs like this?
Get Science and Engineering jobs in Hyderabad, India delivered to your inbox every week.

- Deep knowledge of Cloud IAM, service accounts, Workload Identity Federation, and principle-of-least-privilege controls.
- Experience with GKE operations (clusters, node pools, autoscaling, workload identity, Istio/Anthos optional).
- Understanding of Cloud Storage, Pub/Sub, Dataflow, Dataproc , and Cloud Composer for data/ML workflows.
- Experience building CI/CD pipelines targeting GCP using Cloud Build, Artifact Registry, and Terraform.
- Ability to troubleshoot GCP networking: VPCs, firewall rules, private service access, interconnects/VPN.
Nice to Have
- Intermediate knowledge of Terraform and Cloud Formation required .
- Intermediate Microsoft office skills
- Hands-on experience with advanced GCP services such as Vertex AI, BigQuery , Dataflow, Pub/Sub, Cloud Run, and GKE.
- Experience creating org-level policies, security baselines, and automation patterns for GCP environments
What We Offer
- Collaborative work environment with global teams.
- Competitive compensation and comprehensive benefits.
- Continuous learning and growth opportunities in geospatial and risk analytics technologies.
Perks and Benefits
Health and Wellness
- Health Insurance
- Health Reimbursement Account
- Dental Insurance
- Vision Insurance
- Life Insurance
- Short-Term Disability
- Long-Term Disability
- On-Site Gym
- Mental Health Benefits
- Virtual Fitness Classes
- Fitness Subsidies
- FSA
- HSA
Parental Benefits
- Birth Parent or Maternity Leave
- Non-Birth Parent or Paternity Leave
- Fertility Benefits
- Adoption Assistance Program
- Family Support Resources
- Adoption Leave
Work Flexibility
- Hybrid Work Opportunities
- Remote Work Opportunities
- Flexible Work Hours
Office Life and Perks
- Commuter Benefits Program
- Casual Dress
- On-Site Cafeteria
- Company Outings
- Holiday Events
Vacation and Time Off
- Paid Vacation
- Paid Holidays
- Volunteer Time Off
- Personal/Sick Days
Financial and Retirement
- 401(K) With Company Matching
- Stock Purchase Program
- Performance Bonus
- Relocation Assistance
- Financial Counseling
- Profit Sharing
Professional Development
- Internship Program
- Leadership Training Program
- Associate or Rotational Training Program
- Tuition Reimbursement
- Promote From Within
- Mentor Program
- Shadowing Opportunities
- Access to Online Courses
- Lunch and Learns
- Learning and Development Stipend
Diversity and Inclusion
- Employee Resource Groups (ERG)
- Diversity, Equity, and Inclusion Program
Company Videos
Hear directly from employees about what it is like to work at The Hartford.