MIOps Engineer
About InvoiceCloud:
InvoiceCloud is a fast-growing fintech leader recognized with 20 major awards in 2025, including USA TODAY and Boston Globe Top Workplaces, multiple SaaS Awards wins for Best Solution for Finance and FinTech, and national customer service honors from Stevie and the Business Intelligence Group. Judges also highlighted our mission to reduce digital exclusion and restore simplicity and dignity to how people pay for essential services, as well as our leadership in AI maturity and responsible innovation. It’s an award-winning, purpose-driven environment where top talent thrives. To learn more, visit InvoiceCloud.com.
About InvoiceCloud
InvoiceCloud is a leading fintech company delivering secure, innovative SaaS solutions in the electronic bill presentment and payment (EBPP) space. We empower organizations across utilities, government, and insurance to improve customer experience, accelerate payments, and enhance operational efficiency.
Role Overview
We are seeking a Senior AIOps & Reliability Engineer to design and build AI- and ML-driven operational intelligence systems. This role focuses on proactive reliability, observability, and intelligent automation across cloud-native platforms, enabling early risk detection, faster incident resolution, and self-healing systems.
Key Responsibilities – AI & Machine Learning–Driven Operations Intelligence
- Design and implement ML models for anomaly detection, predictive incident detection, failure forecasting, and root cause analysis.
- Apply AI-assisted analysis for incident summarization, classification, and remediation recommendations.
- Engineer data pipelines converting observability telemetry into ML-ready datasets.
- Continuously evaluate, retrain, and improve models using production feedback.
Shift-Left AIOps & Reliability Engineering
- Implement shift-left AIOps initiatives to surface risks early in the SDLC.
- Apply ML to code changes, Terraform diffs, and deployment metadata to predict operational risk.
Want more jobs like this?
Get jobs in Hyderabad, India delivered to your inbox every week.

- Embed ML-driven risk scoring into Azure DevOps CI/CD and PR workflows.
- Partner with engineering teams to validate observability-first development practices.
AI-Powered Operational Intelligence
- Design AI-driven incident summarization, AI-assisted runbooks, and guided remediation.
- Build human-in-the-loop decision systems for high-impact incidents.
- Balance AI, ML, and deterministic automation with a focus on explainability and trust.
Observability & Telemetry Engineering
- Instrument applications using OpenTelemetry (OTEL).
- Normalize and correlate metrics, logs, and traces.
- Integrate telemetry pipelines with New Relic.
- Define and monitor SLIs, SLOs, and operational health signals.
Cloud, Kubernetes & Platform Operations
- Design and operate workloads on Microsoft Azure.
- Manage Azure Kubernetes Service (AKS) clusters.
- Deploy containerized .NET 8 services using Helm.
ML-Enabled DevOps, Infrastructure & Automation
- Build Azure DevOps pipelines for application, infrastructure, and ML deployments.
- Manage source control using Azure DevOps Repos.
- Implement Infrastructure as Code using Terraform.
- Automate workflows using Ansible.
Intelligent Automation & Self-Healing Systems
- Build closed-loop automation triggered by ML predictions.
- Reduce alert fatigue using intelligent correlation.
- Develop self-healing systems to reduce MTTR.
Required Skills & Experience
- Strong ML fundamentals including anomaly detection and time-series analysis.
- Experience applying AI/LLM systems to operational workflows.
- Hands-on Microsoft Azure and AKS experience.
- Proficiency with Kubernetes, Helm, Azure DevOps, Terraform, and Ansible.
- Experience with OpenTelemetry and New Relic.
Nice to Have
- MLOps or ML lifecycle management experience.
- Python for ML experimentation or AI prototyping.
- Familiarity with SRE principles.
What Success Looks Like
- Operational risks identified before production.
- Early incident prediction through ML and AI insights.
- Reduced alert noise and faster incident resolution.
- Continuous improvement in platform reliability.
Technical Skills
Azure, AKS, Kubernetes, Helm, .NET 8, OpenTelemetry, New Relic,
AIOps, MLOps, Machine Learning, Anomaly Detection, Time-Series Analysis,
LLMs, Predictive Analytics, Root Cause Analysis,
Azure DevOps, CI/CD, Terraform, Ansible,
Python, Infrastructure as Code, Observability, SRE,
Incident Management, Self-Healing Systems
InvoiceCloud is committed to providing equal employment opportunities to all employees and applicants. We do not tolerate discrimination or harassment of any kind based on race, color, religion, age, sex, nationality, disability, genetic information, veteran or military status, sexual orientation, gender identity or expression, or any other characteristic protected under applicable laws.
This commitment applies to all aspects of employment, including recruitment, hiring, placement, promotion, termination, layoff, recall, transfer, leave, compensation, and training.
If you require a disability-related or religious accommodation during the application or recruitment process, and wish to discuss possible adjustments, please contact jobs@invoicecloud.com.
Click here to review InvoiceCloud’s Job Applicant Privacy Policy.
For recruitment agencies: InvoiceCloud does not accept unsolicited resumes from agencies. Please do not forward resumes to our job aliases, employees, or any other company location. InvoiceCloud is not responsible for any fees associated with unsolicited submissions.
Perks and Benefits
Health and Wellness
Parental Benefits
Work Flexibility
Office Life and Perks
Vacation and Time Off
Financial and Retirement
Professional Development
Diversity and Inclusion