Skip to main contentA logo with &quat;the muse&quat; in dark blue text.

TROC Engineer - Monitoring

3 days ago Santiago, Chile

Position: Monitoring Engineer
Job Description:

The Monitoring Engineer will have the responsibility of designing, implementing, and maintaining systems to proactively monitor the health and performance of the customer's organization's OT infrastructure, including servers, networks, applications, and databases, identifying potential issues and alerting relevant teams to take corrective action before major outages occur; It will use specialized monitoring tools and platforms to analyze data and create reports on the performance and reliability of the system provided by the customer.

Its main functions will correspond to:

  • Design and implementation of the monitoring system:
    • Select and configure monitoring tools based on the needs of the infrastructure under monitoring
    • Perform management, optimization and continuous improvement of client's monitoring and observability tool
    • Ensure that L1 and L2 support teams, on-site support personnel and 24x7 monitoring team can effectively monitor the technology infrastructure under scope of service
    • Design, implement and maintain advanced configurations on monitoring platforms, ensuring consistency with configuration databases.
    • Develop dashboards and visualizations to present key performance indicators (KPIs).
    • Establish the monitoring requirements for the transition to production of technological projects.
    • Define and manage critical and customized alerts, ensuring they are relevant to monitored services.
    • Set alert thresholds and notification protocols for critical events.
    • Design, update and deploy service maps and management panels of the elements that are within the scope of service adapted to the needs of stakeholders.
    • Collaborate in the implementation of new monitoring technologies, platform integration, migrations and decommissioning of obsolete tools.
    • Analyse and project the capacity of monitoring platforms to ensure they adapt to future growth requirements.
    • Assist in the integration of monitoring systems with incident management tools to escalate issues in a timely manner.
    • Research, diagnose and propose solutions for devices or services that present difficulties to be monitored effectively.
    • Evaluate emerging tools, methods, and configurations that can facilitate the monitoring of devices with technical constraints or complexities.
    • Implement regular training programs for the monitoring team, ensuring that they are aware of the latest technologies, tools, and best practices in monitoring and management of technological infrastructure. This includes training sessions, documentation, and workshops.
  • Performance analysis:
    • Meet contractually agreed SLAs and service KPIs and conduct monthly review
    • Present infrastructure status in scope and its status through monthly reports
    • Perform deep analysis of metrics and events to identify critical patterns and trends that may impact the performance and stability of services.
    • Analyze real-time and historical monitoring data to identify bottlenecks and performance trends.
    • Investigate performance degradations, errors, and system anomalies to identify root causes.
    • Generate reports on system health and performance metrics to inform capacity management planning and optimization.

Want more jobs like this?

Get jobs in Santiago, Chile delivered to your inbox every week.

Job alert subscription
  • Incident response and troubleshooting:
    • Enable the rapid and accurate identification of relevant events to minimise the impact on operations, using client's monitoring tools.
    • Respond to alerts and proactively troubleshoot issues to minimize downtime.
    • Use advanced methodologies and technologies to anticipate and mitigate potential incidents, adapting to BHP's operational needs.
    • Manage requirements and incidents related to the monitored platforms, providing effective and timely solutions.
    • Generate detailed reports on the problems identified and the solutions implemented, providing feedback to prevent similar situations in the future.
    • Collaborate with other teams to identify and resolve system issues.
    • Conduct post-incident analysis to identify areas for improvement and implement preventative measures.
  • Maintenance and optimization:
    • Participate in meetings to establish priorities and service planning with the client.
    • Monitor the lifecycle of monitoring tools, ensuring their continuous updating and alignment with market best practices.
    • Regularly review and update monitoring configurations to reflect changes in infrastructure.
    • Manage users and privileges of the platforms, perform periodic audits and generate reports.
    • Maintain an updated inventory of entities on each monitoring platform and ensure that they are in sync with the CMDB.
    • Provide robust and adaptable support for monitoring and observability tools and platforms.
    • Implement constant improvements to optimize the performance and effectiveness of monitoring tools.
    • Perform system updates and patch management for monitoring tools.
    • Collaborate with technical teams in the planning and execution of necessary adjustments in configurations, networks or devices to ensure their integration into monitoring platforms.
    • Optimize monitoring processes to improve efficiency and reduce false positives.

The characteristics of the candidates are:
  • Information technology professional with at least a bachelor's degree or higher
  • Experience of at least 5 years in large IT infrastructure monitoring roles
  • Strong understanding of IT infrastructure components such as servers, networks, databases, and applications.
  • Experience with SolarWinds monitoring platform is mandatory. Desirable knowledge of Graphana and IMS
  • Ability to interpret complex monitoring data and identify patterns.
  • Excellent troubleshooting skills to diagnose and resolve operational environment issues.
  • Knowledge of scripting languages (e.g., Python, PowerShell) for custom automation and monitoring tasks will be desirable.
  • Effective communication to collaborate with cross-functional teams and report findings clearly.

Client-provided location(s): Santiago, Chile
Job ID: Infosys-138339BR
Employment Type: OTHER
Posted: 2025-08-29T18:36:32

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Life Insurance
    • HSA
    • Short-Term Disability
  • Parental Benefits

    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
    • On-site/Nearby Childcare
  • Work Flexibility

    • Office Life and Perks

      • Commuter Benefits Program
    • Vacation and Time Off

      • Paid Vacation
      • Paid Holidays
      • Personal/Sick Days
      • Sabbatical
    • Financial and Retirement

      • 401(K)
      • Relocation Assistance
    • Professional Development

      • Learning and Development Stipend
    • Diversity and Inclusion

      • Employee Resource Groups (ERG)