Skip to main contentA logo with &quat;the muse&quat; in dark blue text.
Royal Caribbean Group

Site Reliability Engineer - Automation

Pembroke Pines, FL

The Enterprise Monitoring and Logging team provides the tools and strategy to drive holistic end-to-end enterprise technology monitoring across a combination of people, process, and tools. The team defines and implements a comprehensive standardized set of monitoring tools for use across the enterprise. You will play a role in helping a large enterprise rollout and utilize application performance monitoring tools to enhance Royal Caribbean Group's (RCG) ability to identify production issues early and drive them through to completion. The team democratizes the monitoring data and federates dashboard configuration and events management. We are implementing a modern automation platform using artificial intelligence and machine learning to facilitate automated issue resolution, faster triage, application, and infrastructure management.

Want more jobs like this?

Get Software Engineering jobs in Pembroke Pines, FL delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.


In this role you will focus on ensuring that technology services scale to meet performance requirements and agreements and can sustain current and future demands effectively. Responsibilities include implementation of Application Performance Monitoring (APM) monitoring ( AppDynamics) on multiple technology stacks On-prem and Cloud ( e.g. OpenShift, AWS ), analyzing and troubleshooting application performance and scalability issues and recommending optimizations, (reviewing application architecture and design, creating Dashboards in AppDynamics for various technology specific monitoring metrics (OpenShift, Micro-services, Kubernetes, JVM, Apache, CloudWatch metrics, etc.), researching and monitoring current end-user response time as well as application and system performance and availability; planning performance improvement; and reporting on demand and performance of technology infrastructure, applications, shared platforms, and cloud with the ability to assess and summarize application performance and volumetrics trends and make recommendations.

In addition, it's a great opportunity to network with many organizations throughout RCG both in application development and production services. If you enjoy solving complex business problems with modern interpretations of SRE, using AppDynamics, working with it's suite of products, integrations, and connectors, enjoy designing and building APIs, working with data flows, analyzing data and want to contribute to building next generation of modern applications for our customers, then join us!

Essential Duties and Responsibilities:

  • Work closely with application teams to onboard to AppDynamics and help to uplift their monitoring configuration.
  • Debug/Triage various challenges incurred by Application teams integrating with AppDynamics
  • Support DevOps and partner with application developers to find the best way to optimize the application performance.
  • Work with various teams on learning how to best utilize Application Performance Monitoring to uplift both their development and production operations
  • Identify, resolve, and call out application performance bottlenecks and challenges.
  • Perform Product and OS upgrades for the platform during maintenance windows.
  • Help to document, streamline, or otherwise automate processes required for onboarding applications, databases, and other middleware to AppDynamics
  • Work with Analytics teams to extract AppDynamics data for consumption and use in anomaly detection and other type analytics
  • AppDynamics agent testing, packaging, and implementation.
  • Develop automation solutions for AppDynamics installation and configuration.
  • Developed automation solutions for Reporting of Monitoring metrics.
  • Provide right-sizing and optimization for microservices, APIs, PODS (OpenShift) configurations.
  • Consult with business partners to resolve capacity and performance issues
  • Instrument and Maintain APM tool - AppDynamics.
  • Support multiple business services and solutions to identify service level and capacity requirements.
  • Anticipate and interpret business requirements to generate accurate demand trends and forecasts.
  • Ensure that any applications deployed will properly scale and meet expected SLOs and SLAs for Performance, Availability, Scalability and Stability.
  • Perform APM tuning of applications in development and critical production applications as well as to provide 3rd level performance support.
  • Provide evaluation of application performance and capacity on system resources usage to sustain business application volume processing.
  • Work closely with team of Performance Engineers to orchestrate, conduct, and participate in line of business performance testing analysis.
  • Utilize diagnostic and monitoring tools to measure, detect, isolate, and resolve performance issues found during application development performance testing including measuring, monitoring, and capturing required infrastructure & application performance metrics, logs and reports.
  • Identify application performance and scalability risks and mitigation for risks in a timely manner.
  • Monitoring and reporting for existing applications/systems and infrastructure
  • Create automated solution for data collection and reporting for existing applications for Performance and Availability
  • Create/Maintain Synthetic Monitoring Scripts using Thousand Eyes to support availability monitoring/alerting
  • Management of APM tools (AppDynamics):
  • Planning & Implementation, support and maintain APM tools including instrumentation, configuration, creation of dashboards and reports as well as provide deep-dive and root cause analysis on Performance issues.
  • Documentation of best practices and maintain audit log.
  • Good understanding of Technology specific Performance Metrics and Alert thresholds.
  • Project Risk Analysis and Assessment, Documentation, Mitigation.
  • Root cause analysis, heap dump, thread dump and other log analysis, code profiling, event tracing, and resource analysis.
  • Leverage tools such as Splunk and AppDynamics
  • Responsible for APM Tuning for Cloud, Hybrid environments
  • Application Performance Tuning (Java, Microservices, Containers/OpenShift, API's and microservices) & Monitoring
  • Good understanding of microservices architecture and monitoring/alerting for APIs, Cloud, OpenShift, AWS Services, and Hybrid deployments.
  • Web Application Performance and end user experience (client-side browser profiling) - deep understanding of chrome dev tools.
  • Strong working knowledge of a variety of technologies including but not limited to compute, Amazon Web Services (AWS), OpenShift, Kubernetes, WAS, Apache, JBOSS, Tomcat, etc.
  • Good programming and scripting skills.

Education, Experience, Knowledge & Skills:

  • 2 -5 years of related experience
  • B.S. in Computer Science preferred with at least 3 years in IT
  • Excellent verbal and written communication, presentation skills, and interpersonal skills
  • Ability to apply technical expertise across business or disciplines
  • Ability to think big picture and step back to understand the context of problems before applying analytical skills to address the issues at a more detailed level
  • Highly motivated self-starter with excellent organizational and time management skills; ability to work with minimal direct supervision
  • Strong consulting, relationship building, and collaboration skills
  • Effectively drives results independently and in a cross-functional team environment
  • Strong problem solving and analytical skills
  • Ability to effectively adapt to shifting priorities, demands, and timeline
  • Experience with instrumenting agent based and network based APM tools for performance and availability monitoring like AppDynamics.
  • Pregaming language knowledge and hands on experience with either Python, Shell Scripts, Java, or C++
  • Good SQL skills and database skills with SQL server
  • Basic statistical analysis skills and graphical skills
  • Experience with application tuning for optimal performance including JVM tuning
  • Good understanding of virtualization on VMware, RHEV and OpenShift
  • Good understanding of Microservices Frameworks, platforms like OpenShift, Kubernetes, AWS and technology specific monitoring metrics.
  • Knowledge of Cloud Platforms (AWS, Pivotal, etc.)
  • Strong Application and System architecture knowledge
  • AppDynamics, SolarWinds, Splunk, and other toolset experience
  • Mid-level working experience with different operating systems
  • Experience with Web and Application Servers - WebSphere, NGINX, JBOSS, Apache, IIS Windows, IHS,
  • Experience with SOA and Microservice architecture.
  • Experience with shift left techniques, automation, java and web-based profiling tools, network profiling tools.

Key Relationships

  • The goal of the role is to provide reliable, resilient, next-generation enterprise technology application and infrastructure that enables Royal Caribbean Group application and system owners to deliver commitments and business solutions.
  • The services provided by this role provide a strong, resilient, and stable platform to host business processes while enabling innovation.
  • The role is critical in helping a large enterprise rollout and utilize application performance monitoring tools to enhance the RCG's ability to identify production issues early and drive them through to completion.
  • The role enables application and system owners accelerate digital transformation through innovative data, integration solutions tools that help fix problems quickly, maintain complex systems and improve code.

WORK ENVIRONMENT:
  • Normal office working environment Mon ' Fri, 09:00am to 06:00pm
  • Participate in a 7-day/24-hour on call rotation schedule
  • Hybrid position - 3 days onsite and 2 days remote
  • Duties may require travel to other facilities

#LI-DW1

Client-provided location(s): Miramar, FL, USA
Job ID: royal_carribean-931215000
Employment Type: Other

Perks and Benefits

  • Health and Wellness

    • Health Insurance
    • Dental Insurance
    • Vision Insurance
    • Life Insurance
    • Short-Term Disability
    • Long-Term Disability
  • Parental Benefits

    • Birth Parent or Maternity Leave
    • Non-Birth Parent or Paternity Leave
  • Office Life and Perks

    • Commuter Benefits Program
    • Casual Dress
    • Company Outings
  • Vacation and Time Off

    • Paid Vacation
    • Paid Holidays
    • Personal/Sick Days
  • Financial and Retirement

    • 401(K)
    • Stock Purchase Program
    • Performance Bonus
  • Professional Development

    • Tuition Reimbursement
    • Promote From Within
    • Mentor Program
    • Access to Online Courses
  • Diversity and Inclusion

    • Diversity, Equity, and Inclusion Program

Company Videos

Hear directly from employees about what it is like to work at Royal Caribbean Group.

This job is no longer available.

Search all jobs