Software/Systems Engineer - Site Reliability

Squarespace provides creative tools and services to help anyone build and manage their brand online. For more than a decade, we’ve empowered millions of people to take control of their online presence like never before. The Site Reliability Engineering (SRE) team is looking for passionate software engineers and systems engineers who possess backgrounds in systems, programming and networking to help scale, build and support the engine powering our highly interactive web applications.

Our Site Reliability Engineering (SRE) team is in charge of ensuring that customers around the world are able to access Squarespace sites, that they load quickly, and that all of the features are working properly. We work together with the product teams to maintain the reliability of existing and new features that are backed by a fleet of microservices, with the infrastructure teams to scale our current data centers and build out new ones, with the data teams to tackle challenging “big data” issues, and the security teams to keep a close eye on the latest vulnerabilities and constantly protect our product, infrastructure and networks.


  • Write services and infrastructure-as-code that automates the deployment, provisioning, scaling, and monitoring of Squarespace’s infrastructure
  • In collaboration with other engineering teams, help build the deployment and monitoring framework of our services as we scale to hundreds of microservices
  • Build a distributed platform that builds, runs, and dynamically scales our services in containers
  • Make our services redundant and fast around the globe
  • Take part in our distributed 24x7 on-call rotation


  • Real world experience building and supporting large-scale frontend and/or backend systems on the JVM (strong Java coding and debugging skills a huge plus)
  • Experience managing web services in large-scale *nix environments
  • Understanding of networking fundamentals (TCP/IP, HTTP, DNS, etc.)
  • Experience with shell scripting and knowledge of at least one of: Python, Bash, Perl, Ruby
  • Familiarity with relational (MySQL, PostgreSQL) and NoSQL (MongoDB) databases
  • Interest or experience with infrastructure automation, configuration management, monitoring & alerting tools, performance and networking
  • Experience with Docker and Mesos/Swarm/Kubernetes is a big plus

Our Technology

Some of the technologies that we use on the Site Reliability Engineering team include:

  • Java 8
  • Ansible
  • Python
  • Ruby
  • Clojure
  • CentOS
  • MongoDB
  • Git
  • Hadoop
  • Sensu
  • Consul
  • Elasticsearch / Logstash / Kibana
  • Graphite / Grafana
  • Mesos
  • Docker
  • Swarm
  • Spark
  • Kafka

Back to top