About The Role
Rocket Lab is the global leader in small satellite launch. Our mission is to open access to space to improve life on Earth. There are endless possibilities for what we can achieve with better access to space, and that access is now a reality thanks to Rocket Lab. Our team is 500 people strong and we’re adding to it every week. Collaboration is at our core - every idea is heard and everyone makes a difference. Teams are nimble, decisions are made quickly and we are action-oriented.
While other companies talk about it, we do it!
Site Reliability Engineer
Rocket Lab is seeking Site Reliability Engineers to support development of infrastructure systems and services that enable on-orbit operations of Rocket Lab’s Photon satellites, including the lunar CAPSTONE mission in 2021.
The Spacecraft Operations team is responsible for maintaining the health and safety of Rocket Lab’s growing fleet of spacecraft and ensuring that mission objectives and SLAs are met The team’s goal is to maximize the value Rocket Lab can create from a constantly changing fleet of diverse spacecraft. This unique challenge requires a set of backend services to support 24/7 automated operations.
The team is looking for experienced SREs that can build these services and grow them over time. These engineers will:
- Gather and analyze metrics from systems and applications to assist in performance tuning and fault finding
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service level objectives
Because operations at Rocket Lab is focused on automation, this engineer will need to develop tooling and metrics that are reliable and maintainable, while thriving in an agile environment where needs change often and software needs to grow to support new challenges. This person will be directly responsible for the design and implementation of network and platform tools (primarily AWS, but also GCP, Azure, and some on-prem assets).
Additionally, because reliability and adaptability are key, this person must be fluent with modern development techniques like CI/CD, TDD, and version control (git). This includes setting up and maintaining cloud infrastructure and working with the IT team to deploy and run services with rigorous SLAs/SLOs.
This SRE role will report to the Spacecraft Operations Lead. The Spacecraft Operations group is responsible for on-orbit operation of Rocket Lab spacecraft after launch and orbit injection and works directly with the launch team for mission planning and operational support. The Spacecraft Ops team is also responsible for the backend services and user tools for operating the Photon satellites
Duties & Responsibilities:
- Works with the operations team, fight software, IT, and ground support to build systems that support the full cycle of satellite operations
- Regularly solicits feedback and iterates on designs with systems engineers, operations team members, and other stakeholders.
- Support rigorous/challenging schedules during launch windows and early Photon operations.
- Team Development
- Help build an agile, modern software team
- Reports on metrics, SLAs, and SLOs like uptime, performance, and cloud resource utilization
- Reports to the Spacecraft Operations Lead
- Systems Engineering / Process Development
- Works with the operations team to develop workflows and processes that are robust and support Rocket Lab’s growing fleet of spacecraft.
- Will lead the design, implementation, iteration, and support of platform systems, metrics and alerting, and security systems
- Fills gaps and takes action rather than waiting for detailed requirements, complete documentation, or technical support
- Automation focus rather than “traditional” approaches of manual checks and detailed procedures
- “Digs in” and learns new systems quickly
- Decisions are data-driven
- Ability to make prioritize and make decisions with limited information
Required Skills & Experience:
- Expertise in software development for backend services
- Experience in an SRE role targeting cloud systems (preferably AWS)
- Expert in a high-level programming language (preferably Python)
- Experience working with continuous integration/continuous deployment (CI/CD) workflows and modern tools (git, Gitlab/Github, Docker) and test-driven development
- Cloud infrastructure and DevOps experience are a huge plus
- Exposure and/or ability to quickly learn aerospace fundamentals
To conform to US Government space technology export regulations, applicants must be a US citizen, lawful permanent resident of the US, protected individual as defined by 8 USC 1324b(a)(3), or eligible to obtain the required authorization from the US Department of State.
Rocket Lab USA is an Equal Opportunity Employer, employment with Rocket Lab USA is governed on the basis of merit, competence and qualifications and will not be influenced in any manner by race, color, religion, gender, national origin/ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability or any other legally protected status