We’re Mitek, a NASDAQ-listed global leader in mobile capture and digital identity verification solutions built on the latest advancements in AI and machine learning. Our Mobile Verify and Mobile Deposit products power and protect millions of identity evaluations and mobile deposits every day, around the world.
Our future of work is about enabling a smarter, faster, and happier workforce regardless of work location. Whether you prefer to work from a Mitek office or a remote location of your choosing, we'll provide you with the digital excellence, supporting systems & tools, and communication transparency that allows you to do your best, most collaborative work.
This position is located onsite in our Barcelona/Cerdanyola del Valles office.
Mitek Systems is seeking a Sr. Application Operations Engineer to join us in building our global Application Operations Team. The Application Operations team is responsible for ensuring Mitek's customer facing SaaS products meet our high standards for reliability and availability. The Sr. Application Operations Engineer will design, develop, implement, and deliver systems and automation to improve Mitek monitoring and problem management. This role will collaborate closely with engineering and cloud operations to improve site reliability and availability of Mitek SaaS products hosted in AWS.
What You'll Do
- Train and mentor team members.
- Monitor and respond to incidents relating to all Mitek SaaS Products and Critical services.
- Increase platform observability by implementing robust monitoring and alerting.
- Collaborate with engineering teams to increase infrastructure reliability and resiliency.
- Develop software needed to build operate and gain visibility into a large scale platform.
- Create rich and informative dashboards/reports that provide valuable insights for various business and technical stakeholders.
- Provide support for root cause analysis and preventative analysis of incidents.
- Escalate incidents and issues, and take ownership of the escalation process to other teams.
- Own and manage the operation of site reliability systems including monitoring, alerting, data collection and incident response.
- Assist with production deployments and system upgrades.
What You Need
- Experience with cloud platforms such as AWS, GCP running production workloads.
- Experience with both Linux and Windows operating systems administration preferably in the cloud.
- Experience with Infrastructure as Code tools like CloudFormation and Terraform.
- Experience in a scripting language such as Bash, Python, or Powershell.
- Experience with system and application health monitoring and alerting.
- Experience and proven success working in a highly collaborative environment.
- Understanding of modern site reliability engineering practices.
- 5-8 years of IT/Development experience including Network Operations Center and 24/7.
- Bachelor's Degree in Computer Science, Engineering, Information Technology, or related field preferred.
- Excellent written and verbal English communication skills.
- Ability to lead complex troubleshooting efforts including evidence-based.
- Excellent documentation skills regarding system issues, troubleshooting steps, resolution, and communication with stakeholders.
- Experience with Software Change Management, Production Incident Management, Problem Management, System & Application Monitoring and Logging.
- Willing to work flexible hours to be part of an on-call rotation
- Analytical and metrics driven mindset.
What Would Be Nice To Have, But Not Required
- Experience monitoring and operating containerized applications in the cloud.
- Experience with Configuration Management tools such as Chef, Ansible, Puppet.
- Working knowledge of basic network and routing concepts.
- Event Log Correlation / Security Event & Incident Management.
- Knowledge of REST API's.
- Experience in operating SaaS.