Playson is a leading online gaming supplier with worldwide recognition which was founded in 2012. We offer complete gaming solutions based on the latest technologies and detailed market analysis for the leading iGaming operators.
We are looking for a Site Reliability Engineer/DevOps. It is a position in the Platform Tribe, SRE Stream, FireX Squad, responsible for the automation and high-load infrastructure maintenance.
To succeed in the advertised role, you have:
― Strong experience with issues processing (RCA, Postmortems practices).
― Strong understanding of Kubernetes (K8s) — Including deployment, scaling, troubleshooting, and managing containerized applications.
― Proficiency in AWS services — Specifically, expertise in Amazon Elastic Kubernetes Service (EKS), EC2, RDS, CloudFront, and other relevant services.
― Infrastructure as Code (IAC) — Terraform must have
― Containerization technologies — Knowledge of Docker, including creating and managing Docker images and containers.
― CI/CD — Familiarity with continuous integration and continuous deployment tools like Jenkins, GitLab CI/CD, or GitHub Actions.
― Monitoring and observability — Experience with monitoring tools like DataDog, Prometheus, Grafana, and logging solutions like Elasticsearch, Logstash, and Kibana (ELK Stack) or AWS CloudWatch.
― Networking — Strong understanding of network concepts like DNS, load balancing, and firewalls, as well as network protocols like TCP/IP, HTTP, and HTTPS and gRPC as a big plus.
― Scripting and programming languages — Proficiency in at least one scripting language (e.g., Python, NodeJS, Go).
― Configuration management — Experience with tools like FluxCD/ArgoCD.
― Version control systems — Proficiency in using Git or other version control systems.
― Incident management — Familiarity with incident response and management tools like PagerDuty, Opsgenie, or VictorOps.
― Strong problem-solving and troubleshooting skills — The ability to diagnose and resolve complex technical issues.
― Strong ownership, proactiveness, persistence, and passion for maintaining one of the biggest online gambling platforms
The importance of the role is in:
― Day-to-day management of alerts, checking systems, and escalating issues as necessary.
― Be part of a team that provides 24×7 on-call support for critical SaaS events.
― Available in case of emergencies when team members are not available or need help.
― Documentation of issues and remediation steps.
― Proactively create appropriate monitors in the EKS/K8S ecosystem.
― Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
― Improve existing infrastructure health by implementing checks and scripts to correct known issues.
― Maintenance and development of deployment code.
― Implement/integrate new technologies in our Cloud Infrastructure.
― Collaborate with other teams and departments to provide the highest level of support and assistance.
― Apply a real customer focus when planning deployments/updates, having the customer in the forefront of the mind, and considering the impact on them before making changes.
― Work closely on solutions with Support, Customer Success, Migration, and Professional Services teams to provide the best in class SaaS service to our customers.
― Perform RCA and take necessary corrective actions to prevent the recurrence of issues.
― Create and assign alert-related actions to the appropriate team after the investigation.
― Handle support requests for environment-specific actions.
What you get in return:
🎰 Transparent bonus system quarterly
🎰 Flexibility in your schedule (you decide when it is convenient for you to come to the office as long as it does not cripple down our development plans)
🎰 Opportunity to work remotely
🎰 Medical Insurance for you and your +1
🎰 Unlimited paid vacation leave and Ukrainian bank holidays
🎰 Unlimited paid sick leave in case of necessity
🎰 Development courses/training reimbursement
🎰 Online Corporate English classes
🎰 Corporate team-building events, corporate parties
🎰 Employee Referral bonus program