DevOps / Infrastructure & Field Support Engineer
ApplyAbout us
xBerry – we are an R&D House gaining experience in delivering custom solutions for international clients since 2016. We provide extensive expertise in embedded systems, machine learning, AR/VR technology, and image processing.
Requirements
Technical Requirements
- Strong experience with Linux (Ubuntu) system administration and troubleshooting,
- Hands-on experience with Kubernetes, including cluster troubleshooting and container analysis,
- Practical knowledge of Docker,
- Solid understanding of networking and diagnosing network-related issues,
- Experience with NFS / storage troubleshooting,
- Operational knowledge of GPU / CUDA environments (compatibility, stability),
- Experience working with:
- RabbitMQ,
- PostgreSQL.
Additional Requirements
- Willingness to participate in an on-call / standby rotation,
- Readiness for business travel, including on-site customer visits,
- Ability to work independently in complex, distributed environments,
Strong analytical and problem-solving skills.
Responsibility
Incident Handling and System Maintenance
- Diagnosing and resolving issues related to:
- Kubernetes clusters,
- containers (Docker),
- Linux (Ubuntu) operating system,
- networking,
- storage (including NFS),
- Analyzing logs and service health across application and infrastructure layers,
- Restoring full system functionality in production environments,
- Performing system deployments and upgrades at customer sites,
- Participating in on-site interventions when issues cannot be resolved remotely.
Automation, Observability, and System Resilience
- Designing and developing automated troubleshooting mechanisms,
- Early detection of infrastructure and application-level issues,
- Automated validation of the health of key system components:
- OS,
- Kubernetes,
- containers,
- storage,
- networking,
- Building health checks and observability solutions (metrics, alerts, dashboards),
- Creating and maintaining:
- runbooks,
- standard recovery procedures,
- automated self-healing mechanisms,
- Documenting common incidents, root causes, and resolution methods.
Collaboration and Architecture Improvement
- Close cooperation with development and architecture teams,
- Contributing to architecture simplification and standardization,
- Improving overall system stability and reliability,
- Supporting long-term efforts to reduce operational overhead and manual interventions.
We offer
- Flexible working hours
- Remote work options
- Medical care program
- MultiSport
- Pizza Fridays
- A contract of employment or self-employment, depending on You
Application for Position

