xBerry Job DevOps / Infrastructure & Field Support Engineer

DevOps / Infrastructure & Field Support Engineer

Apply
OpcjonalneFull TimeRemoteWrocław20 000 - 28 000 + action fee
About us

xBerry – we are an R&D House gaining experience in delivering custom solutions for international clients since 2016. We provide extensive expertise in embedded systems, machine learning, AR/VR technology, and image processing.

Requirements

Technical Requirements

 

  • Strong experience with Linux (Ubuntu) system administration and troubleshooting, 
  • Hands-on experience with Kubernetes, including cluster troubleshooting and container analysis, 
  • Practical knowledge of Docker, 
  • Solid understanding of networking and diagnosing network-related issues, 
  • Experience with NFS / storage troubleshooting, 
  • Operational knowledge of GPU / CUDA environments (compatibility, stability), 
  • Experience working with: 
    • RabbitMQ, 
    • PostgreSQL. 

Additional Requirements

 

  • Willingness to participate in an on-call / standby rotation, 
  • Readiness for business travel, including on-site customer visits, 
  • Ability to work independently in complex, distributed environments, 

Strong analytical and problem-solving skills.

Responsibility

Incident Handling and System Maintenance

 

  • Diagnosing and resolving issues related to: 
    • Kubernetes clusters, 
    • containers (Docker), 
    • Linux (Ubuntu) operating system, 
    • networking, 
    • storage (including NFS),

  • Analyzing logs and service health across application and infrastructure layers, 
  • Restoring full system functionality in production environments, 
  • Performing system deployments and upgrades at customer sites, 
  • Participating in on-site interventions when issues cannot be resolved remotely.

Automation, Observability, and System Resilience

 

  • Designing and developing automated troubleshooting mechanisms, 
  • Early detection of infrastructure and application-level issues, 
  • Automated validation of the health of key system components: 
    • OS, 
    • Kubernetes, 
    • containers, 
    • storage, 
    • networking, 
  • Building health checks and observability solutions (metrics, alerts, dashboards), 
  • Creating and maintaining: 
    • runbooks, 
    • standard recovery procedures, 
    • automated self-healing mechanisms, 
  • Documenting common incidents, root causes, and resolution methods. 

Collaboration and Architecture Improvement

 

  • Close cooperation with development and architecture teams, 
  • Contributing to architecture simplification and standardization, 
  • Improving overall system stability and reliability, 
  • Supporting long-term efforts to reduce operational overhead and manual interventions.
We offer
  • Flexible working hours
  • Remote work options
  • Medical care program
  • MultiSport
  • Pizza Fridays
  • A contract of employment or self-employment, depending on You