xBerry Job DevOps / Infrastructure & Field Support Engineer

DevOps / Infrastructure & Field Support Engineer

Apply
OpcjonalneFull TimeRemoteWrocław20 000 - 28 000 + action fee
About us

xBerry – we are an R&D House gaining experience in delivering custom solutions for international clients since 2016. We provide extensive expertise in embedded systems, machine learning, AR/VR technology, and image processing.

Requirements

Technical Requirements

 

  • Strong experience with Linux (Ubuntu) system administration and troubleshooting,
  • Hands-on experience with Kubernetes, including cluster troubleshooting and container analysis,
  • Practical knowledge of Docker,
  • Solid understanding of networking and diagnosing network-related issues,
  • Experience with NFS / storage troubleshooting,
  • Operational knowledge of GPU / CUDA environments (compatibility, stability),
  • Experience working with:
    • RabbitMQ,
    • PostgreSQL.

Additional Requirements

 

  • Willingness to participate in an on-call / standby rotation,
  • Readiness for business travel, including on-site customer visits,
  • Ability to work independently in complex, distributed environments,

Strong analytical and problem-solving skills.

Responsibility

Incident Handling and System Maintenance

 

  • Diagnosing and resolving issues related to:
    • Kubernetes clusters,
    • containers (Docker),
    • Linux (Ubuntu) operating system,
    • networking,
    • storage (including NFS),

  • Analyzing logs and service health across application and infrastructure layers,
  • Restoring full system functionality in production environments,
  • Performing system deployments and upgrades at customer sites,
  • Participating in on-site interventions when issues cannot be resolved remotely.

Automation, Observability, and System Resilience

 

  • Designing and developing automated troubleshooting mechanisms,
  • Early detection of infrastructure and application-level issues,
  • Automated validation of the health of key system components:
    • OS,
    • Kubernetes,
    • containers,
    • storage,
    • networking,
  • Building health checks and observability solutions (metrics, alerts, dashboards),
  • Creating and maintaining:
    • runbooks,
    • standard recovery procedures,
    • automated self-healing mechanisms,
  • Documenting common incidents, root causes, and resolution methods.

Collaboration and Architecture Improvement

 

  • Close cooperation with development and architecture teams,
  • Contributing to architecture simplification and standardization,
  • Improving overall system stability and reliability,
  • Supporting long-term efforts to reduce operational overhead and manual interventions.
We offer
  • Flexible working hours
  • Remote work options
  • Medical care program
  • MultiSport
  • Integrations events
  • A contract of employment or self-employment, depending on You