IONOS S.R.L.
DevOps Engineer (f/m/d) Customer Care AI Platform Team (hybrid)
At IONOS, the leading European provider of cloud infrastructure, cloud services and hosting services, you will work together with a wide range of teams. We are characterized by open structures, a friendly working culture and flat hierarchies with a strong team spirit. We firmly believe that work and fun are compatible, and offer you the right environment for this. Our constant growth means that we are always looking for new colleagues. Become part of IONOS and grow with us.
I want to apply now via online application form
Our mission is to build a modern ecosystem used for all IONOS customer support needs. The tools developed by us are used in over 20 locations, by more than 2.000 users, supporting 8 million customer contracts in 10 markets.
The development team has full responsibility for the development lifecycle. This means we plan, develop, test and deploy our software without any other internal or external dependencies.
Our portfolio revolves around an internally built CRM which is now being enhanced with AI capabilities.
About the product you will be building:
We are building a next-generation AI platform designed to redefine how our company interacts with customers. This isn't just a chatbot; it's a high-performance, multimodal AI ecosystem powered by state-of-the-art Speech-to-Speech (S2S) models, advanced Large Language Models (LLMs), and intelligent orchestration frameworks. Our platform will understand, reason, and respond across text and voice — while seamlessly executing real-time actions to resolve customer needs.
We are aiming for a hybrid architecture of Open Source LLMs, industry-leading proprietary models, and Model Context Protocol (MCP) to enable contextual reasoning, tool invocation, and seamless orchestration across systems. The goal is not just to talk to the customer, but to act on their needs.
What makes this project unique:
The Voice Frontier: We are building low-latency, emotive speech-to-speech pipelines for a truly natural voice channel experience.
Deep System Integration: Our platform connects directly to the company's core systems via MCPs, allowing the AI to access real-time customer context and execute complex workflows.
Self-Evolving Logic: We are developing an automated QA and evaluation module that continuously analyzes interactions across channels.By programmatically measuring quality, accuracy, latency, and resolution outcomes, we can close the feedback loop, and adapt system behavior in hours, not weeks.
Hybrid Innovation: You’ll work at the intersection of "build vs. buy," integrating the best of the open-source community with custom-built internal infrastructure.
What's in it for you:
You won't just be shipping code; you’ll be part of making this concept evolve and shift.
You’ll join a friendly, experienced team where your voice matters and your contribution shapes real-world outcomes. You’ll work in a modern environment with technologies and practices that help us ship reliable software efficiently.
Role description:
As a DevOps Engineer in this team you will build the foundation of our internal AI Customer Care platform. This is a bare-metal infrastructure role. Our Kubernetes cluster runs on Debian Linux virtual machines bootstrapped with kubeadm -- not on a managed cloud provider. You will own the full infrastructure lifecycle: from Ansible automation and OS provisioning, through Kubernetes control plane operations, to HA storage and GitOps delivery.
You will be responsible for the heavy lifting -- designing the distributed systems that power real-time speech-to-speech pipelines, orchestrating agentic workflows via MCP, and ensuring our AI scales reliably. The role requires deep hands-on ownership of the infrastructure, from OS provisioning and bare-metal Kubernetes operations through to GitOps delivery and HA storage management.
Main responsibilities:
- Own and extend our Ansible automati codebase -- playbooks, roles, handlers, vault-encrypted secrets, and the full site.yml lifecycle from OS provisioning to cluster post-install
- Design, build, and maintain CI/CD pipelines in collaboration with development teams
- Bootstrap, operate, and upgrade our bare-metal Kubernetes cluster (kubeadm, 3-node etcd quorum, HAProxy control plane frontend) on Debian Linux VMs
- Maintain and optimize Debian-based Linux systems including LVM storage, system hardening, and OS-level performance tuning
- Operate our GitOps delivery layer: ArgoCD, Helm chart lifecycle, SOPS/AGE-encrypted secrets in Git
- Support and maintain our HA storage cluster: DRBD synchronous replication, Pacemaker/Corosync resource management, NFS CSI integration for Kubernetes persistent volumes
- Ensure high availability and monitoring across multiple data centers
- Contribute to observability, monitoring, logging, and incident response -- Prometheus, Grafana, ELK, and Jaeger across all three telemetry pillars
- Automate infrastructure provisioning and configuration
- Maintain ISO security standards throughout the infrastructure
- Handle vulnerabilities, ensure dependency tracking
- Work closely with developers to optimize deployment workflows and runtime environments
- Use AI tooling effectively (Claude, ChatGPT, internal MCP tools) to improve productivity and automation
- Architect low-latency pipelines: build and optimize the streaming infrastructure for Speech-to-Speech (S2S), ensuring sub-500ms round-trip latency for natural voice interactions. Experience with scaling and deploying in zones, minimising hops between services. Experience with WSS and SRTP protocols would be a plus
- Host and maintain specialised tooling needed in our AI pipelines (MCP servers, vector store databases, caching applications). Monitor and respond to unhealthy patterns (high memory, high CPU, low disk space, high latency)
- Host and maintain our Automated QA Module. Schedule jobs and design alerts for rapid response (high hallucinations or low response quality on latest nightly run, etc.)
We are looking for some of:
- Strong Linux administration experience on Debian-based systems -- not just cloud instances, but bare-metal or VM-level administration including boot, networking, storage, and system
- Hands-on experience with bare-metal or self-managed Kubernetes in production: kubeadm cluster bootstrapping, operations, control plane HA, CNI installation and configuration. Managed Kubernetes (EKS, GKE, AKS) experience alone is not sufficient for this role
- Ansible at depth: writing and maintaining playbooks, roles, handlers, vault-encrypted variables, and understanding idempotency as a discipline -- not just running existing playbooks
- Experience with cloud-native architectures (design, build, operations)
- Solid understanding of networking fundamentals: subnetting, routing, BGP concepts and high-availability design, L3/L4 troubleshooting
- Experience with CI/CD systems and infrastructure automation tools
- Good scripting skills (Bash, Python or similar)
- Ability to troubleshoot distributed systems
- Systems expertise: kubeadm, Ansible, Cilium (eBPF CNI), MetalLB, ArgoCD, Helm, Prometheus, Grafana, ELK, Jaeger, Terraform, JFrog Artifactory, CI/CD (GitHub Actions, GitLab CI, Jenkins), Docker
- Familiarity with monitoring stack covering all three pillars: Prometheus/Grafana (metrics), ELK (logs), Jaeger (distributed traces)
- Security best practices: SOPS/AGE secret encryption in Git, OpenID Connect, OAuth 2, Hashicorp Vault, Keycloak, KeePass, Ansible Vault
Would be a plus:
- Experience with HA storage clustering: DRBD block replication, Pacemaker/Corosync resource management, NFS server HA
- Experience migrating from VM-based infrastructure to container orchestration -- we are actively on this journey
- Experience with telephony gateways (Twilio, Amazon Connect) and SIP/RTP protocols or other telephony platforms
- Prior experience with Puppet, Chef, or other configuration management tools -- the mental model transfers directly to Ansible
- Exposure to AI-driven development workflows
- CKAD, CKA, or RHCSA/RHCE certification
What we offer:
- Access to local/international trainings, development and growth opportunities, including access to e-learning platforms, covering both technical and soft skills areas;
- Modern technologies, product responsibility;
- Flexible work schedule;
- Hybrid work option;
- Medical services package from one of two private providers;
- 25 vacation days per year;
- Substitute days off for public holidays that occur on the weekend;
- Meal tickets;
- Internal referral program;
- Team events, networking events organized to promote a passionate, creative and diverse culture;
- Summerfest and Winterfest parties;
- Of course, coffee, soft drinks and fresh fruits are on us in the office.
Job info
Location: Berlin
Type: Permanent, full-time
Category: Software Development
Work experience: Professionals
Reference ID: 1435
About IONOS
IONOS is the leading European digitalization partner for small and medium-sized businesses (SMB). The company serves around six million customers and operates across 18 markets in Europe and North America, with its services being accessible worldwide. With its Web Presence & Productivity portfolio, IONOS acts as a 'one-stop shop' for all digitalization needs: from domains and web hosting to classic website builders and do-it-yourself solutions, from e-commerce to online marketing tools. In addition, the company offers Cloud Solutions to enterprises who are looking to move to the cloud as their businesses evolve.
We value diversity and welcome all applications - regardless of, for example, gender, nationality, ethnic or social origin, religion, disability, age as well as sexual orientation and identity, physical characteristics, marital status or any other irrelevant factor subject to applicable law.
IONOS SE
Recruiting Team IONOS
Hinterm Hauptbahnhof 3-5
D-76135 Karlsruhe
Jobs@ionos.com