There's a particular kind of engineering perspective that comes from spending years on the other side of the fire alarm — the person who takes the call when something breaks at scale, who writes the postmortem and then goes back in to fix the architecture that made the failure possible. That's the lens Srđan Milutinović brings to every system he works on.
The Early Years
The career started where a lot of technical careers do: QA and support. Positions at MicroFocus, NCR, and Microsoft introduced the fundamentals that no architecture diagram teaches — how enterprise systems fail at the edges, how users experience failures that developers never anticipated, and how the gap between "working in dev" and "working in production" can be catastrophic if you don't take it seriously.
The Microsoft stint involved SAML-based enterprise authentication integrations and Hadoop analytics work — two areas that, combined, built a strong intuition for identity systems, large-scale data pipelines, and the kind of enterprise complexity that makes or breaks B2B products. The NCR years added enterprise QA rigour to the mix. It was unglamorous work by engineering standards, but it was formative.
Fintech Scale
The real step-change came at Wirecard AG. Enterprise merchant integrations in payments mean navigating a world where a misconfigured API mapping can block millions in transactions, where fraud detection SQL runs against hundreds of millions of rows, and where the compliance and technical requirements exist in constant tension with delivery speed. Srđan owned merchant integration projects end-to-end — from technical scoping through implementation and post-launch support — and built a fluency in payment rails that most engineers never need.
When Wirecard collapsed in 2020, the move to PPRO Group was a natural continuation: another fintech, different architecture. PPRO runs global payment infrastructure on AWS ECS, and the role there added two important dimensions. First, real AWS infrastructure operations at scale — ECS task definitions, IAM, networking, the full stack. Second, a team lead element: not just delivering technically, but coordinating across a team, managing expectations, and owning the human side of shipping fast in a high-pressure environment.
A stint at Tamedia / TX Services (now TX Group) was a deliberate context switch. Migrating paywall infrastructure for a major Swiss media company and managing an ELK stack for editorial analytics is a completely different operational environment from fintech — different compliance landscape, different failure modes, different stakeholders. It broadened what "operations" could mean.
Infrastructure at IONOS Cloud
Since November 2022, Srđan has been a Senior Technical Operations Engineer at IONOS Cloud, responsible for three core cloud products: Cloud DNS, CDN, and Domains. These aren't small products.
The role at IONOS covers more ground than most "ops" titles suggest. On any given week, that might mean reviewing a Helm chart deployment for a DNS resolver update, running a ChaosMesh experiment to validate failover behaviour, driving a BSI C5 evidence audit, or leading a cross-functional incident debrief. It's a role that spans product delivery, project coordination, technical architecture, compliance, and people — without being neatly constrained by any single discipline.
BSI C5 Compliance
One of the threads running through the IONOS work is BSI C5 — Germany's federal Cloud Computing Compliance Criteria Catalogue. Owning C5 compliance for live cloud products isn't an audit exercise; it's an operational discipline. It means mapping every control domain (availability, change management, cryptography, incident response, and more) to real evidence from real systems, identifying gaps before auditors do, and working with product and engineering teams to close them.
Most compliance work at this level surfaces architectural questions that weren't obvious before the mapping exercise started. C5 has directly influenced infrastructure decisions at IONOS — particularly around access controls, change management pipelines, and incident documentation. It's one of the more underrated ways that compliance and engineering quality intersect.
Chaos Engineering
In 2023, Srđan designed and implemented a structured chaos engineering programme using ChaosMesh across 100+ Kubernetes workloads. The goal wasn't to break things for sport — it was to answer a specific question: what do we actually know about how these systems fail, versus what do we assume?
The programme ran pod failure, network partition, CPU/memory stress, and DNS failure experiments across production-equivalent environments. The outcomes were significant: multiple failure paths that monitoring hadn't caught, runbook gaps that postmortems hadn't surfaced, and a measurable improvement in operator confidence during real incidents. The 99.99% HA target for the covered workloads is now backed by empirical failure data, not just architectural intent.
Observability Stack
The observability stack at IONOS Cloud — VictoriaMetrics for metrics, Loki for logs, Grafana for dashboards — wasn't inherited; it was built. Designing an observability platform for cloud infrastructure products means thinking about cardinality, retention, alerting signal-to-noise, and the on-call experience simultaneously. The goal is dashboards that tell a story during an incident, not dashboards that display data.
Alongside the tooling, Srđan has contributed to the team's observability culture: what it means to instrument a service well, how to write runbooks that reduce MTTR, and how to run postmortems that produce structural improvements rather than blame.
CKA Certification
In April 2024, Srđan passed the Certified Kubernetes Administrator (CKA) exam — a hands-on certification that reflects the depth of Kubernetes work across his career. The CKA matters not because it's a badge but because it validates operational Kubernetes knowledge at the level where most cloud infrastructure teams actually work.
What's Next
Srđan is open to senior roles in Technical Operations, Platform Engineering, and Technical Product or Delivery — particularly at companies where infrastructure is a competitive advantage and where the boundary between "ops" and "product" is treated as a feature, not a limitation.
If you're building infrastructure that others depend on, and you want someone who's been on both sides of the reliability equation — reach out.