Availability Management

ITIL 4 Availability Management is the service management practice that ensures IT services deliver agreed levels of availability to meet business needs, measured through availability percentage, Mean Time Between Failures (MTBF), Mean Time to Repair (MTTR), and Mean time to Restore Service (MTRS). Reframed from an ITIL v3 process into an ITIL 4 practice, it integrates people, processes, partners, and technology to design, monitor, and continually improve service uptime against service-level agreements and value-stream commitments.

Illustration of a person standing with arms crossed in front of a large data dashboard displaying charts and graphs.

Why availability management matters - the business cost of downtime.

A person is running while holding a large clock and several folders. The person is wearing footwear with elevated heels and the hair is shown moving backward.

Availability and service continuity management decide whether your business keeps running. Unplanned downtime costs Forbes Global 2000 companies USD 400 billion per year - roughly 9% of profits - according to a 2024 Splunk and Oxford Economics study. European firms in that group lose USD 198 million each. Behind every euro sits a service that failed its agreed availability - the problem ITIL 4 Availability Management is designed to prevent.

For IT Service Managers in DACH enterprises, IT service continuity management has shifted twice since ITIL v3. Availability is no longer a discrete process inside Service Design, and it cannot run as a monthly uptime report. Most teams still treat availability as backward-looking - publishing dashboards and missing SLAs.

The building blocks are stable. Availability percentage, MTBF, MTTR, and MTRS still anchor the conversation. What changed is how they wire into value streams, observability platforms, and modern service management tooling.

What is ITIL 4 Availability Management as a practice (versus the v3 process)?

In ITIL v3, Availability Management was a process inside Service Design with prescriptive activities. ITIL 4 dissolved that boundary. According to the AXELOS / PeopleCert ITIL 4 Practice Guide, the ITIL service continuity management and ITIL v4 availability management practices now sit among 17 general management practices and apply across the Service Value Chain.

You no longer schedule availability management as a stage; you embed it inside every value stream that produces a service. The four dimensions model - organizations, information, partners, value streams - replaces the v3 input-output mindset.

ITIL v3 process vs ITIL 4 practice: Availability Management

Dimension	ITIL v3 process	ITIL 4 practice
Classification	Process within Service Design	One of 17 general management practices
Scope	Lifecycle-stage activities	Applied across the Service Value Chain
Framework lens	Process inputs and outputs	Four dimensions: organizations, information, partners, value streams
Ways of working	Plan-driven, prescriptive	Agile, Lean, DevOps friendly
Core metrics	Availability %, MTBF, MTRS	Same plus business-impact and experience-based availability
Integration	Linked mainly to SLM and Capacity	Integrated with 16 other practices
Outcome focus	Service uptime	Value co-creation with stakeholders

A team running v3-era reporting will miss the expectation that availability evidence covers third parties and continuity testing, not just data-centre uptime.

What are the four metrics of ITIL 4 Availability Management?

The practice rests on four quantitative anchors. Get the definitions wrong and every SLA conversation drifts.

Availability percentage

Agreed service time minus downtime, divided by agreed service time, expressed as a percentage. A 99.9% target on a 24x7 service allows 8.77 hours of downtime per year.

MTBF - Mean Time Between Failures

Average uptime between two consecutive incidents that take the service below its target. Rising MTBF means fewer interruptions.

MTTR - Mean Time to Repair

Average time to fix the underlying fault in a component. MTTR is a maintainability measure owned by engineering and supplier teams.

MTRS - Mean Time to Restore Service

Average time from incident detection to user-facing service restoration. ITIL 4 emphasizes MTRS over MTTR for SLA reporting because customers experience service restoration, not component repair.

What are the key KPIs for Availability Management?

The 2024 ITIC Hourly Cost of Downtime survey found that over 90% of mid-size and large enterprises report hourly downtime costs above USD 300,000, with 41% reporting USD 1–5 million per hour. The question shifts from "what was our uptime last month" to "which services, when down, hurt the business most".

Mature KPI sets reflect that shift: services meeting their availability SLA, number and duration of outages, MTBF and MTRS trends, availability requirements captured at service design, and improvement actions closed on time. Leading indicators predict next quarter; lagging ones explain last quarter.

Newer KPIs are business-impact-weighted. Cost of downtime per service, customer-experience-based availability, and supplier commitments translated into SLA exposure are now standard for organisations under contractual service level commitments. Two pitfalls recur: teams report component uptime instead of end-to-end service availability, and planned downtime is silently excluded from SLA maths.

How are AIOps and observability changing Availability Management?

Gartner predicts that by 2028, 40% of organizations deploying AI will use dedicated AI observability tools to monitor availability, performance and accuracy of AI services (Gartner press release, 2026). AIOps and observability tooling are converging with ITSM, moving availability management from reactive reporting to predictive prevention.

In practice, event streams from Datadog, Dynatrace, or SolarWinds enrich ITSM tickets in real time; anomaly detection opens incidents before users feel them; and CMDB-grounded service models let availability calculations follow business services rather than nodes. A well-modelled CMDB is the single largest determinant of whether your availability numbers are credible to auditors. DORA has applied across the EU since 17 January 2025, mandating ICT risk management, continuity testing, third-party oversight, and incident-reporting timelines of 24 hours, 72 hours, and one month. Germany's NIS2 transposition took effect in December 2025.

How Availability Management relates to other ITIL practices?

ITIL 4 Availability Management does not operate in isolation. It shares data, decisions, and dependencies with several other ITIL practices in the framework, and understanding those links is essential to designing service-availability targets that match how the business actually consumes IT. The list below maps the practices that touch Availability Management most directly.

Service Level Management

Negotiates and documents the availability commitments inside SLAs and OLAs. Service Level Management owns the customer-facing target; Availability Management owns the engineering, monitoring, and improvement that delivers it. See Service Level Management.

Service Continuity Management

The sister practice. Availability Management plans for normal operating conditions; Service Continuity Management plans for major disruptions and disaster scenarios. Both practices are designed to be evidenced together.

Incident Management

Every incident is also an availability event. MTRS and SLA reporting depend on accurate detection, escalation, and restoration timestamps from this practice.

Problem Management

Root-cause analysis from Problem Management explains why availability targets were missed and feeds the continual-improvement backlog that Availability Management owns.

Monitoring and Event Management

Provides the real-time signal that triggers incidents and feeds MTBF and MTRS measurement. CMDB-grounded event correlation is what makes availability numbers credible. See our ITIL 4 Monitoring and Event Management practice guide.

Capacity and Performance Management

Capacity shortages cause availability failures. The two practices share design-stage workload modeling and feed each other’s improvement plans.

Configuration Management (CMDB)

Availability calculations are only as credible as the service models they ride on. A clean CMDB is a hard prerequisite for end-to-end service availability, not an optional add-on.

Risk Management

Component Failure Impact Analysis (CFIA), Fault Tree Analysis (FTA), and supplier-availability commitments are all risk-management techniques applied inside Availability Management.

Measurement and Reporting

Supplies the cadence and discipline for translating availability data into board-grade and regulator-grade reports. See the ITIL 4 Measurement and Reporting practice guide.

Key takeaways

ITIL 4 reframes availability management from a Service-Design process into one of 34 practices applied across the Service Value Chain, alongside service continuity management.

The four metrics that matter are availability percentage, MTBF, MTTR, and MTRS - and MTRS, not MTTR, is what customers feel.

Mature KPI sets are business-impact-weighted: cost of downtime per service and customer-experience-based availability now outrank raw uptime.

ITIL 4 Availability Management is the directly mappable practice framework for organizations requiring formal availability evidence covering third parties and continuity testing.

AIOps and observability convergence is shifting availability management toward predictive, CMDB-grounded prevention.

From reporting practice to design discipline

The shift from process to practice is not a documentation exercise. It changes where availability decisions are made - earlier, by more people, with better data. The USD 198 million average annual downtime cost per European Global 2000 company is the stake when the practice is reduced to a monthly slide.

For DACH organizations, three moves matter most this quarter: Translate component uptime into end-to-end service availability tied to the CMDB; move SLA reporting from MTTR to MTRS so the metric matches the customer experience; and package the evidence, availability plans, SLA reports, supplier commitments, in a form your service management stakeholders and auditors can consume.

This is where tooling earns its keep. Matrix42's ITIL 4 Availability Management implementation auto-generates service-availability forms at defined intervals, records whether services met their targets, and triggers notifications when they did not - wiring the practice into your IT service management software workflow.

If you take one action this quarter, redefine one SLA in MTRS terms and instrument it against the CMDB service model. Matrix42 turns availability and service continuity management into the SLA records and miss notifications your service managers and auditors need. Which of your top-ten services is reported on component uptime when the business is paying for end-to-end availability?

How Matrix42 supports Availability Management?

See how Intelligent Service Management unifies assets, services and devices, puts AI at the core of every interaction, and transforms service delivery from reactive to proactive across IT and beyond

Explore the Future of ITSM

FAQS

What is ITIL 4 Availability Management?

ITIL 4 Availability Management is the practice of ensuring services deliver the agreed levels of availability to meet customer and business needs. It plans, measures, and improves uptime using metrics such as availability percentage, MTBF, MTTR, and MTRS. Per AXELOS, it covers the full service lifecycle and aligns with Service Level Management to keep services reliable and cost-effective.

How does ITIL 4 Availability Management differ from the ITIL v3 process?

ITIL v3 defined Availability Management as a discrete process inside Service Design with prescriptive activities. ITIL 4 reframes it as one of 17 general management practices (within ITIL 4's total 34 practices), applied flexibly across the Service Value Chain. The shift emphasizes value co-creation, integration with the four dimensions model (organizations, information, partners, value streams), and Agile/DevOps ways of working rather than rigid process steps.

What are the key metrics in ITIL Availability Management?

The four core metrics are availability percentage (uptime versus agreed service time), MTBF (Mean Time Between Failures), MTTR (Mean Time to Repair), and MTRS (Mean Time to Restore Service). Supporting indicators include MTBSI (Mean Time Between System Incidents), reliability, maintainability, and serviceability. These metrics feed SLA reporting, capacity planning, and continual improvement decisions

How does Availability Management relate to Service Level Management?

Service Level Management negotiates and documents availability targets inside SLAs and OLAs, while Availability Management designs, monitors, and improves the technical and operational capability to meet those targets. The two practices share data flows: SLM provides commitments and customer context, Availability Management provides measurement, root-cause analysis, and improvement plans. Together they close the loop between business expectation and IT delivery.

Who owns ITIL Availability Management in a typical IT organization?

Ownership usually sits with an Availability Manager or Service Owner reporting to the Head of Service Operations or CIO. In smaller organisations the role is combined with Service Level or Capacity Manager responsibilities. The owner is accountable for availability plans, SLA achievement reporting, risk assessments (CFIA, FTA), and driving improvements across infrastructure, application, and supplier teams.

What are the most common KPIs for Availability Management?

Common KPIs include percentage of services meeting availability SLA, number and duration of service outages, MTBF and MTRS trends, percentage of availability requirements documented at service design, and percentage of improvement actions completed on schedule. Mature organizations also track cost of downtime, business-impact-weighted availability, and customer-experience-based availability rather than infrastructure uptime alone.

What are the common pitfalls in implementing Availability Management?

Frequent pitfalls include measuring component uptime instead of end-to-end service availability, ignoring planned downtime in SLA calculations, treating it as a reporting exercise rather than a design discipline, weak integration with Incident and Problem Management, and lack of supplier (third-party) availability commitments. Organizations also fail when they don't translate availability data into business impact language for executive stakeholders.

Which tools support ITIL 4 Availability Management?

Matrix42 Service Management integrates availability metrics with asset, incident, and SLA data so service owners see end-to-end service health in one workspace, grounded in a live CMDB service model. Other ITSM platforms in this category include ServiceNow, BMC Helix, and Ivanti Neurons, paired with monitoring tools (Datadog, Dynatrace, SolarWinds) and AIOps platforms for predictive availability. Matrix42 ITSM differentiates by tying availability calculations directly to that CMDB rather than a separate reporting layer.

Any questions

Get in Touch

What is IT Service Management (ITSM)?

IT Service Management (ITSM) is a structured approach that organizations use to design, deliver, manage, and continually improve IT services to align closely with business goals.

The ITSM Buyer's Guide: How to choose IT Service Management software for your business

Your ITSM vendor choice shapes your IT operations for years. It affects how cost-effectively you manage daily service delivery, so compare your options carefully.

NIS2 and DORA compliance guide: Securing European businesses through automated IT governance

European businesses must balance technological innovation with strict regulatory compliance. The EU has introduced NIS2 (Network and Information Systems Directive) and DORA (Digital Operational Resilience Act) to enhance cybersecurity and operational resilience across sectors critical to the economy.

How AI transforms Service Management: A European guide to responsible implementation

AI in service management refers to artificial intelligence technologies that automate, augment, and predict IT service management operations. These technologies range from AI assistants that help agents find information faster, to autonomous AI agents that resolve issues without human intervention, to proactive AI that prevents incidents before they occur.

Matrix42 (2026). ITIL 4 Practices for Exceptional IT Service Delivery - Availability Management. https://www.matrix42.com/en/itil-practices#availabilitymanagement
AXELOS / PeopleCert (2023). Availability Management: ITIL 4 Practice Guide. https://www.axelos.com/resource-hub/practice/availability-management-itil-4-practice-guide
Splunk & Oxford Economics (2024). The Hidden Costs of Downtime. https://newsroom.cisco.com/c/r/newsroom/en/us/a/y2024/m06/conf24-splunk-report-shows-downtime-costs-global-2000-companies-400b-annually.html
Uptime Institute (2024). Annual Outage Analysis 2024. https://uptimeinstitute.com/resources/research-and-reports/annual-outage-analysis-2024
Information Technology Intelligence Consulting (ITIC) (2024). 2024 Hourly Cost of Downtime Report. https://itic-corp.com/itic-2024-hourly-cost-of-downtime-report/
Gartner (2026). Gartner Predicts 40% of Organizations Deploying AI Will Use AI Observability to Monitor Model Performance by 2028. https://www.gartner.com/en/newsroom/press-releases/2026-05-12-gartner-predicts-40-percent-of-organizations-deploying-ai-will-use-ai-observability-to-monitor-model-performance-by-2028
European Insurance and Occupational Pensions Authority (EIOPA) (2025). Digital Operational Resilience Act (DORA) — Regulatory Overview. https://www.eiopa.europa.eu/digital-operational-resilience-act-dora_en

What is ITIL 4 Availability Management? A comprehensive practice guide