Skip to main content

Survival Function and Hazard Function

Introduction

This section introduces the most foundational element of the survival analysis, which are the survival function and the hazard function.

Simply put,

  • Survival function S(t)S(t) is the probability that an event occurs at some point after time tt
  • Hazard function λ(t)\lambda(t) is the probability that an event occurs in the next instant, given that it has not occurred up to time tt

These functions are mathematically related, and knowing one allows us to derive the other. Let's check the definitions of each functions and their deriviations in the following section.

Definitions

Probability Density Function

f(t)f(t) represents the probability density of an event (e.g., death) occurring at the exact time tt. Note that this is different from the hazard function λ(t)\lambda(t).

Distribution Function

The distribution function F(t)F(t) represents the probability that an event occurs before time tt. If we consider death as the event, the distribution function represents the probability of dying before time tt.

F(t)=P(T<t)F(t) = P(T < t)

Survival Function

The survival function S(t)S(t) is, in contrast to the distribution function, the probability that an event does not occur by time tt. If we consider death as the event, the survival function represents the probability of surviving until time tt.

S(t)=P(Tt)=1F(t)S(t) = P(T \geq t) = 1 - F(t)

Hazard Function

The hazard function λ(t)\lambda(t) can be thought of as a conditional probability that an event occurs in a small interval between time tt and t+ht+h, given survival up to time tt.

λ(t)=limh0P(tT<t+h  Tt)h(1)\lambda(t) = \lim_{h \rightarrow 0} \frac{P(t \leq T < t + h \ | \ T \geq t)}{h} \tag{1}

When death is the event, the hazard function represents the probability (death rate, mortality) that a person who has survived until time tt will die in the next instant after tt. Similarly, when disease onset is the event, the hazard function represents the incidence rate or morbidity at time tt.

The reason why this is a conditional probability becomes clear when considering mortality at a specific age (e.g., age 40). For the event "death at age 40" to occur, the condition "being alive until age 40" must obviously be met.

The hazard function λ(t)\lambda(t) can be transformed using the definition of conditional probability:

P(AB)=P(AB)P(B)(a)P(A|B) = \frac{P(A \cap B)}{P(B)} \tag{a}

Following this definition (a), the hazard function can be expressed using the probability density function f(t)f(t) and the survival function S(t)S(t) at time tt as follows:

λ(t)=limh0P(tTt+h)h1P(Tt)=limh0F(t+h)F(t)h1S(t)=dF(t)dt1S(t)=f(t)S(t)\begin{align*} \lambda(t) &= \lim_{h \rightarrow 0} \frac{P(t \leq T \leq t + h) }{h} \frac{1}{P(T \geq t)} \\ &= \lim_{h \rightarrow 0} \frac{F(t+h) - F(t)}{h} \frac{1}{S(t)} \\ &= \frac{dF(t)}{dt} \frac{1}{S(t)} \\ &= \frac{f(t)}{S(t)} \tag{2} \end{align*}

The hazard function λ(t)\lambda(t) can also be written in terms of S(t)S(t). Since f(t)f(t) can be transformed as:

f(t)=dF(t)dt=ddt(1S(t))=ddtS(t)\begin{align*} f(t) &= \frac{dF(t)}{dt} \\ &= \frac{d}{dt}\bigr( 1 - S(t) \bigl) \\ &= -\frac{d}{dt}S(t) \end{align*}

We get:

λ(t)=f(t)S(t)=dS(t)dt1S(t)=S(t)S(t)=ddtlogS(t)\begin{align} \lambda(t) &= \frac{f(t)}{S(t)} \\ &= -\frac{dS(t)}{dt} \frac{1}{S(t)} \\ &= -\frac{S'(t)}{S(t)} \\ &= -\frac{d}{dt} \log {S(t)} \tag{3} \end{align}

By integrating equation (3) from time 0 to tt, we can write it the other way around, expressing the survival function S(t)S(t) in terms of the hazard function λ(t)\lambda(t).

0t(ddulogS(u))du=0tλ(u)du[logS(u)]0t=0tλ(u)dulogS(t)+logS(0)=0tλ(u)du\begin{align*} \int_0^t \bigg( -\frac{d}{du} \log {S(u)} \bigg) du &= \int_0^t \lambda(u) du \\ \bigg[- \log S(u) \bigg]_0^t &= \int_0^t \lambda(u) du \\ -\log S(t) + \log S(0) &= \int_0^t \lambda(u) du \end{align*}

Since the survival probability at time t=0t=0 is 1, i.e., S(0)=1S(0)=1:

logS(t)+log1=0tλ(u)dulogS(t)=0tλ(u)du\begin{align*} -\log S(t) + \log 1 &= \int_0^t \lambda(u) du \\ \log S(t) &= - \int_0^t \lambda(u) du \end{align*}

Taking the exponential of both sides, we get:

S(t)=exp(0tλ(u)du)(4)S(t) = \exp \bigg( - \int_0^t \lambda(u) du \bigg) \tag{4}

Cumulative Hazard Function

The cumulative hazard function Λ(t)\Lambda(t) at time tt is the definite integral of the hazard function λ(t)\lambda(t) from time 0 to tt, yielding the following relationship:

Λ(t)=0tλ(u)du=logS(t)\begin{align*} \Lambda(t) &= \int_0^t \lambda(u) du \\ &= - \log S(t) \end{align*}

Taking the exponential of both sides, the survival function S(t)S(t) can also be expressed as follows, which is also derived from equation (4) of the hazard function:

S(t)=exp(Λ(t))S(t) = \exp \big( - \Lambda(t) \big)

References

  1. 大橋靖雄, 浜田知久馬, 魚住龍史. 生存時間解析. 第2版, 東京大学出版会, 2022, 320p.
  2. Germán Rodríguez. "7. Survival Models | Generalized Linear Models". Statistics and Population. https://grodri.github.io/glms/notes/c7s1, Retrieved 2022-12-01.
  3. quossy. "生存時間解析にまつわる関数のおさらい". ねこすたっと. 2022-07-06. https://necostat.hatenablog.jp/entry/2022/07/06/080239, Retrieved 2022-12-01.