Survival Function and Hazard Function

Introduction

This section introduces the most foundational element of the survival analysis, which are the survival function and the hazard function.

Simply put,

Survival function $S(t)$ is the probability that an event occurs at some point after time $t$
Hazard function $\lambda(t)$ is the probability that an event occurs in the next instant, given that it has not occurred up to time $t$

These functions are mathematically related, and knowing one allows us to derive the other. Let's check the definitions of each functions and their deriviations in the following section.

Definitions

Probability Density Function

$f(t)$ represents the probability density of an event (e.g., death) occurring at the exact time $t$ . Note that this is different from the hazard function $\lambda(t)$ .

Distribution Function

The distribution function $F(t)$ represents the probability that an event occurs before time $t$ . If we consider death as the event, the distribution function represents the probability of dying before time $t$ .

F(t) = P(T < t)

Survival Function

The survival function $S(t)$ is, in contrast to the distribution function, the probability that an event does not occur by time $t$ . If we consider death as the event, the survival function represents the probability of surviving until time $t$ .

S(t) = P(T \geq t) = 1 - F(t)

Hazard Function

The hazard function $\lambda(t)$ can be thought of as a conditional probability that an event occurs in a small interval between time $t$ and $t+h$ , given survival up to time $t$ .

\lambda(t) = \lim_{h \rightarrow 0} \frac{P(t \leq T < t + h \ | \ T \geq t)}{h} \tag{1}

When death is the event, the hazard function represents the probability (death rate, mortality) that a person who has survived until time $t$ will die in the next instant after $t$ . Similarly, when disease onset is the event, the hazard function represents the incidence rate or morbidity at time $t$ .

The reason why this is a conditional probability becomes clear when considering mortality at a specific age (e.g., age 40). For the event "death at age 40" to occur, the condition "being alive until age 40" must obviously be met.

The hazard function $\lambda(t)$ can be transformed using the definition of conditional probability:

P(A|B) = \frac{P(A \cap B)}{P(B)} \tag{a}

Following this definition (a), the hazard function can be expressed using the probability density function $f(t)$ and the survival function $S(t)$ at time $t$ as follows:

\begin{align*} \lambda(t) &= \lim_{h \rightarrow 0} \frac{P(t \leq T \leq t + h) }{h} \frac{1}{P(T \geq t)} \\ &= \lim_{h \rightarrow 0} \frac{F(t+h) - F(t)}{h} \frac{1}{S(t)} \\ &= \frac{dF(t)}{dt} \frac{1}{S(t)} \\ &= \frac{f(t)}{S(t)} \tag{2} \end{align*}

The hazard function $\lambda(t)$ can also be written in terms of $S(t)$ . Since $f(t)$ can be transformed as:

\begin{align*} f(t) &= \frac{dF(t)}{dt} \\ &= \frac{d}{dt}\bigr( 1 - S(t) \bigl) \\ &= -\frac{d}{dt}S(t) \end{align*}

We get:

\begin{align} \lambda(t) &= \frac{f(t)}{S(t)} \\ &= -\frac{dS(t)}{dt} \frac{1}{S(t)} \\ &= -\frac{S'(t)}{S(t)} \\ &= -\frac{d}{dt} \log {S(t)} \tag{3} \end{align}

By integrating equation (3) from time 0 to $t$ , we can write it the other way around, expressing the survival function $S(t)$ in terms of the hazard function $\lambda(t)$ .

\begin{align*} \int_0^t \bigg( -\frac{d}{du} \log {S(u)} \bigg) du &= \int_0^t \lambda(u) du \\ \bigg[- \log S(u) \bigg]_0^t &= \int_0^t \lambda(u) du \\ -\log S(t) + \log S(0) &= \int_0^t \lambda(u) du \end{align*}

Since the survival probability at time $t=0$ is 1, i.e., $S(0)=1$ :

\begin{align*} -\log S(t) + \log 1 &= \int_0^t \lambda(u) du \\ \log S(t) &= - \int_0^t \lambda(u) du \end{align*}

Taking the exponential of both sides, we get:

S(t) = \exp \bigg( - \int_0^t \lambda(u) du \bigg) \tag{4}

Cumulative Hazard Function

The cumulative hazard function $\Lambda(t)$ at time $t$ is the definite integral of the hazard function $\lambda(t)$ from time 0 to $t$ , yielding the following relationship:

\begin{align*} \Lambda(t) &= \int_0^t \lambda(u) du \\ &= - \log S(t) \end{align*}

Taking the exponential of both sides, the survival function $S(t)$ can also be expressed as follows, which is also derived from equation (4) of the hazard function:

S(t) = \exp \big( - \Lambda(t) \big)

References

大橋靖雄, 浜田知久馬, 魚住龍史. 生存時間解析. 第2版, 東京大学出版会, 2022, 320p.
Germán Rodríguez. "7. Survival Models | Generalized Linear Models". Statistics and Population. https://grodri.github.io/glms/notes/c7s1, Retrieved 2022-12-01.
quossy. "生存時間解析にまつわる関数のおさらい". ねこすたっと. 2022-07-06. https://necostat.hatenablog.jp/entry/2022/07/06/080239, Retrieved 2022-12-01.

Introduction​

Definitions​

Probability Density Function​

Distribution Function​

Survival Function​

Hazard Function​

Cumulative Hazard Function​

References​