Survival Function and Hazard Function
Introduction
This section introduces the most foundational element of the survival analysis, which are the survival function and the hazard function.
Simply put,
- Survival function is the probability that an event occurs at some point after time
- Hazard function is the probability that an event occurs in the next instant, given that it has not occurred up to time
These functions are mathematically related, and knowing one allows us to derive the other. Let's check the definitions of each functions and their deriviations in the following section.
Definitions
Probability Density Function
represents the probability density of an event (e.g., death) occurring at the exact time . Note that this is different from the hazard function .
Distribution Function
The distribution function represents the probability that an event occurs before time . If we consider death as the event, the distribution function represents the probability of dying before time .
Survival Function
The survival function is, in contrast to the distribution function, the probability that an event does not occur by time . If we consider death as the event, the survival function represents the probability of surviving until time .
Hazard Function
The hazard function can be thought of as a conditional probability that an event occurs in a small interval between time and , given survival up to time .
When death is the event, the hazard function represents the probability (death rate, mortality) that a person who has survived until time will die in the next instant after . Similarly, when disease onset is the event, the hazard function represents the incidence rate or morbidity at time .
The reason why this is a conditional probability becomes clear when considering mortality at a specific age (e.g., age 40). For the event "death at age 40" to occur, the condition "being alive until age 40" must obviously be met.
The hazard function can be transformed using the definition of conditional probability:
Following this definition (a), the hazard function can be expressed using the probability density function and the survival function at time as follows:
The hazard function can also be written in terms of . Since can be transformed as:
We get:
By integrating equation (3) from time 0 to , we can write it the other way around, expressing the survival function in terms of the hazard function .
Since the survival probability at time is 1, i.e., :
Taking the exponential of both sides, we get:
Cumulative Hazard Function
The cumulative hazard function at time is the definite integral of the hazard function from time 0 to , yielding the following relationship:
Taking the exponential of both sides, the survival function can also be expressed as follows, which is also derived from equation (4) of the hazard function:
References
- 大橋靖雄, 浜田知久馬, 魚住龍史. 生存時間解析. 第2版, 東京大学出版会, 2022, 320p.
- Germán Rodríguez. "7. Survival Models | Generalized Linear Models". Statistics and Population. https://grodri.github.io/glms/notes/c7s1, Retrieved 2022-12-01.
- quossy. "生存時間解析にまつわる関数のおさらい". ねこすたっと. 2022-07-06. https://necostat.hatenablog.jp/entry/2022/07/06/080239, Retrieved 2022-12-01.