用python怎么做单因素和多因素cox分析?
Cox proportional hazards regression (or Cox regression) is a method used for investigating the association between the survival time of subjects and one or more predictor variables. In Python, the lifelines
library is commonly used to perform Cox regression. Here's how you can do both univariate (single factor) and multivariate (multiple factors) Cox regression analysis using lifelines
.
Installation
First, you need to install the lifelines
package if you haven't already:
shpip install lifelines
Univariate Cox Regression
- Import necessary libraries:
pythonimport pandas as pd
from lifelines import CoxPHFitter
- Load your dataset:
Your dataset should contain columns for the duration of follow-up (duration
), the event occurrence indicator (event
), and the predictor variables.
python# Example dataset
data = {
'duration': [5, 8, 12, 3, 9, 11],
'event': [1, 0, 1, 0, 1, 1],
'age': [45, 34, 50, 40, 60, 55],
'treatment': [1, 0, 1, 0, 1, 0]
}
df = pd.DataFrame(data)
- Fit the univariate Cox model:
For univariate analysis, fit the model separately for each predictor variable.
pythoncph = CoxPHFitter()
# Example for 'age'
cph.fit(df[['duration', 'event', 'age']], duration_col='duration', event_col='event')
cph.print_summary() # Display the results
# Example for 'treatment'
cph.fit(df[['duration', 'event', 'treatment']], duration_col='duration', event_col='event')
cph.print_summary() # Display the results
Multivariate Cox Regression
For multivariate analysis, include all predictor variables in a single model.
- Fit the multivariate Cox model:
python# Include all predictors in the model
cph.fit(df, duration_col='duration', event_col='event')
cph.print_summary() # Display the results
Detailed Explanation of the Output
The summary output includes:
- coef: The estimated coefficient for the predictor. A positive coefficient means that as the predictor increases, the hazard increases (survival time decreases).
- exp(coef): The hazard ratio, which is the exponent of the coefficient.
- se(coef): The standard error of the coefficient.
- z: The z-score for the coefficient.
- p: The p-value for the coefficient. A small p-value (<0.05) indicates that the predictor is statistically significant.
- -log2(p): A transformation of the p-value for easier interpretation.
Example Code
Here's a complete example including both univariate and multivariate Cox regression analyses:
pythonimport pandas as pd
from lifelines import CoxPHFitter
# Example dataset
data = {
'duration': [5, 8, 12, 3, 9, 11],
'event': [1, 0, 1, 0, 1, 1],
'age': [45, 34, 50, 40, 60, 55],
'treatment': [1, 0, 1, 0, 1, 0]
}
df = pd.DataFrame(data)
cph = CoxPHFitter()
# Univariate analysis for 'age'
cph.fit(df[['duration', 'event', 'age']], duration_col='duration', event_col='event')
print("Univariate analysis for 'age'")
cph.print_summary()
# Univariate analysis for 'treatment'
cph.fit(df[['duration', 'event', 'treatment']], duration_col='duration', event_col='event')
print("Univariate analysis for 'treatment'")
cph.print_summary()
# Multivariate analysis
cph.fit(df, duration_col='duration', event_col='event')
print("Multivariate analysis")
cph.print_summary()
This code will output detailed summaries for both univariate and multivariate Cox regression analyses, allowing you to interpret the effects of each predictor variable on the survival time.