MATH2775 Survival Analysis
Homework 3
Deadline and Submission
Submission Deadline: 14:00 Friday 9th May 2025
Submit: You will need to submit via GradeScope on Minerva
Homework: Each homework (there are 3 in total) is worth 5% of your final module mark
Presentation: You need to submit your homework as a single PDF file. You are strongly encouraged to use RMarkdown and Latex for editing. Handwritten and scanned homework is also fine, but it has to be clear and tidy, and clearly scanned. Do not write in very small font if you submit handwritten scanned work. Be aware that when you upload the scanned file to GradeScope the visual clarity can be reduced.
Marking Information
Each exercise part (e.g. 2a) will be marked out of 3, with the following criteria:
• 3 marks for an almost perfect answer.
• 2 marks if there are small flaws, serious enough that you should look at the solutions.
• 1 mark if there are major flaws, serious enough that you really need to look back at the lecture notes.
• 0 marks for an overall poor attempt.
The simplicity of this marking scheme has two advantages: you will immediately
know how to react to your marks (e.g. if you get 1 mark then you need to look at the relevant notes), and it makes the marking faster so that you will receive your feedback sooner. The total marks available from the exercises on each homework will be 24. An additional 1 mark will be awarded if your work is nicely presented.
For any exercises that require the use of R, please provide both your code and the output in your answer. If your submission is scanned handwritten work. I suggest that for questions involving R then you cut-and-paste commented/annotated code and any output (including plots) into, e.g., Word. You should then add some short text to explain/describe the R code/output. You can then print that off (attaching it to any handwritten work).
Downloading data
You will also need to download the two datasets accompanying this homework:
• prostate . Rdata - when you load this into R you will create a dataframe called prostate
• phone . Rdata - when you load this into R you will create a dataframe called phone
You will need to save these two . Rdata files in the working directory of R so that it can find them. Please carefully read (and run) the KMAnalysis . R example on
MINERVA under “R Code and Datasets → Kaplan Meier” to explain how this is done if you are unsure.
HOMEWORK 3
Exercise 1
Which of the following classes of distributions have the proportional hazards property?
• Gompertz: h(t) = λeθt with parameters λ > 0 and θ ∈ R.
• Makeham: h(t) = γ + λe-θt with parameters γ > 0, λ ≥ -γ and θ ≥ 0.
• Bathtub: h(t) = αt + 1+γt/β with parameters γ ≥ 0, α ≥ 0 and β > 0.
Explain your answers as clearly as possible (no marks will be given for answers without explanation).
Exercise 2
The data in the file prostate . Rdata have been adapted from Andrews and Herzberg (1985) and give results of a trial on treatments for prostate cancer. Various covariates were recorded. The variables in the data file are given below:
Variable
|
Description
|
Levels
|
time
|
Time to Death/censoring
|
|
censor
|
indicator variable
|
0 = Right-censored
1 = Death
|
rx
|
Treatment Groups
|
Placebo
|
0.2
1.0
5.0
|
mg
mg
mg
|
Estrogen
Estrogen
Estrogen
|
stage
|
Stage of Disease
|
3 = No evidence of distant metastasis
4 = Evidence of distant metastasis
|
age
|
Age of Patient
|
|
weight
|
Weight Index
|
Weight (kg) - Height (cm) + 200
|
Table 1: Variables measured on prostate cancer in prostate . Rdata by Andrews and Herzberg (1985)
a) Using R, show (on the same plot) the KM estimate for each treatment group (Lectures 7 and 8). Comment on what you see.
b) Using R, fit a Cox proportional hazards model (Lectures 14 and 15) including all the variables (so you take account of all the variables measured on the subjects).
c) Comment on the effect of the various variables, and whether treatment makes a significant difference on survival.
Exercise 3
A consumer review group is testing the lifetimes of two types of mobile phone. The data can be found in phone . Rdata. The variables are:
Variable
|
Description
|
Levels
|
time
|
Time to Failure/Censoring
|
|
status
|
indicator variable
|
0 = Right-censored
1 = Failure
|
group
|
Type of Phone
|
Group A
Group B
|
Table 2: Mobile phone testing in phone . Rdata
a) Using R, plot KM estimates for each group (on same plot). By considering a
suitable transformation of the KM estimates (see Lectures 9 and 10) provide a visual assessment of whether it is reasonable to model the lifetimes in each group by Exponential random variables Exp(λA ) and Exp(λB ) respectively.
b) Using R, under the assumption that it is reasonable to model the lifetimes in each group by Exponential distributions, provide the MLE estimates for the rates λA and λB ; and consequently estimate the expected phone lifetimes for each group μA and μB .
Note: You should do this in R (rather than by hand) by, e.g.,
— Splitting the data.frame into two, for example GroupA_phone and GroupB_phone
— Using methods of Chapter 4.3 (Lectures 9 and 10) to find MLE (along with sum() function) for each group
c) Using R, perform a log-rank test (Lectures 7 and 8) to see if there is a
statistically significant difference between the two groups
d) Using R, fit an AFT model using an exponential distribution:
> load("phone. Rdata")
> # NOTE: You will probably have already loaded the data to answer above
>
> # Now perform a AFT model
> phone_AFT_fit <- survreg(Surv(time, status) ~ group,
+ data = phone,
+ dist= "exponential")
> summary(phone_AFT_fit)
Does this parametric test suggest there is a different in lifetimes between the groups? Compare the estimates obtained for λA and λB by the AFT model with your answers for part b).