代做STAT3600 Statistical Analysis Assignment 2调试Haskell程序

2025-05-19 代做STAT3600 Statistical Analysis Assignment 2调试Haskell程序

STAT3600

Assignment 2 (submit Q5, Q7a – Q7i, Q8a – Q8i)

Deadline: Mar 24, 2025

Note: (1) Numeric values should be presented in 4 decimal places. (2) Show the intermediate steps for Q2 to Q7.

1.   A psychiatrist wants to know whether the level of pathology (Y) in psychotic patients 6 months after treatment can be predicted with reasonable accuracy from knowledge of pretreatment symptom ratings of thinking disturbance (X1) and hostile suspiciousness (X2). The data collected on 15 patients are stored in ‘pathology.dat’. Consider a multiple linear regression model with Y as the dependent variable and X1 and X2 as the independent variables.

a.   Write down the regression model. State clearly the assumptions.

b.   Find the least squares estimates of the regression coefficients. Interpret the results.

c.   Construct the ANOVA table and hence test whether there is a regression of Y on X1  and X2 at the 5% level of significance.

d.   Estimate the covariance matrix of the estimates.

e.   Find a 95% confidence interval for each partial regression coefficient.

f.   Test whether β1 = 32 or not at the 5% level of significance.

g.   Test whether β2  = -10 or not at the 5% level of significance.

h.   Calculate the R2.

i.    Considering a case with x1  = 3 and x2  = 6, find the predicted level of pathology and the

confidence interval for the mean response and the prediction interval with 95% confidence level.

2.   Consider a general linear hypotheses

C is of dimensions r × p with rank r and d is of dimensions r × 1. Prove that

a.   under the reduced model, the least squares estimator is

b.   the difference SSEr SSEf can be expressed as

3.   Consider a multiple linear regression model

yi = β0 +β1xi1 + … +βipxip +εi

where xij are constant, βj are parameters, εi are iid and i = 1, … , n. A weighted least square estimator for βj is obtained by minimizing

where wi are some predefined known constant values and ∑ni =1 wi = 1. Prove that the estimators of the regression coefficients are given as

and the variance-covariance matrix of the estimator is

W is a diagonal matrix of w1, … , wn .

4. In the multiple linear regression model, show that where is the fitted
mean response of the ith observation, and p is the number of regressor variables.

5.   Show that the multiple coefficient of determination in a multiple regression model is the square

of the coefficient of correlation between yi and yi .

6.   Consider the model

7.    You are given the following matrices computed for a regression analysis y = β0 +β1X1 +β2X2 +ε .

The matrices are properly ordered according to the regression function given above.

a.   Calculate the LSE of the regression coefficients. Describe the effects of the regressors on the response variable quantitatively.

b.   Calculate SSE and MSE.

c.   Estimate the covariance matrix of the estimates.

d.   Calculate the standard error of the estimates.

e.   Calculate a 95% confidence interval for β1 and β2 , respectively.

f.   Construct an ANOVA table. Test whether there is a regression of Y on X1  and X2  at the 5% level of significance.

g.   Calculate R2. Comment on the fitness of the model.

h.   Test at the 5% level of significance whether β1 β2  = 0 and β1 +β2  = 2, simultaneously.

i.    Estimate the means of Y for two cases where =

(—1,1). Construct a 90% simultaneous interval based on (i) Bonferroni’s method and (ii) Scheffe’s method.

j.    Calculate the maximum log-likelihood.

k.   Calculate adjusted R2.

l.    Test at the 5% level of significance whether each of X1  and X2  is effective, respectively.

m.  Test at the 5% level of significance whether β1 β2  = 0.

n.   Estimate the mean of Y when (X1, X2) = (2, —1). Construct a 95% confidence interval for the estimate.

8.    The aim of the study was to examine the association between cardiorespiratory parameters and match running performance (MRP) in highly trained football players. The data are stored in ‘MRP’ and the variables are given as follows.

Variable

Description

total

Total distance (m)

HRmax

Maximum heart rate (bpm)

HRAT

Heart rate at the anaerobic threshold (bpm)

HR1

Heart rate at the first minute of recovery (bpm)

HR2

Heart rate at the second minute of recovery (bpm)

a.   Formulate a multiple linear regression model for the dataset, using ‘total’ as the response and the remaining variables as regressors.

b.   Calculate LSE’s for the regression coefficients and their respective standard errors.

c.   Test the significance of each regression coefficient at the 5% level of significance.

d.   Construct an ANOVA table and test whether there is a regression of ‘total’ on the regressors at the 5% level of significance.

e.   Calculate the R2  statistic for the model. Do you think the model is adequate to explain the variation of total distance among the subjects under study?

f.   Describe the effects of the significant regressors on the total distance quantitatively.

g.   Test at the 5% level of significance whether both the coefficients of HRmax and HRAT are zero (m/bpm).

h.   Predict the value of total distance for an individual with the following values. Construct a 95% prediction interval.

HRmax

MRAT

HR1

HR2

180

160

170

110

i.   Estimate the means of total distance for the subjects with the following values. Construct a

at least 95% confidence interval by the (A) Bonferroni’s method and (B) Scheffe’s method.

HRmax

MRAT

HR1

HR2

180

160

170

110

195

170

185

130

j.    Construct 90% confidence intervals for the regression coefficients.

k.   Test at the 5% level of significance whether both the coefficients of HR1 is 50 (m/bpm).

l.    Test at the 5% level of significance whether the sum of the coefficients of HRmax and HR2 are the same.

m. Estimate the mean of total distance for the subjects with the following values. Construct a 95% confidence interval.

HRmax

MRAT

HR1

HR2

180

160

170

110