QBUS5001 Foundation in Data Analytics for Business (2025, S1)
Group Assignment
Due date: 11:59pm, Saturday, 24 May 2025
· This group assignment must be completed by a group of 3-4 students, with one designated group leader.
· The total marks available for this assignment is 60.
· This assignment consists of 2 tasks, and will be graded based on the accuracy of the results and interpretation, as well as the quality of the report.
· Page limit is 20, including any tables, graphs, and appendices. References are not required.
Task 1 (Random Number Generation from the Exponential Distribution and Verification of the Central Limit Theorem): 15%
The PDF and CDF of the exponential Exp(λ) distribution are given by
and
respectively. We aim to simulate a random sample from the exponential distribution with parameter λ, but this cannot be done directly using the Excel’s Analysis ToolPak. However, the following theorem provides a solution:
Theorem: For any continuous distribution, the CDF is a random variable that follows a uniform. U(0,1) distribution. That is, if Y = F(X), then Y ~ U(0,1)
Therefore, by simulating a value y* from the uniform. U(0,1) distribution and setting F(x*) = y*, we can generate a random value x* from this continuous distribution For the exponential Exp(λ) distribution, we can solve the equation to obtain In Excel, the Analysis ToolPak can use used to simulate a random sample from the uniform. distribution.
Task instructions:
1. Simulate a random sample of size 100 from the exponential Exp(0.2) distribution.
2. Calculate the sample mean for this sample.
3. Repeat the simulation independently 1,000 times in total.
4. Verify the CLT by showing that the sample mean of the exponential distribution can be approximately by a normal distribution.
Warning: Excel can be slow in generating random numbers. You have several options to handle this:
1. To avoid generating extremely small or large values, I recommend using the Uniform. U(0.01,0.99) distribution for the simulation.
2. Use other statistical software to generate the data and import them into Excel.
3. Simulate the values in multiple batches in Excel.
4. Use the “rand()” function in Excel to simulate a random value from the U(0,1) distribution is not recommended for this assignment.
Task 2 (Waiting Time and Idle Time): 45%
A small local bank opens for business from 9am to 4pm, with only one staff available to serve customers upon their arrival. Suppose that customers arrive after the bank opens, and their arrival time in minutes follows a “truncated” exponential Exp(λ1 = 0.1) distribution. They will be served immediately if the staff is idle, or they will join a queue and wait to be served if the staff is busy. Suppose that the service time in minutes follows another “truncated” exponential Exp(λ2 = 0.125) distribution, and it is independent of the arrival time. All customers who arrive before 4pm will be served.
To generate a random value x* from a truncated exponential distribution, we simulate y* from a uniform. U(a, b) distribution and then set In this task, set a = 0.25 and b = 0.75.
The data for statistical analysis include (1) customer waiting times and (2) staff idle times. These should be recorded over a period of 5 working days.
Task 2A: Simulate the customer arrival times and their service times (between 9am and 4pm) for a full week, covering 5 business days. The simulation for Day 1 (Monday) is already completed in the sample Excel file. Record the customer waiting times and staff idle times for analysis.
Task 2B: Present the customer waiting times and staff idle times using histograms and descriptive statistics. Describe the distributions of these times.
Task 2C: Construct a 95% confidence interval for the mean customer waiting time. Test, at the 5% level of significance, whether the mean waiting time is longer than a specific value (in minutes) of your choice.
Task 2D: Construct a 95% confidence interval for the mean staff idle time. Test, at the 5% level of significance, whether the mean idle time is shorter than a specified value (in minutes) of your choice.
Task 2E: Test, at the 5% level of significance, whether the mean customer waiting time and the mean staff idle time are equal.
Note: Zero values for both waiting time and idle time are excluded before conducting the statistical analysis.
Additional information
(i) Excel file: A sample Excel file is provided for reference. For Task 2, the simulation for Day 1 has already been completed.
(ii) Data generation: For Task 2, it is recommended that each group member simulate the arrival and service times for at least one business day and then calculate the corresponding waiting and idle times for analysis.
(iii) Task 2A: You may submit a screenshot of the simulation for Day 2, showing the first 20 rows of your Excel file.
(iv) Communication: Present the required information and conclusions using technical terms, but also in layman’s language where appropriate. This ensures the content is understandable for prospective decision-makers or managers.
(v) Submission: Submit your report as a PDF file, named as “QBUS5001-Group_xxx.pdf”
(vi) Excel Spreadsheet: Submit a well-organised Excel spreadsheet that can be used to confirm your calculations.
(vii) Grading: The mark for your report will reflect both the quality of your work and its relative performance compared to other reports.
(viii) Possible Grades and Marks: HD 85 – 100 Exceptional (20% expected)
D 75 – 84 Very good (30% expected)
Expected average mark: 75 C 65 – 74 Good (40% expected)
P 50 – 64 Satisfactory (10% expected)
F 50 – Unsatisfactory (0% expected)
Marking Criteria for Task 1
You will be assessed on your ability to:
(i) Correctly simulate random samples from the exponential distribution.
(ii) Present the sampling distribution of the sample mean and provide key summary statistics.
(iii) Compare the theoretical and empirical values of the mean and standard error of the sample mean.
(iv) Conduct an appropriate statistical test on the sample mean values.
(v) Correctly use hypothesis testing procedures and draw valid conclusions based on a reasonable level of significance.
(vi) Provide clear and detailed descriptions and explanations of your results.
(vii) Include relevant and high-quality Excel output in the report to support your findings.
Marking Criteria for Task 2
You will be assessed on your ability to:
(i) Correctly simulate arrival times and service times from exponential distributions.
(ii) Accurately collect staff idle times and customer waiting times for each business day and combine them for analysis.
(iii) Apply appropriate descriptive methods to analyse arrival times, service times, idle times, and waiting times.
(iv) Conduct appropriate statistical tests, use correct hypothesis testing procedures, and draw valid conclusions based on a reasonable level of significance.
(v) Use and correctly interpret inferential statistics.
(vi) State the required statistical assumptions and, where relevant, discuss the limitations of your analysis.
(vii) Present your results clearly, including relevant graphs and tables, in an appropriate format.
(viii) Include relevant and high-quality Excel output in the report to support your findings.