Homework Assignment 1

2024-02-02 Homework Assignment 1

Homework Assignment 1aDue: Friday, Feb. 2, 2024, 11:59 p.m. Mountain timeTotal marks: 35Question 1. [35 marks]To better visualize random variables and get some intuition for sampling, this question involvessome simple simulations, which is a central theme in machine learning. You will also get someexperience using julia and pluto notebooks, which you will also need to use in later assignments.Please see this document linked here with instructions on how to get started with Julia. Completethe attached notebook A1.jl.In (b) and (c), the goal is to understand how much estimators themselves can vary: how differentour estimate would have been under a different randomly sampled dataset. In the real world, wedo not get to obtain different estimators, we will only have one; in this controlled setting, though,we can actually simulate how different the estimators could be.In (d) and (e), the goal is to understand how we to obtain confidence intervals for our singlesample average estimator.(a) [5 marks] Fill in the code to calculate the sample mean, variance, and standard deviation ofa vector of numbers. Do not use any packages not already loaded! Note that for the remainder ofthis question you will actually only use the sample mean outputted by your code, and will reasonabout the variability in this sample mean estimator. However, we get you to implement all three,for a bit of a practice.(b) [7 marks] WRITTEN: Use your Julia implementation to generate 10 samples with µ = 0and σ2 = 1.0, and compute the sample average (sample mean). Write down the sample averagethat you obtain. Now do this another 4 times, giving you 5 estimates of the sample averageM1, M2, M3, M4 and M5. What is the sample variance of these 5 estimates? Use the unbiasedsample variance formula, V¯ =1n1Pni=1(Mi ¯M)2. Note that here we want to understand thevariability of the mean estimator itself, if it had been run on different datasets. Beautifully we canactually simulate this using synthetic data.(c) [7 marks] WRITTEN: Now run the same experiment, but use 100 samples for eachsample average estimate. What is the sample variance of these 5 estimates? How is it differentfrom the variance when you used 10 samples to compute the estimates?(d) [8 marks] WRITTEN: Now let us consider a higher variance situation, where σ2 = 10.0.Imagine the data comes from a zero-mean Gaussian with this variance, but pretend you do notknow the mean. Run the code to get 30 samples, and compute one sample average M. What isthe 95% confidence interval around this M? Give actual numbers.(e) [8 marks] WRITTEN: Now assume you know less: you do not know the data is Gaussian,though you still know the variance is σ2 = 10.0. Use the same 30 samples from (d) and resultingsample average M. Give a 95% confidence interval around M, now without assuming the samplesare Gaussian.Homework policies:1/2Winter 2024 CMPUT 267: Basics of Machine LearningYour assignment should be submitted as two pdf documents and a .jl notebook, on eClass.Do not submit a zip file with all three. One pdf is for the written work, the other pdf is generated from the .jl notebook. The first pdf should contain your answers for questions starting withWRITTEN:”. Your answers must be written legibly and scanned or must be typed (e.g., Latex).This .pdf should be named Firstname LastName Sol.pdf, For your code, we want you to submit itboth as .pdf and .jl. To generate the .pdf format of a Pluto notebook, you can easily click on thecircle-triangle icon on the right top corner of the screen, called Export, and then generate the .pdffile of your notebook. The .pdf of your Pluto notebook as Firstname LastName Code.pdf whilethe .jl of your Pluto notebook as Firstname LastName.jl. All code should be turned in when yousubmit your assignment.Because assignments are more for learning, and less for evaluation, grading will be based oncoarse bins. The grading is atypical. For grades between (1) 81-100, we round-up to 100; (2)61-80, we round-up to 80; (3) 41-60, we round-up to 60; and (4) 0-40, we round down to 0. Thelast bin is to discourage quickly throwing together some answers to get some marks. The goal forthe assignments is to help you learn the material, and completing less than 50% of the assignmentis ineffective for learning.We will not accept late assignments. There is no late penalty policy. The assignmentsmust be submitted electronically via eClass on time, by 11:59 pm Mountain time on the due date.There is a grace period of 48 hours when assignments will be accepted. No submissions will beaccepted after 48 hours after the deadline, and the assignment will be considered as incomplete ifnot submitted.All assignments are individual. All the sources used for the problem solution must be acknowledged, e.g. web sites, books, research papers, personal communication with people, etc. Academichonesty is taken seriously; for detailed information see the University of Alberta Code of StudentBehaviour.Good luck!2/2