StudentNumberSemester 2 Assessment, 2019School of Mathematics and StatisticsMAST90083 Computational Statistics and Data MiningWriting time: 3 hoursReading time: 15 minutesThis is NOT an open book examThis paper consists of 3 pages (including this page)Authorised Materials• Mobile phones, smart watches and internet or communication devices are forbidden.• No handwritten or print materials may be brought into the exam venue.• This is a closed book exam.• No calculators of any kind may be brought into the examination.Instructions to Students• You must NOT remove this question paper at the conclusion of the examination.Instructions to Invigilators• Students must NOT remove this question paper at the conclusion of the examination.This paper must NOT be held in the Baillieu LibraryMAST90083 Semester 2, 2019Question 1 Suppose we have a model p(x, z | θ) where x is the observed dataset and z are thelatent variables.(a) Suppose that q(z) is a distribution over z. Explain why the followingF(q, θ) = Eq [log p(x, z | θ) − log q(z)]is a lower bound on log p(x | θ).(b) Show that F(q; θ) can be decomposed as followsF(q, θ) = −KL(q(z) || p(z|x, θ)) + log p(x | θ)where for any two distributions p and q, KL(q||p) = −Eq log p(z)q(z)is the Kullback-Leibler(KL) divergence.(c) Describe the EM algorithm in terms of F(q, θ).(d) Note that the KL divergence is always non-negative. Furthermore, it is zero if and only ifp = q. Conclude the optimal q that maximises F isMAST90083代做、代写Data Mining、代写R设 p(z | x, θ).[10 + 10 + 5 + 5 = 30 marks]Question 2 Let {(xi, yi)}ni=1 be our dataset, with xi ∈ Rp and yi ∈ R. Classic linear regressioncan be posed a empirical risk minimisation, where the model is to predict y using a class offunctions f(x) = wT x, parametrised by vector w ∈ Rp using the squared loss, i.e. we minimise(a) Show that the optimal parameter vector iswˆn = (XT X)−1XT Ywhere X is n × p matrix, with i-th row given by xTiand Y is a n × 1 column vector withi-th entry yi(b) Consider regularising the empirical risk by incorporating an l2 penalty. That is, find wminimising.Show that the optimal parameter is given by the ridge regression estimatorwˆridgen = (XT X + λI)−1XT Y.(c) Suppose we now wish to introduce nonlinearities into the model, by transforming x toφ(x). Let Φ be a matrix with i-th row given by φ(xi)T.(i) Show the optimal parameters would be given bywˆkerneln = (ΦT Φ + λI)−1ΦT Y(ii) Express the predicted y values on the training set, Φ ˆwkernel n, only in terms of y andthe Gram matrix K = ΦΦT, with Kij = φ(xi)T φ(xj ) = k(xi, xj ), where k is somekernel function. (This is known as the kernel trick.) Hint: You will find the followingmatrix inversion formula useful:Page 2 of 3 pagesMAST90083 Semester 2, 2019(iii) Compute an expression for the value of y∗ predicted by the model at an unseen testvector x∗.[5+5+5+10+5 = 30 marks]Total marks = 60End of ExamPage 3 of 3 pages转自:http://www.3daixie.com/contents/11/3444.html
讲解:MAST90083、Data Mining、R、RR| Statistics、、
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。
推荐阅读更多精彩内容
- Model Predictive Control of Underactuated Bipedal Robotic...