讲解：MAST90083、Data Mining、R、RR| Statistics、、

StudentNumberSemester 2 Assessment, 2019School of Mathematics and StatisticsMAST90083 Computational Statistics and Data MiningWriting time: 3 hoursReading time: 15 minutesThis is NOT an open book examThis paper consists of 3 pages (including this page)Authorised Materials• Mobile phones, smart watches and internet or communication devices are forbidden.• No handwritten or print materials may be brought into the exam venue.• This is a closed book exam.• No calculators of any kind may be brought into the examination.Instructions to Students• You must NOT remove this question paper at the conclusion of the examination.Instructions to Invigilators• Students must NOT remove this question paper at the conclusion of the examination.This paper must NOT be held in the Baillieu LibraryMAST90083 Semester 2, 2019Question 1 Suppose we have a model p(x, z | θ) where x is the observed dataset and z are thelatent variables.(a) Suppose that q(z) is a distribution over z. Explain why the followingF(q, θ) = Eq [log p(x, z | θ) − log q(z)]is a lower bound on log p(x | θ).(b) Show that F(q; θ) can be decomposed as followsF(q, θ) = −KL(q(z) || p(z|x, θ)) + log p(x | θ)where for any two distributions p and q, KL(q||p) = −Eq log p(z)q(z)is the Kullback-Leibler(KL) divergence.(c) Describe the EM algorithm in terms of F(q, θ).(d) Note that the KL divergence is always non-negative. Furthermore, it is zero if and only ifp = q. Conclude the optimal q that maximises F isMAST90083代做、代写Data Mining、代写R设 p(z | x, θ).[10 + 10 + 5 + 5 = 30 marks]Question 2 Let {(xi, yi)}ni=1 be our dataset, with xi ∈ Rp and yi ∈ R. Classic linear regressioncan be posed a empirical risk minimisation, where the model is to predict y using a class offunctions f(x) = wT x, parametrised by vector w ∈ Rp using the squared loss, i.e. we minimise(a) Show that the optimal parameter vector iswˆn = (XT X)−1XT Ywhere X is n × p matrix, with i-th row given by xTiand Y is a n × 1 column vector withi-th entry yi(b) Consider regularising the empirical risk by incorporating an l2 penalty. That is, find wminimising.Show that the optimal parameter is given by the ridge regression estimatorwˆridgen = (XT X + λI)−1XT Y.(c) Suppose we now wish to introduce nonlinearities into the model, by transforming x toφ(x). Let Φ be a matrix with i-th row given by φ(xi)T.(i) Show the optimal parameters would be given bywˆkerneln = (ΦT Φ + λI)−1ΦT Y(ii) Express the predicted y values on the training set, Φ ˆwkernel n, only in terms of y andthe Gram matrix K = ΦΦT, with Kij = φ(xi)T φ(xj ) = k(xi, xj ), where k is somekernel function. (This is known as the kernel trick.) Hint: You will find the followingmatrix inversion formula useful:Page 2 of 3 pagesMAST90083 Semester 2, 2019(iii) Compute an expression for the value of y∗ predicted by the model at an unseen testvector x∗.[5+5+5+10+5 = 30 marks]Total marks = 60End of ExamPage 3 of 3 pages转自：http://www.3daixie.com/contents/11/3444.html

讲解：MAST90083、Data Mining、R、RR| Statistics、、

推荐阅读更多精彩内容