讲解:MATH20811、data analysis、R、RMatlab|Python

Coursework 1 – Exploratory data analysis and correlationMATH20811 Practical Statistics: Coursework 1The marks awarded for this coursework constitute 30% of the total assessment for the module.Your solution to the coursework should be a consice report (max 10 pages) and it should take, onaverage, about 15 hours to complete.The submission deadline is 10am on Monday 28 October 2019.Please note that this deadline is a strict one with a University set penalty of 10% of the totalmarks applied for each day late up to a maximum of five days, after which your mark for thecoursework will be zero.Your submitted solutions should all be in one document. This must be prepared using LaTeX.For each part of the question you should provide explanations as to how you completed what isrequired, show your workings and also comment on computational results, where applicable.When you include a plot, be sure to give it a title and label the axes correctly.When you have written or used R code to answer any of the parts, then you should list this R codeafter the particular written answer to which it applies. This may be the R code for a function youhave written and/or code you have used to produce numerical results, plots and tables. R codeshould also be clearly annotated.Avoid using screenshots of R code/output. Instead, to include R code use the verbatim environmentand summarise R output in tables using the table environment, as demonstrated in the solution ofExample Sheet 2.Your file should be submitted through the module site on Blackboard to the Turnitin assessmentin the Coursework folder entitled “MATH20811 CW1” by the above time and date. The workwill be marked anonymously on Blackboard so please ensure that your filename is clear but thatit does not contain your name and student id number. Similarly, do not include your name andid number in the document itself.Turnitin will generate a similarity report for your submitted document and indicate matches toother sources, including billions of internet documents (both live and archived), a subscriptionrepository of periodicals, journals and publications, as well as submissions from other students.Please ensure that the document you upload represents your own work and is written in your ownwords. The Turnitin report will be available for you to see shortly after the due date.This coursework should hopefully help to reinforce some of the methodology you have been studying,as well as the skills in R you have been developing in the module. Correct interpretation andmeaningful discussion of the results (i.e. attempt to put the results into context) are as importantas correct calculation of MATH20811代写、代做data analysis、代写the results, in order to achieve a high mark for the coursework.Coursework 1 – Exploratory data analysis and correlationThe data in red_wine.csv and white_wine.csv (Cortez et al, 2009) contain various measurementson red and white variants of the Portuguese Vinho Verde wine. Import the data in R andsave them as objects red_wine and white_wine. Each object should contain measurements on11 continuous variables: fixed.acidity, volatile.acidity, citric.acid, residual.sugar,chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulphates, alcoholand one discrete variable: quality.1. Perform exploratory analysis of the data and report some interesting findings about thedata. Some suggestions include producing summary statistics of the data, comparing thedistributions of specific variables for each of the red and white variants using histograms orbox-plots (as appropriate) and exploring any associations between the variables, in particularalcohol and quality. [10]2. Using the function cor, calculate both Pearson’s and Spearman’s correlation between:• white_wine$chlorides and white_wine$alcohol• log(white_wine$chlorides) and white_wine$alcoholComment on the results and give an explanation for any discrepancies between the variouscorrelation estimates. Hint: Inspecting the scatterplots for each pair might be useful. [5]3. Let ρ1 be Pearson’s correlation between alcohol and density for the red wine dataset. Usingthe function cor.test, test the hypothesis H0 : ρ1 = 0 vs HA : ρ1 6= 0 and reportyour findings. Calculate (DIY) an approximate 95% confidence interval (CI) for ρ1based on Fisher’s z-transform and verify your calculations agree with the CI produced bycor.test. [5]4. Perform (DIY) a hypothesis test for H0 : ρ1 = −0.5 vs HA : ρ1 > −0.5 at 2.5% significancelevel, using Fisher’s z-transform. Compute the p-value and use it to decide whether to rejectthe null hypothesis in favour of the alternative. [5]5. Write a function in R to verify via simulation that the distribution of the Fisher’s ztransformstatistic is approximately Normal. Your function should output a plot comparingthe sampling distribution of Fisher’s z-transform statistic and the appropriate Normal distributionthe statistic has under the null hypothesis. In your simulation, you may assumethe data pairs (x, y) come from independent Normal distributions and that the test statisticcorresponds to a test of zero correlation. [5]References[1] P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by datamining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553.ISSN: 0167-9236.转自:http://www.3daixie.com/contents/11/3444.html

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容