MAST 397B: Introduction to Statistical ComputingABSTRACTNotes: (i) This project can be done in groups. If it is done in a group, you have to submit the copy for the group (not individuals). In this case the cover page must have all the group members with their ID numbers along with a statement of contributions of each member of the group. (ii) You should present references to all materials (online or otherwise) in your report. (ii) All the codes should be put in an appendix. (iii) Answers should be clearly stated; a not-well written report will get only partial credit.Instructor: Yogen ChaubeyMAST 397BFINAL PROJECTDue Date: December 2, 2019MAST 397B: Introduction to Statistical ComputingFinal ProjectDue Date: December 2, 2019 [Hard Copies only]Problem 1. [20 Points]Fitting distributions to a given dataset is an important problem in statistical analysis. R contains a package called fitdistrplus that facilitates fitting various known continuous distributions. In general fitting a distribution requires the knowledge of the form of the distribution such as the Gaussian distribution given by the probability density function (pdf)????(????) = 1 ????√(2????) ????????????{? 12????2 (???? ? ????)2}; ???? ∈ (?∞, ∞).The vector ???? = (????, ????2) is known as the parameter vector and is estimated from a random sample (????1, ????2, … , ????????). Consider the data named goundbeef, available with the packagefitdistrplus. Fit the following two distributions for this dataset (a) log-normal distribution (b) Gamma distribution. (i) Use the maximum likelihood (ML) method for the log-normal distribution and method of moments (MM) for the Gamma distribution. Note that ???? is said to have log-normal distribution if ???? = log ???? has a normal distribution and that the Gamma pdf with shape parameter ???? and scale parameter ???? is given by ????(????) = 1 ????????Γ(????) ?????????1 exp{ ? ???????? }; ???? ≥ 0Use a standard statistical text for explicit formulae in order to calculate these estimators using your own defined function in R.(ii) Use the package fitdistrplus to find the ML and MM estimators for the two distributions.(iii) One method of justifying a given distribution is to perform a Chi-square goodness-of?fit test. It is given by the test statistic????2 = ?????????? ? ?????????2 ????????2 ????????=1Here we assume that the data is grouped into k groups (???? = # ???????? ???????????????? ???????? ????????? ?????????????????????????????????) , ???????? is the observed frequency in ????????? group and ???????? is the frequency in ????????? group under the fitted model.This has to be computed by the formula, ???????? = ????????????, ???????? is the probability of the observation being in group ???? in the model. If the model fits, the test statistic ????2 has a Chi-square distribution with df= ????=k-1-p where p= No. of estimated parameters. Compute the ????2 statistic for the above data for a suitable value of ????; note that for the test to be valid each group must have 5 or more observations. Find the upper 5% value of the appropriate ????2 distribution and compare the computed value (for both the models) in deciding if the models fit the data. [Note: The observed value of ????2 greater than 5% value of χ2 with df= ???? indicates poor fit].(iv) Quality of the fits may also be gauged by plotting the histogram with estimated density super-imposed over it. Provide the histogram with the estimated density super-imposed over it for both the methods for each of the log-normal and gamma distributions and comment on the quality of the fit.(v) Another qualitative method to judge the fit is the Q-Q plot of the data. Give the QQ plots for both the methods for each of the log-normal and Gamma densities. Comment on the quality of fit in each case. How does it compare with your conclusion in part (iii).Problem 2. [15 Points]Problem 3 [10 Points]Consider the following data from Example 7.12(a)The objective is to determine a line ???? = ????0 + ????1???? such that the function????(????0, ????1) = ? |???????? ? ????0 ? ????1????????| ????????=1is minimized. Use optim( ) function of R with starting values obtained from lm( ).(b) Plot the least square line and the line obtained in part (a) on the scatterplot and comment on the fit of these lines to the data.(c) Suppose another point (2.05,3.23) is added to the data. Compute the two lines again and comment on the effect of the new point on the estimates.转自:http://www.3daixie.com/contents/11/3444.html
MAST 397B、R、SAS、STATASQL|R
©著作权归作者所有,转载或内容合作请联系作者
- 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
- 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
- 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
推荐阅读更多精彩内容
- The Inner Game of Tennis W Timothy Gallwey Jonathan Cape ...
- 昨天去老娘家吃饭,在小区入口处遇到一清洁工 。他60岁左右,面色黧黑,满脸皱纹,头发花白。只见他左手擎住垃圾桶...
- 孔文教育启东校区 语文考试中,阅读理解是仅次于作文的一个题型,但是也是最容易让学生们丢分的一个题型。很多学生在答题...