温莎日记 31

To elaborate consider a sequence of independent and identically distributed (i.i.d.) random variablesX_1,X_2,... with X_i ∈ X ⊆ R. In statistics we could be interested in the behaviour of the random variable X_N or some statistic T : X^N → R as N grows.

Convergence in L_p

As we shall see that convergence in L_p guarantees convergence in probability. However, we will see by counter example, that the converse is not true.

Definition 1.2.4. The sequence of random variables X_1,X_2,... converges in L_pto random variable X, for 0 < p < +∞ if: 

\lim_{N\to ∞} E[|X_N-X|^p]=0.

Theorem 1.2.6. Let 0 < r < p < +∞. If X_N \rightarrow  _{L_p} X then X_N \rightarrow _{L_r}X .

In order to prove this result, we use the H¨older inequality. Let 1 < s, t < +∞ with 1/s + 1/t = 1 then for any two random variables Z and Y of moments s and t respectively

E[|ZY|] \leq  E[{|Z|}^s]^{1/s}E[|Y|^t]^{1/t}.

Theorem 1.2.7. Let 1 ≤ p < +∞. If X_N →_{L_p} X then X_N →_P X.

Population, Sample and Models

Often data collected from an experiment are a collection of measurements on a variable of interest. The way one often proceeds in statistics is to hypothesize a model of the data, which we term the population. One then collects data to form a sample. We begin by introducing the notion of a model for data collection.

Definition 2.2.1. The random variables X_1, . . . , X_n are called a random sample of size n from the population f_X(x) if X_1, . . . , X_n are i.i.d. random variables with PMF or PDF f_X(x).

In principle, the random variables could be vector-valued, and indeed, the i.i.d. property is not really needed per-se. In general, one can easily extend this pseudo definition to where the samples form a Markov chain or any other type of (interesting) dependence structure. However, to facilitate much of what will follow, we will maintain the notion that the random variables are i.i.d. In our case we are assuming that the model for data f_X is the same each time and that our measurements are somehow ‘independent’, which of course eliminates some interesting data types, such as those observed in time or space or both. Clearly, as we have seen already, that the joint density (we will use this term exchangeably for PMF or PDF) of the random variables is

f_{X_{1:n}}(x_1,...,x_n)=\prod_{i=1}^n f_X(x_i).

In much of our work, we will assume that there is a finite dimensional parameter θ ∈ Θ which unknown, which charaterizes the density f_X(x), which we will write as f_X(x; θ). The joint density is thus

f_{X_{1:n}}(x_1,...,x_n;\theta )=\prod_{i=1}^n f_X(x_i;\theta ).

Lots of work will be on trying to find good methods to estimate θ in a basis of observed data (the sample)x_1, . . . , x_n. We have already seen the maximum likelihood method and this will be considered in more details later on in this course. Other work will focus on constructing formal statistical tests for the parameter θ, where the test will have a statistical or physical interpretation. For instance a test that two variables have ‘no relationship’. Especially for ‘frequentist’ statistics (of which almost all this course is concerned with, despite my own opinion of this type of inference) the test methods, or estimates of the θ may be based or justified on large sample (n) properties of the associated model.

We observe data in pairs: (y_{1:n},x_{1:n}) with y_i \in  \Upsilon ,\Upsilon \subseteq  R and x_i \in  X \subseteq  R^p,p\geq 1.The y_i are termed to be response variables and the xi are explanatory variables. It is hypothesised that there exist some functional relationship of the form (although one is not restricted to this scenario):

Y_i=g_\theta  (x_i)+\epsilon _i , \theta  \in  \Theta  \subseteq  R^{d_\theta }i\in \left\{ 1,...,n \right\} ,g:X\times \Theta  \rightarrow  \Upsilon , \epsilon _i \rightarrow _{iid}  F  

for some distribution F ensuring that \forall  i\in \left\{ 1,..,n \right\}  , g_\theta (x_i)+\epsilon _i \in \Upsilon . That is to say, in some manner, the variables xi are some-how relevant for explaining or predicting the y_i. To that end, we summarize the functional relationship g by collection of finite dimensional parameters θ, which we will seek to estimate from our observed data.

Heart catheterization is sometimes performed on children with congenital heart defects. A teflon tube is passed into a major vein at the femoral region and pushed up into the heart to obtain information about the heart’s physiology and functional ability. The length of the catheter is typically determined by a physician’s guess. In a small study, the exact catheter length required was determined by a fluoroscope to check if the tip of the catheter reached the pulmonary artery. The patients’ heights and weights were recorded. The objective is to see how accurately the catheter length could be determined by these two variables.

We focus upon the case where there are p−regressors; that is for Y_i \in  R:

Y_i= \theta _0 + \sum_{j=1}^{p-1} \theta _i x_{ij}+\epsilon _i , i \in  \left\{ 1,..,n\right\}

where \epsilon _i are i.i.d. zero mean random variables, p > 1. For simplicity we will assume:

\epsilon _i \sim_{iid} N(0,\sigma ^2) ; p < n .

Neither assumption is absolutely necessary, but we will avoid many mathematical complexities in this manner. In order to introduce a convenient matrix notation we will write Y=y_{1:n}' (the ′ denotes transpose), \theta = \theta _{0:p}' and let X be an n×p matrix with first column of 1’s and each row i ∈ {1 . . . , n}, \left\{ 1,x_{i1},...,x_{i(p-1)} \right\} . Finally write \epsilon =\epsilon _{1:n}' then we have representation of the linear model as: Y=X\theta +\epsilon .

The residual sum of squares (RSS) is defined as

The objective is to compute θ so as to minimize the RSS.

The Likelihood Principle. In the inference about θ, after the data is observed, all relevant experimental information is contained in the likelihood function for the observed data. Furthermore, two likelihood functions contain the same information about θ if they are proportional to each other.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

  • 《闭上眼睛才能看清楚自己》这本书是香海禅寺主持贤宗法师的人生体悟,修行心得及讲学录,此书从六个章节讲述了禅修是什么...
    宜均阅读 10,305评论 1 25
  • 前言 Google Play应用市场对于应用的targetSdkVersion有了更为严格的要求。从 2018 年...
    申国骏阅读 65,411评论 15 98
  • 《来,我们说说孤独》 1·他们都在写孤独 一个诗人 如果 不说说 内心的孤独 不将孤独 写进诗里 是不是很掉价呢 ...
    听太阳升起阅读 4,551评论 1 7
  • 自幼贫民窟长大的女子,侥幸多念了两本书,枉以为可以与人平起平坐。可是人生从来都是接力赛,我们却天真的当成了百米冲刺...
    Leeanran阅读 5,891评论 1 5
  • 云舒老师,姓甚名谁,男的女的,多大岁数,这些我全然不知。之所以要写写云舒老师,完全是因为他写的文章,如一个巨大的磁...
    数豆者m阅读 2,518评论 6 9

友情链接更多精彩内容