2021-04-24

•Educational

and Psychological Measurement

•Zhehan

Jiang

•Peking University

[if ppt]•[endif]

•Course No. 06716070

•CTT

model and reliability

•EPM

•CTT

& Reliability

•Formal

definition of the CTT model, assumptions, results.

•CTT

definition of reliability

•Properties

of composite scores

•Standard

Error of Measurement

•Debates:

soon!

•Should

CONSEQUENCES be considered “part” of validity?

–Example:

using student test scores to evaluate teacher effectiveness

•Can

test scores be sufficiently valid without being reliable?

–Example:

Driver’s test includes a reliable written portion and an unreliable performance

task. Why?

•CTT

“Model”

•Test scores are random variables“sampled” from a hypothetical population

•X = T + E

•Definition of E(X)

•True score for an examinee:

Tj = E(X) = mXj

•True

Score

•The

true score is the mean, or expected value, of an examinee’s observed scores

obtained from a large (theoretically infinite) number of repeated test

administrations.

•Theoretically,

every examinee has a distribution of possible observed scores…even though we

usually only test once.

•Observed

Scores

•What would make the observed scores

change from one trial to the next? (Hint: True scores don’t change)

–Errors

are random and fluctuate

–An

examinee’s distribution of observed scores would be centered around his/her

true score.

•Observed

Scores

•The observed scores have a SD, it

reflects the amount of error variability present.

•A really reliable test

would have examinees’ observed scores closely clustered around their true

scores, with very little random fluctuation.

•Properties

of Error

•For

examinee j: Ej = Xj - Tj

•Note

that Ej is a new random variable and that E(Ej) = 0.

–Because

E(Xj) = Tj

•Interpretation:

average of errors for one examinee = 0

•Assumptions

•Reliability

•“Reliability”

refers to the consistency (or reproducibility) of scores over administrations.

–Repeated

over time, across parallel forms, between raters, or over tasks within an

assessment

•Reliability

= Repeatability

•Reliability

•One

way to think about this:

If z-scores for examinees stay consistent over administrations, the test scoresare reliable.

•Another

way:

the extent to which scores are free of randomness or errormakes them reliable.

•Important

Note

•As with validity evidence, remember that

tests are not reliable, per se, but rather test

scoresare reliable.

•A test may be administered to a very

different population of examinees and produce very different results…

•How

to Quantify Reliability?

•We

know it is desirable for scores to be relatively free from random error, and we

know X = T + E.

•If

T and X are highly related, it implies that E and X are weakly related. If X

and T are perfectly related, then all Observed variability is due to True

scores.

•Reliability

Index

•Reliability

Index = Correlation between Observed scores and True scores: rXT

•Estimating

Reliability

•The reliability index is an important

result, but it isn’t practical without further assumptions being made.

•We can’t observe True scores, only

Observed scores, so how could we ever estimate the correlation between the two?

•Parallel

Forms

•The

CTT estimation of reliability depends on the concept of parallel forms. Two

forms are parallel if:

–Each

examinee has the same true score on both forms of the test: Tj1= Tj2

–Error

variances for both forms are equal: s2(E1)= s2(E2)

–Errors

are uncorrelated across forms

–This

assumes the same construct!

•Parallel

Forms

•It is

difficult (at best) to construct strictlyparallel forms, but the concept is important because it makes reliability

estimable!

•What’s

important is that it’s theoretically

possibleto construct strictly parallel forms…

•Not-so-parallel

forms

•These

definitions of forms that are not strictly parallel will be especially helpful

when we discuss the task of equating or linking different forms.

–Tau

(t)

equivalence

–Essential

tau equivalence

–Congenericity(or “congeneric forms”)

•Tau (t)

equivalence

•Tau (t)

equivalence relaxes the assumption of equal error variances (i.e., error

variances may be unequal), but keeps the assumption that true scores are equal:

Tj1 = Tj2

•Errors

still uncorrelated

•Essential

Tau equivalence

•Essential

Tau (t)

equivalence further relaxes assumptions

–Error

variances are not necessarily equal, and

–Truescores across forms only differ by an additive constant:

T

j1 = Tj2 + c

•Errors

still uncorrelated

•Congeneric

forms

•Congenericityfurther relaxes assumptions to allow for different scales across forms

–Error

variances are not necessarily equal, and

–Truescores across forms differ by a positive linear function:

T

j1 =

d*Tj2+ c, where d > 0

•Errors

still uncorrelated

•Parallel

Forms

•All

that is required for CTT to work is that the concept of

parallel forms is theoretically possible.

•In

practice, we will only need to rely on the assumption of congenericity to

deal with estimating reliability and equating multiple forms.

•Reliability

Coefficient

•Correlation

between observed scores across two parallel forms: rXiXj

•Reliability

Coefficient

•Simple,

elegant, enduring concept:

•Coefficient

vs. Index

•Rel.

Coefficient = (Rel. Index)2

•Rel.

Index = SQRT(Rel. Coefficient)

•Variance

Components

•As

Error variance decreases…

–Ratio

of True/Observed variance increases

–Reliability

coefficient increases

•Interpretations

•Reliability

Coefficient= proportion of Observed score variance due to True score variability.

•Reliability

Index= correlation between Observed and True scores.

•Importance?

Reliability can now be estimated with observable data!

•Importance

•Through

the reliability coefficient, we can determine how much of the variability in

observed scores is due to differences among TRUE scores (the thing we’re trying

to measure!).

•The

higher the value (bounded by zero and one), the less influenced by random

errors the scores are.

•Example

•Let’s

say rXiXj = 0.81. 81% of the Observed scorevariance is due to True score variance, and

s2(T) = 0.81s2(X).

•If s(X) =4, we can predict:

s(T) =

SQRT(0.81*16) =

3.6

•And,the correlation between X,T:

rXT = SQRT(0.81)= 0.9

•Standard

Error of Measurement

•So,

if we have measurements across parallel forms, we can estimate the proportion

of True score to Observed score variance…so what?

•If we

know the proportion of True score variance, we also know the proportion of Error variance.

•Std.

Error of Measurement

•By

knowing the Error

variance,

we can use this information to state our confidence that an examinee’s test

score accurately reflects his/her true ability (i.e., the True score).

•Influence

of Error

•We

can’t know how much of any one examinee’s score is due to error, but we can

estimate the expected amount of variability for observed scores around each

examinee’s true score…THINK: “confidence interval”

•True

Score

•Remember, True score is defined as the

mean, or expected value, of an examinee’s Observed scores from a large number

of repeated test administrations.

•Theoretically, every examinee has a

distribution of possible observed scores, even though we only observe one (or

two).

•Std.

Error of Measurement

•We

can’t actually

computethe standard deviation of possible observed scores for each examinee, but we

can estimate the averageerror standard deviation…

•This

is what we call the Standard Error of Measurement (SEM).

–In a

couple of weeks we

will talk about conditional SEMs.

•Std.

Error of Measurement

•Confidence

Intervals

•Assuming

Normally distributed errors (common in Regression):

•X ± 1sE à 68%

CI

–On

repeated testing, 68% of the time X would be in this interval

•X ± 1.96sE à 95%

CI

–On

repeated testing, 95% of the time X would be in this interval

•Statistical

Analogy

•Reliability

Coefficient: rXiXj is just like R2 from Regression

•Likewise,

the standard error of measurement is just like the standard error of estimate.

•Soon

we’ll generalize this to predict T from X.

•Typical

Reliability Data

•Correlation

between scores from the same form administered to the same group of examinees

on two separate occasions (coefficient

of stability).

–“Test-retest

Reliability”

•Correlation

between two different forms administered to the same examinees on one occasion (coefficient

of equivalence).

–“Parallel-forms

Reliability”

•Typical

Reliability Data

•Correlation among test scores when

examinees respond to parallel components repeatedly is estimated by the coefficient

of internal consistency.

–Next

week’s topic is Internal Consistency: the reliability of composite scores

[if ppt]•[endif]

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,445评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,889评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,047评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,760评论 1 276
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,745评论 5 367
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,638评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,011评论 3 398
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,669评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,923评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,655评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,740评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,406评论 4 320
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,995评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,961评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,197评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,023评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,483评论 2 342

推荐阅读更多精彩内容

  • 我是黑夜里大雨纷飞的人啊 1 “又到一年六月,有人笑有人哭,有人欢乐有人忧愁,有人惊喜有人失落,有的觉得收获满满有...
    陌忘宇阅读 8,520评论 28 53
  • 首先介绍下自己的背景: 我11年左右入市到现在,也差不多有4年时间,看过一些关于股票投资的书籍,对于巴菲特等股神的...
    瞎投资阅读 5,651评论 3 8
  • ![Flask](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAW...
    极客学院Wiki阅读 7,229评论 0 3
  • 不知不觉易趣客已经在路上走了快一年了,感觉也该让更多朋友认识知道易趣客,所以就谢了这篇简介,已做创业记事。 易趣客...
    Physher阅读 3,407评论 1 2