screening the dataset
两个目的:1遗失的数据 check for missing data
2 奇怪的 和 错误的数据
什么算是奇怪的数据?
consistency check 前后回答不一致的
filler questions 是?
极端的数据 怎么算极端?
如何做?
1 analyze frequencies 频率,- check missing data and extreme data ?
2 scatter plot 分布图 - check consistency
*不会 spss- scatter plot , select cases
对’坏数据‘做什么?
啥都不做
收集更多数据
assign missing value-
for not key variables, 填充平均数 substitute neutral values, usually the mean
impute values (根据附近的数值填充)
删掉
决定主要是取决于how many good repondents there are
analyzing dataset
levels of measurement
assigning number ,spss-values
spss中的scale是指 metric data,包括interval和ratio。
nominal 类别
ordinal 排序
interval 评分什么的 1—10
ratio 有含义的数据
数据检验statistical tests 就取决于 度量的类型 the level of measurement of a variable
types of statistical analyses
1描述分析descriptive analysis。总结样本,频率分析
2推断 inferential analysis,由样本推总体,假设检验 和 confidence intervals(可能存在一个模型啥的) ,one-sample
3比较分析 differences analysis , 比较两组或多组数据mean。differences among means.
4关联分析 associative analysis,考察一个关系的strength and direction. cross-tabulations and correlations.
5预测 predictive analysis: regressions.
descriptive analysis
summarize data 总结样本
HOW 如何总结,(总结啥)? (一般来说 这些数据有意义吗)
-descriptive analysis 那一套
1. location: mode , median ,mean
2.variability: (interquartile)range, variance , standard deviation (为啥有了方差还要标准差),coefficient of variation: =standard deviation/mean
3.shape : skewness, kurtosis
*注意:描述分析的意义depending on the level of measurement
adjusting data
re-specifying variables 啥意思?
transforming scales -standardizing z-scores
weighing cases/ respondent (不经常用)啥意思? to account for representativeness.
hypothesis testing
1.two-sided tests (等于or不等)
Ho: 变量的参数是等于某值 the parameter (mean, proportion )of the variable is equal
H1:the parameter of the variable is different
2.one-sided tests (大于小于)
Ho: 大于等于 or 小于等于
H1:< or >
结果可以有两种,一种是test statistic 另一种是p-value.(test statistic 越大,p-value就越小,Ho的可能性就越小) 见图
所以,test statistic >critical value 就拒绝
p-value <0.05 拒绝
spss中,p-value 显示为“Sig.”
p≤0.05,Ho is rejected → the parameter is significantly different from xx.
0.05<p≤0.1,Ho is rejected but marginally → the parameter is marginally significantly different from xx.
p >0.1, Ho is not rejected → the parameter is not statistically different from xx.
test statistic
test statistic > critical value, Ho is rejected
diagram 'when to use which test?'
图~
怎么用这张表? -3 questions:
1. what is the dependent variable?
2.what is the measurement level of the dependent variable?
3.what and how many samples does the hypothesis involve?
-one sample: 比较给定组的参数 (和某一值~)
-independent samples:比较两个组的参数。eg. man/woman, branded/unbranded
-related samples: compare the responses of the same individual amongst each other. 其实是同一个样本 对不同问题的回答 酱紫?
inferential analysis: one-sample tests. representativeness
推断是否具有代表性,和给定的某一值比较
Ho:mean in the population where the sample came from =2.28
首先,DV=household size ,DV measurement= ratio sample: one sample (必要步骤)
所以(查看表格),用one sample t-test
eg2:检验 房屋分布的比例是否和统计数据一致
首先,DV=sample household proportion, DV measurement= ordinal, sample =one sample
所以用one sample Kolmogorov- smirnov (by hand or excel )
total population 中的cumulative percentage 和样本observed cumulative% 计算absolute difference
test statistic = 最大的那个difference → K=xx
critical value at 5%=1.36 除以 根号下样本个数 =aa
K 大于 aa →Ho is rejected 显著不同
检验二分法中的比例 the proportion of a dichotomous variable (yes/no)
用Z-test (by hand)
differential analysis:two and more independent or related samples
表格的运用,见onenote
associative analysis: correlations
变量间的关系
when there are 2 variables
both are metric(interval /ratio ), linear relationship , use pearson correlation coefficient
one or both are ordinal, use spearman rank correlation coefficient
r 属于[-1,1]
significant vs. substantive results.
significant 取决于1 “不同”或“相关”的strength、magnitude? 以及 2样本大小 sample size
sig是第一步,relevance是一个主观判断
sig difference or correlation 不能推断出substantive or relevant
magnitude of the difference =% change in the response of one group from that of the comparision group