模拟零假设

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
np.random.seed(42)

full_data = pd.read_csv('coffee_dataset.csv')
sample_data = full_data.sample(200)
  1. If you were interested in if the average height for coffee drinkers is the same as for non-coffee drinkers, what would the null and alternative be? Place them in the cell below, and use your answer to answer the first quiz question below.

    Since there is no directional component associated with this statement, a not equal to seems most reasonable.

    𝐻0:𝜇𝑐𝑜𝑓𝑓−𝜇𝑛𝑜=0

    𝐻0:𝜇𝑐𝑜𝑓𝑓−𝜇𝑛𝑜≠0

    𝜇𝑐𝑜𝑓𝑓 and 𝜇𝑛𝑜 are the population mean values for coffee drinkers and non-coffee drinkers, respectivley.

  2. If you were interested in if the average height for coffee drinkers is less than non-coffee drinkers, what would the null and alternative be? Place them in the cell below, and use your answer to answer the second quiz question below.

    In this case, there is a question associated with a direction - that is the average height for coffee drinkers is less than non-coffee drinkers. Below is one of the ways you could write the null and alternative. Since the mean for coffee drinkers is listed first here, the alternative would suggest that this is negative.

    𝐻0:𝜇𝑐𝑜𝑓𝑓−𝜇𝑛𝑜≥0

    𝐻0:𝜇𝑐𝑜𝑓𝑓−𝜇𝑛𝑜<0

    𝜇𝑐𝑜𝑓𝑓 and 𝜇𝑛𝑜 are the population mean values for coffee drinkers and non-coffee drinkers, respectivley.

  3. For 10,000 iterations: bootstrap the sample data, calculate the mean height for coffee drinkers and non-coffee drinkers, and calculate the difference in means for each sample. You will want to have three arrays at the end of the iterations - one for each mean and one for the difference in means. Use the results of your sampling distribution, to answer the third quiz question below.

nocoff_means, coff_means, diffs = [], [], []

for _ in range(10000):
    bootsamp = sample_data.sample(200, replace = True)
    coff_mean = bootsamp[bootsamp['drinks_coffee'] == True]['height'].mean()
    nocoff_mean = bootsamp[bootsamp['drinks_coffee'] == False]['height'].mean()
    # append the info 
    coff_means.append(coff_mean)
    nocoff_means.append(nocoff_mean)
    diffs.append(coff_mean - nocoff_mean)   
np.std(nocoff_means) # the standard deviation of the sampling distribution for nocoff
np.std(coff_means) # the standard deviation of the sampling distribution for coff
np.std(diffs) # the standard deviation for the sampling distribution for difference in means
plt.hist(nocoff_means, alpha = 0.5);
plt.hist(coff_means, alpha = 0.5); # They look pretty normal to me!
plt.hist(diffs, alpha = 0.5); # again normal - this is by the central limit theorem
null_vals = np.random.normal(0, np.std(diffs), 10000) # Here are 10000 draws from the sampling distribution under the null
plt.hist(null_vals); #Here is the sampling distribution of the difference under the null
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,448评论 0 10
  • 好难啊……知道如何画也画不好
    fable花儿阅读 415评论 0 1
  • 固定性思维和成长型思维模式造就不一样的人生! 你的某项专长,并不是固定的先天能力决定的,而是通过有目的的锻炼获得!...
    纸上芭蕾阅读 97评论 0 0