今天为大家推送一篇干货满满的文章,欢迎点赞、收藏、转发三连击!
资料来源:《R 语言核心技术手册》和 R
文档数据基本来自胡编乱造 和 R 文档
连续型数据
基于正态分布的检验
均值检验
t.test(1:10, 10:20)
>
> Welch Two Sample t-test
>
> data: 1:10 and 10:20
> t = -7, df = 19, p-value = 2e-06
> alternative hypothesis: true difference in means is not equal to 0
> 95 percent confidence interval:
> -12.4 -6.6
> sample estimates:
> mean of x mean of y
> 5.5 15.0
配对 t 检验:
t.test(rnorm(10), rnorm(10, mean = 1), paired = TRUE)
>
> Paired t-test
>
> data: rnorm(10) and rnorm(10, mean = 1)
> t = -5, df = 9, p-value = 7e-04
> alternative hypothesis: true difference in means is not equal to 0
> 95 percent confidence interval:
> -2.541 -0.962
> sample estimates:
> mean of the differences
> -1.75
使用公式:
df <- data.frame(
value = c(rnorm(10), rnorm(10, mean = 1)),
group = c(rep("control", 10), rep("test", 10))
)t.test(value ~ group, data = df)
>
> Welch Two Sample t-test
>
> data: value by group
> t = -0.4, df = 15, p-value = 0.7
> alternative hypothesis: true difference in means is not equal to 0
> 95 percent confidence interval:
> -1.62 1.08
> sample estimates:
> mean in group control mean in group test
> 0.532 0.802
假设方差同质:
t.test(value ~ group, data = df, var.equal = TRUE)
>
> Two Sample t-test
>
> data: value by group
> t = -0.4, df = 18, p-value = 0.7
> alternative hypothesis: true difference in means is not equal to 0
> 95 percent confidence interval:
> -1.60 1.06
> sample estimates:
> mean in group control mean in group test
> 0.532 0.802
更多查看 ?t.test
两总体方差检验
上面的例子假设方差同质,我们通过检验查看。服从正态分布的两总体方差比较。
进行的是 F 检验var.test(value ~ group, data = df)
>
> F test to compare two variances
>
> data: value by group
> F = 0.4, num df = 9, denom df = 9, p-value = 0.2
> alternative hypothesis: true ratio of variances is not equal to 1
> 95 percent confidence interval:
> 0.103 1.671
> sample estimates:
> ratio of variances
> 0.415
使用 Bartlett 检验比较每个组(样本)数据的方差是否一致。
bartlett.test(value ~ group, data = df)
>
> Bartlett test of homogeneity of variances
>
> data: value by group
> Bartlett's K-squared = 2, df = 1, p-value = 0.2
多个组间均值的比较
对于两组以上数据间均值的比较,使用方差分析 ANOVA。
aov(wt ~ factor(cyl), data = mtcars)
> Call:
> aov(formula = wt ~ factor(cyl), data = mtcars)
>
> Terms:
> factor(cyl) Residuals
> Sum of Squares 18.2 11.5
> Deg. of Freedom 2 29
>
> Residual standard error: 0.63
> Estimated effects may be unbalanced
查看详细信息:
model.tables(aov(wt ~ factor(cyl), data = mtcars))
> Tables of effects
>
> factor(cyl)
> 4 6 8
> -0.9315 -0.1001 0.782
> rep 11.0000 7.0000 14.000
通常先用 lm() 函数对数据建立线性模型,再用 anova() 函数提取方差分析的信息更方便。
ANOVA 分析假设各组样本数据的方差是相等的,如果知道(或怀疑)不相等,可以使用 oneway.test() 函数。
oneway.test(wt ~ cyl, data = mtcars)
>
> One-way analysis of means (not assuming equal variances)
>
> data: wt and cyl
> F = 20, num df = 2, denom df = 19, p-value = 2e-05
这与设定了 var.equal=FALSE 的 t.test 类似(两种方法都是 Welch 提出)。
多组样本的配对 t 检验
pairwise.t.test(mtcars$wt, mtcars$cyl)
>
> Pairwise comparisons using t tests with pooled SD
>
> data: mtcars$wt and mtcars$cyl
>
> 4 6
> 6 0.01 -
> 8 6e-07 0.01
>
> P value adjustment method: holm
可以自定义 p 值校正方法。
正态性检验
使用 Shapiro-Wilk 检验:
shapiro.test(rnorm(30))
>
> Shapiro-Wilk normality test
>
> data: rnorm(30)
> W = 1, p-value = 1
可以通过 QQ 图辅助查看。
qqnorm(rnorm(30))
分布的对称性检验
用 Kolmogorov-Smirnov 检验查看一个向量是否来自对称的概率分布(不限于正态分布)。
ks.test(rnorm(10), pnorm)
>
> One-sample Kolmogorov-Smirnov test
>
> data: rnorm(10)
> D = 0.2, p-value = 0.7
> alternative hypothesis: two-sided
函数第 1 个参数指定待检验的数据,第 2 个参数指定对称分布的类型,可以是数值型向量、指定概率分布函数的字符串或一个分布函数。
ks.test(rnorm(10), "pnorm")
>
> One-sample Kolmogorov-Smirnov test
>
> data: rnorm(10)
> D = 0.4, p-value = 0.09
> alternative hypothesis: two-sided
ks.test(rpois(10, lambda = 1), "pnorm")
> Warning in ks.test(rpois(10, lambda = 1), "pnorm"): ties should not be present
> for the Kolmogorov-Smirnov test
>
> One-sample Kolmogorov-Smirnov test
>
> data: rpois(10, lambda = 1)
> D = 0.5, p-value = 0.01
> alternative hypothesis: two-sided
检验两个向量是否服从同一分布
还是用上面的函数。
ks.test(rnorm(20), rnorm(30))
>
> Two-sample Kolmogorov-Smirnov test
>
> data: rnorm(20) and rnorm(30)
> D = 0.1, p-value = 1
> alternative hypothesis: two-sided
相关性检验
使用 cor.test() 函数。
cor.test(mtcars$mpg, mtcars$wt)
>
> Pearson's product-moment correlation
>
> data: mtcars$mpg and mtcars$wt
> t = -10, df = 30, p-value = 1e-10
> alternative hypothesis: true correlation is not equal to 0
> 95 percent confidence interval:
> -0.934 -0.744
> sample estimates:
> cor
> -0.868
一共有 3 种方法,具体看选项 method 的说明。
cor.test(mtcars$mpg, mtcars$wt, method = "spearman", exact = F)
>
> Spearman's rank correlation rho
>
> data: mtcars$mpg and mtcars$wt
> S = 10292, p-value = 1e-11
> alternative hypothesis: true rho is not equal to 0
> sample estimates:
> rho
> -0.886
不依赖分布的检验
均值检验
Wilcoxon 检验是 t 检验的非参数版本。默认是秩和检验。
wilcox.test(1:10, 10:20)
> Warning in wilcox.test.default(1:10, 10:20): cannot compute exact p-value with
> ties
>
> Wilcoxon rank sum test with continuity correction
>
> data: 1:10 and 10:20
> W = 0.5, p-value = 1e-04
> alternative hypothesis: true location shift is not equal to 0
可以设定为符号检验。
wilcox.test(1:10, 10:19, paired = TRUE)
> Warning in wilcox.test.default(1:10, 10:19, paired = TRUE): cannot compute exact
> p-value with ties
>
> Wilcoxon signed rank test with continuity correction
>
> data: 1:10 and 10:19
> V = 0, p-value = 0.002
> alternative hypothesis: true location shift is not equal to 0
多均值比较
多均值比较使 Kruskal-Wallis 秩和检验。
kruskal.test(wt ~ factor(cyl), data = mtcars)
>
> Kruskal-Wallis rank sum test
>
> data: wt by factor(cyl)
> Kruskal-Wallis chi-squared = 23, df = 2, p-value = 1e-05
方差检验
使用Fligner-Killeen(中位数)检验完成不同组别的方差比较。
fligner.test(wt ~ cyl, data = mtcars)
>
> Fligner-Killeen test of homogeneity of variances
>
> data: wt by cyl
> Fligner-Killeen:med chi-squared = 0.5, df = 2, p-value = 0.8
尺度参数差异
R 有一些检验可以用来确定尺度参数的差异。分布的尺度参数确定分布函数的尺度,如 t 分布的自由度。下面是针对两样本尺度参数差异的 Ansari-Bradley 检验。
还可以使用 Mood 两样本检验做。
mood.test(ramsay, jung.parekh)
>
> Mood two-sample test of scale
>
> data: ramsay and jung.parekh
> Z = 1, p-value = 0.3
> alternative hypothesis: two.sided
离散数据
比例检验
使用 prop.test() 比较两组观测值发生的概率是否有差异。
heads <- rbinom(1, size = 100, prob = .5)prop.test(heads, 100) # continuity correction TRUE by default
>
> 1-sample proportions test with continuity correction
>
> data: heads out of 100, null probability 0.5
> X-squared = 0.2, df = 1, p-value = 0.6
> alternative hypothesis: true p is not equal to 0.5
> 95 percent confidence interval:
> 0.370 0.572
> sample estimates:
> p
> 0.47prop.test(heads, 100, correct = FALSE)
>
> 1-sample proportions test without continuity correction
>
> data: heads out of 100, null probability 0.5
> X-squared = 0.4, df = 1, p-value = 0.5
> alternative hypothesis: true p is not equal to 0.5
> 95 percent confidence interval:
> 0.375 0.567
> sample estimates:
> p
> 0.47
可以给定概率值。
prop.test(heads, 100, p = 0.3, correct = FALSE)
>
> 1-sample proportions test without continuity correction
>
> data: heads out of 100, null probability 0.3
> X-squared = 14, df = 1, p-value = 2e-04
> alternative hypothesis: true p is not equal to 0.3
> 95 percent confidence interval:
> 0.375 0.567
> sample estimates:
> p
> 0.47
二项式检验
binom.test(c(682, 243), p = 3/4)
>
> Exact binomial test
>
> data: c(682, 243)
> number of successes = 682, number of trials = 925, p-value = 0.4
> alternative hypothesis: true probability of success is not equal to 0.75
> 95 percent confidence interval:
> 0.708 0.765
> sample estimates:
> probability of success
> 0.737
binom.test(682, 682 + 243, p = 3/4)
The same
>
> Exact binomial test
>
> data: 682 and 682 + 243
> number of successes = 682, number of trials = 925, p-value = 0.4
> alternative hypothesis: true probability of success is not equal to 0.75
> 95 percent confidence interval:
> 0.708 0.765
> sample estimates:
> probability of success
> 0.737
与其他的检验函数不同,这里的 p 值表示试验成功率与假设值的最小差值。
列联表检验
用来确定两个分类变量是否相关。对于小的列联表,试验 Fisher 精确检验获得较好的检验结果。Fisher 检验有一个关于喝茶的故事。
当列联表较大时,Fisher 计算量很大,可以使用卡方检验替代。
chisq.test(TeaTasting)
> Warning in chisq.test(TeaTasting): Chi-squared approximation may be incorrect
>
> Pearson's Chi-squared test with Yates' continuity correction
>
> data: TeaTasting
> X-squared = 0.5, df = 1, p-value = 0.5
对于三变量的混合影响,使用 Cochran-Mantel-Haenszel 检验。
用 McNemar 卡方检验检验二维列联表的对称性。
列联表非参数检验
Friedman 秩和检验是一个非参数版本的双边 ANOVA 检验。
最后分享一张图,帮助读者选择一个合适的统计检验:
▍本文版权(包括图片及文字)属于“优雅R”(微信公众号:elegant-r),禁止二次转载,如需转载请联系w_shixiang@163.com