「R」统计检验函数汇总

2020-01-19 568动手实验室

今天为大家推送一篇干货满满的文章,欢迎点赞、收藏、转发三连击!
资料来源:《R 语言核心技术手册》和 R
文档数据基本来自胡编乱造 和 R 文档

连续型数据

基于正态分布的检验

均值检验

t.test(1:10, 10:20)

>

> Welch Two Sample t-test

>

> data: 1:10 and 10:20

> t = -7, df = 19, p-value = 2e-06

> alternative hypothesis: true difference in means is not equal to 0

> 95 percent confidence interval:

> -12.4 -6.6

> sample estimates:

> mean of x mean of y

> 5.5 15.0

配对 t 检验:

t.test(rnorm(10), rnorm(10, mean = 1), paired = TRUE)

>

> Paired t-test

>

> data: rnorm(10) and rnorm(10, mean = 1)

> t = -5, df = 9, p-value = 7e-04

> alternative hypothesis: true difference in means is not equal to 0

> 95 percent confidence interval:

> -2.541 -0.962

> sample estimates:

> mean of the differences

> -1.75

使用公式:

df <- data.frame(
value = c(rnorm(10), rnorm(10, mean = 1)),
group = c(rep("control", 10), rep("test", 10))
)t.test(value ~ group, data = df)

>

> Welch Two Sample t-test

>

> data: value by group

> t = -0.4, df = 15, p-value = 0.7

> alternative hypothesis: true difference in means is not equal to 0

> 95 percent confidence interval:

> -1.62 1.08

> sample estimates:

> mean in group control mean in group test

> 0.532 0.802

假设方差同质:

t.test(value ~ group, data = df, var.equal = TRUE)

>

> Two Sample t-test

>

> data: value by group

> t = -0.4, df = 18, p-value = 0.7

> alternative hypothesis: true difference in means is not equal to 0

> 95 percent confidence interval:

> -1.60 1.06

> sample estimates:

> mean in group control mean in group test

> 0.532 0.802

更多查看 ?t.test

两总体方差检验

上面的例子假设方差同质,我们通过检验查看。服从正态分布的两总体方差比较。

进行的是 F 检验var.test(value ~ group, data = df)

>

> F test to compare two variances

>

> data: value by group

> F = 0.4, num df = 9, denom df = 9, p-value = 0.2

> alternative hypothesis: true ratio of variances is not equal to 1

> 95 percent confidence interval:

> 0.103 1.671

> sample estimates:

> ratio of variances

> 0.415

使用 Bartlett 检验比较每个组(样本)数据的方差是否一致。

bartlett.test(value ~ group, data = df)

>

> Bartlett test of homogeneity of variances

>

> data: value by group

> Bartlett's K-squared = 2, df = 1, p-value = 0.2

多个组间均值的比较

对于两组以上数据间均值的比较,使用方差分析 ANOVA。

aov(wt ~ factor(cyl), data = mtcars)

> Call:

> aov(formula = wt ~ factor(cyl), data = mtcars)

>

> Terms:

> factor(cyl) Residuals

> Sum of Squares 18.2 11.5

> Deg. of Freedom 2 29

>

> Residual standard error: 0.63

> Estimated effects may be unbalanced

查看详细信息:

model.tables(aov(wt ~ factor(cyl), data = mtcars))

> Tables of effects

>

> factor(cyl)

> 4 6 8

> -0.9315 -0.1001 0.782

> rep 11.0000 7.0000 14.000

通常先用 lm() 函数对数据建立线性模型,再用 anova() 函数提取方差分析的信息更方便。

R统计_1

ANOVA 分析假设各组样本数据的方差是相等的,如果知道(或怀疑)不相等,可以使用 oneway.test() 函数。

oneway.test(wt ~ cyl, data = mtcars)

>

> One-way analysis of means (not assuming equal variances)

>

> data: wt and cyl

> F = 20, num df = 2, denom df = 19, p-value = 2e-05

这与设定了 var.equal=FALSE 的 t.test 类似(两种方法都是 Welch 提出)。

多组样本的配对 t 检验

pairwise.t.test(mtcars$wt, mtcars$cyl)

>

> Pairwise comparisons using t tests with pooled SD

>

> data: mtcars$wt and mtcars$cyl

>

> 4 6

> 6 0.01 -

> 8 6e-07 0.01

>

> P value adjustment method: holm

可以自定义 p 值校正方法。

正态性检验

使用 Shapiro-Wilk 检验:

shapiro.test(rnorm(30))

>

> Shapiro-Wilk normality test

>

> data: rnorm(30)

> W = 1, p-value = 1

可以通过 QQ 图辅助查看。

qqnorm(rnorm(30))

R统计_2

分布的对称性检验

用 Kolmogorov-Smirnov 检验查看一个向量是否来自对称的概率分布(不限于正态分布)。

ks.test(rnorm(10), pnorm)

>

> One-sample Kolmogorov-Smirnov test

>

> data: rnorm(10)

> D = 0.2, p-value = 0.7

> alternative hypothesis: two-sided

函数第 1 个参数指定待检验的数据,第 2 个参数指定对称分布的类型,可以是数值型向量、指定概率分布函数的字符串或一个分布函数。

ks.test(rnorm(10), "pnorm")

>

> One-sample Kolmogorov-Smirnov test

>

> data: rnorm(10)

> D = 0.4, p-value = 0.09

> alternative hypothesis: two-sided

ks.test(rpois(10, lambda = 1), "pnorm")

> Warning in ks.test(rpois(10, lambda = 1), "pnorm"): ties should not be present

> for the Kolmogorov-Smirnov test

>

> One-sample Kolmogorov-Smirnov test

>

> data: rpois(10, lambda = 1)

> D = 0.5, p-value = 0.01

> alternative hypothesis: two-sided

检验两个向量是否服从同一分布

还是用上面的函数。

ks.test(rnorm(20), rnorm(30))

>

> Two-sample Kolmogorov-Smirnov test

>

> data: rnorm(20) and rnorm(30)

> D = 0.1, p-value = 1

> alternative hypothesis: two-sided

相关性检验

使用 cor.test() 函数。

cor.test(mtcars$mpg, mtcars$wt)

>

> Pearson's product-moment correlation

>

> data: mtcars$mpg and mtcars$wt

> t = -10, df = 30, p-value = 1e-10

> alternative hypothesis: true correlation is not equal to 0

> 95 percent confidence interval:

> -0.934 -0.744

> sample estimates:

> cor

> -0.868

一共有 3 种方法,具体看选项 method 的说明。

cor.test(mtcars$mpg, mtcars$wt, method = "spearman", exact = F)

>

> Spearman's rank correlation rho

>

> data: mtcars$mpg and mtcars$wt

> S = 10292, p-value = 1e-11

> alternative hypothesis: true rho is not equal to 0

> sample estimates:

> rho

> -0.886

不依赖分布的检验

均值检验

Wilcoxon 检验是 t 检验的非参数版本。默认是秩和检验。

wilcox.test(1:10, 10:20)

> Warning in wilcox.test.default(1:10, 10:20): cannot compute exact p-value with

> ties

>

> Wilcoxon rank sum test with continuity correction

>

> data: 1:10 and 10:20

> W = 0.5, p-value = 1e-04

> alternative hypothesis: true location shift is not equal to 0

可以设定为符号检验。

wilcox.test(1:10, 10:19, paired = TRUE)

> Warning in wilcox.test.default(1:10, 10:19, paired = TRUE): cannot compute exact

> p-value with ties

>

> Wilcoxon signed rank test with continuity correction

>

> data: 1:10 and 10:19

> V = 0, p-value = 0.002

> alternative hypothesis: true location shift is not equal to 0

多均值比较

多均值比较使 Kruskal-Wallis 秩和检验。

kruskal.test(wt ~ factor(cyl), data = mtcars)

>

> Kruskal-Wallis rank sum test

>

> data: wt by factor(cyl)

> Kruskal-Wallis chi-squared = 23, df = 2, p-value = 1e-05

方差检验

使用Fligner-Killeen(中位数)检验完成不同组别的方差比较。

fligner.test(wt ~ cyl, data = mtcars)

>

> Fligner-Killeen test of homogeneity of variances

>

> data: wt by cyl

> Fligner-Killeen:med chi-squared = 0.5, df = 2, p-value = 0.8

尺度参数差异

R 有一些检验可以用来确定尺度参数的差异。分布的尺度参数确定分布函数的尺度,如 t 分布的自由度。下面是针对两样本尺度参数差异的 Ansari-Bradley 检验。

R统计_3

还可以使用 Mood 两样本检验做。

mood.test(ramsay, jung.parekh)

>

> Mood two-sample test of scale

>

> data: ramsay and jung.parekh

> Z = 1, p-value = 0.3

> alternative hypothesis: two.sided

离散数据

比例检验

使用 prop.test() 比较两组观测值发生的概率是否有差异。

heads <- rbinom(1, size = 100, prob = .5)prop.test(heads, 100) # continuity correction TRUE by default

>

> 1-sample proportions test with continuity correction

>

> data: heads out of 100, null probability 0.5

> X-squared = 0.2, df = 1, p-value = 0.6

> alternative hypothesis: true p is not equal to 0.5

> 95 percent confidence interval:

> 0.370 0.572

> sample estimates:

> p

> 0.47prop.test(heads, 100, correct = FALSE)

>

> 1-sample proportions test without continuity correction

>

> data: heads out of 100, null probability 0.5

> X-squared = 0.4, df = 1, p-value = 0.5

> alternative hypothesis: true p is not equal to 0.5

> 95 percent confidence interval:

> 0.375 0.567

> sample estimates:

> p

> 0.47

可以给定概率值。

prop.test(heads, 100, p = 0.3, correct = FALSE)

>

> 1-sample proportions test without continuity correction

>

> data: heads out of 100, null probability 0.3

> X-squared = 14, df = 1, p-value = 2e-04

> alternative hypothesis: true p is not equal to 0.3

> 95 percent confidence interval:

> 0.375 0.567

> sample estimates:

> p

> 0.47

二项式检验

binom.test(c(682, 243), p = 3/4)

>

> Exact binomial test

>

> data: c(682, 243)

> number of successes = 682, number of trials = 925, p-value = 0.4

> alternative hypothesis: true probability of success is not equal to 0.75

> 95 percent confidence interval:

> 0.708 0.765

> sample estimates:

> probability of success

> 0.737

binom.test(682, 682 + 243, p = 3/4)

The same

>

> Exact binomial test

>

> data: 682 and 682 + 243

> number of successes = 682, number of trials = 925, p-value = 0.4

> alternative hypothesis: true probability of success is not equal to 0.75

> 95 percent confidence interval:

> 0.708 0.765

> sample estimates:

> probability of success

> 0.737

与其他的检验函数不同,这里的 p 值表示试验成功率与假设值的最小差值。

列联表检验

用来确定两个分类变量是否相关。对于小的列联表,试验 Fisher 精确检验获得较好的检验结果。Fisher 检验有一个关于喝茶的故事。

1-5.png

当列联表较大时,Fisher 计算量很大,可以使用卡方检验替代。

chisq.test(TeaTasting)

> Warning in chisq.test(TeaTasting): Chi-squared approximation may be incorrect

>

> Pearson's Chi-squared test with Yates' continuity correction

>

> data: TeaTasting

> X-squared = 0.5, df = 1, p-value = 0.5

对于三变量的混合影响,使用 Cochran-Mantel-Haenszel 检验。

1-6.png
1-7.png

用 McNemar 卡方检验检验二维列联表的对称性。

1-8.png

列联表非参数检验

Friedman 秩和检验是一个非参数版本的双边 ANOVA 检验。
1-9.png

1-10.png

最后分享一张图,帮助读者选择一个合适的统计检验:

1-2.png

▍本文版权(包括图片及文字)属于“优雅R”(微信公众号:elegant-r),禁止二次转载,如需转载请联系w_shixiang@163.com

上一篇下一篇