当前位置：首页 > news >正文

做资源网站做1688网站运营工资怎么样

news 2026/5/5 7:00:57

做资源网站,做1688网站运营工资怎么样,市场营销方案范文5篇,专门设计网站的公司叫什么欢迎大家关注全网生信学习者系列#xff1a; WX公zhong号#xff1a;生信学习者Xiao hong书#xff1a;生信学习者知hu#xff1a;生信学习者CDSN#xff1a;生信学习者2 介绍微生物数据具有一下的特点#xff0c;这使得在做差异分析的时候需要考虑到更多的问题…欢迎大家关注全网生信学习者系列 WX公zhong号生信学习者Xiao hong书生信学习者知hu生信学习者CDSN生信学习者2 介绍微生物数据具有一下的特点这使得在做差异分析的时候需要考虑到更多的问题 Sparsity Compositional Overdispersion 现在 **Nearing, Douglas et al. Nature Comm. Microbiome differential abundance methods produce different results across 38 datasets.**文章对常用的差异分析方法做了基准测试本文讲把不同方法的核心计算代码记录下来。作者这项工作的目标是比较一系列常见的差异分析DA方法在16S rRNA基因数据集上的表现。因此在这项工作的大部分中作者并不确切知道正确答案是什么作者主要只能说哪些工具在性能上更相似于其他工具。然而还包含了几项分析以帮助评估这些工具在不同环境下来自同一数据集的假阳性率和一致性。简单地说该工作试图评估不同的微生物组差异丰度分析方法在多个数据集上的表现并比较它们之间的相似性和一致性同时尝试评估这些工具在不同数据集上产生假阳性结果的频率。 Sparsity 即使在同一环境中不同样本的微生物出现概率或者丰度都是不一样的大部分微生物丰度极低。又因为在测序仪的检测极限下微生物丰度相对或绝对丰度为0的概率又极大增加了。除此之外比对所使用的数据库大小也即是覆盖物种率也会对最终的微生物丰度表达谱有较大的影响。最后我们所获得的微生物丰度谱必然含有大量的零值它有两种情况一种是真实的零值另一种是误差导致的零值。很多算法会针对这两个特性构建不同的处理零值策略。零值数量的大小构成了微生物丰度谱稀疏性。在某次16s数据的OTU水平中零值比例高达80%以上。Sparsity属性导致常用的数据分析方法如t-test/wilcox-test假设检验方法均不适合。为了解决sparsity对分析的影响很多R包的方法如ANCOM的Zero划分metagenomeSeq的ZIP/ZILN对Zero进行处理处理后的矩阵再做如CLR等变换CLR变换又是为了处理微生物数据另一个特点compositional 下一部分讲。最后转换后的数据会服从常见的分布也即是可以使用常见的如Wilcox/t-test之类两分组的方法做假设检验需要说明的是ANCOM还会根据物种在样本内的显著性的差异比例区分差异物种这也是为何ANCOM的稳健性的原因。 Compositional Compositional的数据特性是服从simplex空间简而言之是指某个样本内所有微生物的加和是一个常数可以是1也可以是10100。符合该属性的数据内部元素之间存在着相关关系即某个元素的比例发生波动必然引起其他元素比例的波动但在实际的微生物环境中这种关联关系可能是不存在的。为了解决compositional的问题有人提出了使用各种normalization方法比如上文提到的CLR X i l o g ( x i G e a m e t r i c M e a n ( X ) ) X_{i}log(\frac{x_{i}}{GeametricMean(X)}) Xilog(GeametricMean(X)xi)我暂时只熟悉这个方法。 Compositional数据不服从欧式空间分布在使用log-ratio transformation后数据可以一一对应到真实的多维变量的空间方便后续应用标准分析方法。 Overdispersion Overdispersion的条件是 Variance mean也就是说数据的方差要远远大于均值。常用的适合count matrix的Poisson分布是无法处理这样的数据的因此现在很多方法都是用负二项分布去拟合数据。总结使用一张自己讲过的PPT总结一下。差异分析方法不同的差异分析方法识别到差异微生物可能会存在较大的区别这是因为这些方法的原理是不一样的但从微生物的数据特点而言方法需要符合微生物数据特性。数据文章提供了38个16S数据集大家通过以下链接下载数据百度网盘链接https://pan.baidu.com/s/1tIMcqm6rs1EEbhJzH-Z4mg提取码: 请关注WX公zhong号_生信学习者_后台发送微生物差异方法获取提取码 ALDEx2 ALDEx2ANOVA-Like Differential Expression 2是一种用于微生物组数据差异分析的方法它特别适用于处理组成数据compositional data这类数据的特点是在每个样本中各部分的总和为一个固定值例如微生物群落中各物种的相对丰度之和为1。ALDEx2方法的核心原理包括以下几个步骤生成后验概率分布首先ALDEx2使用Dirichlet分布来模拟每个分类单元如OTU或ASV的读数计数的后验概率分布。这一步是基于微生物群落数据的组成特性即数据点在高维空间中位于一个低维的简单形simplex上。中心对数比变换CLRALDEx2对原始计数数据进行中心对数比Centered Log-Ratio变换这是一种适合组成数据的变换方法可以消除数据的组成特性带来的影响使得数据更适合常规的统计分析方法。单变量统计检验变换后的数据将用于单变量统计检验如Welch’s t检验或秩和检验以确定不同组之间各分类单元的丰度是否存在显著差异。效应量估计ALDEx2还计算效应量大小这是衡量组间差异相对于组内变异的一个重要指标有助于评估差异的生物学意义。多重检验校正在识别出显著差异的分类单元后ALDEx2使用Benjamini-Hochberg方法进行多重检验校正以控制假阳性率。 #### Script to Run ALDEX2 differential abundancedeps c(ALDEx2) for (dep in deps){if (dep %in% installed.packages()[,Package] FALSE){if (!requireNamespace(BiocManager, quietly TRUE))install.packages(BiocManager)BiocManager::install(ALDEx2)}library(dep, character.only TRUE) }library(ALDEx2)args - commandArgs(trailingOnly TRUE)#test if there is an argument supply if (length(args) 2) {stop(At least three arguments must be supplied, call.FALSE) }con - file(args[1]) file_1_line1 - readLines(con,n1) close(con)if(grepl(Constructed from biom file, file_1_line1)){ASV_table - read.table(args[1], sep\t, skip1, headerT, row.names 1, comment.char , quote, check.names F) }else{ASV_table - read.table(args[1], sep\t, headerT, row.names 1, comment.char , quote, check.names F) }groupings - read.table(args[2], sep\t, row.names 1, headerT, comment.char , quote, check.names F)#number of samples sample_num - length(colnames(ASV_table)) grouping_num - length(rownames(groupings))#check if the same number of samples are being input. if(sample_num ! grouping_num){message(The number of samples in the ASV table and the groupings table are unequal)message(Will remove any samples that are not found in either the ASV table or the groupings table) }#check if order of samples match up. if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings and ASV table are in the same order) }else{rows_to_keep - intersect(colnames(ASV_table), rownames(groupings))groupings - groupings[rows_to_keep,,dropF]ASV_table - ASV_table[,rows_to_keep]if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings table was re-arrange to be in the same order as the ASV table)message(A total of , sample_num-length(colnames(ASV_table)), from the ASV_table)message(A total of , grouping_num-length(rownames(groupings)), from the groupings table)}else{stop(Unable to match samples between the ASV table and groupings table)} }results - aldex(reads ASV_table,conditions groupings[, 1],mc.samples 128,test t,effect TRUE,include.sample.summary FALSE,verbose T, denom all)write.table(results, fileargs[3], quoteFALSE, sep\t, col.names NA)message(Results table saved to , args[3])ANCOM-II ANCOMAnalysis of Composition of Microbiomes是一种用于分析微生物组数据的统计方法专门设计来识别和比较不同样本或处理组之间的微生物组成差异。其核心原理包括以下几个步骤数据聚合首先对数据进行预处理去除低丰度的微生物分类单元OTU/ASV并对数据进行标准化或转换操作将绝对丰度转换为相对丰度。添加伪计数由于ANCOM分析过程中需要使用对数变换而相对丰度为0的分类群无法进行对数变换因此需要添加一个小的正数作为伪计数以解决这个问题。计算特征差异ANCOM使用W统计量来检测不同组之间的特征是否存在显著差异。W统计量的计算包括以下步骤对每个特征在所有样本中的相对丰度进行排序。将样本分为目标组和参考组通常情况下目标组是研究者感兴趣的组别而参考组是其他组别的合并。计算目标组和参考组中每个特征的累积相对丰度即从最低相对丰度的特征开始逐渐累积到当前特征的相对丰度之和。计算目标组和参考组中每个特征的平均累积相对丰度。计算目标组和参考组之间的差异值即目标组的平均累积相对丰度减去参考组的均累积相对丰度。计算每个特征的W统计量即将差异值除以其标准差。W统计量的绝对值大于1.96通常被认为是显著差异的特征。结果解读ANCOM的结果通常包括火山图和统计表格火山图展示了W统计量与中心对数比例CLR变换后的数据而统计表格列出了差异显著的特征及其相关信息。 ANCOM的优点包括能够处理稀疏数据、保持较低的误报率以及对异常值具有鲁棒性。然而它也存在一些限制例如对数据的分布假设敏感对样本数目和特征维度的要求较高 Anchor_v2.1.R脚本 ### This script was downloaded May 11 2020 from FrederickHuangLin/Ancom repo on github ## This script will be used to test ancom2 resultslibrary(exactRankTests) library(nlme) library(dplyr) library(ggplot2) library(compositions)# OTU table should be a matrix/data.frame with each feature in rows and sample in columns. # Metadata should be a matrix/data.frame containing the sample identifier. # Data Pre-Processing feature_table_pre_process function(feature_table, meta_data, sample_var, group_var NULL, out_cut 0.05, zero_cut 0.90, lib_cut, neg_lb){feature_table data.frame(feature_table, check.names FALSE)meta_data data.frame(meta_data, check.names FALSE)# Drop unused levelsmeta_data[] lapply(meta_data, function(x) if(is.factor(x)) factor(x) else x)# Match sample IDs between metadata and feature tablesample_ID intersect(meta_data[, sample_var], colnames(feature_table))feature_table feature_table[, sample_ID]meta_data meta_data[match(sample_ID, meta_data[, sample_var]), ]# 1. Identify outliers within each taxonif (!is.null(group_var)) {group meta_data[, group_var]z feature_table 1 # Add pseudo-count (1) f log(z); f[f 0] NA; f colMeans(f, na.rm T)f_fit lm(f ~ group)e rep(0, length(f)); e[!is.na(group)] residuals(f_fit)y t(t(z) - e)outlier_check function(x){# Fitting the mixture model using the algorithm of Peddada, S. Das, and JT Gene Hwang (2002)mu1 quantile(x, 0.25, na.rm T); mu2 quantile(x, 0.75, na.rm T)sigma1 quantile(x, 0.75, na.rm T) - quantile(x, 0.25, na.rm T); sigma2 sigma1pi 0.75n length(x)epsilon 100tol 1e-5score pi*dnorm(x, mean mu1, sd sigma1)/((1 - pi)*dnorm(x, mean mu2, sd sigma2))while (epsilon tol) {grp1_ind (score 1)mu1_new mean(x[grp1_ind]); mu2_new mean(x[!grp1_ind])sigma1_new sd(x[grp1_ind]); if(is.na(sigma1_new)) sigma1_new 0sigma2_new sd(x[!grp1_ind]); if(is.na(sigma2_new)) sigma2_new 0pi_new sum(grp1_ind)/npara c(mu1_new, mu2_new, sigma1_new, sigma2_new, pi_new)if(any(is.na(para))) breakscore pi_new * dnorm(x, mean mu1_new, sd sigma1_new)/((1-pi_new) * dnorm(x, mean mu2_new, sd sigma2_new))epsilon sqrt((mu1 - mu1_new)^2 (mu2 - mu2_new)^2 (sigma1 - sigma1_new)^2 (sigma2 - sigma2_new)^2 (pi - pi_new)^2)mu1 mu1_new; mu2 mu2_new; sigma1 sigma1_new; sigma2 sigma2_new; pi pi_new}if(mu1 1.96 * sigma1 mu2 - 1.96 * sigma2){if(pi out_cut){out_ind grp1_ind}else if(pi 1 - out_cut){out_ind (!grp1_ind)}else{out_ind rep(FALSE, n)}}else{out_ind rep(FALSE, n)}return(out_ind)}out_ind matrix(FALSE, nrow nrow(feature_table), ncol ncol(feature_table))out_ind[, !is.na(group)] t(apply(y, 1, function(i) unlist(tapply(i, group, function(j) outlier_check(j)))))feature_table[out_ind] NA}# 2. Discard taxa with zeros zero_cutzero_prop apply(feature_table, 1, function(x) sum(x 0, na.rm T)/length(x[!is.na(x)]))taxa_del which(zero_prop zero_cut)if(length(taxa_del) 0){feature_table feature_table[- taxa_del, ]}# 3. Discard samples with library size lib_cutlib_size colSums(feature_table, na.rm T)if(any(lib_size lib_cut)){subj_del which(lib_size lib_cut)feature_table feature_table[, - subj_del]meta_data meta_data[- subj_del, ]}# 4. Identify taxa with structure zerosif (!is.null(group_var)) {group factor(meta_data[, group_var])present_table as.matrix(feature_table)present_table[is.na(present_table)] 0present_table[present_table ! 0] 1p_hat t(apply(present_table, 1, function(x)unlist(tapply(x, group, function(y) mean(y, na.rm T)))))samp_size t(apply(feature_table, 1, function(x)unlist(tapply(x, group, function(y) length(y[!is.na(y)])))))p_hat_lo p_hat - 1.96 * sqrt(p_hat * (1 - p_hat)/samp_size)struc_zero (p_hat 0) * 1# Whether we need to classify a taxon into structural zero by its negative lower bound?if(neg_lb) struc_zero[p_hat_lo 0] 1# Entries considered to be structural zeros are set to be 0sstruc_ind struc_zero[, group]feature_table feature_table * (1 - struc_ind)colnames(struc_zero) paste0(structural_zero (, colnames(struc_zero), ))}else{struc_zero NULL}# 5. Return resultsres list(feature_table feature_table, meta_data meta_data, structure_zeros struc_zero)return(res) }# ANCOM main function ANCOM function(feature_table, meta_data, struc_zero NULL, main_var, p_adj_method BH, alpha 0.05, adj_formula NULL, rand_formula NULL, ...){# OTU table transformation: # (1) Discard taxa with structural zeros (if any); (2) Add pseudocount (1) and take logarithm.if (!is.null(struc_zero)) {num_struc_zero apply(struc_zero, 1, sum)comp_table feature_table[num_struc_zero 0, ]}else{comp_table feature_table}comp_table log(as.matrix(comp_table) 1)n_taxa dim(comp_table)[1]taxa_id rownames(comp_table)n_samp dim(comp_table)[2]# Determine the type of statistical test and its formula.if (is.null(rand_formula) is.null(adj_formula)) {# Basic model# Whether the main variable of interest has two levels or more?if (length(unique(meta_data%%pull(main_var))) 2) {# Two levels: Wilcoxon rank-sum testtfun exactRankTests::wilcox.exact} else{# More than two levels: Kruskal-Wallis testtfun stats::kruskal.test}# Formulatformula formula(paste(x ~, main_var, sep ))}else if (is.null(rand_formula) !is.null(adj_formula)) {# Model: ANOVAtfun stats::aov# Formulatformula formula(paste(x ~, main_var, , adj_formula, sep ))}else if (!is.null(rand_formula)) {# Model: Mixed-effects modeltfun nlme::lme# Formulaif (is.null(adj_formula)) {# Random intercept modeltformula formula(paste(x ~, main_var))}else {# Random coefficients/slope modeltformula formula(paste(x ~, main_var, , adj_formula))}}# Calculate the p-value for each pairwise comparison of taxa.p_data matrix(NA, nrow n_taxa, ncol n_taxa)colnames(p_data) taxa_idrownames(p_data) taxa_idfor (i in 1:(n_taxa - 1)) {# Loop through each taxon.# For each taxon i, additive log ratio (alr) transform the OTU table using taxon i as the reference.# e.g. the first alr matrix will be the log abundance data (comp_table) recursively subtracted # by the log abundance of 1st taxon (1st column) column-wisely, and remove the first i columns since:# the first (i - 1) columns were calculated by previous iterations, and# the i^th column contains all zeros.alr_data apply(comp_table, 1, function(x) x - comp_table[i, ]) # apply(...) allows crossing the data in a number of ways and avoid explicit use of loop constructs.# Here, we basically want to iteratively subtract each column of the comp_table by its i^th column.alr_data alr_data[, - (1:i), drop FALSE]n_lr dim(alr_data)[2] # number of log-ratios (lr)alr_data cbind(alr_data, meta_data) # merge with the metadata# P-valuesif (is.null(rand_formula) is.null(adj_formula)) {p_data[-(1:i), i] apply(alr_data[, 1:n_lr, drop FALSE], 2, function(x){tfun(tformula, data data.frame(x, alr_data, check.names FALSE))$p.value}) }else if (is.null(rand_formula) !is.null(adj_formula)) {p_data[-(1:i), i] apply(alr_data[, 1:n_lr, drop FALSE], 2, function(x){fit tfun(tformula, data data.frame(x, alr_data, check.names FALSE), na.action na.omit)summary(fit)[[1]][main_var, Pr(F)]})}else if (!is.null(rand_formula)) {p_data[-(1:i), i] apply(alr_data[, 1:n_lr, drop FALSE], 2, function(x){fit tfun(fixed tformula, data data.frame(x, alr_data, check.names FALSE),random formula(rand_formula),na.action na.omit, ...)anova(fit)[main_var, p-value]}) }}# Complete the p-value matrix.# What we got from above iterations is a lower triangle matrix of p-values.p_data[upper.tri(p_data)] t(p_data)[upper.tri(p_data)]diag(p_data) 1 # let p-values on diagonal equal to 1# Multiple comparisons correction.q_data apply(p_data, 2, function(x) p.adjust(x, method p_adj_method))# Calculate the W statistic of ANCOM.# For each taxon, count the number of q-values alpha.W apply(q_data, 2, function(x) sum(x alpha))# Organize outputsout_comp data.frame(taxa_id, W, row.names NULL, check.names FALSE)# Declare a taxon to be differentially abundant based on the quantile of W statistic.# We perform (n_taxa - 1) hypothesis testings on each taxon, so the maximum number of rejections is (n_taxa - 1).out_comp out_comp%%mutate(detected_0.9 ifelse(W 0.9 * (n_taxa -1), TRUE, FALSE),detected_0.8 ifelse(W 0.8 * (n_taxa -1), TRUE, FALSE),detected_0.7 ifelse(W 0.7 * (n_taxa -1), TRUE, FALSE),detected_0.6 ifelse(W 0.6 * (n_taxa -1), TRUE, FALSE))# Taxa with structural zeros are automatically declared to be differentially abundantif (!is.null(struc_zero)){out data.frame(taxa_id rownames(struc_zero), W Inf, detected_0.9 TRUE, detected_0.8 TRUE, detected_0.7 TRUE, detected_0.6 TRUE, row.names NULL, check.names FALSE)out[match(taxa_id, out$taxa_id), ] out_comp}else{out out_comp}# Draw volcano plot# Calculate clrclr_table apply(feature_table, 2, clr)# Calculate clr mean differenceeff_size apply(clr_table, 1, function(y) lm(y ~ x, data data.frame(y y, x meta_data %% pull(main_var),check.names FALSE))$coef[-1])if (is.matrix(eff_size)){# Data frame for the figuredat_fig data.frame(taxa_id out$taxa_id, t(eff_size), y out$W, check.names FALSE) %% mutate(zero_ind factor(ifelse(is.infinite(y), Yes, No), levels c(Yes, No))) %%gather(key group, value x, rownames(eff_size))# Replcace x to the name of covariatedat_fig$group sapply(dat_fig$group, function(x) gsub(x, paste0(main_var, ), x))# Replace Inf by (n_taxa - 1) for structural zerosdat_fig$y replace(dat_fig$y, is.infinite(dat_fig$y), n_taxa - 1)fig ggplot(data dat_fig) aes(x x, y y) geom_point(aes(color zero_ind)) facet_wrap(~ group) labs(x CLR mean difference, y W statistic) scale_color_discrete(name Structural zero, drop FALSE) theme_bw() theme(plot.title element_text(hjust 0.5), legend.position top,strip.background element_rect(fill white))fig } else{# Data frame for the figuredat_fig data.frame(taxa_id out$taxa_id, x eff_size, y out$W) %% mutate(zero_ind factor(ifelse(is.infinite(y), Yes, No), levels c(Yes, No)))# Replace Inf by (n_taxa - 1) for structural zerosdat_fig$y replace(dat_fig$y, is.infinite(dat_fig$y), n_taxa - 1)fig ggplot(data dat_fig) aes(x x, y y) geom_point(aes(color zero_ind)) labs(x CLR mean difference, y W statistic) scale_color_discrete(name Structural zero, drop FALSE) theme_bw() theme(plot.title element_text(hjust 0.5), legend.position top)fig}res list(out out, fig fig)return(res) }运行 deps c(exactRankTests, nlme, dplyr, ggplot2, compositions) for (dep in deps){if (dep %in% installed.packages()[,Package] FALSE){install.packages(dep)}library(dep, character.only TRUE) }#args[4] will contain path for the ancom codeargs - commandArgs(trailingOnly TRUE)if (length(args) 3) {stop(At least three arguments must be supplied, call.FALSE) }source(args[[4]])con - file(args[1]) file_1_line1 - readLines(con,n1) close(con)if(grepl(Constructed from biom file, file_1_line1)){ASV_table - read.table(args[1], sep\t, skip1, headerT, row.names 1, comment.char , quote, check.names F) }else{ASV_table - read.table(args[1], sep\t, headerT, row.names 1, comment.char , quote, check.names F) }groupings - read.table(args[2], sep\t, row.names 1, headerT, comment.char , quote, check.names F)#number of samples sample_num - length(colnames(ASV_table)) grouping_num - length(rownames(groupings))if(sample_num ! grouping_num){message(The number of samples in the ASV table and the groupings table are unequal)message(Will remove any samples that are not found in either the ASV table or the groupings table) }if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings and ASV table are in the same order) }else{rows_to_keep - intersect(colnames(ASV_table), rownames(groupings))groupings - groupings[rows_to_keep,,dropF]ASV_table - ASV_table[,rows_to_keep]if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings table was re-arrange to be in the same order as the ASV table)message(A total of , sample_num-length(colnames(ASV_table)), from the ASV_table)message(A total of , grouping_num-length(rownames(groupings)), from the groupings table)}else{stop(Unable to match samples between the ASV table and groupings table)} }groupings$Sample - rownames(groupings)prepro - feature_table_pre_process(feature_table ASV_table, meta_data groupings, sample_var Sample, group_var NULL, out_cut 0.05, zero_cut 0.90,lib_cut 1000, neg_lbFALSE)feature_table - prepro$feature_table metadata - prepro$meta_data struc_zero - prepro$structure_zeros#run ancom main_var - colnames(groupings)[1] p_adj_method BH alpha0.05 adj_formulaNULL rand_formulaNULL res - ANCOM(feature_table feature_table, meta_data metadata, struc_zero struc_zero, main_var main_var, p_adj_method p_adj_method,alphaalpha, adj_formula adj_formula, rand_formula rand_formula)write.table(res$out, fileargs[3], quoteFALSE, sep\t, col.names NA) Corncob Corncob 是一种用于微生物组数据分析的R包它专门用于对微生物相对丰度进行建模并测试协变量对相对丰度的影响。其核心原理包括以下几个方面相对丰度建模Corncob 通过统计模型来分析微生物的相对丰度数据考虑到数据的组成性特征即样本中各微生物的相对丰度总和为1。β-二项式分布Corncob 假设微生物的计数数据遵循β-二项式分布这种分布可以更好地描述微生物组数据中的离散性和过度离散现象。协变量效应测试Corncob 允许研究者测试一个或多个协变量对微生物相对丰度的影响这可以通过Wald检验等统计方法来实现。多重假设检验校正在分析过程中Corncob 会对多重比较问题进行校正以控制第一类错误率常用的校正方法包括Benjamini-Hochberg (BH) 方法。模型拟合与假设检验Corncob 进行模型拟合并对模型参数进行估计然后通过假设检验确定特定微生物分类群的相对丰度是否存在显著差异。稀疏性和零膨胀数据处理Corncob 还考虑到了微生物组数据的稀疏性即许多微生物在多数样本中可能未被检测到以及零膨胀问题即存在大量零计数的情况。 #### Run Corncoblibrary(corncob) library(phyloseq)#install corncob if its not installed. deps c(corncob) for (dep in deps){if (dep %in% installed.packages()[,Package] FALSE){if(depcorncob){devtools::install_github(bryandmartin/corncob)}elseif (!requireNamespace(BiocManager, quietly TRUE))install.packages(BiocManager)BiocManager::install(phyloseq)}library(dep, character.only TRUE) }args - commandArgs(trailingOnly TRUE)if (length(args) 2) {stop(At least three arguments must be supplied, call.FALSE) }con - file(args[1]) file_1_line1 - readLines(con,n1) close(con)if(grepl(Constructed from biom file, file_1_line1)){ASV_table - read.table(args[1], sep\t, skip1, headerT, row.names 1, comment.char , quote, check.names F) }else{ASV_table - read.table(args[1], sep\t, headerT, row.names 1, comment.char , quote, check.names F) }groupings - read.table(args[2], sep\t, row.names 1, headerT, comment.char , quote, check.names F)#number of samples sample_num - length(colnames(ASV_table)) grouping_num - length(rownames(groupings))#check if the same number of samples are being input. if(sample_num ! grouping_num){message(The number of samples in the ASV table and the groupings table are unequal)message(Will remove any samples that are not found in either the ASV table or the groupings table) }#check if order of samples match up. if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings and ASV table are in the same order) }else{rows_to_keep - intersect(colnames(ASV_table), rownames(groupings))groupings - groupings[rows_to_keep,,dropF]ASV_table - ASV_table[,rows_to_keep]if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings table was re-arrange to be in the same order as the ASV table)message(A total of , sample_num-length(colnames(ASV_table)), from the ASV_table)message(A total of , grouping_num-length(rownames(groupings)), from the groupings table)}else{stop(Unable to match samples between the ASV table and groupings table)} }#run corncob #put data into phyloseq object. colnames(groupings) colnames(groupings)[1] - placesOTU - phyloseq::otu_table(ASV_table, taxa_are_rows T) sampledata - phyloseq::sample_data(groupings, errorIfNULL T) phylo - phyloseq::merge_phyloseq(OTU, sampledata)my_formula - as.formula(paste(~,places,sep , collapse )) my_formula results - corncob::differentialTest(formula my_formula,phi.formula my_formula,phi.formula_null my_formula,formula_null ~ 1,testWald, dataphylo,bootF,fdr_cutoff 0.05)write.table(results$p_fdr, fileargs[[3]], sep\t, col.names NA, quoteF) write.table(results$p, filepaste0(args[[3]], _uncor, sep), sep\t, col.names NA, quoteF) DESeq2 DESeq2是一种用于微生物组数据差异分析的统计方法特别适用于处理计数数据如RNA-seq数据或微生物组的OTU操作分类单元计数。DESeq2的核心原理包括以下几个关键步骤数据标准化DESeq2首先对原始的计数数据进行标准化以校正样本间的测序深度差异。这是通过计算大小因子size factors实现的每个样本的大小因子乘以其总计数以调整测序深度。离散度估计DESeq2估计每个特征如OTU或基因的离散度即数据的变异程度。离散度是负二项分布的一个参数用于描述数据的过度离散现象。经验贝叶斯收缩DESeq2使用经验贝叶斯方法对离散度进行收缩即利用所有特征的离散度估计来改进个别特征的离散度估计这有助于提高统计估计的稳定性。负二项分布建模DESeq2假设计数数据遵循负二项分布该分布是处理计数数据的常用分布特别适用于处理微生物组数据中的零膨胀现象。设计公式DESeq2通过设计公式来考虑实验设计包括处理效应、批次效应和其他协变量从而允许研究者评估特定条件下的微生物差异丰度。假设检验DESeq2使用统计检验来确定不同样本组之间特定微生物的丰度是否存在显著差异。这通常涉及比较零假设两组间无差异和备择假设两组间有差异。多重检验校正由于微生物组数据涉及大量多重比较DESeq2使用Benjamini-Hochberg方法进行多重检验校正以控制假发现率FDR。结果解释DESeq2提供了丰富的结果输出包括P值、校正后的P值、对数倍数变化log2 fold change等这些结果可以帮助研究者识别和解释数据中的生物学意义。 #Run_DeSeq2deps c(DESeq2) for (dep in deps){if (dep %in% installed.packages()[,Package] FALSE){if (!requireNamespace(BiocManager, quietly TRUE))install.packages(BiocManager)BiocManager::install(DESeq2)}library(dep, character.only TRUE) }library(DESeq2)args - commandArgs(trailingOnly TRUE) #test if there is an argument supply if (length(args) 2) {stop(At least three arguments must be supplied, call.FALSE) }con - file(args[1]) file_1_line1 - readLines(con,n1) close(con)if(grepl(Constructed from biom file, file_1_line1)){ASV_table - read.table(args[1], sep\t, skip1, headerT, row.names 1,comment.char , quote, check.names F) }else{ASV_table - read.table(args[1], sep\t, headerT, row.names 1,comment.char , quote, check.names F) }groupings - read.table(args[2], sep\t, row.names 1, headerT, comment.char , quote, check.names F)#number of samples sample_num - length(colnames(ASV_table)) grouping_num - length(rownames(groupings))#check if the same number of samples are being input. if(sample_num ! grouping_num){message(The number of samples in the ASV table and the groupings table are unequal)message(Will remove any samples that are not found in either the ASV table or the groupings table) }#check if order of samples match up. if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings and ASV table are in the same order) }else{rows_to_keep - intersect(colnames(ASV_table), rownames(groupings))groupings - groupings[rows_to_keep,,dropF]ASV_table - ASV_table[,rows_to_keep]if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings table was re-arrange to be in the same order as the ASV table)message(A total of , sample_num-length(colnames(ASV_table)), from the ASV_table)message(A total of , grouping_num-length(rownames(groupings)), from the groupings table)}else{stop(Unable to match samples between the ASV table and groupings table)} }colnames(groupings)[1] - Groupings #Run Deseq2dds - DESeq2::DESeqDataSetFromMatrix(countData ASV_table,colDatagroupings,design ~ Groupings) dds_res - DESeq2::DESeq(dds, sfType poscounts)res - results(dds_res, tidyT, formatDataFrame)rownames(res) - res$row res - res[,-1]write.table(res, fileargs[3], quoteFALSE, sep\t, col.names NA)message(Results written to , args[3]) edgeR EdgeREmpirical Analysis of Digital Gene Expression in R是一种用于分析计数数据的统计方法特别适用于微生物组学、转录组学和其他高通量测序技术产生的数据。EdgeR的核心原理包括以下几个关键步骤数据标准化EdgeR通过计算标准化因子Normalization Factors来调整不同样本的测序深度或库大小确保比较的公平性。离散度估计EdgeR估计每个基因或OTU的离散度Dispersion这是衡量数据变异程度的一个参数对于后续的统计检验至关重要。负二项分布建模EdgeR假设数据遵循负二项分布这是一种常用于计数数据的分布模型可以处理数据的过度离散现象。经验贝叶斯收缩EdgeR使用经验贝叶斯方法对离散度进行收缩Shrinkage通过借用全局信息来提高对每个基因或OTU离散度估计的准确性。设计矩阵EdgeR通过设计矩阵Design Matrix来表示实验设计包括处理效应、时间效应、批次效应等允许研究者评估不同条件下的基因或OTU表达差异。统计检验EdgeR进行似然比检验Likelihood Ratio Test, LRT或精确检验Exact Test以确定不同样本组之间特定基因或OTU的表达是否存在显著差异。多重检验校正EdgeR使用多种方法进行多重检验校正如Benjamini-HochbergBH方法以控制假发现率FDR。结果解释EdgeR提供了丰富的结果输出包括P值、校正后的P值、对数倍数变化Log Fold Change, LFC等帮助研究者识别和解释数据中的生物学变化。 deps c(edgeR, phyloseq) for (dep in deps){if (dep %in% installed.packages()[,Package] FALSE){if (!requireNamespace(BiocManager, quietly TRUE))install.packages(BiocManager)BiocManager::install(deps)}library(dep, character.only TRUE) }### Taken from phyloseq authors at: https://joey711.github.io/phyloseq-extensions/edgeR.html phyloseq_to_edgeR function(physeq, group, methodRLE, ...){require(edgeR)require(phyloseq)# Enforce orientation.if( !taxa_are_rows(physeq) ){ physeq - t(physeq) }x as(otu_table(physeq), matrix)# Add one to protect against overflow, log(0) issues.x x 1# Check group argumentif( identical(all.equal(length(group), 1), TRUE) nsamples(physeq) 1 ){# Assume that group was a sample variable name (must be categorical)group get_variable(physeq, group)}# Define gene annotations (genes) as tax_tabletaxonomy tax_table(physeq, errorIfNULLFALSE)if( !is.null(taxonomy) ){taxonomy data.frame(as(taxonomy, matrix))} # Now turn into a DGEListy DGEList(countsx, groupgroup, genestaxonomy, remove.zeros TRUE, ...)# Calculate the normalization factorsz calcNormFactors(y, methodmethod)# Check for division by zero inside calcNormFactorsif( !all(is.finite(z$samples$norm.factors)) ){stop(Something wrong with edgeR::calcNormFactors on this data,non-finite $norm.factors, consider changing method argument)}# Estimate dispersionsreturn(estimateTagwiseDisp(estimateCommonDisp(z))) }args - commandArgs(trailingOnly TRUE) #test if there is an argument supply if (length(args) 2) { stop(At least three arguments must be supplied, call.FALSE) }con - file(args[1]) file_1_line1 - readLines(con,n1) close(con)if(grepl(Constructed from biom file, file_1_line1)){ASV_table - read.table(args[1], sep\t, skip1, headerT, row.names 1, comment.char , quote, check.names F) }else{ASV_table - read.table(args[1], sep\t, headerT, row.names 1, comment.char , quote, check.names F) }groupings - read.table(args[2], sep\t, row.names 1, headerT, comment.char , quote, check.names F)#number of samples sample_num - length(colnames(ASV_table)) grouping_num - length(rownames(groupings))#check if the same number of samples are being input. if(sample_num ! grouping_num){message(The number of samples in the ASV table and the groupings table are unequal)message(Will remove any samples that are not found in either the ASV table or the groupings table) }#check if order of samples match up. if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings and ASV table are in the same order) }else{rows_to_keep - intersect(colnames(ASV_table), rownames(groupings))groupings - groupings[rows_to_keep,,dropF]ASV_table - ASV_table[,rows_to_keep]if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings table was re-arrange to be in the same order as the ASV table)message(A total of , sample_num-length(colnames(ASV_table)), from the ASV_table)message(A total of , grouping_num-length(rownames(groupings)), from the groupings table)}else{stop(Unable to match samples between the ASV table and groupings table)} }OTU - phyloseq::otu_table(ASV_table, taxa_are_rows T) sampledata - phyloseq::sample_data(groupings, errorIfNULL T) phylo - phyloseq::merge_phyloseq(OTU, sampledata)test - phyloseq_to_edgeR(physeq phylo, groupcolnames(groupings)[1])et exactTest(test)tt topTags(et, nnrow(test$table), adjust.methodfdr, sort.byPValue) res - tt.Data[[1]]write.table(res, fileargs[3], quoteF, sep\t, col.names NA) Limma-Voom-TMM Limma-Voom-TMMTrimmed Mean of M-values是一种用于微生物组数据差异分析的方法它结合了limma、voom和TMMTrimmed Mean of M-values技术的优势以处理和分析来自高通量测序技术如RNA-seq的数据。下面是Limma-Voom-TMM方法的基本原理数据预处理首先使用limma包中的预处理功能对原始的测序数据进行质量控制和标准化处理。TMM标准化TMM是一种用于RNA-seq数据的标准化方法它通过计算所有基因的几何平均表达值然后对每个基因的表达值进行缩放以校正样本间的测序深度差异。voom转换voom是一种用于将计数数据转换为适合线性模型分析的格式的方法。它通过对数据进行对数变换和中心化处理将原始的计数数据转换为相对于某个参照样本的比例从而减少数据的离散性。线性模型Limma-Voom方法使用线性模型来分析数据这种模型可以包括多个协变量和批次效应以评估不同样本组之间的基因表达差异。经验贝叶斯估计Limma-Voom方法使用经验贝叶斯统计来估计基因表达的离散度这有助于提高对基因表达变化的检测灵敏度。统计检验在模型拟合之后Limma-Voom进行统计检验来确定基因表达是否存在显著差异。这通常涉及到似然比检验LRT或t检验。多重检验校正由于同时测试多个基因Limma-Voom使用多重检验校正方法如Benjamini-Hochberg方法来控制假发现率FDR。结果解释最终Limma-Voom提供了一系列结果包括P值、校正后的P值、对数倍数变化Log Fold Change, LFC等这些结果可以帮助研究者识别差异表达的基因或微生物类群。 deps c(edgeR) for (dep in deps){if (dep %in% installed.packages()[,Package] FALSE){if (!requireNamespace(BiocManager, quietly TRUE))install.packages(BiocManager)BiocManager::install(deps)}library(dep, character.only TRUE) }args - commandArgs(trailingOnly TRUE) #test if there is an argument supply if (length(args) 2) {stop(At least three arguments must be supplied, call.FALSE) }con - file(args[1]) file_1_line1 - readLines(con,n1) close(con)if(grepl(Constructed from biom file, file_1_line1)){ASV_table - read.table(args[1], sep\t, skip1, headerT, row.names 1,comment.char , quote, check.names F) }else{ASV_table - read.table(args[1], sep\t, headerT, row.names 1,comment.char , quote, check.names F) }groupings - read.table(args[2], sep\t, row.names 1, headerT, comment.char , quote, check.names F)#number of samples sample_num - length(colnames(ASV_table)) grouping_num - length(rownames(groupings))#check if the same number of samples are being input. if(sample_num ! grouping_num){message(The number of samples in the ASV table and the groupings table are unequal)message(Will remove any samples that are not found in either the ASV table or the groupings table) }#check if order of samples match up. if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings and ASV table are in the same order) }else{rows_to_keep - intersect(colnames(ASV_table), rownames(groupings))groupings - groupings[rows_to_keep,,dropF]ASV_table - ASV_table[,rows_to_keep]if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings table was re-arrange to be in the same order as the ASV table)message(A total of , sample_num-length(colnames(ASV_table)), from the ASV_table)message(A total of , grouping_num-length(rownames(groupings)), from the groupings table)}else{stop(Unable to match samples between the ASV table and groupings table)} }DGE_LIST - DGEList(ASV_table) ### do normalization ### Reference sample will be the sample with the highest read depth### check if upper quartile method works for selecting reference Upper_Quartile_norm_test - calcNormFactors(DGE_LIST, methodupperquartile)summary_upper_quartile - summary(Upper_Quartile_norm_test$samples$norm.factors)[3] if(is.na(summary_upper_quartile) | is.infinite(summary_upper_quartile)){message(Upper Quartile reference selection failed will use find sample with largest sqrt(read_depth) to use as reference)Ref_col - which.max(colSums(sqrt(ASV_table)))DGE_LIST_Norm - calcNormFactors(DGE_LIST, method TMM, refColumn Ref_col)fileConn-file(args[[4]])writeLines(c(Used max square root read depth to determine reference sample), fileConn)close(fileConn)}else{DGE_LIST_Norm - calcNormFactors(DGE_LIST, methodTMM) }## make matrix for testing colnames(groupings) - c(comparison) mm - model.matrix(~comparison, groupings)voomvoom - voom(DGE_LIST_Norm, mm, plotF)fit - lmFit(voomvoom,mm) fit - eBayes(fit) res - topTable(fit, coef2, nnrow(DGE_LIST_Norm), sort.bynone) write.table(res, fileargs[3], quoteF, sep\t, col.names NA) Limma-Voom-TMMwsp Limma-Voom-TMMwspTrimmed Mean of M-values with Singleton Pairing是一种用于处理和分析微生物组数据的差异分析方法它结合了几种不同的统计技术来提高分析的准确性和可靠性。下面是Limma-Voom-TMMwsp方法的基本原理数据预处理首先对原始的测序数据进行质量控制和预处理操作包括去除低质量的读段、过滤掉可能的污染物等。TMMwsp标准化TMMwsp是一种标准化方法用于校正不同样本之间的测序深度差异。它通过对每个样本的计数数据应用TMM方法并在计算中考虑单例配对singleton pairing以减少由于样本间测序深度不同带来的偏差。voom转换voom是一种转换方法用于将原始的计数数据转换为适合线性模型分析的格式。voom通过对数据进行对数变换并根据样本之间的差异来估计每个基因或OTU的准确差异度量。线性模型Limma-Voom方法使用线性模型来分析数据这种模型可以包括多个协变量和批次效应以评估不同样本组之间的基因或OTU表达差异。经验贝叶斯估计Limma-Voom方法使用经验贝叶斯统计来估计基因或OTU表达的离散度这有助于提高对基因或OTU表达变化的检测灵敏度。统计检验在模型拟合之后Limma-Voom进行统计检验来确定基因或OTU表达是否存在显著差异。这通常涉及到似然比检验LRT或t检验。多重检验校正由于同时测试多个基因或OTULimma-Voom使用多重检验校正方法如Benjamini-Hochberg方法来控制假发现率FDR。结果解释最终Limma-Voom提供了一系列结果包括P值、校正后的P值、对数倍数变化Log Fold Change, LFC等这些结果可以帮助研究者识别差异表达的基因或微生物类群。 deps c(edgeR) for (dep in deps){if (dep %in% installed.packages()[,Package] FALSE){if (!requireNamespace(BiocManager, quietly TRUE))install.packages(BiocManager)BiocManager::install(deps)}library(dep, character.only TRUE) }args - commandArgs(trailingOnly TRUE) #test if there is an argument supply if (length(args) 2) {stop(At least three arguments must be supplied, call.FALSE) }con - file(args[1]) file_1_line1 - readLines(con,n1) close(con)if(grepl(Constructed from biom file, file_1_line1)){ASV_table - read.table(args[1], sep\t, skip1, headerT, row.names 1,comment.char , quote, check.names F) }else{ASV_table - read.table(args[1], sep\t, headerT, row.names 1,comment.char , quote, check.names F) }groupings - read.table(args[2], sep\t, row.names 1, headerT, comment.char , quote, check.names F)#number of samples sample_num - length(colnames(ASV_table)) grouping_num - length(rownames(groupings))#check if the same number of samples are being input. if(sample_num ! grouping_num){message(The number of samples in the ASV table and the groupings table are unequal)message(Will remove any samples that are not found in either the ASV table or the groupings table) }#check if order of samples match up. if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings and ASV table are in the same order) }else{rows_to_keep - intersect(colnames(ASV_table), rownames(groupings))groupings - groupings[rows_to_keep,,dropF]ASV_table - ASV_table[,rows_to_keep]if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings table was re-arrange to be in the same order as the ASV table)message(A total of , sample_num-length(colnames(ASV_table)), from the ASV_table)message(A total of , grouping_num-length(rownames(groupings)), from the groupings table)}else{stop(Unable to match samples between the ASV table and groupings table)} }DGE_LIST - DGEList(ASV_table) ### do normalization ### Reference sample will be the sample with the highest read depth### check if upper quartile method works for selecting reference Upper_Quartile_norm_test - calcNormFactors(DGE_LIST, methodupperquartile)summary_upper_quartile - summary(Upper_Quartile_norm_test$samples$norm.factors)[3] if(is.na(summary_upper_quartile) | is.infinite(summary_upper_quartile)){message(Upper Quartile reference selection failed will use find sample with largest sqrt(read_depth) to use as reference)Ref_col - which.max(colSums(sqrt(ASV_table)))DGE_LIST_Norm - calcNormFactors(DGE_LIST, method TMMwsp, refColumn Ref_col)fileConn-file(args[[4]])writeLines(c(Used max square root read depth to determine reference sample), fileConn)close(fileConn) }else{DGE_LIST_Norm - calcNormFactors(DGE_LIST, methodTMMwsp) }## make matrix for testing colnames(groupings) - c(comparison) mm - model.matrix(~comparison, groupings)voomvoom - voom(DGE_LIST_Norm, mm, plotF)fit - lmFit(voomvoom,mm) fit - eBayes(fit) res - topTable(fit, coef2, nnrow(DGE_LIST_Norm), sort.bynone) write.table(res, fileargs[3], quoteF, sep\t, col.names NA) Maaslin2 MaAsLin2是一种用于微生物组数据差异分析的统计方法它专门设计用来处理微生物组数据的复杂性包括噪声、稀疏性零膨胀、高维度、极端非正态分布以及通常以计数或组成性测量的形式出现的数据。以下是MaAsLin2方法的基本原理多变量统计框架MaAsLin2Microbiome Multivariable Associations with Linear Models使用广义线性和混合模型来适应各种现代流行病学研究设计包括横截面和纵向研究。数据预处理MaAsLin2提供了预处理模块用于处理元数据和微生物特征中的缺失值、未知数据值和异常值。归一化和转换MaAsLin2可以对微生物测量结果进行归一化和转换以解决样本中可变的覆盖深度。特征标准化可选择执行特征标准化并使用元数据的子集或完整补充来模拟生成的质量控制微生物特征。多变量模型MaAsLin2使用各种可能的多变量模型之一为每个特征的每个元数据关联定义p值。重复测量的处理在存在重复测量的情况下MaAsLin2通过在混合模型范式中适当地模拟对象内或环境相关来识别协变量相关的微生物特征同时还通过指定对象间随机效应来解释个体变异在模型中。统计检验MaAsLin2可以识别每个单独特征与元数据对之间的关联促进特征方面的协变量调整并提高转化和基本生物学应用的可解释性。结果校正MaAsLin2还提供多重检验校正功能以控制第一类错误率。 if(!requireNamespace(BiocManager, quietly TRUE))install.packages(BiocManager) #BiocManager::install(Maaslin2)args - commandArgs(trailingOnly TRUE)if (length(args) 2) {stop(At least three arguments must be supplied, call.FALSE) }con - file(args[1]) file_1_line1 - readLines(con,n1) close(con)if(grepl(Constructed from biom file, file_1_line1)){ASV_table - read.table(args[1], sep\t, skip1, headerT, row.names 1, comment.char , quote, check.names F) }else{ASV_table - read.table(args[1], sep\t, headerT, row.names 1, comment.char , quote, check.names F) } groupings - read.table(args[2], sep\t, row.names 1, headerT, comment.char , quote, check.names F)#number of samples sample_num - length(colnames(ASV_table)) grouping_num - length(rownames(groupings))if(sample_num ! grouping_num){message(The number of samples in the ASV table and the groupings table are unequal)message(Will remove any samples that are not found in either the ASV table or the groupings table) }if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings and ASV table are in the same order) }else{rows_to_keep - intersect(colnames(ASV_table), rownames(groupings))groupings - groupings[rows_to_keep,,dropF]ASV_table - ASV_table[,rows_to_keep]if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings table was re-arrange to be in the same order as the ASV table)message(A total of , sample_num-length(colnames(ASV_table)), from the ASV_table)message(A total of , grouping_num-length(rownames(groupings)), from the groupings table)}else{stop(Unable to match samples between the ASV table and groupings table)} }library(Maaslin2)ASV_table - data.frame(t(ASV_table), check.rows F, check.names F, stringsAsFactors F)fit_data - Maaslin2(ASV_table, groupings,args[3], transform AST,fixed_effects c(colnames(groupings[1])),standardize FALSE, plot_heatmap F, plot_scatter F) metagenomeSeq metagenomeSeq是一种用于分析微生物组测序数据的统计学方法它可以帮助研究人员发现不同条件下微生物组的差异丰度。以下是metagenomeSeq方法的基本原理数据预处理metagenomeSeq首先对原始的测序数据进行质量控制和预处理以确保数据的准确性和可靠性。归一化对测序数据进行归一化处理以校正样本间的测序深度差异确保不同样本间的比较是公平的。统计模型metagenomeSeq使用统计模型来分析数据尤其是针对组成性数据的模型比如零膨胀模型或负二项分布模型这些模型可以处理微生物组数据的离散性和零膨胀问题。差异丰度分析metagenomeSeq确定在两个或多个组之间具有差异丰度的特征如OTU、物种等通过比较它们的相对丰度来识别差异。多重检验校正由于同时测试多个特征metagenomeSeq使用多重检验校正方法如Benjamini-Hochberg方法来控制假发现率FDR。结果解释metagenomeSeq提供了丰富的结果输出包括P值、校正后的P值、对数倍数变化Log Fold Change, LFC等这些结果可以帮助研究者识别和解释数据中的生物学意义。 deps c(metagenomeSeq) for (dep in deps){if (dep %in% installed.packages()[,Package] FALSE){if (!requireNamespace(BiocManager, quietly TRUE))install.packages(BiocManager)BiocManager::install(deps)}library(dep, character.only TRUE) }args - commandArgs(trailingOnly TRUE) #test if there is an argument supply if (length(args) 2) {stop(At least three arguments must be supplied, call.FALSE) }con - file(args[1]) file_1_line1 - readLines(con,n1) close(con)if(grepl(Constructed from biom file, file_1_line1)){ASV_table - read.table(args[1], sep\t, skip1, headerT, row.names 1, comment.char , quote, check.names F) }else{ASV_table - read.table(args[1], sep\t, headerT, row.names 1, comment.char , quote, check.names F) }groupings - read.table(args[2], sep\t, row.names 1, headerT, comment.char , quote, check.names F)#number of samples sample_num - length(colnames(ASV_table)) grouping_num - length(rownames(groupings))#check if the same number of samples are being input. if(sample_num ! grouping_num){message(The number of samples in the ASV table and the groupings table are unequal)message(Will remove any samples that are not found in either the ASV table or the groupings table) }#check if order of samples match up. if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings and ASV table are in the same order) }else{rows_to_keep - intersect(colnames(ASV_table), rownames(groupings))groupings - groupings[rows_to_keep,,dropF]ASV_table - ASV_table[,rows_to_keep]if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings table was re-arrange to be in the same order as the ASV table)message(A total of , sample_num-length(colnames(ASV_table)), from the ASV_table)message(A total of , grouping_num-length(rownames(groupings)), from the groupings table)}else{stop(Unable to match samples between the ASV table and groupings table)} }data_list - list() data_list[[counts]] - ASV_table data_list[[taxa]] - rownames(ASV_table)pheno - AnnotatedDataFrame(groupings) pheno counts - AnnotatedDataFrame(ASV_table) feature_data - data.frame(ASVrownames(ASV_table),ASV2rownames(ASV_table)) feature_data - AnnotatedDataFrame(feature_data) rownames(feature_data) - feature_datadata$ASVtest_obj - newMRexperiment(counts data_list$counts, phenoData pheno, featureData feature_data)p - cumNormStat(test_obj, pFlag T) ptest_obj_norm - cumNorm(test_obj, pp)fromula - as.formula(paste(~1, colnames(groupings)[1], sep )) pd - pData(test_obj_norm) mod - model.matrix(fromula, datapd) regres - fitFeatureModel(test_obj_norm, mod)res_table - MRfulltable(regres, number length(rownames(ASV_table)))write.table(res_table, fileargs[3], quoteF, sep\t, col.names NA) T-test-rare T-test-rarefaction是一种用于微生物组数据分析的统计方法它结合了两种不同的技术t检验和稀释抽样rarefaction。这种方法特别适用于处理微生物群落的计数数据尤其是在样本数量有限或样本之间测序深度不均一的情况下。以下是T-test-rarefaction方法的基本原理稀释抽样Rarefaction在微生物组数据中不同的样本可能具有不同的测序深度即不同的读段总数。为了在比较之前使样本之间具有可比性可以使用稀释抽样技术即随机选择每个样本中相同数量的读段进行分析从而模拟所有样本具有相同的测序深度。数据标准化在稀释抽样之后数据通常需要进行标准化处理以确保不同样本间的比较是公平的。这可以通过将读段计数转换为相对丰度来实现。t检验在数据经过稀释抽样和标准化之后使用t检验来比较两组样本中特定微生物分类单元如OTU或物种的丰度是否存在显著差异。t检验是一种常用的统计方法用于确定两组独立样本均值是否存在显著差异。重复稀释抽样为了评估t检验结果的稳健性可以多次进行稀释抽样每次抽样后都进行t检验然后计算显著性结果的一致性。效应量计算T-test-rarefaction方法还会计算效应量如Cohen’s d来衡量两组样本均值差异的实际重要性。多重检验校正由于同时对多个分类单元进行t检验T-test-rarefaction使用多重检验校正方法如Benjamini-Hochberg方法来控制假发现率FDR。结果解释T-test-rarefaction提供了P值、校正后的P值、效应量等结果帮助研究者识别和解释数据中的生物学意义。 ### Run T Test on rarifiedargs - commandArgs(trailingOnly TRUE)if (length(args) 2) {stop(At least three arguments must be supplied, call.FALSE) }con - file(args[1]) file_1_line1 - readLines(con,n1) close(con)if(grepl(Constructed from biom file, file_1_line1)){ASV_table - read.table(args[1], sep\t, skip1, headerT, row.names 1, comment.char , quote, check.names F) }else{ASV_table - read.table(args[1], sep\t, headerT, row.names 1, comment.char , quote, check.names F) }groupings - read.table(args[2], sep\t, row.names 1, headerT, comment.char , quote, check.names F)#number of samples sample_num - length(colnames(ASV_table)) grouping_num - length(rownames(groupings))if(sample_num ! grouping_num){message(The number of samples in the ASV table and the groupings table are unequal)message(Will remove any samples that are not found in either the ASV table or the groupings table) }if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings and ASV table are in the same order) }else{rows_to_keep - intersect(colnames(ASV_table), rownames(groupings))groupings - groupings[rows_to_keep,,dropF]ASV_table - ASV_table[,rows_to_keep]if(identical(colnames(ASV_table), rownames(groupings))T){message(Groupings table was re-arrange to be in the same order as the ASV table)message(A total of , sample_num-length(colnames(ASV_table)), from the ASV_table)message(A total of , grouping_num-length(rownames(groupings)), from the groupings table)}else{stop(Unable to match samples between the ASV table and groupings table)} }colnames(groupings) colnames(groupings)[1] - places#apply wilcox test to rarified table pvals - apply(ASV_table, 1, function(x) t.test(x ~ groupings[,1], exactF)$p.value)write.table(pvals, fileargs[[3]], sep\t, col.names NA, quoteF)更多内容请前往识别差异微生物的方法汇总总结该研究证实上述不同差异分析方法识别出了截然不同的数量和显著ASV扩增序列变体集合结果依赖于数据预处理。对于许多方法来说识别出的特征数量与数据的某些方面相关如样本大小、测序深度以及群落差异的效应大小。ALDEx2和ANCOM-II在不同研究中产生最一致的结果并且与不同方法结果交集的一致性最好。

查看全文

http://www.hkea.cn/news/14538437/