clusterProfiler非模式物种 KEGG与GO富集分析

非模式物种 KEGG与GO富集分析

1.问题

做基因的GO或者kegg富集分析，需要基因组当中所有基因的GO与KEGG数据库的注释信息，对于做模式物种的人来说很简单，有现成的注释结果，直接使用就可以，比如人里面可以直接用clusterProfiler进行基因集的富集分析；但是，对于非模式动物与植物的研究对象往往没有现成的注释结果，就没法直接进行富集分析；

因此学会基因功能批量注释非常重要；这里介绍一种方法可以针对所有的非模式物种进行GO和KEGG富集分析；

2.基因批量注释

使用eggNOG对基因组进行注释，要想进行富集分析，首先要有背景数据集的GO注释和KEGG注释，这里选用eggNOG进行注释。

是在线服务器，点点鼠标上传就能注释，无需复杂配置。eggNOG虽然是web server，但一次最多可以注释10万条序列，应该是完全可以满足需求的。

网站：http://eggnog-mapper.embl.de/

将自己的基因对应的cds序列或者蛋白序列提交到该网站即可注释：

有时候在线注释排队等候时间太久：

可以参考这里本地计算机批量注释：https://www.omicsclass.com/article/1515

3.GO富集分析需要准备的文件：

研究物种基因组中所有基因对应的GO文件：

go2gene.tsv : 通过eggNOG注释结果文件整理得到

GO	GENE	CLASS
GO:0000165	Pg_S3686.2	biological_process
GO:0003674	Pg_S3686.2	molecular_function
...	...	...

go2name.tsv：GO term对应的功能描述文件

首先需要去GO下载GO的obo文件,这里我使用go-basic.obo然后我写了个脚本可以把obo文件解析为如下格式：

http://purl.obolibrary.org/obo/go/go-basic.obo

GO	DESC	CLASS
GO:0000001	mitochondrion inheritance	biological_process
GO:0000007	low-affinity zinc ion transmembrane transporter activity	molecular_function

4. KEGG富集分析需要的文件

ko2gene.tsv : 通过eggNOG注释结果文件整理得到

Pathway	GENE
ko00920	Pg_S3686.2
ko01100	Pg_S33386.2

pathway2name.tsv ko通路对应的名称: https://www.genome.jp/kegg/pathway.html

pathway	DESC
ko00440	Phosphonate and phosphinate metabolism
ko00450	Selenocompound metabolism
ko00460	Cyanoamino acid metabolism
ko00471	D-Glutamine and D-glutamate metabolism
ko00472	D-Arginine and D-ornithine metabolism
ko00473	D-Alanine metabolism
ko00480	Glutathione metabolism
ko00510	N-Glycan biosynthesis
ko00513	Various types of N-glycan biosynthesis
ko00512	Mucin type O-glycan biosynthesis

5.富集分析

利用clusterProfiler中的enricher这个通用函数进行富集分析：

library(clusterProfiler)
ko2name <- read.delim('ko2name.tsv', stringsAsFactors=FALSE)
ko2gene <- read.delim('ko2gene.tsv', stringsAsFactors=FALSE)
go2name <- read.delim('gog2name.tsv', stringsAsFactors=FALSE)
go2gene <-read.delim('go2gene.tsv', stringsAsFactors=FALSE)
# 前面获取gene list的过程略
gene_list<- read.delim('gene.tsv', stringsAsFactors=FALSE)
# GO富集
## 拆分成BP，MF，CC三个数据框
go2gene = split(go2gene , with(go2gene , CLASS))
## 以MF为例
enricher（gene_list,TERM2GENE=go2gene [['molecular_function']][c(1,2)],TERM2NAME=go2name ）
# KEGG富集
enricher（gene_list,TERM2GENE=ko2name ,TERM2NAME=ko2gene )

具体代码与操作见视频课程：https://bdtcd.xetslk.com/s/29EigM

发表于 2021-06-30 15:21
阅读 ( 13940 )
分类：转录组