ANNOVAR人类各个数据库变异注释结果表格说明

ANNOVAR人类各个数据库变异注释结果表格说明,SnpEff


  •  ANNOVAR注释结果中各列的表头说明:
ID 详解
Chr 染色体
Start 变异位点在染色体上的起始位置
End 变异位点在染色体上的结束位置
Ref 参考基因组碱基型
Alt 变异碱基型
Func.refGene 对变异位点所在的区域进行注释(exonic, splicing, UTR5, UTR3, intronic, ncRNA_exonic, ncRNA_intronic, ncRNA_UTR3, ncRNA_UTR5, ncRNA _splicing, upstream, downstream, intergenic)
Gene.refGene 列出该变异位点相关的转录本(只有功能符合 Func 列的转录本才列出)。如果 Func 为intergenic,此处列出两侧的基因名
GeneDetail.refGene 描述 UTR、splicing、ncRNA_splicing 或 intergenic 区域的变异情况。当 Func 列的值为exonic、ncRNA_exonic、intronic、ncRNA_intronic、upstream、downstream、upstream;downstream、ncRNA_UTR3、ncRNA_UTR5 时,该列为空;当 Func 列的值为 intergenic 时,该列格式为dist=1366;dist=22344,表示该变异位点距离两侧基因的距离
ExonicFunc.refGene 外显子区的 SNV or InDel 变异类型(SNV 的变异类型包括 synonymous_SNV, missense_SNV, stopgain_SNV, stopgloss_SNV 和 unknown;Indel 的变异类型包括 frameshift insertion, frameshift deletion, stopgain, stoploss, nonframeshift insertion, nonframeshift deletion 和 unknown)
AAChange.refGene 氨基酸改变,只有当 Func 列为 exonic 或 exonic;splicing 时,该列才有结果。按照每个转录本进行注释(例如,NADK:NM_001198995:exon10:c.1240_1241insAGG:p.G414delinsEG,其中,NADK 表示该变异所在的基因名称,NM_001198995 表示该变异所在的转录本 ID,exon10 表示该变异位于转录本的第 10 个外显子上,c.1240_1241insAGG 表示该变异引起 cDNA 在第 1240 和 1241 位之间插入 AGG,p.G414delinsEG 表示该变异引起蛋白序列在第 414 位上的氨基酸由 Gly 变为 Gly-Glu。再如, FMN2:NM_020066:exon1:c.160_162del:p.54_54del,表示该变异引起 cDNA 的第 160 到 162 位发生删除,p.54_54del 表示该变异引起蛋白序列在第 54 位上的氨基酸删除)
cytoBand 该变异位点所处的染色体区段(利用 Giemas 染色观察得到的)
genomicSuperDups 基因组中的重复片段
nci60 NCI-60 human tumor cell line panel exome sequencing allele frequency data
esp6500siv2_all 国家心肺和血液研究所外显子组测序计划(NHLBI-ESP project,esp6500si_all 数据库中包含SNP 变异、Indel 变异和Y 染色体上的变异)的所有个体中,突变碱基的等位基因频率(alternative allele frequency)。 
ALL.sites.2015_08 给出千人基因组计划数据(2015 年 8 月公布的版本)的所有人群中,该变异位点上突变碱基的等位基因频率
EAS.sites.2015_08 给出千人基因组计划数据(2015 年 8 月公布的版本)的亚洲人群中,该变异位点上突变碱基的等位基因频率
SAS.sites.2015_08 给出千人基因组计划数据(2015 年 8 月公布的版本)的南亚洲人群中,该变异位点上突变碱基的等位基因频率
avsnp150 该变异在 dbSNP中的 ID
SIFT_score SIFT 分值,表示该变异对蛋白序列的影响,SIFT 分值越小越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大;
SIFT_pred D: Deleterious (sift<=0.05); T: tolerated (sift>0.05))
Polyphen2_HDIV_score 利用 PolyPhen2 基于 HumanDiv 数据库预测该变异对蛋白序列的影响,用于复杂疾病,数值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大;damaging (0.453<=pp2_hdiv<=0.956); B: benign (pp2_hdiv<=0.452))
Polyphen2_HDIV_pred D 或 P 或 B(D: Probably damaging (>=0.957), P: possibly 
Polyphen2_HVAR_score 利用 PolyPhen2 基于 HumanVar 数据库预测该变异对蛋白序列的影响,用于单基因遗传病。数值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大;
Polyphen2_HVAR_pred D 或 P 或 B(D: Probably damaging (>=0.909), P: possibly damaging (0.447<=pp2_hvar<=0.909); B: benign (pp2_hvar<=0.446))
LRT_score LRT 分值,表示该变异对蛋白序列的影响,值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大。
LRT_pred D、N 或者 U(D: Deleterious; N: Neutral; U: Unknown)。
MutationTaster_score MutationTaster 分值,表示该变异对蛋白序列的影响,值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大。("polymorphism_automatic"
MutationTaster_pred A ("disease_causing_automatic"); "D" ("disease_causing");"N" ("polymorphism"); "P" 
MutationAssessor_score MutationAssessor预测的致病得分
MutationAssessor_pred MutationAssessor根据阈值判断得到的预测分类:H为较高可信度的致病位点,M为中等可信的致病位点,L为低可信度的致病位点,N为无害位点
FATHMM_score FATHMM软件预测的致病性得分
FATHMM_pred FATHMM根据阈值得到的分类:D为较高可信度的致病位点,P为可信度一般的致病位点
RadialSVM_score higher score denoting more deleterious variants
RadialSVM_pred D: Deleterious; T: Tolerated
LR_score higher score denoting more deleterious variants
LR_pred D: Deleterious; T: Tolerated
VEST3_score Variant effect scoring tool;Random forest classifier, higher values are more deleterious
CADD_raw CADD raw score
CADD_phred CADD phred-like scorehigher values are more deleterious
GERP++_RS GREP++ "rejected substitutions" (RS) score,higher scores are more deleterious
phyloP46way_placental higher scores are more deleterious
phyloP100way_vertebrate higher scores are more deleterious
SiPhy_29way_logOdds higher scores are more deleterious
dgvMerged 人类结构变异注释结果:http://dgv.tcag.ca/dgv/app/home
phastConsElements100way 由 phastCons 程序基于脊椎动物全基因组比对预测得到的保守区域,100way 是指使用的物种数目为 100 个
omim_201806 孟德尔遗传病数据库注释
cosmic70 人类癌症体细胞突变影响的数据库,COSM开头为ID可到网站查询https://cancer.sanger.ac.uk/cosmic
CLNALLELEID the ClinVar Allele ID
CLNDN ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB
CLNDISDB Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN
CLNREVSTAT ClinVar review status for the Variation ID
CLNSIG Clinical significance for this single variant
gwasCatalog 检测变异位点是否在以往的 GWAS 研究中被报导,表示该变异位点与哪些疾病相关联,“.”表示没有 GWAS 报导
HGMD HGMD注释结果
Allele_frequency 样品变异碱基的等位基因频率
QUAL 变异的质量值
FORMAT 通常为:GT:AD:DP:GQ:PL,标记样品列属性
sample 样品信息列详情见:https://www.omicsclass.com/article/6



当然关于人类的变异信息ANNOVAR注释的数据库很多,这里只列举了部分内容,下面是网上摘录了一个信息:https://brb.nci.nih.gov/seqtools/colexpanno.html

We provide here detailed Description about the files outputted from the  mutation annotators via ANNOVAR and SnpEff.

ChrChromosome number
StartStart position
EndEnd position
RefReference base(s)
AltAlternate non-reference alleles called on at least one of the samples
COSMIC IDCOSMIC ID
Func.refGeneRegions (e.g., exonic, intronic, non-coding RNA)) that one variant hits; please click here for details.
Gene.refGeneGene name associated with one variant
ExonicFunc.refGeneExonic variant function, e.g., nonsynonymous, synonymous, frameshift insertion.please click here for details.
AAChange.refGeneAmino acid change. For example, SAMD11:NM_152486:exon10:c.T1027C:p.W343R stands for gene name, Known RefSeq accession, region, cDNA level change, protein level change.
SIFT_scoreSIFT score. See the dbNSFP information table for details.
SIFT_predSIFT prediction. See the dbNSFP information table for details.
Polyphen2_HDIV_scorePholyphen2 score based on HDIV. See the dbNSFP information table for details.
Polyphen2_HDIV_predPholyphen2 prediction based on HDIV. See the dbNSFP information tablefor details.
Polyphen2_HVAR_scorePolyphen2 score based on HVAR. See the dbNSFP information table for details.
Polyphen2_HVAR_predPolyphen2 prediction based on HVAR. See the dbNSFP information tablefor details.
LRT_scoreLRT score. See the dbNSFP information table for details.
LRT_predLRT prediction. See the dbNSFP information table for details.
MutationTaster_scoreMutationTaster score. See the dbNSFP information table for details.
MutationTaster_predMutationTaster prediction. See the dbNSFP information table for details.
MutationAssessor_scoreMutationTaster score. See the dbNSFP information table for details.
MutationAssessor_predMutationTaster prediction. See the dbNSFP information table for details.
FATHMM_scoreFATHMM score. See the dbNSFP information table for details.
FATHMM_predFATHMM prediction. See the dbNSFP information table for details.
PROVEAN_scorePROVEAN score<. See the dbNSFP information table for details./td>
PROVEAN_predPROVEAN prediction. See the dbNSFP information table for details.
VEST3_scoreVEST V3 score. See the dbNSFP information table for details.
CADD_rawCADD raw score. See the dbNSFP information table for details.
CADD_phredCADD phred-like score. See the dbNSFP information table for details.
DANN_scoreDANN score. See the dbNSFP information table for details.
fathmm-MKL_coding_scorefathmm-MKL score for one coding variant. See the dbNSFP information table for details.
fathmm-MKL_coding_predfathmm-MKL prediction for one coding variant. See the dbNSFP information table for details.
MetaSVM_scoreMetaSVM score. See the dbNSFP information table for details.
MetaSVM_predMetaSVM prediction. See the dbNSFP information table for details.
MetaLR_scoreMetaLR score. See the dbNSFP information table for details.
MetaLR_predMetaLR prediction. See the dbNSFP information table for details.
integrated_fitCons_scorefitCons score<. See the dbNSFP information table for details./td>
integrated_confidence_valueconfidence level. See the dbNSFP information table for details.
GERP++_RSGREP++ "rejected substitutions" (RS) score. See the dbNSFP information table for details.
phyloP7way_vertebratePhylogenetic p-values for 7 vertebrate species. See the dbNSFP information table for details.
phyloP20way_mammalianPhylogenetic p-values for 20 mammalian species. See the dbNSFP information table for details.
phastCons7way_vertebratePhastCons score for 7 vertebrate species. See the dbNSFP information table for details.
phastCons20way_mammalianphastCons p-values for 20 mammalian species. See the dbNSFP information table for details.
SiPhy_29way_logOddsSiPhy log odds score for 29 species. See the dbNSFP information tablefor details.
  • SnpEff 注释结果各表头说明
CHROMChromosome number
POSPosition
IDsemi-colon separated list of unique identifiers where available. If this is a dbSNP variant it is encouraged to use the rs number(s).
REFReference base(s)
ALTAlternate non-reference alleles called on at least one of the samples
EFFECTFunctional consequences of one variant, e.g., missense_variant, synonymous_variant. please click here for details.
REGIONRegions (e.g., exonic, intronic) that one variant hits
IMPACTPutative impact of the variant (e.g. HIGH, MODERATE or LOW impact).
GENEGene name (usually HUGO)
GENEIDGene ID)
FEATUREThe type of feature is in the next field (e.g. transcript, motif, miRNA, etc.)
FEATUREIDTranscript ID (preferably using version number), Motif ID, miRNA, ChipSeq peak, Histone mark, depending on the annotation.
BIOTYPEDescription on whether the transcript is {“Coding”, “Noncoding”}. Whenever possible, use ENSEMBL biotypes. .
HGVS_CVariant using HGVS notation (DNA level). For example, c.352A>G stands for A to G substitution of nucleotide 352. Click here for details.
HGVS_PCoding variant using HGVS notation (Protein level). For example, p.Ile118Val stands for Isoleucine at position number 66 substitution to Valine. p.Ile118Val can be also be represented by p.I118V using the 1-letter symbol here. Click here for details.
SIFT_scoreSIFT score. See the dbNSFP information table for details.
SIFT_predSIFT prediction. See the dbNSFP information table for details.
Polyphen2_HDIV_scorePholyphen2 score based on HDIV. See the dbNSFP information table for details.
Polyphen2_HDIV_predPholyphen2 prediction based on HDIV. See the dbNSFP information tablefor details.
Polyphen2_HVAR_scorePolyphen2 score based on HVAR. See the dbNSFP information table for details.
Polyphen2_HVAR_predPolyphen2 prediction based on HVAR. See the dbNSFP information tablefor details.
LRT_scoreLRT score. See the dbNSFP information table for details.
LRT_predLRT prediction. See the dbNSFP information table for details.
MutationTaster_scoreMutationTaster score. See the dbNSFP information table for details.
MutationTaster_predMutationTaster prediction. See the dbNSFP information table for details.
MutationAssessor_scoreMutationAssessor score. See the dbNSFP information table for details.
MutationAssessor_predMutationAssessor prediction. See the dbNSFP information table for details.
FATHMM_scoreFATHMM score. See the dbNSFP information table for details.
FATHMM_predFATHMM prediction. See the dbNSFP information table for details.
PROVEAN_scorePROVEAN score<. See the dbNSFP information table for details./td>
PROVEAN_predPROVEAN prediction. See the dbNSFP information table for details.
VEST3_scoreVEST V3 score. See the dbNSFP information table for details.
CADD_rawCADD raw score. See the dbNSFP information table for details.
CADD_phredCADD phred-like score. See the dbNSFP information table for details.
MetaSVM_scoreMetaSVM score. See the dbNSFP information table for details.
MetaSVM_predMetaSVM prediction. See the dbNSFP information table for details.
MetaLR_scoreMetaLR score. See the dbNSFP information table for details.
MetaLR_predMetaLR prediction. See the dbNSFP information table for details.
GERP++_NRGREP++ conservation score. See the dbNSFP information table for details.
GERP++_RSGREP++ "rejected substitutions" (RS) score. See the dbNSFP information table for details.
phyloP100way_vertebratePhylogenetic p-values for 100 vertebrate species. See the dbNSFP information table for details.
phastCons100way_vertebratePhastCons score for 7 vertebrate species. See the dbNSFP information table for details.
SiPhy_29way_logOddsSiPhy log odds score for 29 species. See the dbNSFP information tablefor details.

  • 详细说明 Information
SIFT_pred 
SIFT_score
SIFTSort intolerated from toleratedP(An amino acid at a position is tolerated | The most frequentest amino acid being tolerated)D: Deleterious (sift<=0.05);
T: tolerated (sift>0.05)
Pauline Ng, Fred Hutchinson 
Cancer Research Center, Seattle, Washington
Polyphen2_HDIV_pred 
Polyphen2_HDIV_score
Polyphen v2Polymorphism phenotyping v2D: Probably damaging (>=0.957), 
P: possibly damaging (0.453<=pp2_hdiv<=0.956), 
B: benign (pp2_hdiv<=0.452)
Probablistic Classifier Training sets: HumDivHavard Medical School/td>
Polyphen2_HVAR_pred
Polyphen2_HVAR_score
Polyphen v2Polymorphism phenotyping v2Machine learning Training sets: HumVarD: Probably damaging (>=0.957), 
P: possibly damaging (0.453<=pp2_hdiv<=0.956); 
B: benign (pp2_hdiv<=0.452)
Shamil Sunyaev
Havard Medical School
LRT_pred 
LRT_score
LRTLikelihood ratio testLRT of H0: each codon evolves neutrally vs H1: the codon evovles under negative selectionD: Deleterious; 
N: Neutral;
U: Unknown
Lower scores are more deleterious
Sung Chung, Justin Fay Washington University
MutationTaster_pred 
MutationTaster_score
MutationTasterBayes ClassifierA: (""disease_causing_automatic""); 
D: (""disease_causing""); 
N: (""polymorphism [probably harmless]""); 
P: (""polymorphism_automatic[known to be harmless]"
higher values are more deleterious"
Markus Schuelke
the Charité - Universitätsmedizin Berlin
MutationAssessor_pred 
MutationAssessor_score
MutationAssessorEntropy of multiple sequence alighnmentH: high; 
M: medium; 
L: low; 
N: neutral. 
H/M means functional and L/N means non-functional higher values are more deleterious
Reva Boris
Computation Biology Center Memorial Sloan Kettering Cancer Center
FATHMM_pred 
FATHMM_score
FATHMMHMMFunctional analysis through hidden markov model HMMD: Deleterious; 
T: Tolerated;
lower values are more deleterious
Shihab Hashem
University of Bristol, UK
PROVEAN_pred 
PROVEAN_score
Protein Variation Effect AnalyzerClustering of homologus sequencesD: Deleterious; 
N: Neutral
higher values are more deleterious
Choi Y J. Craig Venter Institute
VEST3_scoreVEST V3Variant effect scoring toolRandom forest classifierhigher values are more deleteriousRachel Karchin John Hopkins University
CADD_raw CADD_phredCADD Combined annotation dependent depletionLinear kernel SVMhigher values are more deleteriousJay Shendure, Xiaohui Xie University of California - Irvine
DANN_scoreDANNDeleterious Annotation of genetic variants using Neural NetworksNeural networkhigher values are more deleteriousJay Shendure, Xiaohui Xie
University of California - Irvine
fathmm-MKL_coding_predFATHMM-MKLpredicting the effects of both coding and non-coding variants using nucleotide-based HMMsClassifier based on multiple kernel learningD: Deleterious; 
T: Tolerated
Score >= 0.5: D; 
Score < 0.5: T
Shihab Hashem
University of Bristol, UK
MetaSVM_pred 
MetaSVM_score
MetaSVMSupport vector machineD: Deleterious; T: Tolerated;
higher scores are more deleterious
Coco Dong
USC Biostatiscs Department
MetaLR_pred 
MetaLR_score
MetaLRLogistic regressionD: Deleterious; 
T: Tolerated; 
higher scores are more deleterious
Coco Dong 
USC Biostatiscs Department
integrated_fitCons_score 
integrated_confidence_value
FitConsFitness consequences of functional annotationIntegrate functional assays like ChIP-Seq with conservation measure of transcription factor binding siteshigher scores are more deleteriousAbriza
Cold Spring Harbor Lab
GERP++_RS
GERP++_NR
Genome Evolutionary Rate Profiling ++maximum likelihood estimation procedurehigher scores are more deleteriousEugne Davydov
Stanford University, CS Department
phyloP7way_vertebratePhyloPPhylogentic p-valuesPhylogentic p-values calculated from a LRT, score-based test, GERP test Use 7 specieshigher scores are more deleteriousAdam Siepel 
UCSC
phyloP20way_mammalianPhyloPPhylogentic p-valuesa phylogenetic hidden Markov model (phylo-HMM) Use 20 specieshigher scores are more deleteriousAdam Siepel
UCSC
phastCons7way_vertebratephastConsA phylogenetic hidden Markov model (phylo-HMM) Use 7 specieshigher scores are more deleteriousAdam Siepel
UCSC
phastCons20way_mammalianphastConsa phylogenetic hidden Markov model (phylo-HMM) Use 20 specieshigher scores are more deleteriousAdam Siepel
UCSC
SiPhy_29_waySiPhyProbablistic framework, HMM Use 29 specieshigher scores are more deleteriousManual Garber
Broad Institute of MIT & Harvard

更多生物信息课程: https://study.omicsclass.com/index 

  • 发表于 2018-09-28 20:06
  • 阅读 ( 29330 )
  • 分类:临床医学

0 条评论

请先 登录 后评论
omicsgene
omicsgene

生物信息

702 篇文章

作家榜 »

  1. omicsgene 702 文章
  2. 安生水 351 文章
  3. Daitoue 167 文章
  4. 生物女学霸 120 文章
  5. xun 82 文章
  6. rzx 78 文章
  7. 红橙子 78 文章
  8. CORNERSTONE 72 文章