利用NCBI当中的CD-search工具预测基因的保守结构域

搜索鉴定基因的保守结构域,应该用专门搜索基因的保守结构域工具CD-search,而不是简单的用blast搜索注释。快速找到这个基因上的保守结构域,对研究基因的功能是非常重要的,因为行驶相同相近功能的基因往往具有相同的保守结构域!

1.CDD入口:https://www.ncbi.nlm.nih.gov/cdd/,该工具为NCBI提供的,NCBI主页选项口选择进入Conserved Domain然后点击Search。

然后选择CD-search(单个基因),或者Batch CD-Search(多个基因)。

attachments-2018-07-HXZXbUUl5b55498230963.jpg


2.预测单条序列的保守结构域,点击CD-search进入:

粘贴基因的蛋白序列,或者是核酸序列都可以,注意为fasta序列,如果你知道基因的GI或Accession号,也可以直接数据这些ID就可以查询,右方OPTIONS中选择要搜索的数据库,Expect Value等,或者使用默认设置,然后按“提交”按钮。就可以搜索了;

attachments-2018-07-p64PmH1F5b554dcd35d3e.jpg

3.结果查看,默认为concise 简洁显示,可以选择full result显示全部内容:

attachments-2018-07-PjN2IRP15b5551fc237ea.jpg
搜索匹配结果,有特定匹配(specific hits),非特定匹配(non-specific hits),这些匹配所属的超家族(superfamily),我们发现这个基因特定匹配在dnak,属于HSP70超级家族;

4.批量提交序列搜索保守结构域,Batch CD-Search,如下:

可选择直接粘贴多序列的fasta格式文件,或者提交fasta格式的文件,然后选择数据库,如果数据较大搜索时间很久,可填写email,搜索完成之后会发邮寄,最后点击submit提交就可以搜索了:

attachments-2018-07-D3KiocWy5b55531d0bd65.jpg

5.批量搜索结果展示:

批量搜索结果如下,只展示部分结果,详细结果可点击download进行下载(记得勾选full,下载全部结构域信息),或者点击Browse results 进行详细的可视化浏览;

attachments-2018-07-K4e8FfiT5b5554032e57a.jpg



表格结果说明如下:


Query
输入的序列ID
Hit type
CD-Search results can include hit types that represent various confidence levels (specific hits, non-specific hits) and domain model scope (superfamilies, multi-domains). They can be seen in both the Concise display and Full display, except for non-specific hits, which are shown only in the Full Display.
PSSM-ID
PSSM ID is the unique identifier for a domain model's position-specific scoring matrix (PSSM). 
From..To
The range of amino acids in the query protein sequence to which the domain model aligns. (Note: If the alignment found by RPS-BLAST omitted more than 20% of the CD's extent at either the n- or c-terminus or both, the partial nature of the hit is indicated in the "Incomplete" column of the hit table. Partial hits can also be spotted in the graphical display as domain model cartoons with jagged edges (illustrated example).)
E-value
The expect value, or E-value, indicates the statistical significance of the hit as the likelihood the hit was found by chance. 
Bit Score
比对得分
Accession
The accession number of the hit, which can either be a domain model or a superfamily cluster. (If the hit is a domain model, then the accession number (cl*) of the superfamily cluster to which it belongs is listed in the "Superfamily" column of the output file.)
Short name
The short name of a conserved domain, which concisely defines the domain. For example, "Voltage gated ClC" is the short title of the NCBI-curated conserved domain model for the voltage gated chloride channel (cd00400).
Incomplete
If the hit to a conserved domain is partial (i.e., if the alignment found by RPS-BLAST omitted more than 20% of the CD's extent at either the n- or c-terminus or both), this column will be populated with one of the following values:
      N:      incomplete at the N-terminus 
      C:      incomplete at the C-terminus 
      NC:    incomplete at both the N-terminus and C-terminus
If the hit to a conserved domain is complete, then this column will be populated with a dash (-).
(Note: Partial hits can also be spotted in the graphical display as domain model cartoons with jagged edges (illustrated example).)
Superfamily
This column is populated only for domain models that are specific or non-specific hits, and it lists the accession number of the superfamily to which the domain model belongs.
(If the hit is to a superfamily itself, then this column is simply populated with a dash because the superfamily accession is already listed in the preceding "Accession" column.)


6.批量搜索结果,可视化浏览界面:

可以选中,然后浏览,下载等:

attachments-2018-07-rPxw4OWO5b5554d24b914.jpg

  • PubMeds.gifMarchler-Bauer A et al. (2017), "CDD/SPARCLE: functional classification of proteins via subfamily domain architectures."Nucleic Acids Res.45(D)200-3.
  • PubMeds.gifMarchler-Bauer A et al. (2015), "CDD: NCBI's conserved domain database."Nucleic Acids Res.43(D)222-6.
  • PubMeds.gifMarchler-Bauer A et al. (2011), "CDD: a Conserved Domain Database for the functional annotation of proteins."Nucleic Acids Res.39(D)225-9.
  • PubMeds.gifMarchler-Bauer A, Bryant SH (2004), "CD-Search: protein domain annotations on the fly."Nucleic Acids Res.32(W)327-331.
  • 发表于 2018-07-23 12:11
  • 阅读 ( 77012 )
  • 分类:软件工具

3 条评论

请先 登录 后评论
omicsgene
omicsgene

生物信息

702 篇文章

作家榜 »

  1. omicsgene 702 文章
  2. 安生水 351 文章
  3. Daitoue 167 文章
  4. 生物女学霸 120 文章
  5. xun 82 文章
  6. rzx 78 文章
  7. 红橙子 78 文章
  8. CORNERSTONE 72 文章