HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
HMMER is often used together with a profile database, such as Pfam or many of the databases that participate in Interpro. But HMMER can also work with query sequences, not just profiles, just like BLAST. For example, you can search a protein query sequence against a database with phmmer, or do an iterative search with jackhmmer.
HMMER is designed to detect remote homologs as sensitively as possible, relying on the strength of its underlying probability models. In the past, this strength came at significant computational expense, but as of the new HMMER3 project, HMMER is now essentially as fast as BLAST.
HMMER can be downloaded and installed as a command line tool on your own hardware, and now it is also more widely accessible to the scientific community via new search servers at the European Bioinformatics Institute.
以上是官网的介绍,简单来说,hmmer是使用隐马尔可夫模型,在序列数据库中搜索同源序列,并进行序列比对的工具箱,拥有比blast更高的准确性和相同的速度。常用的Pfam数据库(已经合并到interpro当中,只不过interpro下载之后是压缩文件,)提供hmmer的操作格式。
以下包含了hmmer最新版本呢的全部工具
hmmbuild: build profile from input multiple alignment
hmmalign: make multiple sequence alignment using a profile
hmmsearch: search profile against sequence database
hmmscan: search sequence against profile database
hmmpress: prepare profile database for hmmscan
phmmer: search single sequence against sequence database
jackhmmer: iteratively search single sequence against database
nhmmer: search DNA query against DNA sequence database
nhmmscan: search DNA sequence against a DNA profile database
hmmfetch: retrieve profile(s) from a profile file
hmmstat: show summary statistics for a profile file
hmmemit: generate (sample) sequences from a profile
hmmlogo: produce a conservation logo graphic from a profile
hmmconvert: convert between different profile file formats
hmmpgmd: search daemon for the hmmer.org website
hmmpgmd_shard: sharded search daemon for the hmmer.org website
makehmmerdb: prepare an nhmmer binary database
hmmsim: collect score distributions on random sequences
alimask: add column mask to a multiple sequence alignment
其中我们常用的也只有hmmsearch、hmmbuild、hmmscan、hmmalign
hmmer之hmmsearch用法翻译
用法:hmmsearch 参数 hmm文件 序列数据库
Basic options:
-h : show brief help on version and usage
基本用法:
-h :显示版本和用法的简要帮助信息
Options directing output:
输出定向选项
-o <f> : direct output to file <f>, not stdout
输出到文件中,而不是到标准输出
-A <f> : save multiple alignment of all hits to file <f>
将所有命中的多序列比对输出到文件
--tblout <f> : save parseable table of per-sequence hits to file <f>
保存每个序列的命中结果的解析表到文件中
--domtblout <f> : save parseable table of per-domain hits to file <f>
保存每个结构域的命中结果的解析表到文件中
--pfamtblout <f> : save table of hits and domains to file, in Pfam format <f>
保存命中和结构域的表格到文件,Pfam形式
--acc : prefer accessions over names in output
在输出文件中将登录号覆盖名字
--noali : don't output alignments, so output is smaller
不要输出比对,这样输出会变得更小
--notextw : unlimit ASCII text output line width
不限制ASCII文本输出行的宽度
--textw <n> : set max width of ASCII text output lines [120] (n>=120)
设置ASCII文本输出行的最大宽度
Options controlling reporting thresholds:
控制报告阈值的参数:
-E <x> : report sequences <= this E-value threshold in output [10.0] (x>0)
在结果中报告E值小于这个阈值的序列
-T <x> : report sequences >= this score threshold in output
在结果中报告得分大于这个阈值的序列
--domE <x> : report domains <= this E-value threshold in output [10.0] (x>0)
在结果中报告E值小于这个阈值的domain
--domT <x> : report domains >= this score cutoff in output
在结果中报告大于这个分数的domain
Options controlling inclusion (significance) thresholds:
控制包含(显著性)阈值的参数:
--incE <x> : consider sequences <= this E-value threshold as significant
将<=此e值阈值的序列视为有意义的
--incT <x> : consider sequences >= this score threshold as significant
将得分大于这个阈值的序列设为显著的
--incdomE <x> : consider domains <= this E-value threshold as significant
将大于等于这个E值阈值的domain认为是显著的
--incdomT <x> : consider domains >= this score threshold as significant
将domain得分大于这个阈值的认为是显著的
Options controlling model-specific thresholding:
控制模型特异性阈值的参数:
--cut_ga : use profile's GA gathering cutoffs to set all thresholding
使用文件的GA gathering cutoffs来设置所有的阈值
--cut_nc : use profile's NC noise cutoffs to set all thresholding
使用文件的NC noise cutoffs来设置所有阈值
--cut_tc : use profile's TC trusted cutoffs to set all thresholding
使用文件的TC trusted cutoffs来设置所有阈值
Options controlling acceleration heuristics:
控制加速启发式?搜索的参数:
--max : Turn all heuristic filters off (less speed, more power)
关闭所有的启发式过滤器
--F1 <x> : Stage 1 (MSV) threshold: promote hits w/ P <= F1 [0.02]
--F2 <x> : Stage 2 (Vit) threshold: promote hits w/ P <= F2 [1e-3]
--F3 <x> : Stage 3 (Fwd) threshold: promote hits w/ P <= F3 [1e-5]
--nobias : turn off composition bias filter
Other expert options:
其他的一些参数
--nonull2 : turn off biased composition score corrections
关闭偏向组成分数校正??
-Z <x> : set # of comparisons done, for E-value calculation
--domZ <x> : set # of significant seqs, for domain E-value calculation
--seed <n> : set RNG seed to <n> (if 0: one-time arbitrary seed) [42]
--tformat <s> : assert target <seqfile> is in format <s>: no autodetection
--cpu <n> : number of parallel CPU workers to use for multithreads
用于多线程的并行CPU工作程序的数量
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!