bedtools批量提取基因组指定位置序列

之前已经介绍过很多提取序列的方法，有脚本的也有软件的，这里再介绍一种方法。

用到软件是bedtools，具体方法如下：

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>
Options:
        -fi     Input FASTA file
        -bed    BED/GFF/VCF file of ranges to extract from -fi
        -name   Use the name field for the FASTA header
        -split  given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
        -tab    Write output in TAB delimited format.
                - Default is FASTA format.
        -s      Force strandedness. If the feature occupies the antisense,
                strand, the sequence will be reverse complemented.
                - By default, strand information is ignored.
        -fullHeader     Use full fasta header.
                - By default, only the word before the first space or tab is used.

其中-fi 指定基因组fasta文件，-bed 指定要提取序列的位置文件，可以是bed、gff 或 vcf 文件（染色体碱基位置从0开始计数）。

-tab 指定输出格式。

$bedtools getfasta -fi GCA_001651475.1_Ler_Assembly_genomic.fna -bed id.bed
>CM004359.1:0-10
gtttagggtt
>CM004359.1:100-200
ttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggttt
>CM004359.1:1000-1050
TTGTGGgaaaattatttagttgtaGGGATGAAGTCTTTCTTCGTTGTTGT

$bedtools getfasta -fi GCA_001651475.1_Ler_Assembly_genomic.fna -bed id.bed -tab
CM004359.1:0-10 gtttagggtt
CM004359.1:100-200      ttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggttt
CM004359.1:1000-1050    TTGTGGgaaaattatttagttgtaGGGATGAAGTCTTTCTTCGTTGTTGT

发表于 2018-11-09 13:02
阅读 ( 16228 )
分类：软件工具

bedtools批量提取基因组指定位置序列

你可能感兴趣的文章

相关问题

0 条评论

作家榜 »