之前已经介绍过很多提取序列的方法,有脚本的也有软件的,这里再介绍一种方法。
用到软件是bedtools,具体方法如下:
Usage: bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>
Options:
-fi Input FASTA file
-bed BED/GFF/VCF file of ranges to extract from -fi
-name Use the name field for the FASTA header
-split given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
-tab Write output in TAB delimited format.
- Default is FASTA format.
-s Force strandedness. If the feature occupies the antisense,
strand, the sequence will be reverse complemented.
- By default, strand information is ignored.
-fullHeader Use full fasta header.
- By default, only the word before the first space or tab is used.
$bedtools getfasta -fi GCA_001651475.1_Ler_Assembly_genomic.fna -bed id.bed
>CM004359.1:0-10
gtttagggtt
>CM004359.1:100-200
ttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggttt
>CM004359.1:1000-1050
TTGTGGgaaaattatttagttgtaGGGATGAAGTCTTTCTTCGTTGTTGT
$bedtools getfasta -fi GCA_001651475.1_Ler_Assembly_genomic.fna -bed id.bed -tab
CM004359.1:0-10 gtttagggtt
CM004359.1:100-200 ttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggtttagggttt
CM004359.1:1000-1050 TTGTGGgaaaattatttagttgtaGGGATGAAGTCTTTCTTCGTTGTTGT
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!