在提取miRNA前后500bp序列时,若没有其位置信息,提取会比较麻烦,可用以下方法提取:
1.首先利用blast软件将miRNA的前体序列与基因组比对,获取前体序列的位置信息,比对方法:
makeblastdb -in Donkey_Hic_genome.20180408.fa -dbtype nucl -title Donkey_Hic_genome.20180408.fa
blastall -i miRNA_Pre.fa -d Donkey_Hic_genome.20180408.fa -p blastn -e 0.01 -b 5 -v 5 -m 8 -o blast.out #Donkey_Hic_genome.20180408.fa 参考基因组染色体序列 #miRNA_Pre.fa miRNA的前体序列
由于序列可能较短,所以e-value值可调高一些。比对结果如下:
bta-miR-34c Chr20 100.00 113 0 0 1 113 39385825 39385713 6e-57 224
bta-miR-370 Chr07 100.00 109 0 0 1 109 70327160 70327268 1e-54 216
bta-miR-34b Chr20 100.00 112 0 0 1 112 39386420 39386309 2e-56 222
bta-miR-383 Chr27 100.00 110 0 0 1 110 21958895 21959004 4e-55 218
bta-miR-449a Chr01 100.00 112 0 0 1 112 103603665 103603554 2e-56 222
bta-miR-378 Chr09 100.00 111 0 0 1 111 51049164 51049054 9e-56 220
bta-miR-378 Chr13 96.88 32 1 0 67 98 32631593 32631624 3e-06 56.0
bta-miR-378 Chr13 93.94 33 2 0 62 94 29900614 29900582 2e-04 50.1
bta-miR-378 Chr07 100.00 27 0 0 71 97 65072541 65072515 1e-05 54.0
bta-miR-378 Chr30 96.55 29 1 0 69 97 28926810 28926782 2e-04 50.1
bta-miR-378 Chr18 93.94 33 2 0 62 94 21721834 21721866 2e-04 50.1
bta-miR-206 Chr08 100.00 112 0 0 1 112 78794660 78794771 2e-56 222
bta-miR-1 Chr15 100.00 108 0 0 1 108 48711439 48711546 5e-54 214
bta-miR-1 Chr07 88.89 63 7 0 32 94 26712206 26712268 2e-10 69.9
bta-let-7c Chr18 100.00 112 0 0 1 112 32412721 32412610 2e-56 222
bta-let-7c Chr20 89.19 74 8 0 18 91 29703021 29703094 1e-14 83.8
bta-miR-34c Chr20 100.00 113 0 0 1 113 39385825 39385713 6e-57 224
bta-miR-370 Chr07 100.00 109 0 0 1 109 70327160 70327268 1e-54 216
bta-miR-34b Chr20 100.00 112 0 0 1 112 39386420 39386309 2e-56 222
bta-miR-383 Chr27 100.00 110 0 0 1 110 21958895 21959004 4e-55 218
bta-miR-449a Chr01 100.00 112 0 0 1 112 103603665 103603554 2e-56 222
bta-miR-378 Chr09 100.00 111 0 0 1 111 51049164 51049054 9e-56 220
bta-miR-206 Chr08 100.00 112 0 0 1 112 78794660 78794771 2e-56 222
bta-miR-1 Chr15 100.00 108 0 0 1 108 48711439 48711546 5e-54 214
bta-let-7c Chr18 100.00 112 0 0 1 112 32412721 32412610 2e-56 222
perl /share/work/wangq/script/lv/miRNA_500_fasta.pl -id blast.out -fa Donkey_Hic_genome.20180408.fa -out result.fa -miR miRNA.fa -l 500
-id 后跟筛选的blast结果;-fa后跟参考基因组染色体序列;-miR后跟miRNA的序列文件;-l后跟提取miRNA前后多长序列,默认500。
提取结果(miRNA序列大写标注,其余小写):
>bta-miR-34c
acaaggcacagcatcacccgccgccctgctgggaggaggccgccaccctccgcggcgaactgccgacccgaagcgtttgagaggagaaagctgcgcttcgagactggatgcgtcagcatttcttccgcgcggcgcggcgagcgcgtggtccgcgtgcgagcaaaccccctgcaaacgcaggcgggctgatgcttctagctggagttaaacagttagctatcactaggagtaaaagcaagattgggcgatcccatgtaaggaaagcaaaatctggggcgtttagtagctttaattgtaaggtgtaaagacatttgaattgctgggggaagccccctgtgtaacgttcagagctgtgagtcactgtgcctattttccattgtttatcagggtactcaccaatccaccaactaaagtgagtaccacggagccagtgatctgcctgtcacgacgcatgggggcaccaacttgagactgaagtttgtgatgaaagtaaagcttttttgctgtgagtctagttactAGGCAGTGTAGTTAGCTGATTGctaataataccaatcactaaccacacggccaggtaaaaagatttgggaattcatccagatgagctgcgtgtgcacaccagtgggtttgggggcaagaaggggattggaataccctaatagtacgcattgcctgtttatccatagctcagccaagagagaaatcagcattttagctgctaaatatacaacatatgtagtaaatatacatttttaacatataatttttcaatattccttcaggcgcttaaccaaaaaaattttcagatatattgaggaatagaggttttctctcttagctcatttatgtcattgtgtaaacttgtgattattttgaactatctgtaagactgtattactattttgaaagaataaagtgccctaaagtcataaattgtggtcattcatgtgtccattgcctttcctaagttggctttatgatgtccttctcagcgcctgccacagtaattataactttcctatcgctaaaatggttaaattgttcctagattaacaaaatctcccgggaaggctattcacaagcactttttagtatttttctaagaacacgtta
>bta-miR-370
tgtttattgttcgtgtctcagagttgctctgaggccagtgtgctgggcacccagcaggccatctgggaggggagaaaggaagctagccatgtatgtcagtaggggtcagcgtggaggcctttggggtttctgggaaacggttcgaacatggagaatctcgtgatgtggacgcccccgagggcagccccatttcatgaactggatcccttagtcggatttctgttttccagggcacttgtttaagctttcatgccgtgtccaaaaaaaagagcagatcccaagagtttcctgcccgaggcccgtcttgccagaagccctcggcttggcctgagcatcgagatcattcctctaatcagcctgggggtggtctgacccgcggggtgggcggcgtggggcggggaggggcagggcgtgtgcggggcgggagctgctgggggagctggcggcggttcctttcaccctcgccgtggacccgcgggggcgtggctgtcctcggtctacaaatcgcgcaagtcggggcacaagacagagaggccaggtcacgtctctgcagttacacagctcatgagtGCCTGCTGGGGTGGAACCTGGTctgtctgtctgtctagcaccacagctcgggcgctgctgcagagggaacaaagatttgggtgagggcctcagagacgggctggggaaggggtttactcgggctgactttgacatgaaaacaatagctaattctgcttggggcctagtgctgtgactatcttacagatgggaaacaggcacagacaggttagttaactttcctgatgctttccagctcatcaggattggaggctggatttggaggcaggcggtgtggttcgagcatatgtgctctaaccattgggaaacactgcctcctgactgtgagcacacgtagagatggcacatggagtccaagaatctgggtttgagtccttttaccactgcctggtgggtgtactgtccagtcaatcatacatatttcattcttcctcttgacagagtctctgcagaggcccagcgagtcgttggcccgtgggaaatatttttttaagaatgcatgtgtgtgatagaattttatatctatcgaaggggagggg
脚本代码如下:
#!/usr/bin/perl -w
use strict;
use Getopt::Long;
use Bio::SeqIO;
use Bio::Seq;
my $version = "1.3";
my %opts;
GetOptions(\%opts, "id=s", "fa=s", "miR=s", "l=s","out=s","h");
if(!defined($opts{out}) || !defined($opts{fa}) || !defined($opts{miR}) || !defined($opts{id}) ||defined($opts{h}))
{
print <<"Usage End.";
Description:
$version:lefse analysis
Usage
Forced parameter:
-id blast.out <infile> must be given
-out outfile <outfile> must be given
-fa genome fasta file <infile>must be given -miR miRNA fasta file <infile>must be given -l length <int>
Other parameter:
-h Help document
Usage End.
exit;
} my $len = $opts{l}; $len ||=500;
my $in = Bio::SeqIO->new(-file => "$opts{fa}" , -format => 'Fasta');
my %fasta;
while ( my $seq = $in->next_seq() ) {
my($id,$sequence)=($seq->id,$seq->seq);
$fasta{$id}=$sequence;
}
my $ina = Bio::SeqIO->new(-file => "$opts{miR}" , -format => 'Fasta'); my %mi;
while ( my $seq = $ina->next_seq() ) { my($id,$sequence)=($seq->id,$seq->seq); $mi{$id}=$sequence;
}
open(IN,"$opts{id}") ||die "open file $opts{id} faild.\n";
open(OUT,">$opts{out}") ||die "open file $opts{out} faild.\n";
while(<IN>){
chomp;
my @line = split("\t");
if($line[8] < $line[9]){
my $miR = lc(substr( $fasta{$line[1]},$line[8]-1, $line[9] - $line[8]+1));
my $small = lc($mi{$line[0]});
$miR =~ s/$small/$mi{$line[0]}/;
my $before = lc(substr( $fasta{$line[1]},$line[8]-$len-1, $len));
my $laft = lc(substr( $fasta{$line[1]},$line[9], $len));
print OUT ">$line[0]\n$before$miR$laft\n";
}elsif($line[8] > $line[9]){
my $miR = lc(substr( $fasta{$line[1]},$line[9]-1, $line[8] - $line[9]+1));
my $before = lc(substr( $fasta{$line[1]},$line[9]-$len-1, $len));
my $laft = lc(substr( $fasta{$line[1]},$line[8], $len));
my $gene = $before.$miR.$laft;
$gene = &reverse_complement_IUPAC($gene);
my $small = lc($mi{$line[0]});
$gene =~ s/$small/$mi{$line[0]}/;
print OUT ">$line[0]\n$gene\n";
}
}
close(IN);
close(OUT);
sub reverse_complement_IUPAC {
my $dna = shift;
# reverse the DNA sequence
my $revcomp = reverse($dna);
# complement the reversed DNA sequence
$revcomp =~ tr/ABCDGHMNRSTUVWXYabcdghmnrstuvwxy/TVGHCDKNYSAABWXRtvghcdknysaabwxr/;
return $revcomp;
}
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!