bedtools软件中的nuc函数工具可以统计序列碱基含量,其具体用法如下:
Tool: bedtools nuc (aka nucBed)
Version: v2.25.0
Summary: Profiles the nucleotide content of intervals in a fasta file.
Usage: bedtools nuc [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>
Options:
-fi Input FASTA file
-bed BED/GFF/VCF file of ranges to extract from -fi
-s Profile the sequence according to strand.
-seq Print the extracted sequence
-pattern Report the number of times a user-defined sequence
is observed (case-sensitive).
-C Ignore case when matching -pattern. By defaulty, case matters.
-fullHeader Use full fasta header.
- By default, only the word before the first space or tab is used.
Output format:
The following information will be reported after each BED entry:
1) %AT content
2) %GC content
3) Number of As observed
4) Number of Cs observed
5) Number of Gs observed
6) Number of Ts observed
7) Number of Ns observed
8) Number of other bases observed
9) The length of the explored sequence/interval.
10) The seq. extracted from the FASTA file. (opt., if -seq is used)
11) The number of times a user's pattern was observed.
(opt., if -pattern is used.)
CM004359.1 0 10
CM004359.1 100 200
CM004359.1 1000 1050
$bedtools nuc -fi GCA_001651475.1_Ler_Assembly_genomic.fna -bed id.bed
#1_usercol 2_usercol 3_usercol 4_pct_at 5_pct_gc 6_num_A 7_num_C 8_num_G 9_num_T 10_num_N 11_num_oth 12_seq_len
CM004359.1 0 10 0.600000 0.400000 1 0 4 5 0 0 10
CM004359.1 100 200 0.580000 0.420000 14 0 42 44 0 0 100
CM004359.1 1000 1050 0.660000 0.340000 10 3 14 23 0 0 50
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!