FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length.
An example sequence in FASTA format is:
>gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase MNSERSDVTLYQPFLDYAIAYMRSRLDLEPYPIPTGFESNSAVVGKGKNQEEVVTTSYAFQTAKLRQIRA AHVQGGNSLQVLNFVIFPHLNYDLPFFGADLVTLPGGHLIALDMQPLFRDDSAYQAKYTEPILPIFHAHQ QHLSWGGDFPEEAQPFFSPAFLWTRPQETAVVETQVFAAFKDYLKAYLDFVEQAEAVTDSQNLVAIKQAQ LRYLRYRAEKDPARGMFKRFYGAEWTEEYIHGFLFDLERKLTVVK
注意:带有> 的行为fasta的ID行,该行第一个空白左边为ID,后面为描述信息(description),描述信息可有可无
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions:
The nucleic acid codes are:
A --> adenosine M --> A C (amino) C --> cytidine S --> G C (strong) G --> guanine W --> A T (weak) T --> thymidine B --> G T C U --> uridine D --> G A T R --> G A (purine) H --> A C T Y --> T C (pyrimidine) V --> G C A K --> G T (keto) N --> A G C T (any) - gap of indeterminate length
The accepted amino acid codes are:
A ALA alanine P PRO proline B ASX aspartate or asparagine Q GLN glutamine C CYS cystine R ARG arginine D ASP aspartate S SER serine E GLU glutamate T THR threonine F PHE phenylalanine U selenocysteine G GLY glycine V VAL valine H HIS histidine W TRP tryptophan I ILE isoleucine Y TYR tyrosine K LYS lysine Z GLX glutamate or glutamine L LEU leucine X any M MET methionine * translation stop N ASN asparagine - gap of indeterminate length
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!