我们在做生物分析的时候,经常会碰到GFF格式的文件以及GTF格式的注释文件。他们有着相似的名字,甚至连内容都极为相似~那么,他们如何转换呢?
GFF主要是用来注释基因组,格式如下:
J15 glean gene 25308430 25309140 . + . ID=Gglean067903; status=novel;
J15 glean mRNA 25308430 25309140 0.976879 + . ID=Gglean067903-TA; Parent=Gglean067903; status=novel;
J15 glean CDS 25308430 25308501 . + 0 Parent=Gglean067903-TA;
J15 glean CDS 25308646 25309140 . + 0 Parent=Gglean067903-TA;
J15 glean gene 126763 129003 . + . ID=Gglean075841; status=novel;
J15 glean mRNA 126763 129003 1 + . ID=Gglean075841-TA; Parent=Gglean075841; status=novel;
J15 glean CDS 126763 126973 . + 0 Parent=Gglean075841-TA;
J15 glean CDS 127285 127628 . + 2 Parent=Gglean075841-TA;
J15 glean CDS 127719 127854 . + 0 Parent=Gglean075841-TA;
J15 glean CDS 128049 128185 . + 2 Parent=Gglean075841-TA;
J01 glean CDS 6976 7317 . + 0 transcript_id "Gglean025939-TA"; gene_id "Gglean025939";
J01 glean CDS 7912 8162 . + 0 transcript_id "Gglean025939-TA"; gene_id "Gglean025939";
J01 glean CDS 8245 8413 . + 1 transcript_id "Gglean025939-TA"; gene_id "Gglean025939";
J01 glean CDS 8479 8790 . + 0 transcript_id "Gglean025939-TA"; gene_id "Gglean025939";
J01 glean CDS 9444 9708 . - 2 transcript_id "Gglean025954-TA"; gene_id "Gglean025954";
J01 glean CDS 9778 9935 . - 0 transcript_id "Gglean025954-TA"; gene_id "Gglean025954";
J01 glean CDS 10012 10216 . - 2 transcript_id "Gglean025954-TA"; gene_id "Gglean025954";
J01 glean CDS 10299 10754 . - 2 transcript_id "Gglean025954-TA"; gene_id "Gglean025954";
J01 glean CDS 10838 10926 . - 0 transcript_id "Gglean025954-TA"; gene_id "Gglean025954";
J01 glean CDS 11015 11082 . - 1 transcript_id "Gglean025954-TA"; gene_id "Gglean025954";
#gff2gtf gffread my.gff3 -T -o my.gtf #gtf2gff gffread merged.gtf -o- > merged.gff3
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!