扩增子序列直接物种注释代码

扩增子单条序列注释汇总代码如下 import rewith...

扩增子的注释一般都会先聚类,但如果手里的序列非常少,只有几千条,那不一定能得到结果,或者就是想看看每条序列都是什么物种,那就可以使用blastn比对以后汇总结果

汇总代码如下

import re

with open("results_clean_batch1.txt", "r", encoding="utf-8") as file:
    text = file.read()

matches = re.findall(r'(Query=.*?)(?=Query=|$)', text, re.DOTALL)

results = []
for match in matches:
    query_id = re.search(r'Query= (\S+)', match).group(1)
    species_matches = re.findall(r'>([^ ]+) ([^>]+?)\nLength', match, re.DOTALL)
    identities = re.findall(r'Identities = (.*?)\,', match)

    query_results = [query_id]
    for accession, species in species_matches:
        species_name = ' '.join(species.split())
        if identities:
            identity = identities.pop(0)
            query_results.extend([accession, f"Species: {species_name}", f"Identity: {identity}"])

    results.append(query_results)

# Output to a file
with open('output_batch2.txt', 'w') as f:
    # Add header
    header = "Sequence ID\tMatch 1 Accession\tMatch 1 Species\tMatch 1 Identity\tMatch 2 Accession\tMatch 2 Species\tMatch 2 Identity\tMatch 3 Accession\tMatch 3 Species\tMatch 3 Identity"
    f.write(header + '\n')

    # Write the results
    for result in results:
        f.write('\t'.join(result) + '\n')

发表于 2024-01-22 11:23
阅读 ( 1162 )
分类：测序技术

扩增子序列直接物种注释代码

你可能感兴趣的文章

相关问题

0 条评论

作家榜 »