bioinformatics - Querying NCBI for a sequence from ncbi via Biopython -
how can query ncbi sequences given chromosome's genbank identifier, , start , stop positions using biopython?
cp001665 napp tile 6373 6422 . + . cluster=9; cp001665 napp tile 6398 6447 . + . cluster=3; cp001665 napp tile 6423 6472 . + . cluster=3; cp001665 napp tile 6448 6497 . + . cluster=3; cp001665 napp tile 7036 7085 . + . cluster=10; cp001665 napp tile 7061 7110 . + . cluster=3; cp001665 napp tile 7073 7122 . + . cluster=3;
from bio import entrez bio import seqio entrez.email = "sample@example.org" handle = entrez.efetch(db="nuccore", id="cp001665", rettype="gb", retmode="text") whole_sequence = seqio.read(handle, "genbank") print whole_sequence[6373:6422] once know id , database fetch from, use entrez.efetch handle file. should specify returning type (rettype="gb") , mode (retmode="text"), handler filelike data.
then pass handler seqio, should return seqrecord object. 1 nice feature of seqrecords can cleanly sliced lists. if can retrieve starting , ending points somewhere, above print statement returns:
id: cp001665.1 name: cp001665 description: escherichia coli 'bl21-gold(de3)plyss ag', complete genome. number of features: 0 seq('gcgctaaccatgcgagcgtgcctgatgcgctacgcttatcaggcctacg', iupacambiguousdna())
Comments
Post a Comment