bioinformatics - Querying NCBI for a sequence from ncbi via Biopython -

- March 15, 2012

how can query ncbi sequences given chromosome's genbank identifier, , start , stop positions using biopython?

cp001665    napp    tile    6373    6422    .   +   .   cluster=9;  cp001665    napp    tile    6398    6447    .   +   .   cluster=3;  cp001665    napp    tile    6423    6472    .   +   .   cluster=3;  cp001665    napp    tile    6448    6497    .   +   .   cluster=3; cp001665    napp    tile    7036    7085    .   +   .   cluster=10;  cp001665    napp    tile    7061    7110    .   +   .   cluster=3;  cp001665    napp    tile    7073    7122    .   +   .   cluster=3;

from bio import entrez bio import seqio  entrez.email = "sample@example.org"  handle = entrez.efetch(db="nuccore",                        id="cp001665",                        rettype="gb",                        retmode="text")  whole_sequence = seqio.read(handle, "genbank")  print whole_sequence[6373:6422]

once know id , database fetch from, use entrez.efetch handle file. should specify returning type (rettype="gb") , mode (retmode="text"), handler filelike data.

then pass handler seqio, should return seqrecord object. 1 nice feature of seqrecords can cleanly sliced lists. if can retrieve starting , ending points somewhere, above print statement returns:

id: cp001665.1 name: cp001665 description: escherichia coli 'bl21-gold(de3)plyss ag', complete genome. number of features: 0 seq('gcgctaaccatgcgagcgtgcctgatgcgctacgcttatcaggcctacg', iupacambiguousdna())

Search This Blog

CSS

bioinformatics - Querying NCBI for a sequence from ncbi via Biopython -

Comments

Post a Comment

Popular posts from this blog

sql server - MSSQL Text and Varchar(MAX) fields shown (MEMO) in DBGrid -

qml - Is it possible to implement SystemTrayIcon functionality in Qt Quick application -

double exclamation marks in haskell -