string - Adding different genes for same chromosomal regions to single line in Python -


hi new python , have file chromosomal regions , corresponding genes of region, need include different genes of same region in 1 line like

chr12   10954262    10962540    chr12   10880241    11502235    100.0       acacb   - chr12   10954262    10962540    chr12   10880241    11502235    100.0       rad52   - chr12   10954262    10962540    chr12   10880241    11502235    100.0       rad52   - chr12   10954262    10962540    chr12   10880241    11502235    100.0       tas2r8  - chr12   10954262    10962540    chr12   10880241    11502235    100.0       tas2r9  - 

from above lines have out put in single line (like below)with gene names side of chromosomal region instead of multiple lines

chr12   10954262    10962540    chr12   10880241    11502235    100.0 acacb, rad52, rad52, tas2r8, tas2r9 

your highly appreciated.

jyothi

assuming filename file containing containing following:

chr12 10954262 10962540 chr12 10880241 11502235 100.0 acacb -   chr12 10954262 10962540 chr12 10880241 11502235 100.0 rad52 -   chr12 10954262 10962540 chr12 10880241 11502235 100.0 rad52 -   chr12 10954262 10962540 chr12 10880241 11502235 100.0 tas2r8 -  chr12 10954262 10962540 chr12 10880241 11502235 100.0 tas2r9 -  chr12 10977955 10999847 chr12 10880241 11502235 100.0 erc1 -    chr12 10977955 10999847 chr12 10880241 11502235 100.0 kctd10 -  chr12 10977955 10999847 chr12 10880241 11502235 100.0 mmab -    chr12 10977955 10999847 chr12 10880241 11502235 100.0 myo1h -   chr12 10977955 10999847 chr12 10880241 11502235 100.0 prr4 -    chr12 10977955 10999847 chr12 10880241 11502235 100.0 rad52 -   

script.py

from collections import defaultdict        genes_dict = defaultdict(list)             line in open("filename",'r'):                                 _,val,key = line[::-1].split(" ",2)        genes_dict[key[::-1]].append(val[::-1])             key in genes_dict:                         vals = ""                                  val in genes_dict[key]:                    vals +=","+val                    print key,vals.lstrip(",")        

output

chr12 10977955 10999847 chr12 10880241 11502235 100.0 erc1,kctd10,mmab,myo1h,prr4,rad52 chr12 10954262 10962540 chr12 10880241 11502235 100.0 acacb,rad52,rad52,tas2r8,tas2r9 

Comments

Popular posts from this blog

google api - Incomplete response from Gmail API threads.list -

Installing Android SQLite Asset Helper -

Qt Creator - Searching files with Locator including folder -