string - Adding different genes for same chromosomal regions to single line in Python -
hi new python , have file chromosomal regions , corresponding genes of region, need include different genes of same region in 1 line like
chr12 10954262 10962540 chr12 10880241 11502235 100.0 acacb - chr12 10954262 10962540 chr12 10880241 11502235 100.0 rad52 - chr12 10954262 10962540 chr12 10880241 11502235 100.0 rad52 - chr12 10954262 10962540 chr12 10880241 11502235 100.0 tas2r8 - chr12 10954262 10962540 chr12 10880241 11502235 100.0 tas2r9 -
from above lines have out put in single line (like below)with gene names side of chromosomal region instead of multiple lines
chr12 10954262 10962540 chr12 10880241 11502235 100.0 acacb, rad52, rad52, tas2r8, tas2r9
your highly appreciated.
jyothi
assuming filename
file containing containing following:
chr12 10954262 10962540 chr12 10880241 11502235 100.0 acacb - chr12 10954262 10962540 chr12 10880241 11502235 100.0 rad52 - chr12 10954262 10962540 chr12 10880241 11502235 100.0 rad52 - chr12 10954262 10962540 chr12 10880241 11502235 100.0 tas2r8 - chr12 10954262 10962540 chr12 10880241 11502235 100.0 tas2r9 - chr12 10977955 10999847 chr12 10880241 11502235 100.0 erc1 - chr12 10977955 10999847 chr12 10880241 11502235 100.0 kctd10 - chr12 10977955 10999847 chr12 10880241 11502235 100.0 mmab - chr12 10977955 10999847 chr12 10880241 11502235 100.0 myo1h - chr12 10977955 10999847 chr12 10880241 11502235 100.0 prr4 - chr12 10977955 10999847 chr12 10880241 11502235 100.0 rad52 -
script.py
from collections import defaultdict genes_dict = defaultdict(list) line in open("filename",'r'): _,val,key = line[::-1].split(" ",2) genes_dict[key[::-1]].append(val[::-1]) key in genes_dict: vals = "" val in genes_dict[key]: vals +=","+val print key,vals.lstrip(",")
output
chr12 10977955 10999847 chr12 10880241 11502235 100.0 erc1,kctd10,mmab,myo1h,prr4,rad52 chr12 10954262 10962540 chr12 10880241 11502235 100.0 acacb,rad52,rad52,tas2r8,tas2r9
Comments
Post a Comment