regex - Retrieving text from Pattern1 to Pattern2 - Python -


i have input file below

pattern1 ptr1 blah blah blah needthis  blah blah blah thisoneaswell  blah blah blah pattern2  pattern1 ptr2 blah blah blah needthis  blah blah blah thisoneaswell  blah blah blah pattern2   ............................ ............................  pattern1  ptrn blah blah needthis  blah blah blah thisoneaswell blah blah blah pattern2 

i need function return first column entries pattern1 pattern2 below,

ptr1 needthis thisoneaswell  ptr2 needthis thisoneaswell  ...................... ...................... ptrn needthis thisoneaswell 

ptr1 , ptr2 ...... ptrn each different texts. pattern1 & pattern2 different consistently present in file.

how can achieve in python?

i still beginner in python , trying achieve use re.findall() not getting desired o/p:

def retrieve():     file = open("filename","r")     string = re.findall(r"pattern1",file.read())     print string 

you nest 2 regexes:

txt='''\ pattern1 ptr1 blah blah blah needthis1  blah blah blah thisoneaswell1  blah blah blah pattern2  pattern1 ptr2 blah blah blah needthis2  blah blah blah thisoneaswell2  blah blah blah pattern2   ............................ ............................  pattern1  ptrn blah blah needthisn  blah blah blah thisoneaswelln blah blah blah pattern2'''  import re  m in re.finditer(r'^pattern1\s*(.*?)(?=^pattern2)', txt, re.m | re.s):     print re.findall(r'(^\w+)', m.group(1), re.m) 

prints:

['ptr1', 'needthis1', 'thisoneaswell1'] ['ptr2', 'needthis2', 'thisoneaswell2'] ['ptrn', 'needthisn', 'thisoneaswelln'] 

edit 1

if using file fit in memory:

with open(fn) f:     txt=f.read()     m in re.finditer(r'^pattern1\s*(.*?)(?=^pattern2)', txt, re.m | re.s):         print re.findall(r'(^\w+)', m.group(1), re.m) 

use mmap larger files won't fit in memory.


edit 2

just append results list after joining string:

with open(fn) f:     results=[]     txt=f.read()     m in re.finditer(r'^pattern1\s*(.*?)(?=^pattern2)', txt, re.m | re.s):         results.append('\n'.join(re.findall(r'(^\w+)', m.group(1), re.m))     print '\n===\n'.join(results) 

Comments

Popular posts from this blog

google api - Incomplete response from Gmail API threads.list -

Installing Android SQLite Asset Helper -

Qt Creator - Searching files with Locator including folder -