python find max value and print first 5 lines of file -
i trying program,where have directory , having list of text files,if find "color=" find fuzzy value of 'filename' , 'starting line of file',so:
i need : find max value of fuzzy value , need find first 5 lines file having max value
i did coding can find fuzzy value dont know how find max value , print first 5 files having maximum fuzzy value.please help!
import os fuzzywuzzy import fuzz path = r'c:\python27' data = {} dir_entry in os.listdir(path): dir_entry_path = os.path.join(path, dir_entry) if os.path.isfile(dir_entry_path): open(dir_entry_path, 'r') my_file: line in my_file: part in line.split(): if "color=" in part: print part string1= "filename:", dir_entry_path print(string1) string2= "start line of file:", list(my_file)[0] print(string1) string3=(fuzz.ratio(string1, string2)) print(string3)
and output looks as:
"color=" ('filename:', 'c:\\python27\\maybeee.py') ('filename:', 'c:\\python27\\maybeee.py') 20 "color=" ('filename:', 'c:\\python27\\mayp.py') ('filename:', 'c:\\python27\\mayp.py') 28 part.startswith('color='): ('filename:', 'c:\\python27\\mayp1.py') ('filename:', 'c:\\python27\\mayp1.py') 29
i need output be,considering example here max value 29,so need print first 5 lines of file having max value.please help!answers appreciated.
your code attempts reread entire file again (at list(myfile)[0]
), while there's iterator going on already. troublesome.
it better store 5 first lines (this you're asking, yes?) in variable , print them when condition matches.
also, you're printing string1
twice.
changing loop to:
from collections import defaultdict filenames2fuzz = defaultdict(list) dir_entry in os.listdir(path): dir_entry_path = os.path.join(path, dir_entry) if os.path.isfile(dir_entry_path): first5lines = [] condition_matched_in_file = false open(dir_entry_path, 'r') my_file: line_nbr, line in enumerate(my_file): if line_nbr < 5: first5lines.append(line) part in line.split(): if "color=" in part: print part string1= "filename:", dir_entry_path print(string1) condition_matched_in_file = true fuzziness = fuzz.ratio(string1, first5lines[0]) filenames2fuzz[dir_entry_path].append(fuzziness) print(fuzziness) if condition_matched_in_file: print('\n'.join(first5lines)) # have dictionary holds filenames # fuzziness values, can find first 5 lines again # of file has best fuzziness value. best_fuzziness_ratio = 0 # far can tell, docs indicate between 0 , 100 k, v in filenames2fuzz.items(): if max(v) > best_fuzziness_ratio: best_fuzzy_file = k best_fuzziness_ratio = max(v) print('file {} has highest fuzzy value ' 'of {}. \nthe first 5 lines are:\n' ''.format(best_fuzzy_file, best_fuzziness_ratio)) open(best_fuzzy_file) f: in range(5): print(f.readline())
there few more optimizations (have @ os.walk) , without better explanation of problem (give details files you're looping over, list parts of contents), best can do.
Comments
Post a Comment