perl - Change output filename from WGET when using input file option -
i have perl script wrote gets image urls, puts urls input file, , proceeds run wget --input-file
option. works perfectly... or @ least did long image filenames unique.
i have new company sending me data , use troublesome naming scheme. files have same name, 0.jpg
, in different folders.
for example:
cdn.blah.com/folder/folder/202793000/202793123/0.jpg cdn.blah.com/folder/folder/198478000/198478725/0.jpg cdn.blah.com/folder/folder/198594000/198594080/0.jpg
when run script this, wget works fine , downloads images, titled 0.jpg.1
, 0.jpg.2
, 0.jpg.3
, etc. can't count them , rename them because files can broken, not available, whatever.
i tried running wget once each file -o
, it's embarrassingly slow: starting program, connecting site, downloading, , ending program. thousands of times. it's hour vs minutes.
so, i'm trying find method change output filenames wget without taking long. original approach works don't want change unless necessary, open suggestions.
additional:
lwp::simple
simple this. yes, works, slowly. has same problem running individual wget commands. each get()
or get_store()
call makes system re-connect server. since files small (60kb on average) many process (1851 1 test file alone) connection time considerable.
the filename using can found /\/(\d+)\/(\d+.jpg)/i
filename $1$2
2027931230.jpg
. not important question.
i'm looking @ lwp::useragent
lwp::conncache
, times out and/or hangs on pc. need adjust timeout , retry values. inaugural run of code downloaded 693 images (43mb) in couple minutes before hung. using simple, got 200 images in 5 minutes.
use lwp::useragent; use lwp::conncache; chomp(@filelist = <inputfile>); $browser = lwp::useragent->new; $browser->conn_cache(lwp::conncache->new()); foreach(@filelist){ /\/(\d+)\/(\d+.jpg)/i $newfilename = $1.$2; $response = $browser->mirror($_, $folder . $newfilename); die 'response failure' if($response->is_error()); }
lwp::simple's getstore
function allows specify url fetch , filename store data in. it's excellent module many of same use cases wget
, benefit of being perl module (i.e. no need outsource shell or spawn off child processes).
use lwp::simple; # grab filename end of url $filename = (split '/', $url)[-1]; # if file exists, increment name while (-e $filename) { $filename =~ s{ (\d+)[.]jpg }{ $1+1 . '.jpg' }ex or die "unexpected filename encountered"; } getstore($url, $filename);
the question doesn't specify kind of renaming scheme need, work examples given incrementing filename until current directory doesn't contain filename.
Comments
Post a Comment