URLLib2.URL Error: Reading Server Response Codes (Python) -
i have list of urls. i'd see server response code of each , find out if broken. can read server errors (500) , broken links (404) okay, code breaks once non-website read (e.g. "notawebsite_broken.com"). i've searched around , not found answer... hope can help.
here's code:
import urllib2 #list of urls. third url not website urls = ["http://www.google.com","http://www.ebay.com/broken-link", "http://notawebsite_broken"] #empty list store output response_codes = [] # run "for" loop: server response code , save results response_codes url in urls: try: connection = urllib2.urlopen(url) response_codes.append(connection.getcode()) connection.close() print url, ' - ', connection.getcode() except urllib2.httperror, e: response_codes.append(e.getcode()) print url, ' - ', e.getcode() print response_codes
this gives output of...
http://www.google.com - 200 http://www.ebay.com/broken-link - 404 traceback (most recent call last): file "test.py", line 12, in <module> connection = urllib2.urlopen(url) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/urllib2.py", line 404, in open response = self._open(req, data) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/urllib2.py", line 422, in _open '_open', req) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/urllib2.py", line 1214, in http_open return self.do_open(httplib.httpconnection, req) file "/system/library/frameworks/python.framework/versions/2.7/lib/python2.7/urllib2.py", line 1184, in do_open raise urlerror(err) urllib2.urlerror: <urlopen error [errno 8] nodename nor servname provided, or not known>
does know fix or can point me in right direction?
you use requests:
import requests urls = ["http://www.google.com","http://www.ebay.com/broken-link", "http://notawebsite_broken"] u in urls: try: r = requests.get(u) print "{} {}".format(u,r.status_code) except exception,e: print "{} {}".format(u,e) http://www.google.com 200 http://www.ebay.com/broken-link 404 http://notawebsite_broken httpconnectionpool(host='notawebsite_broken', port=80): max retries exceeded url: /
Comments
Post a Comment