i'm trying scrape pdfs have paid subscription , valid login credentials.

the url 1 of pdf files :

after loggin in manually , accessing above url, actual url in address bar when pdf rendered in browser :

when access url i'm redirected - login page.

i have read several threads in python requests , tried below code past login page , handle cookies, still keep getting login page response says : "if registered snl user, log in using email address , password."

import requests, sys requests.packages.urllib3 import add_stderr_logger  add_stderr_logger() s = requests.session() s.headers['user-agent'] = 'mozilla/5.0'  name_form = 'username' password_form = 'password' login = {name_form: 'my_email_id', password_form: 'my_password'} login_response ="", data=login) print 'l',login_response r in login_response.history:     if r.status_code == 401:  # 401 means authentication failed         sys.exit(1)  # abort  pdf_response = s.get("") 


2014-06-26 13:04:54,555 debug added stderr logging handler logger: requests.packages.urllib3 2014-06-26 13:04:54,605 info starting new https connection (1): 2014-06-26 13:04:55,943 debug "get /interactivex/default.aspx http/1.1" 302 152 2014-06-26 13:04:56,282 debug "get /interactivex/logincookiecheck.aspx http/1.1" 302 143 2014-06-26 13:04:56,650 debug "get /interactivex/default.aspx http/1.1" 200 none 2014-06-26 13:04:56,865 info starting new http connection (1): 2014-06-26 13:04:57,447 debug "get /interactivex/file.aspx?id=17670354&keyfileformat=pdf http/1.1" 302 143 2014-06-26 13:04:57,788 debug "get /interactivex/default.aspx http/1.1" 302 162 2014-06-26 13:04:58,151 debug "get /interactivex/default.aspx http/1.1" 200 none 

i don't know how interpret output when googled response code 200, learnt means ok.

but when print pdf_response.text, returns login page again.


