python - NTLM authentication with Scrapy for web scraping -


i attempting scrape data website requires authentication.
have been able login using requests , httpntlmauth following:

s = requests.session()      url = "https://website.com/things"                                                       response = s.get(url, auth=httpntlmauth('domain\\username','password')) 

i explore capabilities of scrapy, have not been able authenticate.

i came across following middleware seems work not think have been implementing properly:

https://github.com/reimund/ntlm-middleware/blob/master/ntlmauth.py

in settings.py have

spider_middlewares = { 'test.ntlmauth.ntlmauthmiddleware': 400, } 

and in spider class have

http_user = 'domain\\user' http_pass = 'pass' 

i have not been able work.

if has been able scrape website ntlm authentication can point me in right direction, appreciate it.

i able figure out going on.

1: considered "downloader_middleware" not "spider_middleware".

downloader_middlewares = { 'test.ntlmauth.ntlm_middleware': 400, } 

2: middleware trying use needed modified significantly. here works me:

from scrapy.http import response import requests                                                               requests_ntlm import httpntlmauth  class ntlm_middleware(object):      def process_request(self, request, spider):         url = request.url         pwd = getattr(spider, 'http_pass', '')         usr = getattr(spider, 'http_user', '')         s = requests.session()              response = s.get(url,auth=httpntlmauth(usr,pwd))               return response(url,response.status_code,{}, response.content) 

within spider, need set these variables:

http_user = 'domain\\user' http_pass = 'pass' 

Comments

Popular posts from this blog

google api - Incomplete response from Gmail API threads.list -

Installing Android SQLite Asset Helper -

Qt Creator - Searching files with Locator including folder -