python - NTLM authentication with Scrapy for web scraping -
i attempting scrape data website requires authentication.
have been able login using requests , httpntlmauth following:
s = requests.session() url = "https://website.com/things" response = s.get(url, auth=httpntlmauth('domain\\username','password'))
i explore capabilities of scrapy, have not been able authenticate.
i came across following middleware seems work not think have been implementing properly:
https://github.com/reimund/ntlm-middleware/blob/master/ntlmauth.py
in settings.py have
spider_middlewares = { 'test.ntlmauth.ntlmauthmiddleware': 400, }
and in spider class have
http_user = 'domain\\user' http_pass = 'pass'
i have not been able work.
if has been able scrape website ntlm authentication can point me in right direction, appreciate it.
i able figure out going on.
1: considered "downloader_middleware" not "spider_middleware".
downloader_middlewares = { 'test.ntlmauth.ntlm_middleware': 400, }
2: middleware trying use needed modified significantly. here works me:
from scrapy.http import response import requests requests_ntlm import httpntlmauth class ntlm_middleware(object): def process_request(self, request, spider): url = request.url pwd = getattr(spider, 'http_pass', '') usr = getattr(spider, 'http_user', '') s = requests.session() response = s.get(url,auth=httpntlmauth(usr,pwd)) return response(url,response.status_code,{}, response.content)
within spider, need set these variables:
http_user = 'domain\\user' http_pass = 'pass'
Comments
Post a Comment