scrapy

使用scrapy提交一个表单

一些登录界面提交的参数除了用户名和密码还有一些参数是隐藏的,如
<input type="hidden" value = "aaa">,通过FormRequest.from_response可以自动提取要提交的参数,再添加上用户名和密码

如官网文档中的例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import scrapy

class LoginSpider(scrapy.Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']

def parse(self, response):
return scrapy.FormRequest.from_response(
response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login
)

def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.logger.error("Login failed")
return

# continue scraping with authenticated session...