python学习笔记3

¶urllib库

¶request模块

¶BaseHandler类

Handler的父类，基本函数有default_open(),protocol_request()

¶Hander子类

¶HTTPDefaultErrorHandler

用于处理HTTP响应错误

¶HTTPRedirectHandler

用于处理重定向

¶HTTPCookieProcessor

用于处理cookies:

¶字典处理

使用http.cookiejar.CookieJar()创建cookie

使用HTTPCookieProcessor创建handle并传入cookie

build_opener()创建opener并open()

import http.cookiejar
import urllib.request
cookie = http.cookiejar.CookieJar()
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com/')
for item in cookie:
    print(item.name+'='+item.value)

¶保存cookie

使用MozillaCookieJar(filename)或LWPCookieJar(filename)创建cookie

使用cookie.save(ignore_expires=True, ignore_discard=True)保存文件

import http.cookiejar
import urllib.request
filename = 'cookies.txt'
cookie = http.cookiejar.MozillaCookieJar(filename)
# Mozilla格式或LWP格式
# cookie = http.cookiejar.LWPCookieJar(filename)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com/')
cookie.save(ignore_expires=True, ignore_discard=True)

¶读取cookie

使用cookie.load(filename, ignore_discard=True, ignore_expires=True)读取cookie文件

import http.cookiejar
import urllib.request
filename = 'cookies.txt'
cookie = http.cookiejar.MozillaCookieJar()
# Mozilla格式或LWP格式
# cookie = http.cookiejar.LWPCookieJar()
cookie.load(filename, ignore_discard=True, ignore_expires=True)
handler = urllib.request.HTTPCookieProcessor(cookie)
opener = urllib.request.build_opener(handler)
response = opener.open('http://www.baidu.com/')
print(response.read().decode('utf-8'))

¶ProxyHandler

设置代理

使用ProxyHandler，传入参数为字典

使用build_opener(handler)构建opener

使用open()函数打开

from urllib.request import ProxyHandler, build_opener
proxy_handler = ProxyHandler({
    'http': 'http://127.0.0.1:8080',
    'https': 'https://127.0.0.1:8080'
}
)  # 地址自己设置
opener = build_opener(proxy_handler)
response = opener.open('https://www.baidu.com')
print(response.read().decode('utf-8'))

¶HTTPPasswordMgr

用于管理密码

¶HTTPBasicAuthHandler

用于管理认证:

构建HTTPPasswordMgrWithDefaultRealm对象p

使用add_password加入账号密码和url

构建HTTPBasicAuthHandler对象时参数p

使用build_opener创建opener对象

使用open()函数打开

from urllib.request import HTTPPasswordMgrWithDefaultRealm, HTTPBasicAuthHandler, build_opener
username = 'username'
password = 'password'
url = 'http://www.baidu.com/'   # 使用需要身份验证的页面
p = HTTPPasswordMgrWithDefaultRealm()
p.add_password(None, url, username, password)
handler = HTTPBasicAuthHandler(p)
opener = build_opener(handler)
result = opener.open(url)

¶error模块

¶URLError类

继承自OSError类，是error模块的基类

具有reason属性表示异常内容

¶HTTPError类

URLError的子类，专门用来处理HTTP请求错误

属性:

code: 返回HTTP状态码，404表示网页不存在，500表示服务器内部错误
reason: 返回错误原因
headers: 返回请求头

例如:

from urllib import request, error
try:
    response = request.urlopen('https://yelangpi.github.io/index.htm')  # 不存在的网站
except error.HTTPError as e:
    print(e.reason, e.code, e.headers, sep='\n')

¶注意

reason属性可能是一个对象而不是字符串，例如socket.timeout类

import socket
import urllib.request
import urllib.error
try:
    response = urllib.request.urlopen('https://www.baidu.com', timeout=0.01)
except urllib.error.URLError as e:
    print(type(e.reason))
    if isinstance(e.reason, socket.timeout):
        print('TIME OUT')