踩过的坑全都是泪
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "http://www.chncpa.org/was5/web/search?channelid=242606&orderby=+kssj&page=1&sw=&searchword=jssj%3E=%272019-06-12%27&perPj=&pjnumpos="
html = urlopen(url)
bs = BeautifulSoup(html, 'html.parser')
接下来有两种方式:
第一种:
hyperlink = bs.find_all('a')
for h in hyperlink:
hh = h.get('href')
print(hh)
第二种:
urls = bsObj.findAll("a",{"href":re.compile("ticket*")}) //这里用正则表达式进一步筛选url
for url in urls:
print(url["href"])
最最重要的是url,用chrome找到json实际加载的地址
而网页地址是 http://www.chncpa.org/searchall/?typey=26&typec=,这个地址是找不到url的