实战案例——中国大学排名定向爬虫

By 进击的码农设计师

2019年4月10日

0

1110

需求：从“中国最好大学网”爬取大学排名信息，输出大学排名信息的屏幕输出（排名、大学名称、总分）

程序的结构设计：

步骤1：从网络上获取大学排名网页内容——getHTMLText()
步骤2：提取网页内容中信息到合适的数据结构——fillUnivList()
步骤3：利用数据结构展示并输出结果——printUnivList()

实现代码：

import requests
from bs4 import BeautifulSoup
import bs4


def getHTMLText(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""


def fillUnivList(ulist, html):
    soup = BeautifulSoup(html, "lxml")
    for tr in soup.find('tbody').children:
        if isinstance(tr, bs4.element.Tag):
            tds = tr('td')
            ulist.append([tds[0].string, tds[1].string, tds[3].string])


def printUnivList(ulist, num):
    tplt = "{0:^10}\t{1:{3}^10}\t{2:^10}"
    print(tplt.format("排名", "学校名称", "总分", chr(12288)))
    for i in range(num):
        u = ulist[i]
        print(tplt.format(u[0], u[1], u[2], chr(12288)))


def main():
    uinfo = []
    url = 'http://www.zuihaodaxue.cn/zuihaodaxuepaiming2016.html'
    html = getHTMLText(url)
    fillUnivList(uinfo, html)
    printUnivList(uinfo, 20)  # 20 univs


main()

Previous article学习与搜索

Next article实战案例——淘宝商品比价定向爬虫

欢迎留下您的宝贵建议 Cancel reply

Please enter your comment!

Please enter your name here

You have entered an incorrect email address!

Please enter your email address here

实战案例——中国大学排名定向爬虫

【Python计算生态】Dooit——待办事项管理...

【Python内置函数】hex()函数

【Python计算生态】Black——代码格式化工...

欢迎留下您的宝贵建议 Cancel reply

Most Popular

【Python计算生态】Dooit——待办事项管理...

【Python内置函数】hex()函数

【Python计算生态】Black——代码格式化工...

【Python内置函数】help()函数

Recent Comments

EDITOR PICKS

RSS

3D Map Generator Terrain

1.ENVI软件操作基础——窗口介绍及打开、浏览数...

POPULAR POSTS

【ArcGIS工具箱】13.表面分析——山体阴影

【Python数据分析】26.文本格式数据的读写

2.空间句法——视线分析

POPULAR CATEGORY