实战案例——淘宝商品比价定向爬虫

2019年4月10日

1229

1.目标：

获取淘宝搜索页面的信息，提取其中的商品名称和价格。

# 起始页
https://s.taobao.com/search?q=书包`&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.2017.201856-taobao-item.1&ie=utf8&initiative_id=tbindexz_20170306
    
# 第二页
https://s.taobao.com/search?q=书包&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.2017.201856-taobao-item.1&ie=utf8&initiative_id=tbindexz_20170306&bcoffset=3&ntoffset=3&p4ppushleft=1%2C48&s=44
    
# 第三页
https://s.taobao.com/search?q=书包&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.2017.201856-taobao-item.1&ie=utf8&initiative_id=tbindexz_20170306&bcoffset=0&ntoffset=6&p4ppushleft=1%2C48&s=88

2.程序的结构设计：

步骤1：提交商品搜索请求，循环获取页面。
步骤2：对于每个页面，提取商品名称和价格信息。
步骤3：将信息输出到屏幕上。

3.代码实现：

import re
import requests


def getHTMLText(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""


def parsePage(ilt, html):
    try:
        plt = re.findall(r'\"view_price\"\:\"[\d\.]*\"', html)
        tlt = re.findall(r'\"raw_title\"\:\".*?\"', html)
        for i in range(len(plt)):
            price = eval(plt[i].split(':')[1])
            title = eval(tlt[i].split(':')[1])
            ilt.append([price, title])
    except:
        print("")


def printGoodsList(ilt):
    tplt = "{:4}\t{:8}\t{:16}"
    print(tplt.format("序号", "价格", "商品名称"))
    count = 0
    for g in ilt:
        count = count + 1
        print(tplt.format(count, g[0], g[1]))


def main():
    goods = '书包'
    depth = 3
    start_url = 'https://s.taobao.com/search?q=' + goods
    infoList = []
    for i in range(depth):
        try:
            url = start_url + '&s=' + str(44 * i)
            html = getHTMLText(url)
            parsePage(infoList, html)
        except:
            continue
    printGoodsList(infoList)


main()

不过代码运行时，系统爬取的是淘宝登录页面，需要模拟登录才可以爬取，这一部分内容在后面会提到如何处理。

Previous article实战案例——中国大学排名定向爬虫

Next article实战案例——股票数据定向爬虫

欢迎留下您的宝贵建议 Cancel reply

Please enter your comment!

Please enter your name here

You have entered an incorrect email address!

Please enter your email address here

实战案例——淘宝商品比价定向爬虫

1.目标：

2.程序的结构设计：

3.代码实现：

【Python计算生态】Dooit——待办事项管理...

【Python内置函数】hex()函数

【Python计算生态】Black——代码格式化工...

欢迎留下您的宝贵建议 Cancel reply

Most Popular

【Python计算生态】Dooit——待办事项管理...

【Python内置函数】hex()函数

【Python计算生态】Black——代码格式化工...

【Python内置函数】help()函数

Recent Comments

EDITOR PICKS

RSS

3D Map Generator Terrain

1.ENVI软件操作基础——窗口介绍及打开、浏览数...

POPULAR POSTS

【ArcGIS工具箱】211.子类型——设置默认子...

【数据可视化（ECharts篇）】35.地理数据可...

【ArcGIS工具箱】84.多元分析——波段集统计...

POPULAR CATEGORY