新聞中心
首先,神槍鎮(zhèn)樓

創(chuàng)新互聯(lián)是一家專(zhuān)業(yè)提供龍沙企業(yè)網(wǎng)站建設(shè),專(zhuān)注與網(wǎng)站設(shè)計(jì)制作、成都網(wǎng)站設(shè)計(jì)、成都h5網(wǎng)站建設(shè)、小程序制作等業(yè)務(wù)。10年已為龍沙眾多企業(yè)、政府機(jī)構(gòu)等服務(wù)。創(chuàng)新互聯(lián)專(zhuān)業(yè)網(wǎng)站建設(shè)公司優(yōu)惠進(jìn)行中。
背景
最近老板愛(ài)上了吃雞(手游:全軍出擊),經(jīng)常拉著我們開(kāi)黑,只能放棄午休的時(shí)間,陪老板在沙漠里奔波。 上周在在微信游戲頻道看戰(zhàn)績(jī)的時(shí)候突發(fā)奇想,是不是可以通過(guò)這個(gè)方式抓取到很多戰(zhàn)斗數(shù)據(jù),然后分析看看有什么規(guī)律。
秀一波戰(zhàn)績(jī),開(kāi)黑情況下我們團(tuán)隊(duì)吃雞率非常高,近100場(chǎng)吃雞次數(shù)51次
簡(jiǎn)單評(píng)估了一下,覺(jué)得可行,咱就開(kāi)始。
Step 1 分析數(shù)據(jù)接口
第一步當(dāng)然是把這些戰(zhàn)績(jī)數(shù)據(jù)采集下來(lái),首先我們需要了解頁(yè)面背后的故事。去看看頁(yè)面是如何獲取戰(zhàn)斗數(shù)據(jù)的。
使用Charles抓包
抓包實(shí)現(xiàn)
在Mac下推薦使用工具Charles來(lái)從協(xié)議層抓取手機(jī)上的流量,原理就是在Mac上開(kāi)啟一個(gè)代*理*服務(wù)器,然后將手機(jī)的網(wǎng)絡(luò)代*理設(shè)置為Mac,這樣手機(jī)上的所有流量都會(huì)經(jīng)過(guò)我們的代*理*服務(wù)器了。 大致流程如下:
https加密流量的處理
在實(shí)際操作的時(shí)候發(fā)現(xiàn)微信所有的流量都走了HTTPS,導(dǎo)致我們的抓到的都是加密數(shù)據(jù),對(duì)我們沒(méi)有任何參考意義。 經(jīng)過(guò)研究,可以通過(guò)在手機(jī)和電腦都安裝Charles根證書(shū)的方式來(lái)實(shí)現(xiàn)對(duì)Https流量的分析,具體操作可以參考:
- charles mac下https抓包和iphone https抓包
- 解決Charles無(wú)法正常抓包iOS 11中的Https請(qǐng)求
安裝證書(shū)后,我們的流量大致是這樣子的
經(jīng)過(guò)上述的配置,我們已經(jīng)可以讀取到https的請(qǐng)求和響應(yīng)數(shù)據(jù)了,如下圖所示。
- windows下用findler可以實(shí)現(xiàn)相同的功能
- 其實(shí)這就是一個(gè)非常典型的中間人場(chǎng)景
數(shù)據(jù)接口
接下來(lái)就根據(jù)這些數(shù)據(jù)來(lái)找出我們需要的接口了,經(jīng)過(guò)分析,主要涉及三個(gè)接口
- 獲取用戶(hù)信息接口
- 獲取用戶(hù)戰(zhàn)績(jī)列表接口
- 獲取用戶(hù)指定戰(zhàn)績(jī)?cè)敿?xì)信息接口
下面我們一個(gè)一個(gè)看
1. 獲取用戶(hù)信息接口
- request
| API | /cgi-bin/gamewap/getpubgmdatacenterindex |
|---|---|
| 方法 | GET |
| 參數(shù) | openid、pass_ticket |
| cookie | key pass_ticket、uin、pgv_pvid、sd_cookie_crttime、sd_userid |
- response
- {
- "user_info": {
- "openid": "oODfo0pjBQkcNuR4XLTQ321xFVws",
- "head_img_url": "http://wx.qlogo.cn/mmhead/Q3auHgzwzM5hSWxxxxxUQPwW9ibxxxx9DlxLTsKWk97oWpDI0rg/96",
- "nick_name": "望",
- "role_name": "xxxx",
- "zone_area_id": 0,
- "plat_id": 1
- },
- "battle_info": {
- "total_1": 75,
- "total_10": 336,
- "total_game": 745,
- "total_kill": 1669
- },
- "battle_list": [{
- "map_id": 1,
- "room_id": "6575389198189071197",
- "team_id": 57,
- "dt_event_time": 1530953799,
- "rank_in_ds": 3,
- "times_kill": 1,
- "label": "前五",
- "team_type": 1,
- "award_gold": 677,
- "mode": 0
- }],
- "appitem": {
- "AppID": "wx13051697527efc45",
- "IconURL": "https://mmocgame.qpic.cn/wechatgame/mEMdfrX5RU0dZFfNEdCsMJpfsof1HE0TP3cfZiboX0ZPxqh5aZnHjxPFXUGgsXmibe/0",
- "Name": "絕地求生 全軍出擊",
- "BriefName": "絕地求生 全軍出擊",
- "Desc": "官方正版絕地求生手游",
- "Brief": "槍?xiě)?zhàn) | 808.2M",
- "WebURL": "https://game.weixin.qq.com/cgi-bin/h5/static/detail_v2/index.html?wechat_pkgid=detail_v2&appid=wx13051697527efc45&show_bubble=0",
- "DownloadInfo": {
- "DownloadURL": "https://itunes.apple.com/cn/app/id1304987143",
- "DownloadFlag": 5
- },
- "Status": 0,
- "AppInfoFlag": 45,
- "Label": [],
- "AppStorePopUpDialogConfig": {
- "Duration": 1500,
- "Interval": 172800,
- "ServerTimestamp": 1531066098
- },
- "HasEnabledChatGroup": false,
- "AppType": 0,
- "game_tag_list": ["絕地求生", "正版還原", "好友開(kāi)黑", "百人對(duì)戰(zhàn)", "超大地圖"],
- "recommend_reason": "正版絕地求生,荒野射擊",
- "size_desc": "808.2M"
- },
- "is_guest": true,
- "is_blocked": false,
- "errcode": 0,
- "errmsg": "ok"
- }
- 分析
openid是用戶(hù)的惟一標(biāo)識(shí)。
2. 獲取用戶(hù)戰(zhàn)績(jī)列表接口
- request
| API | /cgi-bin/gamewap/getpubgmbattlelist |
|---|---|
| 方法 | GET |
| 參數(shù) | openid、pass_ticket、plat_id、after_time、limit |
| cookie | key pass_ticket、uin、pgv_pvid、sd_cookie_crttime、sd_userid |
- response
- {
- "errcode": 0,
- "errmsg": "ok",
- "next_after_time": 1528120556,
- "battle_list": [{
- "map_id": 1,
- "room_id": "6575389198111172597",
- "team_id": 57,
- "dt_event_time": 1530953799,
- "rank_in_ds": 3,
- "times_kill": 1,
- "label": "前五",
- "team_type": 1,
- "award_gold": 677,
- "mode": 0
- }, {
- "map_id": 1,
- "room_id": "6575336498940384115",
- "team_id": 11,
- "dt_event_time": 1530941404,
- "rank_in_ds": 5,
- "times_kill": 2,
- "label": "前五",
- "team_type": 1,
- "award_gold": 632,
- "mode": 0
- }],
- "has_next": true
- }
- 分析
這個(gè)接口用after_time來(lái)進(jìn)行分頁(yè),遍歷獲取時(shí)可以根據(jù)接口響應(yīng)的has_next和next_after_time來(lái)判斷是否還有下一頁(yè)的數(shù)據(jù)。
列表里面的room_id是每一場(chǎng)battle的惟一標(biāo)識(shí)。
3. 獲取用戶(hù)戰(zhàn)績(jī)?cè)斍榻涌?/h3>
- request
| API | /cgi-bin/gamewap/getpubgmbattledetail |
|---|---|
| 方法 | GET |
| 參數(shù) | openid、pass_ticket、room_id |
| cookie | key pass_ticket、uin、pgv_pvid、sd_cookie_crttime、sd_userid |
- request
- {
- "errcode": 0,
- "errmsg": "ok",
- "base_info": {
- "nick_name": "柚茶",
- "head_img_url": "http://wx.qlogo.cn/mmhead/xxxx/96",
- "dt_event_time": 1528648165,
- "team_type": 4,
- "rank": 1,
- "player_count": 100,
- "role_sex": 1,
- "label": "大吉大利",
- "openid": "oODfo0s1w5lWjmxxxxxgQkcCljXQ"
- },
- "battle_info": {
- "award_gold": 622,
- "times_kill": 6,
- "times_head_shot": 0,
- "damage": 537,
- "times_assist": 3,
- "survival_duration": 1629,
- "times_save": 0,
- "times_reborn": 0,
- "vehicle_kill": 1,
- "forward_distance": 10140,
- "driving_distance": 5934,
- "dead_poison_circle_no": 6,
- "top_kill_distance": 223,
- "top_kill_distance_weapon_use": 2924130819,
- "be_kill_user": {
- "nick_name": "小旭",
- "head_img_url": "http://wx.qlogo.cn/mmhead/ibLButGMnqJNFsUtStNEV8tzlH1QpwPiaF9kxxxxx66G3ibjic6Ng2Rcg/96",
- "weapon_use": 20101000001,
- "openid": "oODfo0qrPLExxxxc0QKjFPnPxyI"
- },
- "label": "大吉大利"
- },
- "team_info": {
- "user_list": [{
- "nick_name": "ooo",
- "times_kill": 6,
- "assist_count": 3,
- "survival_duration": 1638,
- "award_gold": 632,
- "head_img_url": "http://wx.qlogo.cn/mmhead/Q3auHgzwzM4k4RXdyxavNxxxxUjcX6Tl47MNNV1dZDliazRKRg",
- "openid": "oODfo0xxxxf1bRAXE-q-lEezK0k"
- }, {
- "nick_name": "我吃炒肉",
- "times_kill": 2,
- "assist_count": 2,
- "survival_duration": 1502,
- "award_gold": 583,
- "head_img_url": "http://wx.qlogo.cn/mmhead/sTJptKvBQLKd5SAAjOF0VrwiapUxxxxFffxoDUcrVjYbDf9pNENQ",
- "openid": "oODfo0gIyDxxxxZpUrSrpapZSDT0"
- }]
- },
- "is_guest": true,
- "is_blocked": false
- }
- 分析
- 這個(gè)接口響應(yīng)了戰(zhàn)斗的詳細(xì)信息,包括殺*敵數(shù)、爆*頭數(shù)、救人數(shù)、跑動(dòng)距離等等,足夠我們分析了。
- 這個(gè)接口還響應(yīng)了是被誰(shuí)殺死的以及組團(tuán)成員的openid,利用這個(gè)特性我們這可無(wú)限深度的發(fā)散爬取更多用戶(hù)的數(shù)據(jù)。
至于cookie中的息pass_ticket等信息肯定是用于權(quán)限認(rèn)證的,在上述的幾次請(qǐng)求中這些信息都沒(méi)有變化,所以我們不需要深研其是怎么算出來(lái)的,只需要抓包提取到默認(rèn)信息后填到代碼里面就可以用了。
Step 2 爬取數(shù)據(jù)
接口已經(jīng)確定下來(lái)了,接下來(lái)就是去抓取足夠量的數(shù)據(jù)了。
使用requests請(qǐng)求接口獲取數(shù)據(jù)
- url = 'https://game.weixin.qq.com/cgi-bin/gamewap/getpubgmdatacenterindex?openid=%s&plat_id=0&uin=&key=&pass_ticket=%s' % (openid, settings.pass_ticket)
- r = requests.get(url=url, cookies=settings.def_cookies, headers=settings.def_headers, timeout=(5.0, 5.0))
- tmp = r.json()
- wfile = os.path.join(settings.Res_UserInfo_Dir, '%s.txt' % (rediskeys.user(openid)))
- with codecs.open(wfile, 'w', 'utf-8') as wf:
- wf.write(simplejson.dumps(tmp, indent=2, sort_keys=True, ensure_ascii=False))
參照這種方式我們可以很快把另外兩個(gè)接口寫(xiě)好。
使用redis來(lái)標(biāo)記已經(jīng)爬取過(guò)的信息
在上述接口中我們可能從用戶(hù)A的入口進(jìn)去找到用戶(hù)B的openid,然后從用戶(hù)B的入口進(jìn)去又找到用戶(hù)A的openid,為了避免重復(fù)采集,所以我們需要記錄下哪些信息是我們采集過(guò)的。 核心代碼片斷:
- # rediskeys.user_battle_list 根據(jù)openid獲取存在redis中的key值
- def user_battle_list(openid):
- return 'ubl_%s' % (openid)
- # 在提取battle list之前,首先判斷這用用戶(hù)的數(shù)據(jù)是否已經(jīng)提取過(guò)了
- if settings.DataRedis.get(rediskeys.user_battle_list(openid)):
- return True
- # 在提取battle list之后,需要在redis中記錄用戶(hù)信息
- settings.DataRedis.set(rediskeys.user_battle_list(openid), 1)
使用celery來(lái)管理隊(duì)列
celery是一個(gè)非常好用的分布式隊(duì)列管理工具,我這次只打算在我自己的電腦上運(yùn)行,所以并沒(méi)有用到分布式的功能。 我們創(chuàng)建三個(gè)task和三個(gè)queue
- task_queues = (
- Queue('queue_get_battle_info', exchange=Exchange('priority', type='direct'), routing_key='gbi'),
- Queue('queue_get_battle_list', exchange=Exchange('priority', type='direct'), routing_key='gbl'),
- Queue('queue_get_user_info', exchange=Exchange('priority', type='direct'), routing_key='gui'),
- )
- task_routes = ([
- ('get_battle_info', {'queue': 'queue_get_battle_info'}),
- ('get_battle_list', {'queue': 'queue_get_battle_list'}),
- ('get_user_info', {'queue': 'queue_get_user_info'}),
- ],)
然后在task中控制API請(qǐng)求和Redis數(shù)據(jù)實(shí)現(xiàn)完整的任務(wù)邏輯,如:
- @app.task(name='get_battle_list')
- def get_battle_list(openid, plat_id=None, after_time=0, update_time=None):
- # 判斷是否已經(jīng)取過(guò)用戶(hù)戰(zhàn)績(jī)列表信息
- if settings.DataRedis.get(rediskeys.user_battle_list(openid)):
- return True
- if not plat_id:
- try:
- # 提取用戶(hù)信息
- us = handles.get_user_info_handles(openid)
- plat_id=us['plat_id']
- except Exception as e:
- print 'can not get user plat_id', openid, traceback.format_exc()
- return False
- # 提取戰(zhàn)績(jī)列表
- battle_list = handles.get_battle_list_handle(openid, plat_id, after_time=0, update_time=None)
- # 為每一場(chǎng)戰(zhàn)斗創(chuàng)建異步獲取詳情任務(wù)
- for room_id in battle_list:
- if not settings.DataRedis.get(rediskeys.user_battle(openid, room_id)):
- get_battle_info.delay(openid, plat_id, room_id)
- return True
開(kāi)始抓取
因?yàn)槲覀兪前l(fā)散是爬蟲(chóng),所以需要給代碼一個(gè)用戶(hù)的入口,所以需要手動(dòng)創(chuàng)建一個(gè)用戶(hù)的采集任務(wù)
- from tasks.all import get_battle_list
- my_openid = 'oODfo0oIErZI2xxx9xPlVyQbRPgY'
- my_platid = '0'
- get_battle_list.delay(my_openid, my_platid, after_time=0, update_time=None)
有入口之后我們就用celery來(lái)啟動(dòng)worker去開(kāi)始爬蟲(chóng)
- # 啟動(dòng)獲取用戶(hù)詳情worker
- celery -A tasks.all worker -c 5 --queue=queue_get_user_info --loglevel=info -n get_user_info@%h
- # 啟動(dòng)獲取戰(zhàn)績(jī)列表worker
- celery -A tasks.all worker -c 5 --queue=queue_get_battle_list --loglevel=info -n get_battle_list@%h
- # 啟動(dòng)獲取戰(zhàn)績(jī)?cè)斍閣orker
- celery -A tasks.all worker -c 30 --queue=queue_get_battle_info --loglevel=info -n get_battle_info@%h
這樣我們的爬蟲(chóng)就可以愉快的跑起來(lái)了。再通過(guò)celery-flower來(lái)查看執(zhí)行情況。
- celery flower -A tasks.all --broker=redis://:$REDIS_PASS@$REDIS_HOST:$REDIS_PORT/10
通過(guò)flower,我們可以看到運(yùn)行的效率還是非常不錯(cuò)的。
在執(zhí)行過(guò)程中會(huì)發(fā)現(xiàn)get_battle_list跑太快,導(dǎo)致get_battle_info即使開(kāi)了30個(gè)并發(fā)都還會(huì)積壓很多,所以需要適時(shí)的去停一下這些worker。 在我們抓到20萬(wàn)條信息之后就可以停下來(lái)了。
Step 3 數(shù)據(jù)分析
分析方案
20萬(wàn)場(chǎng)戰(zhàn)斗的數(shù)據(jù)已經(jīng)抓取好了,全部分成json文件存在我本地磁盤(pán)上,接下來(lái)就做一些簡(jiǎn)單的分析。 python在數(shù)據(jù)分析領(lǐng)域也非常強(qiáng)大,有很多非常優(yōu)秀的庫(kù),如pandas和NumPy,可惜我都沒(méi)有學(xué)過(guò),而且對(duì)于一個(gè)高考數(shù)學(xué)只考了70幾分的人來(lái)說(shuō),數(shù)據(jù)分析實(shí)在是難,所以就自己寫(xiě)了一個(gè)非常簡(jiǎn)單的程序來(lái)做一些淺度分析。 需要進(jìn)行深度分析,又不想自己爬蟲(chóng)的大??梢月?lián)系我打包這些數(shù)據(jù)。
- # coding=utf-8
- import os
- import json
- import datetime
- import math
- from conf import settings
- class UserTeamTypeData:
- def __init__(self, team_type, player_count):
- self.team_type = team_type
- self.player_count = player_count
- self.label = {}
- self.dead_poison_circle_no = {}
- self.count = 0
- self.damage = 0
- self.survival_duration = 0 # 生存時(shí)間
- self.driving_distance = 0
- self.forward_distance = 0
- self.times_assist = 0 # 助攻
- self.times_head_shot = 0
- self.times_kill = 0
- self.times_reborn = 0 # 被救次數(shù)
- self.times_save = 0 # 救人次數(shù)
- self.top_kill_distance = []
- self.top_kill_distance_weapon_use = {}
- self.vehicle_kill = 0 # 車(chē)輛殺死
- self.award_gold = 0
- self.times_reborn_by_role_sex = {0: 0, 1: 0} # 0 男 1 女
- self.times_save_by_role_sex = {0: 0, 1: 0} # 0 男 1 女
- def update_dead_poison_circle_no(self, dead_poison_circle_no):
- if dead_poison_circle_no in self.dead_poison_circle_no:
- self.dead_poison_circle_no[dead_poison_circle_no] += 1
- else:
- self.dead_poison_circle_no[dead_poison_circle_no] = 1
- def update_times_reborn_and_save_by_role_sex(self, role, times_reborn, times_save):
- if role not in self.times_reborn_by_role_sex:
- return
- self.times_reborn_by_role_sex[role] += times_reborn
- self.times_save_by_role_sex[role] += times_save
- def update_top_kill_distance_weapon_use(self, weaponid):
- if weaponid not in self.top_kill_distance_weapon_use:
- self.top_kill_distance_weapon_use[weaponid] = 1
- else:
- self.top_kill_distance_weapon_use[weaponid] += 1
- class UserBattleData:
- def __init__(self, openid):
- self.openid = openid
- self.team_type_res = {}
- self.label = {}
- self.hour_counter = {}
- self.weekday_counter = {}
- self.usetime = 0
- self.day_record = set()
- self.battle_counter = 0
- def get_avg_use_time_per_day(self):
- # print "get_avg_use_time_per_day:", self.openid, self.usetime, len(self.day_record), self.usetime / len(self.day_record)
- return self.usetime / len(self.day_record)
- def update_label(self, lable):
- if lable in self.label:
- self.label[lable] += 1
- else:
- self.label[lable] = 1
- def get_team_type_data(self, team_type, player_count):
- player_count = int(math.ceil(float(player_count) / 10))
- team_type_key = '%d_%d' % (team_type, player_count)
- if team_type_key not in self.team_type_res:
- userteamtypedata = UserTeamTypeData(team_type, player_count)
- self.team_type_res[team_type_key] = userteamtypedata
- else:
- userteamtypedata = self.team_type_res[team_type_key]
- return userteamtypedata
- def update_user_time_property(self, dt_event_time):
- dt_event_time = datetime.datetime.fromtimestamp(dt_event_time)
- hour = dt_event_time.hour
- if hour in self.hour_counter:
- self.hour_counter[hour] += 1
- else:
- self.hour_counter[hour] = 1
- weekday = dt_event_time.weekday()
- if weekday in self.weekday_counter:
- self.weekday_counter[weekday] += 1
- else:
- self.weekday_counter[weekday] = 1
- self.day_record.add(dt_event_time.date())
- def update_battle_info_by_room(self, roomid):
- # print ' load ', self.openid, roomid
- file = os.path.join(settings.Res_UserBattleInfo_Dir, self.openid, '%s.txt' % roomid)
- with open(file, 'r') as rf:
- battledata = json.load(rf)
- self.battle_counter += 1
- base_info = battledata['base_info']
- self.update_user_time_property(base_info['dt_event_time'])
- battle_info = battledata['battle_info']
- userteamtypedata = self.get_team_type_data(base_info['team_type'], base_info['player_count'])
- userteamtypedata.count += 1
- userteamtypedata.award_gold += battle_info['award_gold']
- userteamtypedata.damage += battle_info['damage']
- userteamtypedata.update_dead_poison_circle_no(battle_info['dead_poison_circle_no'])
- userteamtypedata.driving_distance += battle_info['driving_distance']
- userteamtypedata.forward_distance += battle_info['forward_distance']
- self.update_label(battle_info['label'])
- userteamtypedata.survival_duration += battle_info['survival_duration']
- self.usetime += battle_info['survival_duration']/60
- userteamtypedata.times_assist += battle_info['times_assist']
- userteamtypedata.times_head_shot += battle_info['times_head_shot'] 網(wǎng)頁(yè)標(biāo)題:用Python分析了20萬(wàn)場(chǎng)吃雞數(shù)據(jù)
瀏覽地址:http://m.fisionsoft.com.cn/article/coosdjo.html


咨詢(xún)
建站咨詢(xún)
