豆瓣电影top250爬虫系列(三)--- python+Echarts数据可视化

github源码下载

  • 前两篇我们分别爬取了电影数据,也将爬取到的数据存到了数据库;
  • 接下来我们要对现有的数据进行分析,已获得一些有效信息;
  • 我这里只是进行了简单的可视化分析,运用Echarts插件生成各种图标;
python连接mysql数据库查询电影信息,并生成json数据,存储到本地文件里,以供前端js读取生成可视化图表:
  • 查询电影类型数量并返回json数据,其后写入文件里面
typeNameList = ['剧情','喜剧','动作','爱情','科幻','悬疑','惊悚','恐怖','犯罪',
            '同性','音乐','歌舞','传记','历史','战争','西部','奇幻','冒险',
            '灾难','武侠','情色']
def getMovieTypeJson():
    typeNumList = []
    for type in typeNameList:
        sql = r"select count(type) from movie where type like '%{}%'".format(type)
        dataM = getJsonData(sql)
        typeNumList.append(int(str(dataM).strip(r'(').strip(r',)')))

    return {'typeNameList' : typeNameList, 'typeNumList' : typeNumList}


def writeTypeJsonFile(path):
    with open(path, 'w') as f:
        json.dump(getMovieTypeJson(), f)


#执行写入操作
writeTypeJsonFile(r'C:\Users\Administrator\Desktop\books\movieType.txt')
  • 对应前端页面:
<!DOCTYPE html>
<html style="height: 100%">
   <head>
       <meta charset="utf-8">
   </head>
   <body style="height: 100%; margin: 0">
       <div id="container" style="height: 100%"></div>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/echarts.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts-gl/echarts-gl.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts-stat/ecStat.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/extension/dataTool.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/map/js/china.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/map/js/world.js"></script>
       <script type="text/javascript" src="http://api.map.baidu.com/api?v=2.0&ak=ZUONbpqGBsYGXNIYHicvbAbM"></script> 
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/extension/bmap.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/simplex.js"></script>
       <script type="text/javascript" src="C:\Users\Administrator\Desktop\books\jquery.min.js"></script>
       <script type="text/javascript">
        var dom = document.getElementById("container");
        var myChart = echarts.init(dom);
        var app = {};
        option = null;
        var typeNameList = [];
        var typeNumList = []
                
        $.ajaxSettings.async = false;
        $.getJSON ("../Desktop/books/movieType.txt", function (data)  {
            typeNameList = data.typeNameList;
            typeNumList = data.typeNumList;

            //alert(typeNumList)

        })


        app.title = '豆瓣top250电影类型统计 - 条形图';
        
            option = {
                title: {
                    text: '豆瓣top250电影类型统计',
                    subtext: '数据来自豆瓣'
                },
                tooltip: {
                    trigger: 'axis',
                    axisPointer: {
                        type: 'shadow'
                    }
                },
                legend: {
                    data: ['电影类型数量', '2012年']
                },
                grid: {
                    left: '3%',
                    right: '4%',
                    bottom: '3%',
                    containLabel: true
                },
                xAxis: {
                    type: 'value',
                    boundaryGap: [0, 0.01]
                },
                yAxis: {
                    type: 'category',
                    data: typeNameList
                },
                series: [
                    {
                        name: '电影类型数量',
                        type: 'bar',
                        data: typeNumList
                    }
                ]
            };
        
        if (option && typeof option === "object") {
            myChart.setOption(option, true);
        }
       </script>
   </body>
</html>
  • 生成图表结果:


    type.jpg
  • 按照type --> age --> country --> score --> movieLength --> title的顺序进行循环

def getMovieTreeJson():
    jsonFinal = '{"types": ['
    for type in typeNameList:
        sql = r"select distinct age from movie where type like '%{}%' order by age desc".format(type)
        ageList = getJsonData(sql)
        jsonFinal += '{{"name":"{}", "children":['.format(type)
        for age in getPureList(ageList):
            sql = r"select distinct country from movie where age = '{}' and type like '%{}%'".format(age, type)
            countryList = getJsonData(sql)
            countryArr = []
            jsonFinal += '{{"name":"{}", "children":['.format(age)
            for country in getPureList(countryList):
                if country.split(" ")[0] not in countryArr:
                    countryArr.append(country.split(" ")[0])
                else:
                    continue
                sql = r"select distinct score from movie where age = '{}' and type like '%{}%' and country like '{}%'" \
                      r"order by score desc".format(age, type, country.split(" ")[0])
                scoreList = getJsonData(sql)
                jsonFinal += '{{"name":"{}", "children":['.format(country.split(" ")[0])
                for score in getPureList(scoreList):
                    sql = r"select distinct movieLength from movie where age = '{}' and type like '%{}%' and country like '{}%'" \
                          r"and score = '{}' order by score desc".format(age, type, country.split(" ")[0], score)
                    movieLengthList = getJsonData(sql)

                    jsonFinal += '{{"name":"分数{}", "children":['.format(score)
                    for movieLength in getPureList(movieLengthList):
                        jsonFinal += '{{"name":"时长{}", "children":['.format(movieLength)
                        sql = r"select title, note from movie where age = '{}' and type like '%{}%' and country like '{}%'" \
                              r"and score = '{}' and movieLength = '{}' order by score desc".format(
                              age, type, country.split(" ")[0], score, movieLength)
                        titleNoteList = getJsonData(sql)

                        # print(age, type, country.split(" ")[0], score, movieLength, str(titleNoteList[0]).strip(","))
                        for title, note in titleNoteList:
                            jsonFinal += '{{"name":"{}", "value":"{}"}},'.format(title, note)
                            # print(jsonFinal[:-1])
                        jsonFinal = jsonFinal[:-1] + ']},'
                    jsonFinal = jsonFinal[:-1] + ']},'
                jsonFinal = jsonFinal[:-1] + ']},'
            jsonFinal = jsonFinal[:-1] + ']},'
        jsonFinal = jsonFinal[:-1] + ']},'
    jsonFinal = jsonFinal[:-1] + ']},'

    return jsonFinal[:-1]


def writeTreeJsonFile(path):
    with open(path, 'w') as f:
        json.dump(getMovieTreeJson(), f)

writeTreeJsonFile(r'C:\Users\Administrator\Desktop\books\movieTreeJson.txt')
  • 对应html页面
<!DOCTYPE html>
<html style="height: 100%">
   <head>
       <meta charset="utf-8">
   </head>
   <body style="height: 100%; margin: 0">
       <div id="container-0" style=" height: 300%;margin-bottom:100px;"></div>
       <div id="container-1" style=" height: 300%; margin-bottom:100px;"></div>
       <div id="container-2" style=" height: 300%;margin-bottom:100px;"></div>
       <div id="container-3" style=" height: 300%; margin-bottom:100px;"></div>
       <div id="container-4" style=" height: 300%;margin-bottom:100px;"></div>
       <div id="container-5" style=" height: 300%; margin-bottom:100px;"></div>
       <div id="container-6" style=" height: 300%;margin-bottom:100px;"></div>
       <div id="container-7" style=" height: 300%; margin-bottom:100px;"></div>
       <div id="container-8" style=" height: 300%;margin-bottom:100px;"></div>
       <div id="container-9" style=" height: 300%; margin-bottom:100px;"></div>
       <div id="container-10" style=" height: 300%;margin-bottom:100px;"></div>
       <div id="container-11" style=" height: 300%; margin-bottom:100px;"></div>
       <div id="container-12" style=" height: 300%;margin-bottom:100px;"></div>
       <div id="container-13" style=" height: 300%; margin-bottom:100px;"></div>
       <div id="container-14" style=" height: 300%;margin-bottom:100px;"></div>
       <div id="container-15" style=" height: 200%; margin-bottom:100px;"></div>
       <div id="container-16" style=" height: 300%;margin-bottom:100px;"></div>
       <div id="container-17" style=" height: 300%; margin-bottom:100px;"></div>
       <div id="container-18" style=" height: 300%;margin-bottom:100px;"></div>
       <div id="container-19" style=" height: 300%;margin-bottom:100px;"></div>
       <div id="container-20" style=" height: 300%;margin-bottom:100px;"></div>

       <!-- <div id="container1" style="height: 100%"></div> -->
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/echarts.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts-gl/echarts-gl.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts-stat/ecStat.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/extension/dataTool.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/map/js/china.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/map/js/world.js"></script>
       <script type="text/javascript" src="http://api.map.baidu.com/api?v=2.0&ak=ZUONbpqGBsYGXNIYHicvbAbM"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/extension/bmap.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/simplex.js"></script>
       <script type="text/javascript" src="C:\Users\Administrator\Desktop\books\jquery.min.js"></script>
       <script type="text/javascript">

var myChart;
var dataEnd = {};
//   http://echarts.baidu.com/examples/data/asset/data/flare.json ../Desktop/books/movieTreeJson.txt





$.ajaxSettings.async = false;
$.get('../Desktop/books/movieTreeJson.txt', function (data) {
    
    var subStr = data.substring(1,data.length-1).replace(/\\"/g, "\"");
    
    //alert(subStr);
    dataEnd = JSON.parse(subStr);
    //alert(dataEnd)
});


for(var i=0; i<21; i++) {
    initEcharts("container-"+i, i);
}


function initEcharts(name, index) {
    
    myChart = echarts.init(document.getElementById(name));
    option = null;
    myChart.showLoading();


    myChart.hideLoading();
    

    myChart.setOption(option = {
        tooltip: {
            trigger: 'item',
            triggerOn: 'mousemove'
        },
        series: [
            {
                type: 'tree',

                data: [dataEnd.types[i]],

                top: '18%',
                bottom: '14%',

                layout: 'radial',

                symbol: 'emptyCircle',

                symbolSize: 7,

                initialTreeDepth: 3,

                animationDurationUpdate: 750

            }
        ]
    });
    if (option && typeof option === "object") {
        myChart.setOption(option, true);
    }

}
       </script>
   </body>
</html>
  • 图标结果是21种电影类型,这里指贴出其中一张示例


    tree.png
  • 查询年代得分:

def getAgeScoreJson():
    ageScoreMap = {}
    ageScoreMap['ages'] = ['Growth']
    ageScoreMap['ageNames'] = []
    sql = r'select DISTINCT age from movie ORDER BY age desc'
    ageList = getPureList(getJsonData(sql))
    # print(ageList)
    for age in ageList:
        avgScoreList = []
        for type in typeNameList:
            sql = r"select avg(score) from movie where age = '{}' and type like '%{}%'".format(age, type)
            avgScore = str(getPureList(getJsonData(sql))).strip("['").strip("']")
            if avgScore == 'None':
                avgScore = 0
            avgScoreList.append(round(float(avgScore)))
        ageScoreMap[str(age)] = avgScoreList
        ageScoreMap['ages'].append(str(age))
        # ageScoreMap['ageNames'].append('result.type' + str(age))
    ageScoreMap['names'] = typeNameList

    return ageScoreMap

def writeAgeScoreJsonFile(path):
    with open(path, 'w') as f:
        json.dump(getAgeScoreJson(), f)

writeAgeScoreJsonFile(r'C:\Users\Administrator\Desktop\books\movieAgeScoreJson.txt')
  • 前端页面:
<!DOCTYPE html>
<html style="height: 100%">
   <head>
       <meta charset="utf-8">
   </head>
   <body style="height: 150%; margin: 0">
       <div id="container" style="height: 100%"></div>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/echarts.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts-gl/echarts-gl.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts-stat/ecStat.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/extension/dataTool.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/map/js/china.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/map/js/world.js"></script>
       <script type="text/javascript" src="http://api.map.baidu.com/api?v=2.0&ak=ZUONbpqGBsYGXNIYHicvbAbM"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/echarts/extension/bmap.min.js"></script>
       <script type="text/javascript" src="http://echarts.baidu.com/gallery/vendors/simplex.js"></script>
       <script type="text/javascript" src="C:\Users\Administrator\Desktop\books\jquery.min.js"></script>
       <script type="text/javascript">
var dom = document.getElementById("container");
var myChart = echarts.init(dom);
var app = {};
option = null;
myChart.showLoading();

$.get('../Desktop/books/movieAgeScoreJson.txt', function (result) {
    result = JSON.parse(result);
    //alert(result)

    var series = []
    
    for (var i=1;i<=result.ages.length;i++) {
        series.push({
            name: result.ages[i],
            type: 'bar',
            data: result[result.ages[i]]
            
        })
    }

    
    myChart.hideLoading();

    option = {
        tooltip : {
            trigger: 'axis',
            axisPointer: {
                type: 'shadow',
                label: {
                    show: true
                }
            }
        },
        toolbox: {
            show : true,
            feature : {
                mark : {show: true},
                dataView : {show: true, readOnly: false},
                magicType: {show: true, type: ['line', 'bar']},
                restore : {show: true},
                saveAsImage : {show: true}
            }
        },
        calculable : true,
        legend: {
            data: result.ages,
            itemGap: 5
        },
        grid: {
            top: '12%',
            left: '1%',
            right: '10%',
            containLabel: true
        },
        xAxis: [
            {
                type : 'category',
                data : result.names
            }
        ],
        yAxis: [
            {
                type : 'value',
                name : 'average score',
                axisLabel: {
                    formatter: function (a) {
                        //alert(a)
                        return a;
                    }
                }
            }
        ],
        dataZoom: [
            {
                show: true,
                start: 94,
                end: 100
            },
            {
                type: 'inside',
                start: 94,
                end: 100
            },
            {
                show: true,
                yAxisIndex: 0,
                filterMode: 'empty',
                width: 30,
                height: '80%',
                showDataShadow: false,
                left: '93%'
            }
        ],
        series : series
    };

    myChart.setOption(option);

});;
if (option && typeof option === "object") {
    myChart.setOption(option, true);
}
       </script>
   </body>
</html>
  • 生成图表结果:


    scores.png
subscore.png

tips:

  • 其实还可以生成词云图、折线图等各种其他形式图表;
  • 本文只对电影表进行了分析,并没有对演员表、评论表、获奖表分析;
  • 以后有时间再扩展;
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,752评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,100评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,244评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,099评论 1 286
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,210评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,307评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,346评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,133评论 0 269
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,546评论 1 306
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,849评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,019评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,702评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,331评论 3 319
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,030评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,260评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,871评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,898评论 2 351

推荐阅读更多精彩内容