作业要求:
选择简书解密大数据专题里面上次爬虫课的作业文档地址作为分析页面,分析并提交该页面的网页结构分析与元素标签位置信息。
上次作业链接 http://www.jianshu.com/p/7e2fccb4fad9
HTML基本结构
网页基本结构图
head 区域
网页标题
<title>爬虫课程作业01-解密大数据社群 - 简书</title>
顶部导航栏
简
|<a class="logo" href="/">| </a>
写文章
| <a class="btn write-btn" target="_blank" href="/writer#/">
| <i class="iconfont ic-write"></i>写文章
| </a>
发现、关注、消息和搜索四个按钮
<div class="collapse navbar-collapse" id="menu">
<ul class="nav navbar-nav">
<li class="">
<a href="/">
<span class="menu-text">发现</span><i class="iconfont ic-navigation-discover menu-icon"></i>
</a> </li>
<li class="">
<a href="/subscriptions">
<span class="menu-text">关注</span><i class="iconfont ic-navigation-follow menu-icon"></i>
</a> </li>
<li class="notification v-notification-dropdown-menu ">
<a class="notification-btn" href="/notifications" data-hover="dropdown">
<span class="menu-text">消息</span>
<i class="iconfont ic-navigation-notification menu-icon"></i>
<span class="badge"></span>
</a>
</li>
<li class="search">
<form target="_blank" action="/search" accept-charset="UTF-8" method="get"><input name="utf8" type="hidden" value="✓" />
<input type="text" name="q" id="q" value="" placeholder="搜索" class="search-input" />
<a class="search-btn" href="javascript:void(null)"><i class="iconfont ic-search"></i></a>
</form> </li>
</ul>
</div>
文章标题
<h1 class="title">爬虫课程作业01-解密大数据社群</h1>
作者信息
<div class="author">
<a class="avatar" href="/u/40cc6159e5ad">
</a> <div class="info">
<span class="tag">作者</span>
<span class="name"><a href="/u/40cc6159e5ad">在旅途的车</a></span>
文章基本信息,包括更新时间、字数、阅读数量、评论数量、喜欢数量等等
<div class="meta">
<span class="publish-time" data-toggle="tooltip" data-placement="bottom" title="" data-original-title="最后编辑于 2017.07.04 00:29">2017.07.04 00:26*</span>
<span class="wordage">字数 387</span>
<span class="views-count">阅读 33</span><span class="comments-count">评论 2</span><span class="likes-count">喜欢 2</span></div>
文章主体内容:
<div data-note-content="" class="show-content">
<div class="image-package">
<div class="image-caption">glenn-carstens-peters-203007.jpg</div>
</div>
<p>最近对金融行业的就业情况比较感兴趣,准备从领英网站获取一些数据,做一些分析。</p>
<p>一、要爬取的数据类别</p>
<p>领英网站金融行业的职位数据,包括公司名称、职位名称、薪酬范围、职位要求</p>
<p>二、对应的数据源网站</p>
<p>领英网址 www.linkedin.com</p>
<p>三、爬取数据的URL</p>
<p><a href="https://www.linkedin.com/jobs/search/?keywords=audit&location=%E5%85%A8%E7%90%83&locationId=OTHERS.worldwide" target="_blank">https://www.linkedin.com/jobs/search/?keywords=audit&location=%E5%85%A8%E7%90%83&locationId=OTHERS.worldwide</a></p>
<p>四、数据筛选规则</p>
<p>根据职位的类别、招聘公司、职位所在地域、职位对应工作年限的要求、发布日期、职位要求、薪酬范围等维度,对爬取的数据进行筛选和分析,希望获得以下结论:</p>
<p>某个特定职位的薪酬水平及变化趋势,判断该职位的稀缺程度和就业概率;</p>
<p>某个特定职位的地域分布情况,提供自己发展的区域选择参考依据;</p>
<p>某个特定职位在不同行业的分布情况,和对应的薪酬水平,以审计(audit)为例,该职位具备一定的行业共性,但是不同行业、同一个职位薪酬水平不同,可以为自己做职业转换提供参考;</p>
<p>某个特定职位的工作要求,为自己的职业发展和技能培训提供指导性意见。</p>
</div>
侧边浮动按钮,主要包括回到顶部、文章投稿、收藏文章和分享文章四个功能:
<ul><li data-placement="left" data-toggle="tooltip" data-container="body" data-original-title="回到顶部"><a class="function-button"><i class="iconfont ic-backtop"></i></a></li> <li data-placement="left" data-toggle="tooltip" data-container="body" data-original-title="文章投稿"><a class="js-submit-button"><i class="iconfont ic-note-requests"></i></a> </li> <li data-placement="left" data-toggle="tooltip" data-container="body" data-original-title="收藏文章"><a class="function-button"><i class="iconfont ic-mark"></i></a></li> <li data-placement="left" data-toggle="tooltip" data-container="body" data-original-title="分享文章"><a tabindex="0" role="button" data-toggle="popover" data-placement="left" data-html="true" data-trigger="focus" href="javascript:void(0);" data-content="<ul class='share-list'>
<li><a class="weixin-share"><i class="social-icon-sprite social-icon-weixin"></i><span>分享到微信</span></a></li>
<li><a href="javascript:void((function(s,d,e,r,l,p,t,z,c){var%20f='http://v.t.sina.com.cn/share/share.php?appkey=1881139527',u=z||d.location,p=['&url=',e(u),'&title=',e(t||d.title),'&source=',e(r),'&sourceUrl=',e(l),'&content=',c||'gb2312','&pic=',e(p||'')].join('');function%20a(){if(!window.open([f,p].join(''),'mb',['toolbar=0,status=0,resizable=1,width=440,height=430,left=',(s.width-440)/2,',top=',(s.height-430)/2].join('')))u.href=[f,p].join('');};if(/Firefox/.test(navigator.userAgent))setTimeout(a,0);else%20a();})(screen,document,encodeURIComponent,'','','', '我写了新文章《爬虫课程作业01-解密大数据社群》( 分享自 @简书 )','http://www.jianshu.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=weibo','页面编码gb2312|utf-8默认gb2312'));"><i class='social-icon-sprite social-icon-weibo'></i><span>分享到微博</span></a></li>
<li><a href="javascript:void(function(){var d=document,e=encodeURIComponent,r='http://sns.qzone.qq.com/cgi-bin/qzshare/cgi_qzshare_onekey?url='+e('http://www.jianshu.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=qzone')+'&title='+e('我写了新文章《爬虫课程作业01-解密大数据社群》'),x=function(){if(!window.open(r,'qzone','toolbar=0,resizable=1,scrollbars=yes,status=1,width=600,height=600'))location.href=r};if(/Firefox/.test(navigator.userAgent)){setTimeout(x,0)}else{x()}})();"><i class='social-icon-sprite social-icon-zone'></i><span>分享到QQ空间</span></a></li>
<li><a href="javascript:void(function(){var d=document,e=encodeURIComponent,r='https://twitter.com/share?url='+e('http://www.jianshu.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=twitter')+'&text='+e('我写了新文章《爬虫课程作业01-解密大数据社群》( 分享自 @jianshucom )')+'&related='+e('jianshucom'),x=function(){if(!window.open(r,'twitter','toolbar=0,resizable=1,scrollbars=yes,status=1,width=600,height=600'))location.href=r};if(/Firefox/.test(navigator.userAgent)){setTimeout(x,0)}else{x()}})();"><i class='social-icon-sprite social-icon-twitter'></i><span>分享到Twitter</span></a></li>
<li><a href="javascript:void(function(){var d=document,e=encodeURIComponent,r='https://www.facebook.com/dialog/share?app_id=483126645039390&display=popup&href=http://www.jianshu.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=facebook',x=function(){if(!window.open(r,'facebook','toolbar=0,resizable=1,scrollbars=yes,status=1,width=450,height=330'))location.href=r};if(/Firefox/.test(navigator.userAgent)){setTimeout(x,0)}else{x()}})();"><i class='social-icon-sprite social-icon-facebook'></i><span>分享到Facebook</span></a></li>
<li><a href="javascript:void(function(){var d=document,e=encodeURIComponent,r='https://plus.google.com/share?url='+e('http://www.jianshu.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=google_plus'),x=function(){if(!window.open(r,'google_plus','toolbar=0,resizable=1,scrollbars=yes,status=1,width=450,height=330'))location.href=r};if(/Firefox/.test(navigator.userAgent)){setTimeout(x,0)}else{x()}})();"><i class='social-icon-sprite social-icon-google'></i><span>分享到Google+</span></a></li>
<li><a href="javascript:void(function(){var d=document,e=encodeURIComponent,s1=window.getSelection,s2=d.getSelection,s3=d.selection,s=s1?s1():s2?s2():s3?s3.createRange().text:'',r='http://www.douban.com/recommend/?url='+e('http://www.jianshu.com/p/7e2fccb4fad9?utm_campaign=maleskine&utm_content=note&utm_medium=reader_share&utm_source=douban')+'&title='+e('爬虫课程作业01-解密大数据社群')+'&sel='+e(s)+'&v=1',x=function(){if(!window.open(r,'douban','toolbar=0,resizable=1,scrollbars=yes,status=1,width=450,height=330'))location.href=r+'&r=1'};if(/Firefox/.test(navigator.userAgent)){setTimeout(x,0)}else{x()}})()"><i class='social-icon-sprite social-icon-douban'></i><span>分享到豆瓣</span></a></li>
</ul>" data-original-title="" title="" class="function-button"><i class="iconfont ic-share"></i></a> <!----></li></ul>
底部作者信息:
<div class="follow-detail">
<div class="info">
<a class="avatar" href="/u/40cc6159e5ad">
</a> <div data-author-follow-button=""></div>
<a class="title" href="/u/40cc6159e5ad">在旅途的车</a>
<p>写了 39662 字,被 26 人关注,获得了 35 个喜欢</p></div>
</div>