讲解:RQ4、Java,Python、c++,JavaProcessing|Processing

ADM / 2019 / Homework_2 / README.MDtlancian [upd] RQ4 - Clarificationa9dc986 4 days ago4 contributorsBranch: master Find file Copy path130 lines (73 sloc) 11.3 KBHomework 2 - Soccer analyticsSoccer analytics is attracting an increasing interest of academiaand industry, thanks to the availability of sensing technologiesthat provide high-fidelity data streams extracted from everymatch.The goal of this assignment is to perform an analysis on thelargest open collection of soccer-logs ever released, collectedby [Wyscout] (https://wyscout.com/) containing all the spatiotemporalevents (passes, shots, fouls, etc.) that occur during allmatches of the entire season 2017-2018 of seven competitions(La Liga, Serie A, Bundesliga, Premier League, Ligue 1, FIFAWorld Cup 2018, UEFA Euro Cup 2016). A match event containsinformation about its position, time, outcome, player andcharacteristics.In particular, we are curious to answer to some specificresearch questions (RQs) that may help us discover andinterpret meaningful patterns in data.Raw Blame HistoryBefore startingAmong all numerous things and good practises a data scientistneeds to do before running any analysis, there is one the is ofuttermost importance: get data and understand it!Here you find the list of tasks you need to perform beforedigging into the rich world of soccer.Get your data! Go to this website and download the filesrelated to Coaches, Players, Events, Teams and Matches.Throughout the analysis we focus only in club teamsinformation. So, there is no need to download/use thefiles relative to the European Cup and the World Cup.Understand your data. Read the legend of each column tounderstand what it refers to. Additional information aboutthe labels can be found here: Coaches, Players, Events,Teams, Matches. Please, be sure that youve understoodthe data before start coding.Handling data. The data are provided in multiple .jsonfiles, with some of the columns present in more than onefile. For this reason, in order to answer the RQs, we kindlysuggest you to import the .json files as pandasDataFrame object and then, based on what you want toanalyze, perform joins among the DataFrames. Here youcan find a quick useful guide. Remember, Google is yourbest friend!VERY VERY IMPORTANT]. !!! Read the entire homework before coding anything!!!^. My solution its not better than yours and yours is notbetter than mine. In any data analysis task, there is not aunique way to answer to RQs. For this reason it is crucial(necessary and mandatory) that you describe any singledecision you take and all the steps you do._. Once performed any exercise, comments about theobtained results are mandatory. We are not always explicitwhere to focus your comments, but we will always wantsome brief sentences about your discoveries.Research questionsExploratory Data AnalysisGeneral Setup: All the analysis requested from RQ1 to RQ5,must be performed only over the Premier League dataset.]. [RQ1] Who wants to be a Champion? During a season couldhappen that a team has bad periods. For example, morethan three consecutive games lost, or it could have apositive trend where it seems to be unbeatable. Letsvisualize this trends!Create a plot where each point (x,y) represents the numberof points obtained by team x at game week y. In order toshow the trends, points related to the same team must beconnected to each other. Remind: in soccer each team gets3 points for a win, 1 point for a tied game, and 0 for a loss.Highlight the two teams that got the longest winning streak(# of consecutive wins), and the two teams that got thelongest losing streak (# of consecutive losses).Below you can see a similar example of what we would likeyou to show us. Keep in mind that you must create this plotfor all the entire season (38 game weeks).^. [RQ2] Is there a home-field advantage? It is generallybelieved that there is an underlying home field advantage insport, i.e. an highest probability of winning of the hometeam. Lets check for this, and see whether the outcome ofthe game (win, draw, lose) is correlated to the playing side(home or away). For 5 different teams of Premier League,show the contingency table (outcome x side). Therefore,perform an overall Chi-squared test in the following way:build a unique contingency table, that contains all thematches in which only one of the 5 teams previouslyselected is involved, to see whether there is home fieldadvantage. State clearly the tested hypothesis and whetherit is accepted or rejected._. [RQ3] Which teams have the youngest coaches? Rank allthe teams by the age of their coach and show the 10 teamswith the youngest coaches. Remember that during aseason a team could have more coaches, in that case pickthe younger of them. AdditioRQ4代做、代写Java,Python编程设计、代做c++,nally, show the distirbutions ofthe ages of all coaches in Premier League, using a boxplot.(Hint: Theres an attribute birthDate).f. [RQ4] Find the top 10 players with the highest ratiobetween completed passes and attempted passes. For thistask, consider all the different types of passes, and asspecified in the website, a completed pass has tag 1801(accurate event).In order to avoid meaningless results (e.g. players whoplayed few minutes, and completed 2 passes over 2,achieving 100% ratio), select an arbitrary threshold ofminimum attempted passes, in order to consider only thesubset of players that played enough. Justify the choicesyou make.i. [RQ5] Does being a tall player mean winning more airduels? Soccer is a physical game, and it happens often in amatch that players are involved in air duels (i.e. when twoplayers are contending for the ball while it is not on theground). Make a plot that shows the dependency betweenheight of the player and the ratio of air duels won with airduels attempted. The visualization should be a scatterplot,where each point (x,y) represent a player whose height isequal to x, and that has a ratio of winning air duels equal toy. Furthermore, color any point according an arbitraryselection of categories of height (e.g. yellow: 160-165cm,orange: 165-170cm, etc.)Remember that the Air Duel is a subevent of the eventDuel and that an air duel is said to be won if it has the tag1801. Same as in RQ4, choose a threshold of minimum airduels attempted, in order filter your data, get reliableresults, and justify your choice.j. [RQ6] Free your mind! Go further with the EDA (ExploratoryData Analysis) showing a new interesting result about thedataset that you found.Core Research Questions[CRQ1] What are the time slots of the match with moregoals? Lets analyse and visualise the goals distribution into9-minutes sets for all the matches. I.e., lets transform theminute of a goal from a continuous variable in a discretevariable (e.g. A goal scored in 5th minute, will end up in theinterval [0-9)). Remind that every match goes usually fromminute 0, to minute 90, but in football it is always added anarbitary amount of extra-time to every half of the match,thus consider also the intervals 45+ and 90+.i. Make a barplot with the absolute frequency of goals inall the time slots.ii. Find the top 10 teams that score the most in theinterval 81-90.iii. Show if there are players that were able to score atleast one goal in 8 different intervals.[CRQ2] Visualize movements and passes on the pitch! Herewe try to focus our attention on the zones that a playercovers during a match. For each event, we have a pair ofcoordinates, that are respectively the starting and endingpoint of that event. It can be helpful to follow this link.Knowing all the different positions where events happen, let usbe able to create different types of visualizations:]. Considering only the match Barcelona - Real Madridplayed on the 6 May 2018:visualize with a heatmap the zones where CristianoRonaldo was more active. The events to be consideredare: passes, shoots, duels, free kicks.compare his map with the one of Lionel Messi.Comment the results and point out the maindifferences (we are not looking for deep and techniqueanalysis, just show us if there are some cleardifferences between the 2 plots).Heres an example of heatmap where are shown all the startingpositions of the goals of Arsenal during the entire season.^. Considering only the match Juventus - Napoli played onthe 22 April 2018:visualize with arrows the starting point and endingpoint of each pass done during the match by Jorginhoand Miralem Pjanic. Is there a huge differencebetween the map with all the passes done and the onewith only accurate passes? Comment the results andpoint out the main differences.Here theres an example of a map with arrows.Theoretical QuestionYou are given the recursive function splitSwap, which acceptsan array a, an index i, and a length n.function splitSwap(a, l, n): if n return splitSwap(a, l, n/2) splitSwap(a, l+ n /2, n/2) swapList(a, l, n)The subroutine swapList is described here:function swapList(a, l, n): for i = 1 to n/2: tmp = a[l + i] a[l + i] = a[l + n/2 + i] a[l + n/2 + i] = tmp]. How much running time does it take to execute splitSwap(a,0, n)? (We want a Big O analysis.)^. What does this algorithm do? Is it optimal? Describe themechanism of the algorithm in details, we do not want toknow only its final result.Bonus]. Repeat the entire analysis for other leagues (La Liga, SerieA, Bundesliga and Ligue 1), aggregating the results andhighlighting the differences you find among the leagues.^. Make nice visualization using libraries like Bokeh andSeaborn.转自:http://www.3daixie.com/contents/11/3444.html

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,588评论 6 496
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,456评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,146评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,387评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,481评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,510评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,522评论 3 414
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,296评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,745评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,039评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,202评论 1 343
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,901评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,538评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,165评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,415评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,081评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,085评论 2 352

推荐阅读更多精彩内容