Pandas常用语法

最近过了一遍kaggle上的pandas入门,感觉还是有些东西不熟,所以把内容做一笔记供以后查阅。

pandas cheatsheet

Index, Reference

Select a column 'description'

reviews['description']
reviews.description

Select the first elements of 'description' column
reviews.loc[0, 'description']

Select first row of dataframe

reviews.loc[0, :]
reviews.iloc[0, :]

Select the first 10 elements of dataframe

reviews.iloc[0:10, 0]
reviews.loc[0:10, 'description']

Select the row number 1, 2, 3, 5, 8
reviews.iloc[[1, 2, 3, 5, 8], :]

Select 'country' and 'variety' of the first 100 records
reviews.loc[0:100, ['country', 'variety']]

Select wines made from 'Italy'
reviews[reviews['country']='Italy']

Select entries whose 'region2' is not empty
reviews[reviews.region2.notnull()]

Select last 1000 entries from points
reviews.iloc[-1000:, 3]

Select points for wines made from Italy
reviews[reviews.country=='Italy']].points

Who produces more above-averagely good wines, France or Italy? Select the country column, but only when said country is one of those two options, and the points column is greater than or equal to 90.
reviews[reviews.country.isin(['France', 'Italy']) & reviews.points>=90].country

Summary and maps

What is the median of the points column?
reviews.points.median()

What countries are represented in the dataset?
reviews.country.unique()

What countries appear in the dataset most often?
reviews.country.value_counts()

Remap the price column by subtracting the median price. Use the Series.map method.

m_val = reviews.price.median()
reviews.price.map(lambd x:x-m_val)

Remap the price column by subtracting the median price. Use the DataFrame.apply method.

def f(x):
    return x - m_val
reviews.price.apply(f)

I"m an economical wine buyer. Which wine in is the "best bargain", e.g., which wine has the highest points-to-price ratio in the dataset?
reviews.loc[(reviews.points/reviews.price).idmax()].title

There are only so many words you can use when describing a bottle of wine. Is a wine more likely to be "tropical" or "fruity"? Create a Series counting how many times each of these two words appears in the description column in the dataset.

c_tropical = reviews.description.map(lambda r:'tropical' in r).value_counts()
c_fruity = reviews.description.map(lambda r:'fruity' in r).value_counts()
pd.Series([c_tropical[True], c_fruity[True]], index = ['tropical', 'fruity'])

What combination of countries and varieties are most common?
Create a Series whose index consists of strings of the form "<Country> - <Wine Variety>". For example, a pinot noir produced in the US should map to "US - Pinot Noir". The values should be counts of how many times the given wine appears in the dataset. Drop any reviews with incomplete country or variety data.

df1 = reviews[(reviews.country.notna()&reviews.variety.notna())]
df = df1.apply(lambda s: s.country+ " - "+s.variety, axis = 'columns')
df.value_counts()

Group and Sorting

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,448评论 0 10
  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 9,891评论 0 23
  • 夜已深 人未眠 听一首温暖的情歌 想一个远方的人 回忆一段逝去的时光 品味一场青涩的爱恋 回不去的从前 触不及的明...
    a张广明a阅读 311评论 0 4
  • 我们的小孙子OK两岁零三个月了,他爱听音乐,喜欢画画,特别有爱心,更令人高兴的是,还喜欢读书,但也非常有个性,即有...
    冉心教育阅读 701评论 9 9
  • 雯哥打招呼方法很别致 “哇哇哇……“一阵魔音在耳边突突。我马上睁开眼看看我旁边,“嗯,雯哥(弟弟小名)怎么啦?是尿...
    緋徜认眞阅读 306评论 0 1