1. 找最大的一批值
import pandas as pd
import numpy as np
pd.set_option('max_columns', 4, 'max_rows', 10, 'max_colwidth', 12)
movie = pd.read_csv('../data/movie.csv')
movie2 = movie[['movie_title', 'imdb_score', 'budget']]
movie2.head()
# Use the .nlargest method to select the top N data by column name
# eg: select the top 100 movies by imdb_score
movie2.nlargest(100, 'imdb_score').head()
结果
2. 找一批最小值
你可以用链式操作,在前一个结果集上继续调用函数。
如下:
# chain the .nsmallest method to return the 3 lowest budget films among those with a top 100 score
(movie2
.nlargest(100, 'imdb_score')
.nsmallest(3, 'budget')
)
结果
3. 值排序
(movie
[['movie_title', 'title_year', 'imdb_score']]
.sort_values('imdb_score', ascending=False)
)
结果
4. 去重
去重前
# original data
(movie
[['movie_title', 'title_year', 'imdb_score']]
.sort_values(['title_year','imdb_score'],
ascending=False)
)
去重后
# use the .drop_duplicates method to keep only the first row of every year
(movie
[['movie_title', 'title_year', 'imdb_score']]
.sort_values(['title_year','imdb_score'],
ascending=False)
.drop_duplicates(subset='title_year')
)