pandas.Series.str.contains用于判断Series的字符串中是否包含待匹配的模式或者正则表达式,返回的是一个boolean Series。
Parameters:
pat:str. Character sequence or regular expression.
case:bool, default True. If True, case sensitive.
regex:bool, default True. If True, assumes the pat is a regular expression. If False, treats the pat as a literal string.
下面以一个电影题材的例子说明:
数据集如图所示:
代码:
import pandas as pd
movie = [['Toy Story (1995)', 'Adventure|Animation|Children|Comedy|Fantasy'],
['Jumanji (1995)', 'Adventure|Children|Fantasy'],
['Grumpier Old Men (1995)', 'Comedy|Romance'],
['Waiting to Exhale (1995)', 'Comedy|Drama|Romance'],
['Father of the Bride Part II (1995)', 'Comedy'],
['Heat (1995)', 'Action|Crime|Thriller'],
['Sabrina (1995)', 'Comedy|Romance'],
['Tom and Huck (1995)', 'Adventure|Children'],
['Sudden Death (1995)', 'Action'],
['GoldenEye (1995)', 'Action|Adventure|Thriller'],
['American President, The (1995)', 'Comedy|Drama|Romance'],
['Dracula: Dead and Loving It (1995)', 'Comedy|Horror'],
['Balto (1995)', 'Adventure|Animation|Children'],
['Nixon (1995)', 'Drama'],
['Cutthroat Island (1995)', 'Action|Adventure|Romance'],
['Casino (1995)', 'Crime|Drama'],
['Sense and Sensibility (1995)', 'Drama|Romance'],
['Four Rooms (1995)', 'Comedy'],
['Ace Ventura: When Nature Calls (1995)', 'Comedy'],
['Money Train (1995)', 'Action|Comedy|Crime|Drama|Thriller']]
df = pd.DataFrame(data=movie, columns=['title', 'genres'])
movie_with_Action = df[df['genres'].str.contains(pat='Action')]
movie_with_Romance = df[df['genres'].str.contains(pat='Romance')]
movie_with_Action_Romance = df[df['genres'].str.contains(pat='Action|Romance')]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
结果:
movie_with_Action:
movie_with_Romance:
movie_with_Action_Romance:
————————————————
版权声明:本文为CSDN博主「Stephen__W」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/w1301100424/article/details/98620412