pandas的字符串操作
在pandas中可以直接使用一些对字符串操作的方法直接操作。
data = ['peter', 'Paul', 'MARY', 'gUIDO']
[s.capitalize() for s in data]
import pandas as pd
names = pd.Series(data)
names
names.str.capitalize()
str 可以将其他对象转化为字符串,可以忽略不可操作的空值。
pandas 列表的字符串方法
len() |
lower() |
translate() |
islower() |
ljust() |
upper() |
startswith() |
isupper() |
rjust() |
find() |
endswith() |
isnumeric() |
center() |
rfind() |
isalnum() |
isdecimal() |
zfill() |
index() |
isalpha() |
split() |
strip() |
rindex() |
isdigit() |
rsplit() |
rstrip() |
capitalize() |
isspace() |
partition() |
lstrip() |
swapcase() |
istitle() |
rpartition() |
见名即可知道含义。
用法:
monte.str.lower()
正则表达式
Method | Description |
---|---|
match() |
Call re.match() on each element, returning a boolean. |
extract() |
Call re.match() on each element, returning matched groups as strings. |
findall() |
Call re.findall() on each element |
replace() |
Replace occurrences of pattern with some other string |
contains() |
Call re.search() on each element, returning a boolean |
count() |
Count occurrences of pattern |
split() |
Equivalent to str.split() , but accepts regexps |
rsplit() |
Equivalent to str.rsplit() , but accepts regexps |
. * ? ^ $ 这些匹配语句
其他方法
Method | Description |
---|---|
get() |
Index each element |
slice() |
Slice each element |
slice_replace() |
Replace slice in each element with passed value |
cat() |
Concatenate strings |
repeat() |
Repeat values |
normalize() |
Return Unicode form of string |
pad() |
Add whitespace to left, right, or both sides of strings |
wrap() |
Split long strings into lines with length less than a given width |
join() |
Join strings in each element of the Series with passed separator |
get_dummies() |
extract dummy variables as a dataframe |
从其他途径读取数据
比较多的是json,xml等数据形式。
with open('recipeitems-latest.json') as f:
line = f.readline()
pd.read_json(line).shape