网上搜索到的数据,拿来当做pandas学习的练习材料,数据如下:
data = """Edith Mcbride ;,;$1.21 ;,; white ;,;
09/15/17 ,Herbert Tran ;,; $7.29;,;
white&blue;,; 09/15/17 ,Paul Clarke ;,;$12.52
;,; white&blue ;,; 09/15/17 ,Lucille Caldwell
;,; $5.13 ;,; white ;,; 09/15/17,
Eduardo George ;,;$20.39;,; white&yellow
;,;09/15/17 , Danny Mclaughlin;,;$30.82;,;
purple ;,;09/15/17 ,Stacy Vargas;,; $1.85 ;,;
purple&yellow ;,;09/15/17, Shaun Brock;,;
$17.98;,;purple&yellow ;,; 09/15/17 ,
Erick Harper ;,;$17.41;,; blue ;,; 09/15/17,
Michelle Howell ;,;$28.59;,; blue;,; 09/15/17 ,
Carroll Boyd;,; $14.51;,; purple&blue ;,;
09/15/17 , Teresa Carter ;,; $19.64 ;,;
white;,;09/15/17 , Jacob Kennedy ;,; $11.40
;,; white&red ;,; 09/15/17, Craig Chambers;,;
$8.79 ;,; white&blue&red ;,;09/15/17 , Peggy Bell;,; $8.65 ;,;blue ;,; 09/15/17, Kenneth Cunningham ;,; $10.53;,; green&blue ;,;
09/15/17 , Marvin Morgan;,; $16.49;,;
green&blue&red ;,; 09/15/17 ,Marjorie Russell
;,; $6.55 ;,; green&blue&red;,; 09/15/17 ,
Israel Cummings;,; $11.86 ;,;black;,;
09/15/17, June Doyle ;,; $22.29 ;,;
black&yellow ;,;09/15/17 , Jaime Buchanan ;,;
$8.35;,; white&black&yellow ;,; 09/15/17,
Rhonda Farmer;,;$2.91 ;,; white&black&yellow
;,;09/15/17, Darren Mckenzie ;,;$22.94;,;green
;,;09/15/17,Rufus Malone;,;$4.70 ;,; green&yellow
;,; 09/15/17 ,Hubert Miles;,; $3.59
;,;green&yellow&blue;,; 09/15/17 , Joseph Bridges ;,;$5.66 ;,; green&yellow&purple&blue
;,; 09/15/17 , Sergio Murphy ;,;$17.51 ;,;
black ;,; 09/15/17 , Audrey Ferguson ;,;
$5.54;,;black&blue ;,;09/15/17 ,Edna Williams ;,;
$17.13;,; black&blue;,; 09/15/17, Randy Fleming;,; $21.13 ;,;black ;,;09/15/17 ,Elisa Hart;,; $0.35 ;,; black&purple;,; 09/15/17 ,
Ernesto Hunt ;,; $13.91 ;,; black&purple ;,;
09/15/17, Shannon Chavez ;,;$19.26 ;,;
yellow;,; 09/15/17 , Sammy Cain;,; $5.45;,;
yellow&red ;,;09/15/17 , Steven Reeves ;,;$5.50
;,; yellow;,; 09/15/17, Ruben Jones ;,;
$14.56 ;,; yellow&blue;,;09/15/17 , Essie Hansen;,; $7.33 ;,; yellow&blue&red
;,; 09/15/17 , Rene Hardy ;,; $20.22 ;,;
black ;,; 09/15/17 , Lucy Snyder ;,; $8.67
;,;black&red ;,; 09/15/17 ,Dallas Obrien ;,;
$8.31;,; black&red ;,; 09/15/17, Stacey Payne
;,; $15.70 ;,; white&black&red ;,;09/15/17
, Tanya Cox ;,; $6.74 ;,;yellow ;,;
09/15/17 , Melody Moran ;,; $30.84
;,;yellow&black;,; 09/15/17 , Louise Becker ;,;
$12.31 ;,; green&yellow&black;,; 09/15/17 ,
Ryan Webster;,;$2.94 ;,; yellow ;,; 09/15/17
,Justin Blake ;,; $22.46 ;,;white&yellow ;,;
09/15/17, Beverly Baldwin ;,; $6.60;,;
white&yellow&black ;,;09/15/17 , Dale Brady
;,; $6.27 ;,; yellow ;,;09/15/17 ,Guadalupe Potter ;,;$21.12 ;,; yellow;,; 09/15/17 ,
Desiree Butler ;,;$2.10 ;,;white;,; 09/15/17
,Sonja Barnett ;,; $14.22 ;,;white&black;,;
09/15/17, Angelica Garza;,;$11.60;,;white&black
;,; 09/15/17 , Jamie Welch ;,; $25.27 ;,;
white&black&red ;,;09/15/17 , Rex Hudson
;,;$8.26;,; purple;,; 09/15/17 , Nadine Gibbs
;,; $30.80 ;,; purple&yellow ;,; 09/15/17 ,
Hannah Pratt;,; $22.61 ;,; purple&yellow
;,;09/15/17,Gayle Richards;,;$22.19 ;,;
green&purple&yellow ;,;09/15/17 ,Stanley Holland
;,; $7.47 ;,; red ;,; 09/15/17 , Anna Dean;,;$5.49 ;,; yellow&red ;,; 09/15/17 ,
Terrance Saunders ;,; $23.70 ;,;green&yellow&red
;,; 09/15/17 , Brandi Zimmerman ;,; $26.66 ;,;
red ;,;09/15/17 ,Guadalupe Freeman ;,; $25.95;,;
green&red ;,; 09/15/17 ,Irving Patterson
;,;$19.55 ;,; green&white&red ;,; 09/15/17 ,Karl Ross;,; $15.68;,; white ;,; 09/15/17 , Brandy Cortez ;,;$23.57;,; white&red ;,;09/15/17,
Mamie Riley ;,;$29.32;,; purple;,;09/15/17 ,Mike Thornton ;,; $26.44 ;,; purple ;,; 09/15/17,
Jamie Vaughn ;,; $17.24;,;green ;,; 09/15/17 ,
Noah Day ;,; $8.49 ;,;green ;,;09/15/17
,Josephine Keller ;,;$13.10 ;,;green;,; 09/15/17 , Tracey Wolfe;,;$20.39 ;,; red ;,; 09/15/17 ,
Ignacio Parks;,;$14.70 ;,; white&red ;,;09/15/17
, Beatrice Newman ;,;$22.45 ;,;white&purple&red
;,; 09/15/17, Andre Norris ;,; $28.46 ;,;
red;,; 09/15/17 , Albert Lewis ;,; $23.89;,;
black&red;,; 09/15/17, Javier Bailey ;,;
$24.49 ;,; black&red ;,; 09/15/17 , Everett Lyons ;,;$1.81;,; black&red ;,; 09/15/17 ,
Abraham Maxwell;,; $6.81 ;,;green;,; 09/15/17
, Traci Craig ;,;$0.65;,; green&yellow;,;
09/15/17 , Jeffrey Jenkins ;,;$26.45;,;
green&yellow&blue ;,; 09/15/17, Merle Wilson
;,; $7.69 ;,; purple;,; 09/15/17,Janis Franklin
;,;$8.74 ;,; purple&black ;,;09/15/17 ,
Leonard Guerrero ;,; $1.86 ;,;yellow
;,;09/15/17,Lana Sanchez;,;$14.75 ;,; yellow;,;
09/15/17 ,Donna Ball ;,; $28.10 ;,;
yellow&blue;,; 09/15/17 , Terrell Barber ;,;
$9.91 ;,; green ;,;09/15/17 ,Jody Flores;,;
$16.34 ;,; green ;,; 09/15/17, Daryl Herrera
;,;$27.57;,; white;,; 09/15/17 , Miguel Mcguire;,;$5.25;,; white&blue ;,; 09/15/17 ,
Rogelio Gonzalez;,; $9.51;,; white&black&blue
;,; 09/15/17 , Lora Hammond ;,;$20.56 ;,;
green;,; 09/15/17,Owen Ward;,; $21.64 ;,;
green&yellow;,;09/15/17,Malcolm Morales ;,;
$24.99 ;,; green&yellow&black;,; 09/15/17 ,
Eric Mcdaniel ;,;$29.70;,; green ;,; 09/15/17
,Madeline Estrada;,; $15.52;,;green;,; 09/15/17
, Leticia Manning;,;$15.70 ;,; green&purple;,;
09/15/17 , Mario Wallace ;,; $12.36 ;,;green ;,;
09/15/17,Lewis Glover;,; $13.66 ;,;
green&white;,;09/15/17, Gail Phelps ;,;$30.52
;,; green&white&blue ;,; 09/15/17 , Myrtle Morris
;,; $22.66 ;,; green&white&blue;,;09/15/17"""
先将数据中的空格、换行符、$符号等无意义的字符处理掉(名字中间的空格需要保留)
import re
data=re.sub(';,;',';',data)
data=re.sub('\n|\$','',data)
data=re.sub('\s(?![A-Z])','',data)
data=data.split(',')
结果如下:
image.png
现在需要将每条数据转换成list,之后放入dataframe中,方便进行后续处理
import pandas
data2=[]
from pandas import Series,DataFrame,concat
for data1 in data:
data1=data1.split(';')
data2.append(data1)
data3=DataFrame(data2,columns=['name','amount','allcolor','date'])
image.png
现在观察一下数据,如果将所有颜色当做列名,然后用0和1表示每一条数据中是否有这个颜色是不是更好呢?
color=data3['allcolor']
arr=[]
for item in color:
item2=item.split("&")
arr=arr+item2
arr=list(set(arr))
arr2=DataFrame(columns=arr)
i=0
for co in color:
arr3=list()
for co2 in arr:
if re.search(co2,co):
res='1'
else:
res='0'
arr3.append(res)
arr2.loc[i]=arr3
i=i+1
print(arr2)
image.png
最后,和原表join一下,得到最终结果
data3=data3.iloc[:,[0,1,3]].join(arr2)
image.png
对pandas不是很熟练,中间肯定有更简单的实现方法,后面学习更深入之后再过来改吧
另外,处理空格的正则表达式也有问题,有的空格删除不了,写成(?<![a-z])\s(?![A-Z])会导致有的颜色前面有空格,暂时没有找到问题产生的原因