一、pandas读取csv文件
import pandas as pd
f = open("C:\\Users\\matrix\\lz_business_advisor_dashboard_key_metrics_source.csv", encoding = 'utf-8')
pd.read_csv(f)
参考
【1】详解pandas的read_csv方法:https://blog.csdn.net/weixin_37706204/article/details/120827141
二、pandas读取excel文件
import pandas as pd
data = pd.read_excel("C:\\Users\\matrix\\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, index_col = 0, nrows = 3)
data
其中,
sheet_name:第几个sheet页,是从0开始的
header:第几行是header
index_col :其实列
nrows:取多少行
三、pandas在指定列添加一列
import pandas as pd
df = pd.read_excel("C:\\Users\\matrix\\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df
四、pandas写入excel文件
import pandas as pd
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df.to_excel(r"C:\Users\matrix\test.xlsx")
五、pandas写入数据库
import pandas as pd
import pymysql
from sqlalchemy import create_engine
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# create conn
conn = create_engine('mysql+pymysql://root:123456@localhost:3306/test', encoding = 'utf8')
# write db
insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace')
print(insert_rows)
# df.to_excel(r"C:\Users\matrix\test.xlsx")
参考:pandas 写入mysql数据库.to_sql方法详解
六、pandas处理表头
import pandas as pd
import pymysql
from sqlalchemy import create_engine
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
df = pd.DataFrame(df, columns = ['date_of_data','Date'])
df.columns = (['date_of_data', 'date'])
df
七、pandas正则
replace方法不支持正则,需要用sub方法
demo: 将所有表头除了数字、字母和下划线以外的字符全部替换为下划线,并且全部字符小写
import pandas as pd
import re
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# df = pd.DataFrame(df, columns = ['date_of_data','Date'])
field_list = []
# regex1 = re.compile(r'[^A-Za-z0-9]')
# regex2 = re.compile(r'_+')
for field in df.columns:
# field = field.replace(' ', '')
# field = regex1.sub('_', field)
# transform except 'A-Za-z0-9' char to '_'
field = re.sub(r'[^A-Za-z0-9]', r'_', field)
# transform multiple '_' char to '_'
field = re.sub(r'_+', r'_', field)
# lower all char
field = field.lower()
# field = regex2.sub('#', field)
field_list.append(field)
df.columns = tuple(field_list)
df
八、pandas将所有类改为str类型
import pandas as pd
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df[:] = df[:].astype(str)
df.dtypes
九、pandas写入数据库列类型
insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace', index=False,
dtype={col_name: VARCHAR(500) for col_name in df}
)
import pandas as pd
import pymysql
import sqlalchemy
from sqlalchemy import create_engine
import re
from sqlalchemy.types import VARCHAR
df = pd.read_excel(r"C:\Users\matrix\Business Advisor - Dashboard - Key Metrics_2022-08-26_2022-08-26(1).xls", sheet_name = 0, header = 5, nrows = 3)
df.insert(loc = 0, column = "date_of_data", value = "2022-09-01")
# df = pd.DataFrame(df, columns = ['date_of_data','Date'])
field_list = []
# regex1 = re.compile(r'[^A-Za-z0-9]')
# regex2 = re.compile(r'_+')
for field in df.columns:
# field = field.replace(' ', '')
# field = regex1.sub('_', field)
# transform except 'A-Za-z0-9' char to '_'
field = re.sub(r'[^A-Za-z0-9]', r'_', field)
# transform multiple '_' char to '_'
field = re.sub(r'_+', r'_', field)
# lower all char
field = field.lower()
# field = regex2.sub('#', field)
field_list.append(field)
df.columns = tuple(field_list)
# transform all columns to str type, str map to db text
# df[:] = df[:].astype(str)
# df.dtypes
# df[field_list]
# create conn
conn_string = 'mysql+pymysql://root:123456@localhost:3306/test'
conn = create_engine(conn_string, encoding = 'utf8')
# write db
insert_rows = pd.io.sql.to_sql(df, "test_table", conn, if_exists = 'replace', index=False,
dtype={col_name: VARCHAR(500) for col_name in df}
)
print(insert_rows)
参考:
[1] Pandas to_sql将列类型从varchar更改为text
[2] pandas to_sql all columns as nvarchar (可行方案)