Question:
A retail company “ABC Private Limited” wants to understand the customer purchase behaviour (specifically, purchase amount) against various products of different categories. They have shared purchase summary of various customers for selected high volume products from last month.
The data set also contains customer demographics (age, gender, marital status, city_type, stay_in_current_city), product details (product_id and product category) and Total purchase_amount from last month.
Now, they want to build a model to predict the purchase amount of customer against various products which will help them to create personalized offer for customers against different products.
Explanation:
Variable | Definition |
---|---|
User_ID | User ID |
Product_ID | Product ID |
Gender | Sex of User |
Age | Age in bins |
Occupation | Occupation (Masked) |
City_Category | Category of the City (A,B,C) |
Stay_In_Current_City_Years | Number of years stay in current city |
Marital_Status | Marital Status |
Product_Category_1 | Product Category (Masked) |
Product_Category_2 | Product may belongs to other category also (Masked) |
Product_Category_3 | Product may belongs to other category also (Masked) |
Purchase | Purchase Amount (Target Variable) |
Your model performance will be evaluated on the basis of your prediction of the purchase amount for the test data (test.csv), which contains similar data-points as train except for their purchase amount.
思路:先利用panda读取训练数据,然后把数据进行转化归一,接着利用sklearn的线性回归进行模型计算,接着导入测试数据并归一化,最后进行purchase预测并写入结果文件中。
Answer:
import pandas as pd
from sklearn.linear_model import LinearRegression
import sklearn
import sklearn.preprocessing
df = pd.read_csv("train.csv")
#数据转化
gender_number = {'F':'0','M':'1'}
age_number = {'0-17':'0','18-25':'1','26-35':'2','36-45':'3','46-50':'4','51-55':'5','55+':'6'}
city_category_number = {'A':'0','B':'1','C':'2'}
stay_in_current_city_years_number = {'4+':'1','0':'0','1':'1','2':'2','3':'3'}
df['Gender'] = df['Gender'].map(gender_number)
df['Age'] = df['Age'].map(age_number)
df['Stay_In_Current_City_Years'] = df['Stay_In_Current_City_Years'].map(stay_in_current_city_years_number)
df['City_Category'] = df['City_Category'].map(city_category_number)
x = df[['Gender','Age','City_Category','Occupation','Stay_In_Current_City_Years','Marital_Status','Product_Category_1']]
scaler = sklearn.preprocessing.MinMaxScaler() #归一化
x_scaler = scaler.fit_transform(x)
y = df['Purchase']
model = LinearRegression()
model.fit(x_scaler,y)
model.score(x_scaler,y)
print('Coefficient: \n',model.coef_)
print('Intercept: \n',model.intercept_)
df_test = pd.read_csv('test.csv')
df_test['Gender'] = df_test['Gender'].map(gender_number)
df_test['Age'] = df_test['Age'].map(age_number)
df_test['Stay_In_Current_City_Years'] = df_test['Stay_In_Current_City_Years'].map(stay_in_current_city_years_number)
df_test['City_Category'] = df_test['City_Category'].map(city_category_number)
x_test = df_test[['Gender','Age','City_Category','Occupation','Stay_In_Current_City_Years','Marital_Status','Product_Category_1']]
x_test_scaler = scaler.fit_transform(x_test)
y_predicted = model.predict(x_test_scaler)
df_result = pd.DataFrame({'User_ID':df_test['User_ID'],'Product_ID':df_test['Product_ID'],'Purchase':y_predicted})
print(df_result)
df_result.to_csv('result.csv')
链接:Black Friday – Like I already said – No amount of theory can beat practice. Here is a regression problem that you can try your hands on for a deeper understanding.