机器学习之线性回归

Question:
A retail company “ABC Private Limited” wants to understand the customer purchase behaviour (specifically, purchase amount) against various products of different categories. They have shared purchase summary of various customers for selected high volume products from last month.
The data set also contains customer demographics (age, gender, marital status, city_type, stay_in_current_city), product details (product_id and product category) and Total purchase_amount from last month.
Now, they want to build a model to predict the purchase amount of customer against various products which will help them to create personalized offer for customers against different products.

Explanation:

Variable Definition
User_ID User ID
Product_ID Product ID
Gender Sex of User
Age Age in bins
Occupation Occupation (Masked)
City_Category Category of the City (A,B,C)
Stay_In_Current_City_Years Number of years stay in current city
Marital_Status Marital Status
Product_Category_1 Product Category (Masked)
Product_Category_2 Product may belongs to other category also (Masked)
Product_Category_3 Product may belongs to other category also (Masked)
Purchase Purchase Amount (Target Variable)

Your model performance will be evaluated on the basis of your prediction of the purchase amount for the test data (test.csv), which contains similar data-points as train except for their purchase amount.

Test_file
Train_file

思路:先利用panda读取训练数据,然后把数据进行转化归一,接着利用sklearn的线性回归进行模型计算,接着导入测试数据并归一化,最后进行purchase预测并写入结果文件中。

Answer:

  import pandas as pd
  from sklearn.linear_model import LinearRegression
  import sklearn
  import sklearn.preprocessing

  df = pd.read_csv("train.csv")

  #数据转化
  gender_number = {'F':'0','M':'1'}
  age_number = {'0-17':'0','18-25':'1','26-35':'2','36-45':'3','46-50':'4','51-55':'5','55+':'6'}
  city_category_number = {'A':'0','B':'1','C':'2'}
  stay_in_current_city_years_number = {'4+':'1','0':'0','1':'1','2':'2','3':'3'}
  df['Gender'] = df['Gender'].map(gender_number)
  df['Age'] = df['Age'].map(age_number)
  df['Stay_In_Current_City_Years'] = df['Stay_In_Current_City_Years'].map(stay_in_current_city_years_number)
  df['City_Category'] = df['City_Category'].map(city_category_number)

  x = df[['Gender','Age','City_Category','Occupation','Stay_In_Current_City_Years','Marital_Status','Product_Category_1']]
  scaler = sklearn.preprocessing.MinMaxScaler() #归一化
  x_scaler = scaler.fit_transform(x)
  y = df['Purchase']


  model = LinearRegression()
  model.fit(x_scaler,y)
  model.score(x_scaler,y)

  print('Coefficient: \n',model.coef_)
  print('Intercept: \n',model.intercept_)

  df_test = pd.read_csv('test.csv')
  df_test['Gender'] = df_test['Gender'].map(gender_number)
  df_test['Age'] = df_test['Age'].map(age_number)
  df_test['Stay_In_Current_City_Years'] = df_test['Stay_In_Current_City_Years'].map(stay_in_current_city_years_number)
  df_test['City_Category'] = df_test['City_Category'].map(city_category_number)
  x_test = df_test[['Gender','Age','City_Category','Occupation','Stay_In_Current_City_Years','Marital_Status','Product_Category_1']]
  x_test_scaler = scaler.fit_transform(x_test)
  y_predicted = model.predict(x_test_scaler)

  df_result = pd.DataFrame({'User_ID':df_test['User_ID'],'Product_ID':df_test['Product_ID'],'Purchase':y_predicted})
  print(df_result)
  df_result.to_csv('result.csv')

链接:Black Friday – Like I already said – No amount of theory can beat practice. Here is a regression problem that you can try your hands on for a deeper understanding.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • PLEASE READ THE FOLLOWING APPLE DEVELOPER PROGRAM LICENSE...
    念念不忘的阅读 13,550评论 5 6
  • **2014真题Directions:Read the following text. Choose the be...
    又是夜半惊坐起阅读 9,891评论 0 23
  • 相信每一个人的微信名都代表了一种意义。我的当然也是。我叫鑫漪淼,一个复杂的名字。其寓意是每秒钟是新的开始,新一秒。...
    麦冬加石斛阅读 229评论 0 0
  • 很多时候,我们做事情都抱有一定期望,也就是说短期的一个目标,当实际的绩效远远小于绩效的时候,挫败感就会涌现。这可以...
    QuellaY阅读 381评论 0 0