1.Background:what's social computing?
简单来说,social computing就是利用人群解决机器难以解决的问题(难以通过算法解决)
一般social computing会大面积的利用到数据库,有点像大数据方向。然后训练机器,又有点像machine learning方向,如何与数据库交互成为衡量代码质量好坏的标准之一。
2.Today's feed
Social Computing -- Prediction,building a Recommender System
2.1 training set:
The training set is the set which is used to train the machine and provide the data to predict the final rating(set.size:about 20,000,000)
2.2 Testing set:
The testing set is the set with username and the itemID ,what we need to do is to predict the user's rating to that item.(set.size:about 10,000,000)
3.Code realization
3.1 SQLite (before running the code)
command line(Terminal)
building the database(sqlite3 dbname)
create table Training(.create table tablename(key1,key2,key3)
import the CSV files into the Training table and Testing table(.import CSV filename tablename)
3.2 Java for SQLite
(The database and original tables has been built in 3.1 by default)
3.3 Algorithm
There are 2 ways to realize the prediction:Item-based and User-based.
1.Item-based: compare the similarity of item-item
2.User-based:compare the similarity of user-user
E.g.(不得不吐槽它居然不让我加表格你敢信)
what we want to predict:user1-item3
item-based:compare item3 with the other items and get the similarities
user-based:compare user1 with other users and get the similarities
(再次吐槽居然没办法加公式你敢信)
Similarity Function for item-based:
Prediction for item-based:
3.3 Code realization(in java)
Overview:
Using HashMap to store the data:
-for the Trainingset,<user,<item,rating>> data
-for the Testingset,<item,user> testdata
Algorithm realization:
The most difficult i meet when i try to realize the algorithm is the similarity matrix building.Because in the similarity,the U is the set who rated both the item1 and item2.What i thought before is to iterate the item first,which is difficult to filter the user who rated both items,
In this function,it can filter the users who rated for this item,and all their ratings about other items,but it's difficult the filter the user who both rated for 2 items and their ratings of these 2.
public void createMatrix(){
int i = 0;
createMatrixTable(matrix_table);
double sim = 0;
double sumU = 0;
float sumD1 = 1;
float sumD2= 1;
try{
SQLiteStatement simstatInsert = c.prepare("INSERT INTO "+ matrix_table +" VALUES(?,?,?) ");
System.out.println("Start building similarity matrix...");
c.exec("BEGIN");
for(Integer item :testdata.keySet()){
System.out.println(item);
int simitem = item;
if(!selectmap.containsKey(simitem)){
System.out.println("NO such item in data");
}
else{
for(EntryselectRatings : selectmap.get(item).entrySet()){ //item,user,rating
System.out.println(selectRatings);
Integer user = selectRatings.getKey();//user who rated for this item
Float rating = selectRatings.getValue();//for the item itself
int item2 =simitem+1;
while(!selectmap.containsKey(item2) || !selectmap.get(item2).containsKey(user)){
item2 = ++simitem;
}
float rating2 = selectmap.get(item2).get(user);
System.out.println("rating for next item "+item2+" is "+rating2);
double averRating = useraverage.get(user);
System.out.println("average rating for this user is "+averRating);
sumU = (rating - averRating) * (rating2 -averRating)+sumU;
sumD1 = (float)Math.pow(2, (rating - averRating))+sumD1;
sumD2 = (float)Math.pow(2, (rating2 - averRating)) +sumD2;
simstatInsert.bind(1, item);
simstatInsert.bind(2, item2);
}
sim =sumU/(Math.sqrt(sumD1)*Math.sqrt(sumD2));
simstatInsert.bind(3, sim);
System.out.println("the sim is :" +sim);
simstatInsert.stepThrough();
System.out.println("Procrssing user "+user +",and item "+item +","+ item2 +" done." + "and the sim is "+sim);
simstatInsert.reset();
}
System.out.println("Procrssing item "+item +"done.");
}
c.exec("COMMIT");
simstatInsert.dispose();
System.out.println("Finish building similarity matrix.");
}catch(SQLiteException e){
e.printStackTrace();
}
}
In the final version:
private double sim(int item1, int item2){
double s1 = 0;double root1 = 0;
double root2 = 0;
for(int user: data.keySet()){
HashMapudata = data.get(user);
if(udata.containsKey(item1) && udata.containsKey(item2)){
double r1 = data.get(user).get(item1);
double r2 = data.get(user).get(item2);
double u_avg = useraverage.get(user);
s1 += (r1 - u_avg) * (r2 - u_avg);
root1 += Math.pow((r1 - u_avg), 2);
root2 += Math.pow((r2 - u_avg), 2);
}
}
return s1 / (Math.sqrt(root1) * Math.sqrt(root2));
}
Iterator is user instead of item.