Slides for 12.20 Presentation

Slides for 12.20 Persentation

Page 1

Hello, Every one, this is Setsu. In this video, I will mainly talk about the Architecture of my proposed method.

Here is the Outline of this video. It contains 3 parts, First is The OneMax Problem on Genetic Algorithm, it's a very simple use case of genetic algorithm. The second part is the Architecture of my proposed method, in this part, I will talk the parallel strategy and some program flow chart of my proposed method. The last part is Future work

At the begining, I will simplely introduce the OneMax Problem on Genetic Algorithm.
OneMax Problem's final goal is find the Max individual, which is all of one, from some initial individuals which are made up by a series of 0 and 1.
Let us see the whole processes. First, there is a initial population with a number of individuals and the fitness of each individual is the total 1 number of the individual. Afer some genetic operations such as crossover, mutation and selection, there will be new generation population, then do the next genetic operations until it find the max individual which is all of 1
The simple example is usually used to test the efficient of genetic algorithm, so I try to run this genetic algorithm on spark to test the efficient and performance

I found a OmeMax code on the Internet and modified the code into Spark way
Then I Run the modified demo on the spark cluster
But I only run it on local mode successfully, the local mode means run the mode just on ome machine. When I run it on the cluster model it has some connection timeout bugs, so I still debugging and tuning the demo, and I will summary the tuning and debugging experiences later)

The next part is the Architecture of my proposed method. So First let's the pervious review
I'm focusing on the WITF model, Which named Weight Irregular Tensor factorization.
The model uses crossdomain data to do the recommendation. It consider crossdomain data as a Irregular tensor then do the tensor factorization. But the tensor must be regular tensor when do tensor factorization, so the Irregular tensor must transfer into regular tensor. and the most important point is to minimize the lost when do the transfer. Therefore it need to find a optimal weights configuration over domains wk to minimize the loss.
My proposed method is to find the optimal weights configuration by genetic algorithm instead of the empirical strategy the model used currently

The parallel strategy of my proposed method is refered to a paper which published in 2017. This paper use genetic algorithm on spark to find optimal test case.
The paper proposed a two-phase parallelization. It contains parallel fitness evaluation and parallel genetic operations during the whole processes
When do parallel fitness evaluation, it computes each individual's fitness value parallel. When do parallel Genetic operations, it dose each crossover, mutation and selection parallel.
With Using this two-phase parallel strategy on spark, it speed up significantly

Next is the genetic algorithm for my proprsed method. In details, Each individual is one possible configuration of weights over domains
The genetic operations can be executed on Spark parallely and the fitness evaluation part is WITF model, it use large datasets, and it adpots parallel strategy inside.
After it iterate a numbers of generations it could find the one better configuration of weights over domains

- Next is the details of the WITF model
the model use crossdomain data to computer user vectors, Domains vectors and Virtual item vectors.
During the processes, some vectors can be compute parallely, for example the each user's vetor can be updated parallel, because it's conditional indepence with other users. the Domian vectors and the constrict vectors have the simliar situation as user vectors, they all can be update parallely
After get those vector, The model will use the common measurement RMSE to computer the accuracy, then the accuracy will be the fitness value.

Combine with WITF and Genetic algorithm on spark using the two-phase parallelization will be a problem, which name Spark RDD Nested
The best situation is consider each individual as a spark RDD element and then executed each individual's fitness evaluation parallely on spark, but the fitness function WITF model will also use Spark RDD inside
So it has Spark RDD Nested, but Spark RDD do not support nested, I have to find an alternative which is Evaluate fitness sequentially ont parallely , the efficient depends on the speed of WITF on Spark and it needs to consider further

Summary

最后编辑于：2017.12.19 03:46:20