Slides for 12.20 Persentation
Page 1
- Hello, Every one, this is Setsu. In this video, I will mainly talk about the Architecture of my proposed method.
Page 2
- Here is the Outline of this video. It contains 3 parts, First is The OneMax Problem on Genetic Algorithm, it's a very simple use case of genetic algorithm. The second part is the Architecture of my proposed method, in this part, I will talk the parallel strategy and some program flow chart of my proposed method. The last part is Future work
Page 3
- At the begining, I will simplely introduce the OneMax Problem on Genetic Algorithm.
- OneMax Problem's final goal is find the Max individual, which is all of one, from some initial individuals which are made up by a series of 0 and 1.
- Let us see the whole processes. First, there is a initial population with a number of individuals and the fitness of each individual is the total 1 number of the individual. Afer some genetic operations such as crossover, mutation and selection, there will be new generation population, then do the next genetic operations until it find the max individual which is all of 1
- The simple example is usually used to test the efficient of genetic algorithm, so I try to run this genetic algorithm on spark to test the efficient and performance
Page 4
- I found a OmeMax code on the Internet and modified the code into Spark way
- Then I Run the modified demo on the spark cluster
- But I only run it on local mode successfully, the local mode means run the mode just on ome machine. When I run it on the cluster model it has some connection timeout bugs, so I still debugging and tuning the demo, and I will summary the tuning and debugging experiences later)
Page 5
- The next part is the Architecture of my proposed method. So First let's the pervious review
- I'm focusing on the WITF model, Which named Weight Irregular Tensor factorization.
- The model uses crossdomain data to do the recommendation. It consider crossdomain data as a Irregular tensor then do the tensor factorization. But the tensor must be regular tensor when do tensor factorization, so the Irregular tensor must transfer into regular tensor. and the most important point is to minimize the lost when do the transfer. Therefore it need to find a optimal weights configuration over domains wk to minimize the loss.
- My proposed method is to find the optimal weights configuration by genetic algorithm instead of the empirical strategy the model used currently
Page 6
- The parallel strategy of my proposed method is refered to a paper which published in 2017. This paper use genetic algorithm on spark to find optimal test case.
- The paper proposed a two-phase parallelization. It contains parallel fitness evaluation and parallel genetic operations during the whole processes
- When do parallel fitness evaluation, it computes each individual's fitness value parallel. When do parallel Genetic operations, it dose each crossover, mutation and selection parallel.
- With Using this two-phase parallel strategy on spark, it speed up significantly
Page 7
- Next is the genetic algorithm for my proprsed method. In details, Each individual is one possible configuration of weights over domains
- The genetic operations can be executed on Spark parallely and the fitness evaluation part is WITF model, it use large datasets, and it adpots parallel strategy inside.
- After it iterate a numbers of generations it could find the one better configuration of weights over domains
Page 8
- Next is the details of the WITF model
- the model use crossdomain data to computer user vectors, Domains vectors and Virtual item vectors.
- During the processes, some vectors can be compute parallely, for example the each user's vetor can be updated parallel, because it's conditional indepence with other users. the Domian vectors and the constrict vectors have the simliar situation as user vectors, they all can be update parallely
- After get those vector, The model will use the common measurement RMSE to computer the accuracy, then the accuracy will be the fitness value.
Page 9
- Combine with WITF and Genetic algorithm on spark using the two-phase parallelization will be a problem, which name Spark RDD Nested
- The best situation is consider each individual as a spark RDD element and then executed each individual's fitness evaluation parallely on spark, but the fitness function WITF model will also use Spark RDD inside
- So it has Spark RDD Nested, but Spark RDD do not support nested, I have to find an alternative which is Evaluate fitness sequentially ont parallely , the efficient depends on the speed of WITF on Spark and it needs to consider further
Here is my presentation video link, the video is short due to less research progress. Sorry.
Summary
- Modified a genetic algorithm by using Spark and do some test
- The architecture of proposed method