Slides for 12.20 Presentation

Slides for 12.20 Persentation

Page 1

  • Hello, Every one, this is Setsu. In this video, I will mainly talk about the Architecture of my proposed method.

Page 2

  • Here is the Outline of this video. It contains 3 parts, First is The OneMax Problem on Genetic Algorithm, it's a very simple use case of genetic algorithm. The second part is the Architecture of my proposed method, in this part, I will talk the parallel strategy and some program flow chart of my proposed method. The last part is Future work

Page 3

  • At the begining, I will simplely introduce the OneMax Problem on Genetic Algorithm.
  • OneMax Problem's final goal is find the Max individual, which is all of one, from some initial individuals which are made up by a series of 0 and 1.
  • Let us see the whole processes. First, there is a initial population with a number of individuals and the fitness of each individual is the total 1 number of the individual. Afer some genetic operations such as crossover, mutation and selection, there will be new generation population, then do the next genetic operations until it find the max individual which is all of 1
  • The simple example is usually used to test the efficient of genetic algorithm, so I try to run this genetic algorithm on spark to test the efficient and performance

Page 4

  • I found a OmeMax code on the Internet and modified the code into Spark way
  • Then I Run the modified demo on the spark cluster
  • But I only run it on local mode successfully, the local mode means run the mode just on ome machine. When I run it on the cluster model it has some connection timeout bugs, so I still debugging and tuning the demo, and I will summary the tuning and debugging experiences later)

Page 5

  • The next part is the Architecture of my proposed method. So First let's the pervious review
  • I'm focusing on the WITF model, Which named Weight Irregular Tensor factorization.
  • The model uses crossdomain data to do the recommendation. It consider crossdomain data as a Irregular tensor then do the tensor factorization. But the tensor must be regular tensor when do tensor factorization, so the Irregular tensor must transfer into regular tensor. and the most important point is to minimize the lost when do the transfer. Therefore it need to find a optimal weights configuration over domains wk to minimize the loss.
  • My proposed method is to find the optimal weights configuration by genetic algorithm instead of the empirical strategy the model used currently

Page 6

  • The parallel strategy of my proposed method is refered to a paper which published in 2017. This paper use genetic algorithm on spark to find optimal test case.
  • The paper proposed a two-phase parallelization. It contains parallel fitness evaluation and parallel genetic operations during the whole processes
  • When do parallel fitness evaluation, it computes each individual's fitness value parallel. When do parallel Genetic operations, it dose each crossover, mutation and selection parallel.
  • With Using this two-phase parallel strategy on spark, it speed up significantly

Page 7

  • Next is the genetic algorithm for my proprsed method. In details, Each individual is one possible configuration of weights over domains
  • The genetic operations can be executed on Spark parallely and the fitness evaluation part is WITF model, it use large datasets, and it adpots parallel strategy inside.
  • After it iterate a numbers of generations it could find the one better configuration of weights over domains

Page 8

    • Next is the details of the WITF model
  • the model use crossdomain data to computer user vectors, Domains vectors and Virtual item vectors.
  • During the processes, some vectors can be compute parallely, for example the each user's vetor can be updated parallel, because it's conditional indepence with other users. the Domian vectors and the constrict vectors have the simliar situation as user vectors, they all can be update parallely
  • After get those vector, The model will use the common measurement RMSE to computer the accuracy, then the accuracy will be the fitness value.

Page 9

  • Combine with WITF and Genetic algorithm on spark using the two-phase parallelization will be a problem, which name Spark RDD Nested
  • The best situation is consider each individual as a spark RDD element and then executed each individual's fitness evaluation parallely on spark, but the fitness function WITF model will also use Spark RDD inside
  • So it has Spark RDD Nested, but Spark RDD do not support nested, I have to find an alternative which is Evaluate fitness sequentially ont parallely , the efficient depends on the speed of WITF on Spark and it needs to consider further

Here is my presentation video link, the video is short due to less research progress. Sorry.

Summary

  1. Modified a genetic algorithm by using Spark and do some test
  2. The architecture of proposed method
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 一睁眼,他的坚韧顽强便震撼了自己:啊啊啊!!!我是一块石头。青苔微涩,黄土存柔,这颗顽石卡在这里,把岁月磨成纹,把...
    知道3阅读 1,612评论 0 0
  • 打开门,一眼看到儿子跌跌撞撞的正往门口奔来,走到鞋柜边上,然后他吃力的抱着拖鞋往你手里送,你赶忙放下自己手里的钥匙...
    简橙橙阅读 1,248评论 0 0