代码工具 流数据分析 华为

诺亚方舟实验室的资深研究员Albert Bifet等最近在GitHub上发布了Spark Streaming上的流数据分析工具包,这个叫做StreamDM的开源软件现在包括5个算法,今后还会有更多算法被加入。网页链接 请大家关注。Albert也是Storm上的流数据分析工具包Samoa的主要开发者。

Big Data Stream Learning
Big Data stream learning is more challenging than batch or offline learning, since the data may not keep the same distribution over the lifetime of the stream. Moreover, each example coming in a stream can only be processed once, or they need to be summarized with a small memory footprint, and the learning algorithms must be very efficient.
Spark Streaming
Spark Streaming is an extension of the core Spark API that enables stream processing from a variety of sources. Spark is a extensible and programmable framework for massive distributed processing of datasets, called Resilient Distributed Datasets (RDD). Spark Streaming receives input data streams and divides the data into batches, which are then processed by the Spark engine to generate the results.
Spark Streaming data is organized into a sequence of DStreams, represented internally as a sequence of RDDs.
Included Methods
In this first pre-release of StreamDM, we have implemented:
SGD Learner and Perceptron
Naive Bayes
CluStream
Hoeffding Decision Trees
Stream KM++

In the next releases we plan to add:
Random Forests
Frequent Itemset Miner: IncMine

Going Further
For a quick introduction to running StreamDM, refer to the Getting Started document. The StreamDMProgramming Guide presents a detailed view of StreamDM. The full API documentation can be consulted here.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • char 称为字符类型,只能用单引号' '来包围,不能用双引号" "包围。而字符串只能用双引号" "包围,不能用单...
    gload_kk阅读 334评论 0 1
  • 五一那几天,我窝在家里看玩手机,跟老妈有一搭没一搭地聊着。老妈说,“那个谁谁的家里出事了,你知道吗?” 我一脸诧异...
    你叫呆小瓜阅读 372评论 2 1
  • 乔布斯曾说3.5是最佳单手尺寸,那最佳双手尺寸的手机呢。 且看目前市面上的智能机,从3.5到7各种尺寸尽有。 对此...
    六水君阅读 541评论 1 1
  • 这是我的第一篇简书,还有许多错误希望各位大神提出建议,我会及时改正。 前言:最近我们开发组正在对Scratch进行...
    InputEureka阅读 1,282评论 0 0