首先,你需要下载R,下载python,之后还需要加载java。然后你可以在R中使用
install.packages(h2o)
进行安装h2o,之后就是library(h2o)
,然后初始化h2o平台h2o.init()
你也可以在python中安装h2o:
pip install - U h2o
import h2o
h2o.init()
做一个简短的开始
h2o.init()
irish2o <- as.h2o(iris %>% filter(Species !='setosa'))
y <- 'Species'
x <- setdiff(names(irish2o),y)
parts <- h2o.splitFrame(irish2o,0.8)
train <- parts[[1]]
test <- parts[[2]]
----------------------------------------------------------------------
Your next step is to start H2O:
> h2o.init()
For H2O package documentation, ask for help:
> ??h2o
After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai
----------------------------------------------------------------------
载入程辑包:‘h2o’
The following objects are masked from ‘package:stats’:
cor, sd, var
The following objects are masked from ‘package:base’:
&&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
colnames<-, ifelse, is.character, is.factor, is.numeric,
log, log10, log1p, log2, round, signif, trunc
> h2o.init()
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
/var/folders/jz/qf7zhsc97f71slzzf59mvs2w0000gn/T//RtmpujsoRp/h2o_milin_started_from_r.out
/var/folders/jz/qf7zhsc97f71slzzf59mvs2w0000gn/T//RtmpujsoRp/h2o_milin_started_from_r.err
java version "10.0.1" 2018-04-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.1+10)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)
Starting H2O JVM and connecting: ... Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 3 seconds 560 milliseconds
H2O cluster timezone: Asia/Shanghai
H2O data parsing timezone: UTC
H2O cluster version: 3.20.0.8
H2O cluster version age: 1 month and 20 days
H2O cluster name: H2O_started_from_R_milin_jhc047
H2O cluster total nodes: 1
H2O cluster total memory: 2.00 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 4
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
R Version: R version 3.4.3 (2017-11-30)
m <- h2o.randomForest(x = x,y = y,training_frame = train)
|=============================================================| 100%
> m
Model Details:
==============
H2OBinomialModel: drf
Model ID: DRF_model_R_1541858573921_1
Model Summary:
number_of_trees number_of_internal_trees model_size_in_bytes
1 50 50 6827
min_depth max_depth mean_depth min_leaves max_leaves mean_leaves
1 2 5 3.34000 3 10 5.88000
H2OBinomialMetrics: drf
** Reported on training data. **
** Metrics reported on Out-Of-Bag training samples **
MSE: 0.05615946
RMSE: 0.2369799
LogLoss: 0.2136178
Mean Per-Class Error: 0.05441176
AUC: 0.9779412
Gini: 0.9558824
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
versicolor virginica Error Rate
versicolor 38 2 0.050000 =2/40
virginica 2 32 0.058824 =2/34
Totals 40 34 0.054054 =4/74
Maximum Metrics: Maximum metrics at their respective thresholds
metric threshold value idx
1 max f1 0.476190 0.941176 30
2 max f2 0.260952 0.953757 33
3 max f0point5 0.937500 0.966667 25
4 max accuracy 0.476190 0.945946 30
5 max precision 1.000000 1.000000 0
6 max recall 0.004662 1.000000 49
7 max specificity 1.000000 1.000000 0
8 max absolute_mcc 0.476190 0.891176 30
9 max min_per_class_accuracy 0.476190 0.941176 30
10 max mean_per_class_accuracy 0.476190 0.945588 30
Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
> p <- h2o.predict(m,test)
|=============================================================| 100%
> p
predict versicolor virginica
1 versicolor 0.9679487 0.032051282
2 versicolor 0.8779487 0.122051282
3 versicolor 0.9979487 0.002051282
4 versicolor 0.9679487 0.032051282
5 versicolor 0.9979487 0.002051282
6 versicolor 0.9979487 0.002051282
[26 rows x 3 columns]
>
performance Versus Predictions
h2o.performance(m,test)
H2OMultinomialMetrics: drf
Test Set Metrics:
=====================
MSE: (Extract with `h2o.mse`) 0.08837984
RMSE: (Extract with `h2o.rmse`) 0.2972875
Logloss: (Extract with `h2o.logloss`) 0.2452472
Mean Per-Class Error: 0.1623932
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>, <data>)`)
=========================================================================
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
setosa versicolor virginica Error Rate
setosa 6 0 0 0.0000 = 0 / 6
versicolor 0 11 2 0.1538 = 2 / 13
virginica 0 3 6 0.3333 = 3 / 9
Totals 6 14 8 0.1786 = 5 / 28
Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>, <data>)`
=======================================================================
Top-3 Hit Ratios:
k hit_ratio
1 1 0.821429
2 2 1.000000
3 3 1.000000
>
h2o flow
h2o flow 是h2o 的一个网页的接口,你可以直接上传或者下载数据,你可以查看你所建立的所有模型,你可以直接的创建模型,也可以直接的进行预测。
有几种方式打开h2o flow ,首先,第一种是在你的R或者python中初始化h2o,然后在你的网页打开:http://127.0.0.1:54321
另外一种是你要在服务器部署h2o,然后打开
1.Download H2O. This is a zip file that contains everything you need to get started.
2.
cd ~/Downloads
unzip h2o-3.22.0.1.zip
cd h2o-3.22.0.1
java -jar h2o.jar
3. Point your browser to [http://你的主机地址:54321]
如何使用h2o flow 参见我以前的文章:
https://www.jianshu.com/p/74d12c682af7