前面几篇文章在带Nvidia T4的GN7型虚拟主机 Ubuntu18上安装了Python 3.9的开发环境,R-4.2.1的开发环境,因为大多数R软件包都是C/C++编写,升级了gcc/g++到11版(一些R软件包要求C++14或17),编译了一些小的C程序来测试OpenGL开发环境的安装,并通过reticulate包在R中调用Python。Java的情况稍有不同,互联网世界的常用生产环境服务器端软件主要是J2EE架构的Java编写,比如Weblogic、Tomcat、阿里云、华为云等,所以在我的数据分析学习研究中,Java主要是作为J2EE服务器的运行环境,而不是自己编码的开发环境来配置,在本实验中,要跑应用层的Tomcat去集成中间层的Shiny Server,以及跑数据层的Neo4j来提供图数据库服务。然而R语言开发也可以通过rJava包直接调用Java程序,就像通过reticulate包调用Python程序一样。Java程序的开发,我一般在PC端用重量级的桌面IDE Eclipse或IntelliJ IDEA来完成,因为那是相当重量级的工作,像Rstudio Server、Jupyter Lab这类浏览器界面的轻量级Java IDE还没有了解过。
20年前开始学Java的时候,用的是WebSphere/WSAD,然后是Weblogic/Eclipse,然后是Tomcat/Eclipse,前几年上云后又试用了IntelliJ IDEA。Java及J2EE规范这些年已经有了很多变化,JDK从1.2发展到了19,新东西了解的不多,不过积累的技能解决一些小问题还是够用的,比如为Neo4j开发了有向图的朱刘算法最小树形图算法插件。用Tomcat是因为重点不在J2EE应用层,演示性质只需要作最小最简单的封装集成,简便为主,只需要轻量级的web容器即可。
在多层应用体系的集成中,J2EE容器中的Java程序,也可以直接调用R语言程序,本篇将用一个微博词云的例子来演示一下。
1、安装OpenJDK-11,Noe4j 4.X要求JDK 11。
root@VM-0-14-ubuntu:~# apt-get install openjdk-11-jdk
正在读取软件包列表... 完成
正在分析软件包的依赖关系树
正在读取状态信息... 完成
将会同时安装下列软件:
at-spi2-core ca-certificates-java fonts-dejavu-extra java-common libasound2 libasound2-data libatk-bridge2.0-0 libatk-wrapper-java libatk-wrapper-java-jni libatk1.0-0 libatk1.0-data
libatspi2.0-0 libpcsclite1 openjdk-11-jdk-headless openjdk-11-jre openjdk-11-jre-headless
建议安装:
default-jre libasound2-plugins alsa-utils pcscd openjdk-11-demo openjdk-11-source visualvm libnss-mdns fonts-ipafont-gothic fonts-ipafont-mincho fonts-indic
下列【新】软件包将被安装:
at-spi2-core ca-certificates-java fonts-dejavu-extra java-common libasound2 libasound2-data libatk-bridge2.0-0 libatk-wrapper-java libatk-wrapper-java-jni libatk1.0-0 libatk1.0-data
libatspi2.0-0 libpcsclite1 openjdk-11-jdk openjdk-11-jdk-headless openjdk-11-jre openjdk-11-jre-headless
升级了 0 个软件包,新安装了 17 个软件包,要卸载 0 个软件包,有 0 个软件包未被升级。
需要下载 261 MB 的归档。
解压缩后会消耗 413 MB 的额外空间。
您希望继续执行吗? [Y/n] y
root@VM-0-14-ubuntu:~# java --version
openjdk 11.0.17 2022-10-18
OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu218.04)
OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu218.04, mixed mode, sharing)
root@VM-0-14-ubuntu:~# which java
/usr/bin/java
root@VM-0-14-ubuntu:~# export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))
root@VM-0-14-ubuntu:~# echo $JAVA_HOME
/usr/lib/jvm/java-11-openjdk-amd64
2、设置环境变量。
root@VM-0-14-ubuntu:~# vi /etc/profile
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export JRE_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin
3、安装Tomcat,启动一下验证安装。
A、下载解压安装。
root@VM-0-14-ubuntu:~# wget https://dlcdn.apache.org/tomcat/tomcat-10/v10.0.27/bin/apache-tomcat-10.0.27.tar.gz
root@VM-0-14-ubuntu:~# tar -xzvf apache-tomcat-10.0.27.tar.gz
root@VM-0-14-ubuntu:~# mv /home/ubuntu/apache-tomcat-10.0.27 /usr/local/apache-tomcat-10.0.27
root@VM-0-14-ubuntu:~# cd /usr/local/apache-tomcat-10.0.27/bin
root@VM-0-14-ubuntu:/usr/local/apache-tomcat-10.0.27/bin# export CATALINA_HOME=/usr/local/apache-tomcat-10.0.27
root@VM-0-14-ubuntu:/usr/local/apache-tomcat-10.0.27/bin# ./startup.sh &
B、输出环境变量:
root@VM-0-14-ubuntu:~# vi /etc/profile
# Added for Tomcat
export CATALINA_HOME=/usr/local/apache-tomcat-10.0.27
export CATALINA_BASE=/usr/local/apache-tomcat-10.0.27
C、配置管理账户:
root@VM-0-14-ubuntu:~# cd /usr/local/apache-tomcat-10.0.27/conf
root@VM-0-14-ubuntu:/usr/local/apache-tomcat-10.0.27/conf# vi tomcat-users.xml
<role rolename="admin-gui"/>
<role rolename="manager-gui"/>
<user username="tomcat" password="tomcat" roles="admin-gui"/>
<user username="admin" password="1234" roles="manager-gui"/>
其中用户名为tomcat,密码为tomcat的是用来登录tomcat的Host Manager的,而用户名为admin,密码为1234是用来登录tomcat的App Manager的。
D、启停管理:
root@VM-0-14-ubuntu:/usr/local/apache-tomcat-10.0.27/bin# ./startup.sh
root@VM-0-14-ubuntu:/usr/local/apache-tomcat-10.0.27/bin# ./shutdown.sh
E、允许从远程登录Tomcat App Manager与Host Manager:
在这两个Tomcat Web App的 ./META-INF/context.xml中找到下面限制访问IP的一段注释掉。
root@VM-0-14-ubuntu:/usr/local/apache-tomcat-10.0.27/webapps/manager/META-INF# vi context.xml
root@VM-0-14-ubuntu:/usr/local/apache-tomcat-10.0.27/webapps/host-manager/META-INF# vi context.xml
<!--
<Valve className="org.apache.catalina.valves.RemoteAddrValve"
allow="127\.\d+\.\d+\.\d+|::1|0:0:0:0:0:0:0:1" />
-->
F、Tomcat10注意事项。
从Tomcat10开始,Java EE变成Jakarta EE,参阅Tomcat 10 主页:
1)所有javax开头的包都重命名为jakarta开头,后文中所有引用javax.servlet开头的类都要转换为jakarta.servlet开头。
2)JSP的java 规范更新为Dynamic Web Module 5.0,需要Eclipse 2021-03以后版本的支持,参阅资料1,参阅资料2。已有Java Web项目更改project facet升级Dynamic Web Module规范可以参阅该帖子修改org.eclipse.wst.common.project.facet.core.xml。
<?xml version="1.0" encoding="UTF-8"?>
<faceted-project>
<runtime name="Apache Tomcat v10.0"/>
<fixed facet="java"/>
<fixed facet="jst.web"/>
<fixed facet="wst.jsdt.web"/>
<installed facet="jst.web" version="5.0"/>
<installed facet="wst.jsdt.web" version="1.0"/>
<installed facet="java" version="11"/>
</faceted-project>
如果不想修改源码,要用Tomcat 9以下的版本。
4、为Tomcat配置SSL加密连接。
A、生成JKS格式的keystore。使用的是与前面Nginx等相同的自签数字证书与密钥,先打包为P12格式,再由P12格式转换为JKS格式。数字证书也是把自签CA的数字证书带上,形成完整的证书链。
# cd /root/cert
# openssl pkcs12 -export -inkey server.key -in server.crt -chain -CAfile \
./demoCA/cacert.pem -out server.p12 -name 106.52.33.185 -passout pass:123456
# keytool -importkeystore -v -srckeystore server.p12 -srcstoretype \
pkcs12 -srcstorepass 123456 -destkeystore server.jks \
-deststoretype jks -deststorepass 123456
B、为Tomcat配置一个HTTPS加密连接器。
root@VM-0-14-ubuntu:/usr/local/apache-tomcat-10.0.27/conf# vi server.xml
注释掉原来8080端口的http Connector,增加8443端口的 https Connector,使用刚才生成的JKS密钥库。参阅资料。
<!--
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" />
-->
<Connector
protocol="org.apache.coyote.http11.Http11NioProtocol"
port="8443"
maxThreads="150"
SSLEnabled="true">
<SSLHostConfig>
<Certificate
certificateKeystoreFile="/root/cert/server.jks"
certificateKeystorePassword="123456"
type="RSA"
/>
</SSLHostConfig>
</Connector>
C、重启Tomcat,访问https://106.52.33.185:8443/ 。
root@VM-0-14-ubuntu:/usr/local/apache-tomcat-10.0.27/bin# ./shutdown.sh
root@VM-0-14-ubuntu:/usr/local/apache-tomcat-10.0.27/bin# ./startup.sh
5、安装Neo4j-4.4.6-CHS。
Neo4j已经发布了5.1版,这里还是用4.4.6,因为所做的例子都是在4.2.6上测试过,涉及到Graph Data Sicence Library库版本更新等,有向图最小树形图算法只在GDSL 2.0上测试过。GDSL升级的话,要评估一下最小树形图算法插件是否要修改程序,这个工作就不小了。
A、下载解压。
# wget --no-check-certificate https://we-yun.com/doc/neo4j-chs/4.4.6/neo4j-chs-community-4.4.6-unix.tar.gz
# tar -xf neo4j-chs-community-4.4.6-unix.tar.gz
# mv neo4j-chs-community-4.4.6-unix /opt/neo4j-chs-community-4.4.6-unix
B、启动一次,创建服务器的目录结构。
# cd /opt/neo4j-chs-community-4.4.6-unix/bin
# ./neo4j console
该命令会创建服务器的目录结构。
root@VM-0-14-ubuntu:/opt/neo4j-chs-community-4.4.6-unix/bin# ./neo4j console
Directories in use:
home: /opt/neo4j-chs-community-4.4.6-unix
config: /opt/neo4j-chs-community-4.4.6-unix/conf
logs: /opt/neo4j-chs-community-4.4.6-unix/logs
plugins: /opt/neo4j-chs-community-4.4.6-unix/plugins
import: /opt/neo4j-chs-community-4.4.6-unix/import
data: /opt/neo4j-chs-community-4.4.6-unix/data
certificates: /opt/neo4j-chs-community-4.4.6-unix/certificates
licenses: /opt/neo4j-chs-community-4.4.6-unix/licenses
run: /opt/neo4j-chs-community-4.4.6-unix/run
Starting Neo4j.
2022-11-15 03:20:27.633+0000 INFO Starting...
2022-11-15 03:20:28.048+0000 INFO This instance is ServerId{7a81ce5a} (7a81ce5a-e54a-47e2-9eec-628a84b3f33c)
2022-11-15 03:20:29.168+0000 INFO ======== Neo4j 4.4.6 ========
2022-11-15 03:20:35.443+0000 INFO Initializing system graph model for component 'security-users' with version -1 and status UNINITIALIZED
2022-11-15 03:20:35.449+0000 INFO Setting up initial user from defaults: neo4j
2022-11-15 03:20:35.449+0000 INFO Creating new user 'neo4j' (passwordChangeRequired=true, suspended=false)
2022-11-15 03:20:35.464+0000 INFO Setting version for 'security-users' to 3
2022-11-15 03:20:35.466+0000 INFO After initialization of system graph model component 'security-users' have version 3 and status CURRENT
2022-11-15 03:20:35.469+0000 INFO Performing postInitialization step for component 'security-users' with version 3 and status CURRENT
2022-11-15 03:20:36.679+0000 INFO Called db.clearQueryCaches(): Query cache already empty.
2022-11-15 03:20:36.776+0000 INFO Bolt enabled on localhost:7687.
2022-11-15 03:20:37.382+0000 INFO Remote interface available at http://localhost:7474/
2022-11-15 03:20:37.385+0000 INFO id: 37DBFB9339E01F74BE1F4295B1D871ECB678F865B7597DD67F58A120A61DABB7
2022-11-15 03:20:37.385+0000 INFO name: system
2022-11-15 03:20:37.385+0000 INFO creationDate: 2022-11-15T03:20:30.077Z
2022-11-15 03:20:37.385+0000 INFO Started.
^C2022-11-15 03:20:45.691+0000 INFO Neo4j Server shutdown initiated by request
2022-11-15 03:20:45.692+0000 INFO Stopping...
2022-11-15 03:20:50.944+0000 INFO Stopped.
C、拷贝空白数据库neo4j备份,社区版不支持多个并发数据库,以后新建数据库时拷贝该备份并重新命名,然后切换至新的数据库即可。
# cd /opt/neo4j-chs-community-4.4.6-unix/data/databases
# cp -R neo4j blank
D、修改配置,打开网络访问地址等,默认只侦听loopback地址 127.0.0.1。这里把从CSV导入数据的许可目录限制在/home/ubuntu/data,Neo4j HTTPS仍然使用前面Shiny Server等服务器使用的同一个自签服务器数字证书,不过bolt+s协议需要使用把服务器证书与自建CA证书拼接到一起包含完整链条的证书,否则Chrome浏览器不能连接bolt+s协议,具体可参阅:参考资料1,参考资料2,参考资料3。
# cd /opt/neo4j-chs-community-4.2.6-unix/conf
# vi neo4j.conf
# 打开网络访问地址
dbms.default_listen_address=0.0.0.0
# 启动时打开的数据库,社区版只能在线打开一个数据库,通过改变下面的名字切换。
# Change to the database you want
dbms.default_database=neo4j
# 切换时重建事务日志,否则不能启动,因为数据库变了,不匹配。
# Create a new transaction log when change to a new database
dbms.recovery.fail_on_missing_files=false
# 配置允许CSV数据文件导入,Linux上必须配置,
# 限制从其它目录导入数据,以堵塞安全漏洞。
# This setting constrains all `LOAD CSV` import files to be under the `import` directory.
dbms.directories.import=/home/ubuntu/data
# Determines if Cypher will allow using file URLs when loading data using
# `LOAD CSV`. Setting this value to `false` will cause Neo4j to fail `LOAD CSV`
# clauses that load data from the file system.
dbms.security.allow_csv_import_from_file_urls=true
#取消对APOC及GDS过程的安全限制。
# A comma separated list of procedures and user defined functions that are allowed
# full access to the database through unsupported/insecure internal APIs.
#dbms.security.procedures.unrestricted=my.extensions.example,my.procedures.*
dbms.security.procedures.unrestricted=jwt.security.*,gds.*,apoc.*
#打开用户验证,注释掉后默认是打开的。
# Whether requests to Neo4j are authenticated.
# To disable authentication, uncomment this line
#dbms.security.auth_enabled=false
# 网络协议设置,默认在7474打开http,7687打开bolt,
# 这里关闭http,在默认的7473打开https,bolt打开SSL。
# Bolt connector
dbms.connector.bolt.enabled=true
dbms.connector.bolt.tls_level=REQUIRED
#dbms.connector.bolt.listen_address=:7687
#dbms.connector.bolt.advertised_address=:7687
# HTTP Connector. There can be zero or one HTTP connectors.
dbms.connector.http.enabled=false
# HTTPS Connector. There can be zero or one HTTPS connectors.
dbms.connector.https.enabled=true
#dbms.connector.https.listen_address=:7473
#dbms.connector.https.advertised_address=:7473
# Bolt SSL configuration
dbms.ssl.policy.bolt.enabled=true
dbms.ssl.policy.bolt.base_directory=/root/cert
dbms.ssl.policy.bolt.private_key=server.key
dbms.ssl.policy.bolt.public_certificate=server.crt
dbms.ssl.policy.bolt.client_auth=NONE
dbms.ssl.policy.https.enabled=true
dbms.ssl.policy.https.base_directory=/root/cert
dbms.ssl.policy.https.private_key=server.key
dbms.ssl.policy.https.public_certificate=server.crt
dbms.ssl.policy.https.client_auth=NONE
E、增加环境变量。
# vi /etc/profile
# Added for Neo4j
export NEO4J_HOME=/opt/neo4j-chs-community-4.4.6-unix
export NEO4J_CONF=$NEO4J_HOME/conf
F、重启Neo4j,访问https://106.52.33.185:7473测试。默认用户名/口令是 neo4j/neo4j,第一次登录要改密码。
G、安装APOC、GDSL等plug-in,拷贝到plugins目录,注意要和Neo4j的版本匹配,具体参阅APOC主页与GDSL主页。下列的*-SNAPSHOT.jar是我开发的算法插件。
root@VM-0-14-ubuntu:/opt/neo4j-chs-community-4.4.6-unix# cd plugins
root@VM-0-14-ubuntu:/opt/neo4j-chs-community-4.4.6-unix/plugins# ls
apoc-4.4.0.1-all.jar neo4j-graph-data-science-2.0.2.jar open-gds-extend-0.0.1-SNAPSHOT.jar rs-4.0.0.jar
neo4j-functions-1.0.0-SNAPSHOT.jar ojdbc8-19.8.0.0.jar README.txt rs-license-4.0.0.Trial.00000000-0000-0000-0000-000000000000.20211231.txt
重启Neo4j,测试GDSL及APOC的安装。
测试最小树形图算法,在Neo4j Browser上逐条语句执行。
// 清空图,删除所有结点与关系
match(n) detach delete n;
// 创建测试图
CREATE(a:Node {name: 'a'}),
(b:Node {name: 'b'}),
(c:Node {name: 'c'}),
(d:Node {name: 'd'}),
(e:Node {name: 'e'}),
(b)<-[:TYPE {cost:17}]-(a), (c)<-[:TYPE {cost:16}]-(a), (d)<-[:TYPE {cost:19}]-(a), (e)<-[:TYPE {cost:16}]-(a),
(c)<-[:TYPE {cost:3}]-(b), (d)<-[:TYPE {cost:3}]-(b), (e)<-[:TYPE {cost:11}]-(b),
(b)<-[:TYPE {cost:3}]-(c), (d)<-[:TYPE {cost:4}]-(c), (e)<-[:TYPE {cost:8}]-(c),
(b)<-[:TYPE {cost:3}]-(d), (c)<-[:TYPE {cost:4}]-(d), (e)<-[:TYPE {cost:12}]-(d),
(b)<-[:TYPE {cost:11}]-(e), (c)<-[:TYPE {cost:8}]-(e), (d)<-[:TYPE {cost:12}]-(e);
// 建立图的投影graph
CALL gds.graph.project(
'graph',
'Node',
{
LINK: {
type: 'TYPE',
properties: 'cost',
orientation: 'NATURAL'
}
}
) ;
// 调用最小树形图算法
MATCH (n:Node {name: 'a'})
CALL gds.alpha.spanningArborescenceReverse.minimum.write('graph', {
startNodeId: id(n),
relationshipWeightProperty: 'cost',
writeProperty: 'MINSA',
weightWriteProperty: 'cost'
});
YIELD preProcessingMillis, computeMillis, writeMillis, effectiveNodeCount
RETURN preProcessingMillis, computeMillis, writeMillis, effectiveNodeCount;
//输出最小树形图
MATCH path = (n:Node {name: 'a'})-[:MINSA*]-()
WITH relationships(path) AS rels
UNWIND rels AS rel
WITH DISTINCT rel AS rel
RETURN startNode(rel) as source, endNode(rel) AS destination, rel;
// 删除图的投影graph(如果有)
CALL gds.graph.drop( 'graph');
6、配置开机启动Tomcat与Neo4j
这个镜像已经激活了rc.local服务,把启动命令加入/etc/rc.d/rc.local即可,重新输出那些环境变量是因为开机自启动进程没有登录的动作,不会执行/etc/profile等设置。
# vi /etc/rc.d/rc.local
# Added by Jean for java, 2022/11/12
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin
export JRE_HOME=$JAVA_HOME
export CLASS_PATH=$JAVA_HOME/lib:$CLASS_PATH
# Added for Tomcat
export CATALINA_HOME=/usr/local/apache-tomcat-10.0.27
export CATALINA_BASE=/usr/local/apache-tomcat-10.0.27
# Added for Neo4j
export NEO4J_HOME=/opt/neo4j-chs-community-4.4.6-unix
export NEO4J_CONF=$NEO4J_HOME/conf
# Added for rJava
export LD_LIBRARY_PATH=$JAVA_HOME/lib/server:$LD_LIBRARY_PATH
# Startup Tomcat
cd /usr/local/apache-tomcat-10.0.27/bin
./startup.sh
# Startup Neo4j
cd /opt/neo4j-chs-community-4.4.6-unix/bin
./neo4j start
测试一下:
# reboot now
7、配置rJava,像reticulate包一样,通过rJava包,可以在R中直接调用java语言的程序。
A、更新R的Java配置。
root@VM-0-14-ubuntu:~# R CMD javareconf
Java interpreter : /usr/lib/jvm/java-11-openjdk-amd64/bin/java
Java version : 11.0.17
Java home path : /usr/lib/jvm/java-11-openjdk-amd64
Java compiler : /usr/lib/jvm/java-11-openjdk-amd64/bin/javac
Java headers gen.:
Java archive tool: /usr/lib/jvm/java-11-openjdk-amd64/bin/jar
trying to compile and link a JNI program
detected JNI cpp flags : -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/linux
detected JNI linker flags : -L$(JAVA_HOME)/lib/server -ljvm
gcc -I"/usr/lib64/R-4.2.1/lib/R/include" -DNDEBUG -I/usr/lib/jvm/java-11-openjdk-amd64/include -I/usr/lib/jvm/java-11-openjdk-amd64/include/linux -I/usr/local/include -fpic -g -O2 -c conftest.c -o conftest.o
gcc -shared -L/usr/lib64/R-4.2.1/lib/R/lib -L/usr/local/lib -o conftest.so conftest.o -L/usr/lib/jvm/java-11-openjdk-amd64/lib/server -ljvm -L/usr/lib64/R-4.2.1/lib/R/lib -lR
JAVA_HOME : /usr/lib/jvm/java-11-openjdk-amd64
Java library path: $(JAVA_HOME)/lib/server
JNI cpp flags : -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/linux
JNI linker flags : -L$(JAVA_HOME)/lib/server -ljvm
Updating Java configuration in /usr/lib64/R-4.2.1/lib/R
Done.
参考该帖子。上面的命令会更新/usr/lib64/R-4.2.1/lib/R/etc/ldpaths,更新后要重启机器才会生效。
B、安装rJava包。
> install.packages("rJava")
library(rJava)
.jinit()
s <- .jnew("java/lang/String", "Hello World!")
print(s)
.jcall(s,"I","length")
s$length()
J("java.lang.Double", "parseDouble", "10.2")
8、在Java中通过Rserve调用R。
Rserve通过标准的TCP/IP服务端口向外提供R语言的调用,因此为传统的J2EE Web应用等系统提供了大数据分析与作图的能力,从而扩充了这些系统,可以有效的整合与盘活各种存量软硬件与数据资产,这个系统集成方案的性价比非常高,Tableau等数据科学公司也在用。
参考: R语言服务器程序 Rserve详解,配置用户验证。
How to enable T.L.S 1.2 in R-Serve,配置SSL。
Rserve TLS SSL Support, 选择使用的协议与参数。
Configuration directives supported by latest Rserve version。
A、安装,参阅Rserve Github主页:
> install.packages("Rserve")
B、配置:
root@VM-0-14-ubuntu:/home/ubuntu# vi /etc/Rserv.conf
SSL连接使用与前面相同的自签数字证书与密钥。
//允许远程登录
remote enable
//连接需要用户验证
auth required
//验证密码不允许明文
plaintext disable
//使用utf-8编码
encoding utf8
//允许远程控制后台R进程
control enable
//使用qap+tls协议
qap.tls.port 6311
//Rserve服务器密钥
tls.key /root/cert/server.key
//Rserve服务器证书,这里是自签证书
tls.cert /root/cert/server.crt
//可选,服务器CA证书
tls.ca /root/cert/demoCA/cacert.pem
//禁用非加密的qap协议
qap disable
C、配置开机自动启动。
root@VM-0-14-ubuntu:/home/ubuntu# vi /etc/rc.d/rc.local
# Startup Rserve with /etc/Rserv.conf
cd /usr/lib64/R-4.2.1/bin
./R CMD Rserve --no-save
D、本机测试SSL连接,可以与配好的Nginx SSL端口比较一下以确认。
#openssl s_client -connect localhost:6311 -tls1_2
#openssl s_client -connect localhost:443 -tls1_2
root@VM-0-14-ubuntu:/usr/lib64/R-4.2.1/bin# openssl s_client -connect localhost:6311 -tls1_2
CONNECTED(00000003)
......
SSL-Session:
Protocol : TLSv1.2
Cipher : ECDHE-RSA-AES256-GCM-SHA384
Session-ID: D42CB7A107EDBAAC42CF290542C5A68CC8B364BAB837CC414C68A600C6F758D4
Session-ID-ctx:
Master-Key: 3C0166A753FB4D68B0068B3D068C8963FF32BEE82FD15765D73FA390350A87AB259AC0F11F4462E7110B6DE1BEBC1563
PSK identity: None
PSK identity hint: None
SRP username: None
TLS session ticket lifetime hint: 7200 (seconds)
TLS session ticket:
......
Start Time: 1668394246
Timeout : 7200 (sec)
Verify return code: 19 (self signed certificate in certificate chain)
Extended master secret: yes
---
Rsrv0103QAP1
ARucK9E ------
E、本机测试R客户端连接,先用localhost再用外网IP测试。
# R
> install.packages(“RSclient”)
> library(RSclient)
> conn<-RS.connect(host="127.0.0.1",tls=TRUE, verify=FALSE)
INFO: peer nas NO cert
> RS.login(conn,"ubuntu","password",authkey=RS.authkey(conn))
[1] TRUE
> RS.eval(conn,rnorm(5))
[1] 1.047752e+00 2.513512e+00 -7.279324e-01 -6.308483e-05 1.698458e-01
> conn<-RS.connect(host="106.52.33.185",tls=TRUE, verify=FALSE)
INFO: peer nas NO cert
> RS.login(conn,"ubuntu","password",authkey=RS.authkey(conn))
[1] TRUE
> RS.eval(conn,rnorm(5))
[1] -0.1932565 -1.7393902 1.7531404 0.1870313 1.7429810
>
F、客户端Rstudio测试R客户端连接,同上。
G、增加一个专用Linux系统用户以用于远程访问Rserve,以增加安全性。
#adduser rserve
#passwd rserve
H、在Java中调用Rserve。
现在可以先看看部署在另一个普通虚拟主机上的例子,Tomcat Web App中Java调用R语言生成微博词云图。因为微博的验证机制,本地存储的微博验证令牌会有一段时间的有效期,有效期过后就不能正确连接微博读取数据,这时词云输出的就是空白,需要在Rstudio中运行程序重新验证获得有效的令牌。
在Java客户端里,主要是通过Rengin包连接Rserve,然后需要注意的是,因为Rserve配置了SSL,需要把服务器自签数字证书的CA根证书加入到JDK的受信任根证书列表中,开发环境是PC端的JDK,部署运行环境是服务器端的JDK。JDK8与JDK11已经没有jre目录,为保持兼容可以自己生成,反正找到JDK的lib/security目录,cacerts文件在该目录下,Windows上也是一样的。执行下面的命令导入根证书:
root@VM-0-14-ubuntu:/usr/lib/jvm/java-11-openjdk-amd64/lib/security# ls
blacklisted.certs blocked.certs cacerts default.policy public_suffix_list.dat
root@VM-0-14-ubuntu:/usr/lib/jvm/java-11-openjdk-amd64/lib/security# keytool -import -alias Jean -file /root/cert/demoCA/cacert.pem -keystore cacerts -storepass changeit
警告: 使用 -cacerts 选项访问 cacerts 密钥库
所有者: CN=RootCA, OU=Study, O=Jean, L=ZhuHai, ST=GD, C=CN
发布者: CN=RootCA, OU=Study, O=Jean, L=ZhuHai, ST=GD, C=CN
序列号: 40c0c7317eacfdaf07c9e692bded553d0806bce1
生效时间: Wed Nov 02 17:40:39 CST 2022, 失效时间: Sat Oct 30 17:40:39 CST 2032
证书指纹:
SHA1: EB:40:C8:EC:BA:D1:6B:21:CD:CA:89:CE:1B:54:42:C6:9F:FB:89:36
SHA256: CD:DE:44:9D:7F:3B:D8:D1:6A:0E:69:E3:AB:F7:89:3E:E5:C7:28:C0:6E:E2:7E:8F:DC:0F:44:C6:85:EE:CF:A3
签名算法名称: SHA256withRSA
主体公共密钥算法: 2048 位 RSA 密钥
版本: 3
......
是否信任此证书? [否]: y
证书已添加到密钥库中
root@VM-0-14-ubuntu:/usr/lib/jvm/java-11-openjdk-amd64/lib/security# keytool -list -keystore cacerts -storepass changeit
警告: 使用 -cacerts 选项访问 cacerts 密钥库
密钥库类型: JKS
密钥库提供方: SUN
您的密钥库包含 128 个条目
debian:ac_raiz_fnmt-rcm.pem, 2022年11月12日, trustedCertEntry,
证书指纹 (SHA-256): EB:C5:57:0C:29:01:8C:4D:67:B1:AA:12:7B:AF:12:F7:03:B4:61:1E:BC:17:B7:DA:B5:57:38:94:17:9B:93:FA
......
jean, 2022年11月14日, trustedCertEntry,
证书指纹 (SHA-256): CD:DE:44:9D:7F:3B:D8:D1:6A:0E:69:E3:AB:F7:89:3E:E5:C7:28:C0:6E:E2:7E:8F:DC:0F:44:C6:85:EE:CF:A3
以后再用Eclipse建立一个Tomcat Web App项目来演示,通过Rserve调用R语言函数,传入参数,由rmarkdown根据参数生成一份PDF数据分析报告,然后在Web App结果页面中显示下载链接供下载阅读。具体的Java项目与程序,篇幅也不小,大概需要另起一篇文章,编写也要一些时间,这里先介绍一下词云例子的要点。
1)先在Rstudio中写一个生成词云图的函数CiYunTongJi():
这个函数接收一个参数,少于最低词频的词就不画进词云图。函数中调用wordcloud2包生成词云图,这是个运行在浏览器中的交互式网页,因为在服务器端后台没有显示窗口,要调用htmlwidgets包暂存网页到磁盘,然后调用webshot包截取网页快照,就得到词云图。其余的代码是连接微博抓取一些当天的数据,清理并分词,整理成data frame order2。R源码加载时会先执行非函数部分的代码,然后Java端调用CiYunTongJi()函数时,直接在order2上过滤词条。词云图以临时文件存储路径的方式返回,在结果展示页面中,发送词云图后就把临时文件删除。
library(openxlsx)
library(Rweibo)
library(jiebaR)
library(tm)
library(wordcloud2)
library(webshot)
library(htmlwidgets)
freq<-15
CiYunTongJi<- function(freq1){
cp<-tryCatch({
as.integer(freq1)
}, error = function(e) {
15
})
if (cp<=0)
cp<-15;
order3<-order2[which(order2$freq>= cp),]
#写入到磁盘
fn<<-paste(getwd(),"/WeiBoCiYun",as.character(as.numeric(Sys.time())),".png",sep="")
tempfn = paste(getwd(),"/WeiBoCiYun",as.character(as.numeric(Sys.time())),".html",sep="")
ciyun<-wordcloud2(order3,size = 1,minRotation = -pi/3, maxRotation = pi/3,rotateRatio = 0.8,
fontFamily = "微软雅黑", color = "random-light", shape = 'star')
# 暂存网页
saveWidget(ciyun,tempfn , selfcontained = F)
# 网页截图
webshot(tempfn, fn, delay = 0.5, vwidth = 800, vheight = 800)
results <<- list(f1 = fn)
return(results)
}
#微博转换为数据框
weibo2dataframe<-function(res){
#i<-1
dt<- data.frame(uid= character(), name= character(),screen_name= character(),
id= character(),text= character(), created_at= character())
for(i in 1:length(res)){
tmp<- data.frame(uid= res[[i]]$user$idstr, name= res[[i]]$user$name,screen_name= res[[i]]$user$screen_name,
id= res[[i]]$idstr,text= res[[i]]$text, created_at= res[[i]]$created_at)
dt<-rbind(dt,tmp)
}
return (dt)
}
roauth <- createOAuth("RJean", "rweibo")
#爬虫循环爬取一些微博,转成数据框,做文本挖掘测试
wdt<- data.frame(uid= character(), name= character(),screen_name= character(),
id= character(),text= character(), created_at= character())
for(i in 1:10){
res<- statuses.friends_timeline(roauth,page=i, count = 100)
if(length(res)>0){
tmp<-weibo2dataframe(res)
wdt<-rbind(wdt,tmp)
cat(length(res))
cat("\n")
}else{cat(i); cat("\n");break}
}
#只看中央媒体的主题,要先加关注
#wdt2<-wdt[which(wdt$name %in% c("央视新闻","人民日报","人民网","新浪新闻","新华网","头条新闻","中国新闻网","环球时报","新浪博客","广东税务","珠海税务")),]
wdt2<-wdt
#把微博文本向量转换为语料库
ovid <- Corpus(VectorSource(wdt2$text))
# 之后要对每一条微博进行处理,正则匹配去掉@,去掉标点,去掉里面出现的图片等
s1 <- gsub('[a-zA-Z0-9]','',ovid)
s1 <- gsub('[\\pP+~$`^=|<>~`$^+=|<>¥×]','',s1)
s1 <- gsub('①|②|③|④|⑤|⑥|⑦|⑧|⑨|℃|↓|→|丨','',s1)
#去掉各种副词
s1<-gsub("[的|和|了|来|与|到|由|等|从|以|一|为|在|上|各|去|对|侧|多|并|千|万|年|更|向|这是]","",s1)
#分词
seg<-worker()
seg<=s1
#建立词频
freq2<-freq(segment(s1,seg))
#按词频排序
index <- order(-freq2[,2])
order2<<-freq2[index, ]
# 调用测试一下
# CiYunTongJi("20")
2)在Java中调用CiYunTongJi()函数。
定义一个输入参数的页面,它调用ServerLet /testR/CiYunRserve。
<%@ page language="java" contentType="text/html; charset=GBK"
pageEncoding="GBK"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=GBK">
<title>Insert title here</title>
</head>
<body>
<center>
<form action="/testR/CiYunRserve" method="get">
返回微博词条的最小词频<br/>
词频阀值:<input type="number" name="freq" value=20><br/>
<input type="submit" value="开始统计">
</form>
</center>
</body>
</html>
定义一个ServLet CiYunRserve.java来处理对R函数的调用。具体的处理中通过一个Helper类先初始化一个到Rserve的连接,然后执行rc.voidEval("source(fn)")加载源码CiYunTongJi.R,rc.eval("freq<-'"+freq+"'")传入参数,执行x = (REXP)rc.eval("CiYunTongJi(freq)")调用R函数得到返回的结果列表,关闭Rserve连接,把结果放入session对象中,跳转到结果处理页面,该页面会从中提取结果,发送图片并删除临时文件。这些接口函数都是Rsession包对Rengin包的进一步封装,用起来比较方便,具体见Rsession包主页。
package test;
import java.io.IOException;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpSession;
import org.math.R.Rsession;
import org.rosuda.REngine.REXP;
import org.rosuda.REngine.RList;
/**
* Servlet implementation class CiYunRserve
*/
//@WebServlet("/CiYunRserve")
public class CiYunRserve extends HttpServlet {
private static final long serialVersionUID = 1L;
/**
* @see HttpServlet#HttpServlet()
*/
public CiYunRserve() {
super();
// TODO Auto-generated constructor stub
}
/**
* @see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse response)
*/
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
// TODO Auto-generated method stub
callRServe(request, response);
}
/**
* @see HttpServlet#doPost(HttpServletRequest request, HttpServletResponse response)
*/
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
// TODO Auto-generated method stub
doGet(request, response);
}
public void callRServe(HttpServletRequest request, HttpServletResponse response) {
HttpSession session = request.getSession();
// if (session.getAttribute("rs") == null)
try {
String freq=request.getParameter("freq");
long t1 = System.currentTimeMillis();
Rsession rc =RServeHelper.getRsessionInstance();
String source = RServeHelper.prefix+"CiYunTongJi.R";
System.out.println(source);
rc.set("fn",source);
//rc.eval("source(fn)"装入源程序总是出错,用rc.voidEval("source(fn)")则可以
rc.voidEval("source(fn)");
rc.eval("freq<-'"+freq+"'");
REXP x ;
RList excels=null;
try{
x = (REXP)rc.eval("CiYunTongJi(freq)");
excels = x.asList();
} catch(Exception e){
e.printStackTrace();
}
// parse the result returned
for (int i = 0; i < excels.size(); i++) {
String gf = excels.at(i).asString();
System.out.println(gf);
}
long t2 = System.currentTimeMillis();
long t= (t2 - t1) / 1000 ;
System.out.println("耗时:" +t+ "秒");
RServeHelper.endRsession(rc);
session.setAttribute("rs", excels);
session.setAttribute("time", t);
} catch (Exception e) {
e.printStackTrace();
}
try {
System.out.println("显示结果");
response.sendRedirect("./cy/rcjg.jsp");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
结果处理页面中实际发送词云图片由另一个发送图片的ServLet执行,页面中只需给出它的超链接URL,该ServLet发送图片后可以删除临时文件。
<%@ page language="java" contentType="text/html; charset=GBK"
pageEncoding="GBK"%>
<%@ page import="org.rosuda.REngine.RList,java.util.List"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=GBK">
<title>词云图</title>
</head>
<body>
<center>
词云图<br/>
<%
if (session.getAttribute("rs")!=null){
//RList rs=(RList)session.getAttribute("rs");
RList rs=(RList)session.getAttribute("rs");
for (int i = 0; i < rs.size(); i++) {
String gf = rs.at(i).asString();
System.out.println(gf);
}
%>
<center><img src="/testR/ServePic?keep=true&filename=<%=rs.at("f1").asString()%>"></center>
<br/>
<center>耗时: <%=session.getAttribute("time") %> 秒</center>
<%} %>
</center>
</body>
</html>
这个Rserve连接帮助类主要是处理一下Windows开发环境与Linux部署环境的目录差异。
package test;
import java.io.IOException;
import java.util.Properties;
import org.math.R.RserveSession;
import org.math.R.RserverConf;
import org.math.R.Rsession;
public class RServeHelper {
// private static boolean isWindows = false;
private static Rsession rsession = null;
private static String host = "124.223.110.20";
//public static String host="127.0.0.1";
public static String prefix = "../../../../home/jean/R/";
//public static String prefix="C:/Users/lenovo/Documents/Rscripts/";
public static String rpics = "/tmp/Rpics";
//public static String rpics="C:/Users/lenovo/Documents/Rpics";
/**
* 利用Rsession初始化RServe
*
* @return
* @throws IOException
*/
public static Rsession initRserve() throws IOException {
// 从配置文件中读取Rserve信息,IP.用户名.密码
// Properties prop = PropertieHelper.getPropInstance("ssh.properties");
// String hostname = prop.getProperty("host");
// String username = prop.getProperty("username");
// String password = prop.getProperty("password");
// RserverConf rconf=new RserverConf(host,6311,username,password,new
// Properties());
Properties prop = new Properties();
prop.setProperty("tls", "true");
RserverConf rconf = new RserverConf(host, 6311, "rserve", "rserve@2022", prop);
rsession = RserveSession.newInstanceTry(System.out, rconf);
return rsession;
}
/**
* 创建Rsession单例
*
* @return
* @throws IOException
*/
public static Rsession getRsessionInstance() throws IOException {
if (rsession == null) {
rsession = initRserve();
}
return rsession;
}
public static void endRsession(Rsession rs) {
rs.end();
rsession = null;
}
}
发送图片的ServLet,从Rserve读取图片,可以按需要删除临时文件。
package test;
import java.io.IOException;
import java.io.OutputStream;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import org.math.R.Rsession;
/**
* Servlet implementation class ServePic
*/
@WebServlet("/ServePic")
public class ServePic extends HttpServlet {
private static final long serialVersionUID = 1L;
/**
* @see HttpServlet#HttpServlet()
*/
public ServePic() {
super();
}
/**
* @see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse
* response)
*/
protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
String JPG = "image/png;charset=GB2312";
String gf = request.getParameter("filename");
String keep = request.getParameter("keep");
OutputStream out = response.getOutputStream();// 得到输出流
response.setContentType(JPG);// 设定输出的类型
callRServe(gf, out, keep);
out.close();
out.flush();
}
public void callRServe(String fn, OutputStream out, String keep) throws IOException {
Rsession rc =RServeHelper.getRsessionInstance();
try {
System.out.println(fn);
rc.getFile(out, fn);
if(keep==null||!keep.equalsIgnoreCase("true"))
rc.eval("unlink(fn); r");
RServeHelper.endRsession(rc);
} catch (Exception e) {
e.printStackTrace();
try{
RServeHelper.endRsession(rc);
}catch(Exception ex){
ex.printStackTrace();
}
}
}
/**
* @see HttpServlet#doPost(HttpServletRequest request, HttpServletResponse
* response)
*/
protected void doPost(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
// TODO Auto-generated method stub
}
}
9、在Tomcat中部署Web App。
在Eclipse中开发测试好Tomcat Web App后,就可以打包输出成war文件,远程上传部署到服务器上,然后在浏览器中访问测试,这个比较简单,就不多说了。