<meta charset="utf-8">
虚拟机(centos7):192.168.198.131
java 1.8
一、hadoop 安装
1、设置主机名 master
vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=master
vim /etc/hosts
192.168.198.131 master
重启生效 reboot
2、关闭防火墙
systemctl stop firewalld firewall-cmd --state
3、设置免密码登录,感觉没有必要吧(有必要,后面用到,后面有设置)
4、Hadoop-2.7.4 解压
[root@master tools]# tar -zxvf hadoop-2.7.4.tar.gz
5、jdk
[root@master hadoop-2.7.4]# java -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
6、查看hadoop版本
[root@master bin]# ./hadoop version
Error: JAVA_HOME is not set and could not be found.
修改hadoop环境配置
vim hadoop-env.sh
export JAVA_HOME=/usr/local/tools/jdk1.8.0_161
export HADOOP_LOG_DIR=/data/hadoop_repo/logs/hadoop
查看版本:
[root@master bin]# ./hadoop version
Hadoop 2.7.4
Subversion https://shv@git-wip-us.apache.org/repos/asf/hadoop.git -r cd915e1e8d9d0131462a0b7301586c175728a282
Compiled by kshvachk on 2017-08-01T00:29Z
Compiled with protoc 2.5.0
From source with checksum 50b0468318b4ce9bd24dc467b7ce1148
This command was run using /usr/local/tools/hadoop-2.7.4/share/hadoop/common/hadoop-common-2.7.4.jar
7、修改配置文件
[root@master hadoop]# pwd
/usr/local/tools/hadoop-2.7.4/etc/hadoop
vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop_repo</value>
</property>
</configuration>
vim hdfs-site.xml
副本数量
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
表示在yarn这个引擎执行
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
vim yarn-site.xml
yarn跑哪个引擎,白名单
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
8、hdfs使用前需要进行格式化(和格式化磁盘类似):不要频繁执行,如果出错,把hadoop_repo目录删除,在执行格式化
确保路径 /data/hadoop_repo 存在
bin/hdfs namenode -format
20/05/05 19:44:45 INFO common.Storage: Storage directory /data/hadoop_repo/dfs/name has been successfully formatted.
9、环境变量
vim /etc/profile
HADOOP_HOME=/usr/local/tools/hadoop-2.7.4
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${PATH}
10、设置免密码登录 ssh-keygen -t rsa
如果不设置,执行 start-all.sh 命令,会一直提示:
[root@master sbin]# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
The authenticity of host 'master (192.168.198.131)' can't be established.
ECDSA key fingerprint is SHA256:1C+x54x8j+iiBi/mdjnk8mcbfYEH0ilFTHe8DMFohNw.
ECDSA key fingerprint is MD5:db:63:90:91:26:0d:40:d2:61:f2:56:23:b9:75:db:3a.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added 'master,192.168.198.131' (ECDSA) to the list of known hosts.
root@master's password:
master: starting namenode, logging to /data/hadoop_repo/logs/hadoop/hadoop-root-namenode-master.out
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:1C+x54x8j+iiBi/mdjnk8mcbfYEH0ilFTHe8DMFohNw.
ECDSA key fingerprint is MD5:db:63:90:91:26:0d:40:d2:61:f2:56:23:b9:75:db:3a.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
root@localhost's password:
root@localhost's password: localhost: Permission denied, please try again.
localhost: starting datanode, logging to /data/hadoop_repo/logs/hadoop/hadoop-root-datanode-master.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:1C+x54x8j+iiBi/mdjnk8mcbfYEH0ilFTHe8DMFohNw.
ECDSA key fingerprint is MD5:db:63:90:91:26:0d:40:d2:61:f2:56:23:b9:75:db:3a.
Are you sure you want to continue connecting (yes/no)? yuH[[3~[[D[[D^[[C
Please type 'yes' or 'no': yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
root@0.0.0.0's password:
0.0.0.0: starting secondarynamenode, logging to /data/hadoop_repo/logs/hadoop/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/tools/hadoop-2.7.4/logs/yarn-root-resourcemanager-master.out
root@localhost's password:
localhost: starting nodemanager, logging to /usr/local/tools/hadoop-2.7.4/logs/yarn-root-nodemanager-master.out
未设置ssh免密码登录
[root@master sbin]# ssh 192.168.198.131
root@192.168.198.131's password:
Last failed login: Tue May 5 19:51:21 PDT 2020 from localhost on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Tue May 5 19:28:21 2020 from 192.168.198.1
设置ssh免密码登录
[root@master sbin]# ssh-keygen -t rsa
三次回车
执行
[root@master ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
然后可以直接ssh了
exit退出
11、启动
[root@master sbin]# ./start-all.sh
[root@master sbin]# jps
7202 Jps
6836 NodeManager
6709 ResourceManager
6536 SecondaryNameNode
6201 NameNode
6347 DataNode
12、访问
localhost:8088 localhost:50070
二、hive 安装
1、解压
2、配置环境变量
[root@master apache-hive-2.3.7]# hive --version
Hive 2.3.7
Git git://Alans-MacBook-Air.local/Users/gates/git/hive -r cb213d88304034393d68cc31a95be24f5aac62b6
Compiled by gates on Tue Apr 7 12:42:45 PDT 2020
From source with checksum 9da14e8ac4737126b00a1a47f662657e
3、
[root@master conf]# cp hive-default.xml.template hive-site.xml
[root@master conf]# vim hive-site.xml
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.198.131:3306/hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
4、复制mysql的驱动程序到hive/lib下面
5、创建mysql下的hive数据库,然后执行
mysql 启动,创建hive库,不使用hive自带的
[root@master mysql]# service mysqld start
/etc/init.d/mysqld: line 239: my_print_defaults: command not found
/etc/init.d/mysqld: line 259: cd: /usr/local/mysql: No such file or directory
Starting MySQL ERROR! Couldn't find MySQL server (/usr/local/mysql/bin/mysqld_safe)
CREATE DATABASE hive
;
[root@master bin]# schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/tools/apache-hive-2.3.7/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/tools/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://192.168.198.131:3306/hive
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
[root@master bin]#
6、执行hive命令
[root@master apache-hive-2.3.7]# hive
which: no hbase in (/usr/local/tools/apache-hive-2.3.7/bin:/usr/local/tools/hadoop-2.7.4/bin:/usr/local/tools/hadoop-2.7.4/sbin:/usr/local/tools/node/bin:/usr/local/tools/apache-maven-3.6.3/bin:/usr/local/tools/jdk1.8.0_161/bin:/usr/local/tools/node/bin:/usr/local/tools/apache-maven-3.6.3/bin:/usr/local/tools/jdk1.8.0_161/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/tools/apache-hive-2.3.7/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/tools/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/tools/apache-hive-2.3.7/lib/hive-common-2.3.7.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
7、查看第5步创建的数据库,已经有了很多表
mysql -uroot -p
目测主要看两个表:TBLS
, COLUMNS_V2
8、测试
[root@master apache-hive-2.3.7]# hive
which: no hbase in (/usr/local/tools/apache-hive-2.3.7/bin:/usr/local/tools/hadoop-2.7.4/bin:/usr/local/tools/hadoop-2.7.4/sbin:/usr/local/tools/node/bin:/usr/local/tools/apache-maven-3.6.3/bin:/usr/local/tools/jdk1.8.0_161/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/tools/apache-hive-2.3.7/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/tools/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/tools/apache-hive-2.3.7/lib/hive-common-2.3.7.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show tables;
OK
Time taken: 4.426 seconds
hive> create database hive_1;
OK
Time taken: 0.198 seconds
hive> show databases;
OK
default
hive_1
Time taken: 0.03 seconds, Fetched: 2 row(s)
hive>
看看hadoop存储信息:
[root@master ~]# hadoop fs -lsr /
lsr: DEPRECATED: Please use 'ls -R' instead.
drwx-wx-wx - root supergroup 0 2020-05-05 22:29 /tmp
drwx-wx-wx - root supergroup 0 2020-05-05 22:29 /tmp/hive
drwx------ - root supergroup 0 2020-05-06 00:42 /tmp/hive/root
drwx------ - root supergroup 0 2020-05-06 00:42 /tmp/hive/root/1fed50ca-d9f6-4b5c-b80b-a81a66679812
drwx------ - root supergroup 0 2020-05-06 00:42 /tmp/hive/root/1fed50ca-d9f6-4b5c-b80b-a81a66679812/_tmp_space.db
drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user
drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user/hive
drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user/hive/warehouse
drwxr-xr-x - root supergroup 0 2020-05-06 00:53 /user/hive/warehouse/hive_1.db
三、kafka (伪分布式)安装,前提安装zookeeper
1、安装
# tar zxvf kafka_2.11-2.2.1.tgz # mv kafka_2.11-2.2.1 kafka # cd kafka
启动kafka服务
# nohup bin/kafka-server-start.sh config/server.properties &
创建topic
# bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
查看topic
# bin/kafka-topics.sh --list --zookeeper localhost:2181
2、测试
使用kafka-console-producer.sh 发送消息
# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
使用kafka-console-consumer.sh消费消息
# bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
3、kafka集群
config 下配置多个server属性文件,设置不同的 broker.id
bin/kafka-server-start.sh config/server-1.properties &
需要先启动zookeeper
四、hbase 安装
1、解压,配置环境变量
[root@master hbase-1.4.13]# vim /etc/profile
HBASE_HOME=/usr/local/tools/hbase-1.4.13
export PATH=${HBASE_HOME}/bin:${PATH}
[root@master hbase-1.4.13]# source /etc/profile
2、修改配置文件
向hbase-env.sh中添加:
export JAVA_HOME=/usr/local/tools/jdk1.8.0_161 export HBASE_MANAGES_ZK=false
修改hbase-site.xml为
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect.
</description>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/usr/local/tools/hbase-1.4.13/data</value>
</property>
<!-- zk的位置,zk伪集群,value只有一个,如果是集群,主机名以逗号分隔 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>master</value>
</property>
<!--
false表示单机模式 true表示集群模式
此处必须为true,不然hbase仍用自带的zk,若启动了外部的zookeeper,会导致冲突,hbase启动不起来
-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
</configuration>
3、启动hbase
[root@master bin]# ./start-hbase.sh
访问:查看HBase界面 端口 16010
4、问题总结
1)
running master, logging to /usr/local/tools/hbase-1.4.13/bin/../logs/hbase-root-master-master.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
解决方案:
如报错所示,在hbase-env.sh配置文件中存在某些在jdk8中不存在命令,查看配置文件发现如下场景:
Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"
注释即可
2)
HMaster和HRegionServer是Hbase的两个子进程,但是使用jps发现没有启动起来,所以去我们配置的logs查看错误信息。提示:
Could not start ZK at requested port of 2181. ZK was started at port: 2182. Aborting as clients (e.g. shell) will not be able to find this ZK quorum.
但是在hbase-env.sh文件中设置了export HBASE_MANAGES_ZK=false
设置不使用自带zookeeper,这一步设置完按理说就可以使用独立的zookeeper程序了,但是还是报错。很明显,这是启动自带zookeeper与独立zookeeper冲突了。因为把hbase.cluster.distributed设置为false,也就是让hbase以standalone模式运行时,依然会去启动自带的zookeeper。
所以要做如下设置,值为true:
vim conf/hbase-site.xml
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
3)
2020-05-07 02:57:17,302 INFO [main-SendThread(192.168.181.131:2181)] zookeeper.ClientCnxn: Opening socket connection to server 192.168.181.131/192.168.181.131:2181. Will not attempt to authenticate using SASL (unknown error)
hbase-site.xml 配置的zookeeper 主机为hostname(master),之前是ip
五、solr集群 solr-7.5.0,前提,已经配置好了zk伪集群
1、解压
2、启动测试
solr start
3、配置zk集群和SOLR_PORT,zokeeper伪集群已经配置好,需要在solr中配置zk和SOLR_PORT
[root@master bin]# vim solr.in.sh
ZK_HOST="192.168.198.131:2181,192.168.198.131:2182,192.168.198.131:2183"
4、solr 创建 collection
bash $SOLR_HOME/bin/solr create -c vertex_index -d $SOLR_HOME/apache-atlas-conf -shards 2 -replicationFactor 2
bash $SOLR_HOME/bin/solr create -c edge_index -d $SOLR_HOME/apache-atlas-conf -shards 2 -replicationFactor 2
bash $SOLR_HOME/bin/solr create -c fulltext_index -d $SOLR_HOME/apache-atlas-conf -shards 2 -replicationFactor 2
5、solr集群启动
zk伪集群
solr伪集群
/usr/local/tools/solr-cloud/solr1/bin/solr start -force
/usr/local/tools/solr-cloud/solr2/bin/solr start -force
/usr/local/tools/solr-cloud/solr3/bin/solr start -force
/usr/local/tools/solr-cloud/solr4/bin/solr start -force
/usr/local/tools/solr-cloud/solr1/bin/solr stop
/usr/local/tools/solr-cloud/solr2/bin/solr stop
/usr/local/tools/solr-cloud/solr3/bin/solr stop
/usr/local/tools/solr-cloud/solr4/bin/solr stop
[root@master bin]#./solr create_collection -c test_collection -shards 2 -replicationFactor 2 -force
-c 指定库(collection)名称
-shards 指定分片数量,可简写为 -s ,索引数据会分布在这些分片上
-replicationFactor 每个分片的副本数量
-force 上文已说明
加 -force 是因为solr不允许使用 root 进行操作的,其他账户可不加
solr集群完成
参考:https://blog.csdn.net/qq_37936542/article/details/83113083
六、apache atlas 独立部署开始
使用atlas内置的hbase和solr
/usr/local/project/apache-atlas-sources-2.0.0/distro/target/apache-atlas-2.0.0-server
不使用atlas内置的hbase和solr
/usr/local/project/apache-atlas-sources-2.0.0-alone
[root@master apache-atlas-sources-2.0.0-alone]# mvn clean -DskipTests package -Pdist
编译完成,使用 distro/target/apache-atlas-2.0.0-server
集成solr到apache atlas
cd /usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf
[root@master conf]# cp -r solr/ /usr/local/tools/solr-7.5.0/apache-atlas-conf
独立部署:主要修改配置文件
atlas-env.sh
export HBASE_CONF_DIR=/usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf/hbase/conf
atlas-application.properties
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
######### Graph Database Configs #########
# Graph Database
#Configures the graph database to use. Defaults to JanusGraph
atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase
# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various storage backends.
#
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus
#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=master:2181,master:2182,master:2183
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000
#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=
# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true
# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1
# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository
# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1
# Graph Search Index
atlas.graph.index.search.backend=solr
#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=master:2181,master:2182,master:2183
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true
#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr
# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: http://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=true
# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150
######### Notification Configs #########
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=master:2181/kafka,master:2182/kafka,master:2183/kafka
atlas.kafka.bootstrap.servers=master:9092,master:9093,master:9094
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000
atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab
## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443
######### Security Properties #########
# SSL config
atlas.enableTLS=false
#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks
#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks
# Authentication config
atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true
#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none
#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties
### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true
######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=<password>
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=<default role>
######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=<password>
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=<default role>
######### JAAS Configuration ########
#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM
######### Server Properties #########
atlas.rest.address=http://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
#atlas.server.run.setup.on.start=false
######### Entity Audit Configs #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=master:2181,master:2182,master:2183
######### High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>
######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json
######### Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=
######### Performance Configs #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000
######### CSRF Configs #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER
############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=
############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query.<key>.<name>
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=
######### Compiled Query Cache Configuration #########
# The size of the compiled query cache. Older queries will be evicted from the cache
# when we reach the capacity.
#atlas.CompiledQueryCache.capacity=1000
# Allows notifications when items are evicted from the compiled query
# cache because it has become full. A warning will be issued when
# the specified number of evictions have occurred. If the eviction
# warning threshold <= 0, no eviction warnings will be issued.
#atlas.CompiledQueryCache.evictionWarningThrottle=0
######### Full Text Search Configuration #########
#Set to false to disable full text search.
#atlas.search.fulltext.enable=true
######### Gremlin Search Configuration #########
#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false
########## Add http headers ###########
#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.<headerName>=<headerValue>
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
######### Sqoop Hook Configs #######
atlas.hook.sqoop.synchronous=false
atlas.hook.sqoop.numRetries=3
atlas.hook.sqoop.queueSize=10000
storage.cql.protocol-version=3
storage.cql.local-core-connections-per-host=10
storage.cql.local-max-connections-per-host=20
storage.cql.local-max-requests-per-connection=2000
storage.buffer-size=1024
七、atlas 独立部署问题总结
1)
Could not find hbase-site.xml in %s. Please set env var HBASE_CONF_DIR to the hbase client conf dir
软连接不对 ?
cd /usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0
[root@cdh632-worker03 atlas]# ln -s /etc/hbase/conf /opt/module/apache-atlas-sources-2.0.0/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf/hbase/conf
[root@cdh632-worker03 atlas]# pwd
vim atlas-env.sh
export HBASE_CONF_DIR=/usr/local/project/apache-atlas-sources-2.0.0-alone/distro/target/apache-atlas-2.0.0-server/apache-atlas-2.0.0/conf/hbase/conf
2)
启动报错
2020-05-23 10:30:46,794 WARN - [main:] ~ Unexpected exception during getDeployment() (HBaseStoreManager:399)
java.lang.RuntimeException: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
修改配置文件
master:2181,master:2182,master:2183
参考:
https://blog.csdn.net/qq_34024275/article/details/105393745
图数据库建立流程如下:
配置文件配置图数据库的数据存储位置和索引存储位置:
atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase
# Graph Storage
atlas.graph.storage.backend=hbase
atlas.graph.storage.port=2181
atlas.graph.storage.hbase.table=atlas-test
atlas.graph.storage.hostname=docker2,docker3,docker4
# Graph Search Index Backend
atlas.graph.index.search.backend=elasticsearch
atlas.graph.index.search.hostname=127.0.0.1
atlas.graph.index.search.index-name=atlas_test
3)
at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily
atlas hbase 是2.0,本地启动的是 1.4.13
解决:
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>Property from ZooKeeper'sconfig zoo.cfg. The port at which the clients will connect.
</description>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/usr/local/tools/hbase-2.2.4/data</value>
</property>
<!-- zk的位置 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>master</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
4)
2020-05-23 23:52:00,415 WARN - [main:] ~ JanusGraphException: Could not open global configuration (AtlasJanusGraphDatabase:167)
2020-05-23 23:52:00,432 WARN - [main:] ~ Unexpected exception during getDeployment() (HBaseStoreManager:399)
java.lang.RuntimeException: org.janusgraph.diskstorage.TemporaryBackendException: Temporary failure in storage backend
配置文件添加:
storage.cql.protocol-version=3
storage.cql.local-core-connections-per-host=10
storage.cql.local-max-connections-per-host=20
storage.cql.local-max-requests-per-connection=2000
storage.buffer-size=1024
5)
Caused by: org.apache.solr.common.SolrException: Cannot connect to cluster at master:2181,master:2182,master:2183/solr: cluster not found/not ready
at org.apache.solr.common.cloud.ZkStateReader.createClusterStateWatchersAndUpdate(ZkStateReader.java:385)
at org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.connect(ZkClientClusterStateProvider.java:141)
at org.apache.solr.client.solrj.impl.CloudSolrClient.connect(CloudSolrClient.java:383)
at org.janusgraph.diskstorage.solr.Solr6Index.<init>(Solr6Index.java:218)
master:2181,master:2182,master:2183/solr
改成 master:2181,master:2182,master:2183 就可以了
补充:
1、java环境变量
vim /etc/profile
加上以下代码:
export JAVA_HOME=/usr/local/tools/jdk1.8.0_161
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
然后保存退出,使配置生效:
source /etc/profile
2、zookeeper 伪分布式
[root@master zookeeper-01]# cd data/
[root@master data]# touch myid
[root@master data]# echo 1 >> myid
修改配置文件。把conf目录下的zoo_sample.cfg文件改名为zoo.cfg(IP号记得改成你自己的)
server.1=192.168.198.131:2881:3881
server.2=192.168.198.131:2882:3882
server.3=192.168.198.131:2883:3883
zookeeper 集群启动报错
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException
zoo.cfg 需要改成自己创建的data路径,因为里面有myid文件
dataDir=/usr/local/tools/zk-cloud/zookeeper01/data
创建启动文件,省的一个一个启动
vim zk-start.sh
cd zookeeper01/bin
./zkServer.sh start
cd ../../
cd zookeeper02/bin
./zkServer.sh start
cd ../../
cd zookeeper03/bin
./zkServer.sh start
cd ../../
chmod -R 755 zk-start.sh
zookeeper启动成功,查看 zkServer.sh stauts