ClickHouse 分片单副本集群部署

环境

  • 每天机器已安装单机版的ClickHouse,官方推荐使用 rpm 安装,也可以安装好一台后复制文件过去
  • CentOS Linux release 7.8.2003 (Core)
  • ClickHouse 20.11.6.6
  • 防火墙、selinux已关闭
分片 IP,仅单副本
分片01 192.168.66.101:9000
分片02 192.168.66.102:9000
分片03 192.168.66.103:9000

限制

当前配置为3分片1副本,也可以仅使用2台机器配置为2分片。
单副本多分片情况下,数据会进行分片存储,但若有一台机器分配挂掉,就会导致整个集群不可用。


image.png

配置host【所有机器配置】

echo "192.168.66.101 yqtest1" >> /etc/hosts
echo "192.168.66.102 yqtest2" >> /etc/hosts
echo "192.168.66.103 yqtest3" >> /etc/hosts

安装【所有都安装】

安装略,可参考前篇文章

集群配置

集群配置在配置文件中加入后立即生效

配置分片与副本信息【所有机器都需配置】

可配置完成一个后传过去,也可以分别配置
vi /etc/clickhouse-server/config.d/metrika.xml

<?xml version="1.0"?>
<yandex>
    <!--集群相关配置-->
    <clickhouse_remote_servers>
        <!--自定义集群名称 ckcluster_3shards_1replicas-->
        <ckcluster_3shards_1replicas>
            <!--分片1-->
            <shard>
                <internal_replication>false</internal_replication>
                <!--yqtest1 单副本-->
                <replica>
                    <host>yqtest1</host>
                    <port>9000</port>
                </replica>
            </shard>
            <!--分片2-->
            <shard>
                <internal_replication>false</internal_replication>
                <!--yqtest2 单副本-->
                <replica>
                    <host>yqtest2</host>
                    <port>9000</port>
                </replica>
            </shard>
            <!--分片3-->
            <shard>
                <internal_replication>false</internal_replication>
                <!--yqtest3 单副本-->
                <replica>
                    <host>yqtest3</host>
                    <port>9000</port>
                </replica>
            </shard>
        </ckcluster_3shards_1replicas>
    </clickhouse_remote_servers>
    <!--压缩算法-->
    <clickhouse_compression>
        <case>
            <min_part_size>10000000000</min_part_size>
            <min_part_size_ratio>0.01</min_part_size_ratio>
            <method>lz4</method>
        </case>
    </clickhouse_compression>
</yandex>

注意:internal_replication的参数设置很重要,当有多个副本集群时

  • 创建的表为非复制表 <internal_replication>false</internal_replication>
    写分布式表:会将数据插入到所有副本的本地表中,副本表上的数据保持同步。写入本地单表,数据仅写入到当前服务器单表上,会出现不同服务器查询结果不同。
    <internal_replication>true</internal_replication>
    写分布式表:数据只插入到一个副本的本地表中,不会做同步,数据紊乱,官方不推荐使用
  • 创建的表为复制表
    <internal_replication>false</internal_replication>
    写分布式表:数据会分片后插入到所有本地表中,会出现重复,但复制表会自动删除重复,这有性能损耗
    <internal_replication>true</internal_replication>
    写分布式表/本地表:会随机写一份到分片上,然后自动同步数据到复制副本分片上。官方推荐

配置信息引入config文件【所有服务器配置】

vi /etc/clickhouse-server/config.xml
搜索 metrika,在下面添加引入

<!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
         By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
         Values for substitutions are specified in /yandex/name_of_substitution elements in that file.
      -->
![image.png](https://upload-images.jianshu.io/upload_images/24929690-70704ecd1c4a6af5.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

查看集群信息【任何一台】

clickhouse-client -h 192.168.66.101 --port 9000 --user default --query "select * from system.clusters";
clickhouse-client -h 192.168.66.102 --port 9000 --user default --query "select * from system.clusters";
clickhouse-client -h 192.168.66.103 --port 9000 --user default --query "select * from system.clusters";
image.png
  • 在这遭遇到一个诡异的问题,103怎么查询 is_local(圈出字段) 都是定位在 101端,重启103端clickhouse解决
  • 这边不一致一定要重视,否则可能出现查询分片表数据不一致
  • 下面这些test的集群可以修改clickhouse-server/config.xml中的

数据导入测试

导入测试数据【101】

官方测试数据下载地址:
【405M】https://datasets.clickhouse.tech/visits/tsv/visits_v1.tsv.xz
仅需传到其中一台服务器上即可,本例放在了 101 上,数据库需要预先所有库创建好

# 解压,解压后2.5G
unxz visits_v1.tsv.xz

-- 创建库【所有】
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS yqtest"

-- 创建表【101】
clickhouse-client --query "CREATE TABLE yqtest.visits_v1 ( CounterID UInt32,  StartDate Date,  Sign Int8,  IsNew UInt8,  VisitID UInt64,  UserID UInt64,  StartTime DateTime,  Duration UInt32,  UTCStartTime DateTime,  PageViews Int32,  Hits Int32,  IsBounce UInt8,  Referer String,  StartURL String,  RefererDomain String,  StartURLDomain String,  EndURL String,  LinkURL String,  IsDownload UInt8,  TraficSourceID Int8,  SearchEngineID UInt16,  SearchPhrase String,  AdvEngineID UInt8,  PlaceID Int32,  RefererCategories Array(UInt16),  URLCategories Array(UInt16),  URLRegions Array(UInt32),  RefererRegions Array(UInt32),  IsYandex UInt8,  GoalReachesDepth Int32,  GoalReachesURL Int32,  GoalReachesAny Int32,  SocialSourceNetworkID UInt8,  SocialSourcePage String,  MobilePhoneModel String,  ClientEventTime DateTime,  RegionID UInt32,  ClientIP UInt32,  ClientIP6 FixedString(16),  RemoteIP UInt32,  RemoteIP6 FixedString(16),  IPNetworkID UInt32,  SilverlightVersion3 UInt32,  CodeVersion UInt32,  ResolutionWidth UInt16,  ResolutionHeight UInt16,  UserAgentMajor UInt16,  UserAgentMinor UInt16,  WindowClientWidth UInt16,  WindowClientHeight UInt16,  SilverlightVersion2 UInt8,  SilverlightVersion4 UInt16,  FlashVersion3 UInt16,  FlashVersion4 UInt16,  ClientTimeZone Int16,  OS UInt8,  UserAgent UInt8,  ResolutionDepth UInt8,  FlashMajor UInt8,  FlashMinor UInt8,  NetMajor UInt8,  NetMinor UInt8,  MobilePhone UInt8,  SilverlightVersion1 UInt8,  Age UInt8,  Sex UInt8,  Income UInt8,  JavaEnable UInt8,  CookieEnable UInt8,  JavascriptEnable UInt8,  IsMobile UInt8,  BrowserLanguage UInt16,  BrowserCountry UInt16,  Interests UInt16,  Robotness UInt8,  GeneralInterests Array(UInt16),  Params Array(String),  Goals Nested(ID UInt32, Serial UInt32, EventTime DateTime,  Price Int64,  OrderID String, CurrencyID UInt32),  WatchIDs Array(UInt64),  ParamSumPrice Int64,  ParamCurrency FixedString(3),  ParamCurrencyID UInt16,  ClickLogID UInt64,  ClickEventID Int32,  ClickGoodEvent Int32,  ClickEventTime DateTime,  ClickPriorityID Int32,  ClickPhraseID Int32,  ClickPageID Int32,  ClickPlaceID Int32,  ClickTypeID Int32,  ClickResourceID Int32,  ClickCost UInt32,  ClickClientIP UInt32,  ClickDomainID UInt32,  ClickURL String,  ClickAttempt UInt8,  ClickOrderID UInt32,  ClickBannerID UInt32,  ClickMarketCategoryID UInt32,  ClickMarketPP UInt32,  ClickMarketCategoryName String,  ClickMarketPPName String,  ClickAWAPSCampaignName String,  ClickPageName String,  ClickTargetType UInt16,  ClickTargetPhraseID UInt64,  ClickContextType UInt8,  ClickSelectType Int8,  ClickOptions String,  ClickGroupBannerID Int32,  OpenstatServiceName String,  OpenstatCampaignID String,  OpenstatAdID String,  OpenstatSourceID String,  UTMSource String,  UTMMedium String,  UTMCampaign String,  UTMContent String,  UTMTerm String,  FromTag String,  HasGCLID UInt8,  FirstVisit DateTime,  PredLastVisit Date,  LastVisit Date,  TotalVisits UInt32,  TraficSource    Nested(ID Int8,  SearchEngineID UInt16, AdvEngineID UInt8, PlaceID UInt16, SocialSourceNetworkID UInt8, Domain String, SearchPhrase String, SocialSourcePage String),  Attendance FixedString(16),  CLID UInt32,  YCLID UInt64,  NormalizedRefererHash UInt64,  SearchPhraseHash UInt64,  RefererDomainHash UInt64,  NormalizedStartURLHash UInt64,  StartURLDomainHash UInt64,  NormalizedEndURLHash UInt64,  TopLevelDomain UInt64,  URLScheme UInt64,  OpenstatServiceNameHash UInt64,  OpenstatCampaignIDHash UInt64,  OpenstatAdIDHash UInt64,  OpenstatSourceIDHash UInt64,  UTMSourceHash UInt64,  UTMMediumHash UInt64,  UTMCampaignHash UInt64,  UTMContentHash UInt64,  UTMTermHash UInt64,  FromHash UInt64,  WebVisorEnabled UInt8,  WebVisorActivity UInt32,  ParsedParams    Nested(Key1 String,  Key2 String,  Key3 String,  Key4 String, Key5 String, ValueDouble    Float64),  Market Nested(Type UInt8, GoalID UInt32, OrderID String,  OrderPrice Int64,  PP UInt32,  DirectPlaceID UInt32,  DirectOrderID  UInt32,  DirectBannerID UInt32,  GoodID String, GoodName String, GoodQuantity Int32,  GoodPrice Int64),  IslandID FixedString(16)) ENGINE = CollapsingMergeTree(Sign) PARTITION BY toYYYYMM(StartDate) ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192"

-- 导入数据【101】
cat visits_v1.tsv | clickhouse-client --query "INSERT INTO yqtest.visits_v1 FORMAT TSV" --max_insert_block_size=100000

-- 查看导入数据量,因为仅在101导入,因此仅101存在【101】
yqtest1 :) select count(*) from visits_v1;
┌─count()─┐
│ 1681077 │
└─────────┘

创建本地表和分片表【所有机器】

本地表创建

clickhouse-client --query "CREATE TABLE yqtest.test_local ( CounterID UInt32,  StartDate Date,  Sign Int8,  IsNew UInt8,  VisitID UInt64,  UserID UInt64,  StartTime DateTime,  Duration UInt32,  UTCStartTime DateTime,  PageViews Int32,  Hits Int32,  IsBounce UInt8,  Referer String,  StartURL String,  RefererDomain String,  StartURLDomain String,  EndURL String,  LinkURL String,  IsDownload UInt8,  TraficSourceID Int8,  SearchEngineID UInt16,  SearchPhrase String,  AdvEngineID UInt8,  PlaceID Int32,  RefererCategories Array(UInt16),  URLCategories Array(UInt16),  URLRegions Array(UInt32),  RefererRegions Array(UInt32),  IsYandex UInt8,  GoalReachesDepth Int32,  GoalReachesURL Int32,  GoalReachesAny Int32,  SocialSourceNetworkID UInt8,  SocialSourcePage String,  MobilePhoneModel String,  ClientEventTime DateTime,  RegionID UInt32,  ClientIP UInt32,  ClientIP6 FixedString(16),  RemoteIP UInt32,  RemoteIP6 FixedString(16),  IPNetworkID UInt32,  SilverlightVersion3 UInt32,  CodeVersion UInt32,  ResolutionWidth UInt16,  ResolutionHeight UInt16,  UserAgentMajor UInt16,  UserAgentMinor UInt16,  WindowClientWidth UInt16,  WindowClientHeight UInt16,  SilverlightVersion2 UInt8,  SilverlightVersion4 UInt16,  FlashVersion3 UInt16,  FlashVersion4 UInt16,  ClientTimeZone Int16,  OS UInt8,  UserAgent UInt8,  ResolutionDepth UInt8,  FlashMajor UInt8,  FlashMinor UInt8,  NetMajor UInt8,  NetMinor UInt8,  MobilePhone UInt8,  SilverlightVersion1 UInt8,  Age UInt8,  Sex UInt8,  Income UInt8,  JavaEnable UInt8,  CookieEnable UInt8,  JavascriptEnable UInt8,  IsMobile UInt8,  BrowserLanguage UInt16,  BrowserCountry UInt16,  Interests UInt16,  Robotness UInt8,  GeneralInterests Array(UInt16),  Params Array(String),  Goals Nested(ID UInt32, Serial UInt32, EventTime DateTime,  Price Int64,  OrderID String, CurrencyID UInt32),  WatchIDs Array(UInt64),  ParamSumPrice Int64,  ParamCurrency FixedString(3),  ParamCurrencyID UInt16,  ClickLogID UInt64,  ClickEventID Int32,  ClickGoodEvent Int32,  ClickEventTime DateTime,  ClickPriorityID Int32,  ClickPhraseID Int32,  ClickPageID Int32,  ClickPlaceID Int32,  ClickTypeID Int32,  ClickResourceID Int32,  ClickCost UInt32,  ClickClientIP UInt32,  ClickDomainID UInt32,  ClickURL String,  ClickAttempt UInt8,  ClickOrderID UInt32,  ClickBannerID UInt32,  ClickMarketCategoryID UInt32,  ClickMarketPP UInt32,  ClickMarketCategoryName String,  ClickMarketPPName String,  ClickAWAPSCampaignName String,  ClickPageName String,  ClickTargetType UInt16,  ClickTargetPhraseID UInt64,  ClickContextType UInt8,  ClickSelectType Int8,  ClickOptions String,  ClickGroupBannerID Int32,  OpenstatServiceName String,  OpenstatCampaignID String,  OpenstatAdID String,  OpenstatSourceID String,  UTMSource String,  UTMMedium String,  UTMCampaign String,  UTMContent String,  UTMTerm String,  FromTag String,  HasGCLID UInt8,  FirstVisit DateTime,  PredLastVisit Date,  LastVisit Date,  TotalVisits UInt32,  TraficSource    Nested(ID Int8,  SearchEngineID UInt16, AdvEngineID UInt8, PlaceID UInt16, SocialSourceNetworkID UInt8, Domain String, SearchPhrase String, SocialSourcePage String),  Attendance FixedString(16),  CLID UInt32,  YCLID UInt64,  NormalizedRefererHash UInt64,  SearchPhraseHash UInt64,  RefererDomainHash UInt64,  NormalizedStartURLHash UInt64,  StartURLDomainHash UInt64,  NormalizedEndURLHash UInt64,  TopLevelDomain UInt64,  URLScheme UInt64,  OpenstatServiceNameHash UInt64,  OpenstatCampaignIDHash UInt64,  OpenstatAdIDHash UInt64,  OpenstatSourceIDHash UInt64,  UTMSourceHash UInt64,  UTMMediumHash UInt64,  UTMCampaignHash UInt64,  UTMContentHash UInt64,  UTMTermHash UInt64,  FromHash UInt64,  WebVisorEnabled UInt8,  WebVisorActivity UInt32,  ParsedParams    Nested(Key1 String,  Key2 String,  Key3 String,  Key4 String, Key5 String, ValueDouble    Float64),  Market Nested(Type UInt8, GoalID UInt32, OrderID String,  OrderPrice Int64,  PP UInt32,  DirectPlaceID UInt32,  DirectOrderID  UInt32,  DirectBannerID UInt32,  GoodID String, GoodName String, GoodQuantity Int32,  GoodPrice Int64),  IslandID FixedString(16)) ENGINE = CollapsingMergeTree(Sign) PARTITION BY toYYYYMM(StartDate) ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID) SAMPLE BY intHash32(UserID) SETTINGS index_granularity = 8192"

创建分片表

clickhouse-client --query "create table yqtest.test_all as yqtest.test_local ENGINE = Distributed(ckcluster_3shards_1replicas,yqtest,test_local,rand())"

插入数据

# 当前表
yqtest1 :) show tables;
┌─name───────┐
│ test_all   │ # 0
│ test_local │ # 0
│ visits_v1  │ # 1681077
└────────────┘

# 将数据插入分片表
insert into yqtest.test_all select * from visits_v1 limit 30000;

查看数据分布

  • 查看本地表可看到数据被打散分布在3太机器上
  • 查看分片表可看到在任意一台数据量查询到的都是相同的
-- 分别执行
SELECT count(*) FROM yqtest.test_local
SELECT count(*) FROM yqtest.test_all
服务器 本地表[test_local] 分片表[test_all]
101 9983 30000
102 9966 30000
103 10051 30000

故障测试

假设【102】宕机,直接对 102 执行了 poweroff

  • 宕机任何一台会导致整个集群不可用,分片表无法正常查询
  • 到每一台机器上能查询到local表的信息,但若102数据丢失,则扔为无用数据
  • 102恢复后,集群恢复正常
-- 分别执行
SELECT count(*) FROM yqtest.test_local
SELECT count(*) FROM yqtest.test_all
服务器 本地表[test_local] 分片表[test_all]
101 9983 报错
102 宕机 宕机
103 10051 报错

查询分片表出现如下错误:

← Progress: 2.00 rows, 8.21 KB (19.16 rows/s., 78.62 KB/s.) 
Received exception from server (version 20.11.6):
Code: 279. DB::Exception: Received from localhost:9000. DB::Exception: All connection tries failed. Log: 

Code: 32, e.displayText() = DB::Exception: Attempt to read after eof (version 20.11.6.6 (official build))
Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 192.168.66.102:9000 (yqtest2:9000) (version 20.11.6.6 (official build))
Code: 209, e.displayText() = DB::NetException: Timeout: connect timed out: 192.168.66.102:9000 (yqtest2:9000) (version 20.11.6.6 (official build))

: While executing Remote. 

0 rows in set. Elapsed: 0.299 sec. 
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,590评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 86,808评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,151评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,779评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,773评论 5 367
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,656评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,022评论 3 398
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,678评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 41,038评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,659评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,756评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,411评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,005评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,973评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,053评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,495评论 2 343

推荐阅读更多精彩内容