clickhouse 配置过程记录
参考资料
ubuntu 18.04 ×64 clickhouse安装流程
-
检查当前CPU是否支持SSE 4.2
$ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"- 检测后支持,继续下一步
-
执行以下命令开始安装
sudo apt-get install apt-transport-https ca-certificates dirmngrsudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4echo "deb https://repo.clickhouse.tech/deb/stable/ main/" | sudo tee /etc/apt/sources.list.d/clickhouse.listsudo apt-get updatesudo apt-get install -y clickhouse-server clickhouse-client
由于安装时间超过3小时,暂停安装,改采用下载deb包安装。
-
下载deb安装包
- 下载3个文件 clickhouse-client,clickhouse-server,clickhouse-common-static
-
执行安装命令
sudo dpkg -i clickhouse-common-static_20.6.3.28_amd64.debsudo dpkg -i clickhouse-client_20.6.3.28_all.debsudo dpkg -i clickhouse-server_20.6.3.28_all.deb
-
启动
sudo service clickhouse-server startclickhouse-client
-
启动失败,错误信息如下
Code: 516. DB::Exception: Received from localhost:9000. DB::Exception:default: Authentication failed: password is incorrect or there is no user with such name.
-
重新执行命令
clickhouse-client --password
-
启动成功,输出语句
ClickHouse client version 20.6.3.28 (official build).
Password for user (default):
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.6.3 revision 54436. -
创建数据库 BITest
CREATE DATABASE IF NOT EXISTS BITEST
-
创建TAXI_TRIPS 表
CREATE TABLE BITEST.TAXI_TRIPS
(
Trip IDString,
Taxi IDString,
Trip Start TimestampDateTime,
Trip End TimestampNullable(DateTime),
Trip SecondsNullable(UInt64),
Trip MilesNullable(Float64),
Pickup Census TractNullable(UInt64),
Dropoff Census TractNullable(UInt64),
Pickup Community AreaNullable(UInt64),
Dropoff Community AreaNullable(UInt64),
FareNullable(String),
TipsNullable(String),
TollsNullable(String),
ExtrasNullable(String),
Trip TotalNullable(String),
Payment TypeNullable(String),
CompanyNullable(String),
Pickup Centroid LatitudeNullable(Float64),
Pickup Centroid LongitudeNullable(Float64),
Pickup Centroid Location Nullable(String),
Dropoff Centroid Latitude Nullable(Float64),
Dropoff Centroid Longitude Nullable(Float64),
Dropoff Centroid Location Nullable(String),
Community Areas Nullable(UInt64)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(toDate(Trip Start Timestamp))
ORDER BY Trip ID
SETTINGS index_granularity = 8192;
11 退出client ,执行导入数据的命令行
- clickhouse-client --password --query='INSERT INTO BITEST.TAXI_TRIPS FORMAT CSVWithNames' < Taxi_Trips.csv --date_time_input_format='best_effort' --max_insert_block_size=1024
-
执行报错,显示数据导入值不对,报错位置在13572634行。打开文件后发现最后五行数据值不合理。删除最后五行。
- err.png
- errordata.png
-
删除表中上一次导入的数据,重新导入csv文件,然后检证数据条数。
- [图片上传失败...(image-d482ca-1598519252266)]
- (在执行第三个SQL时发现问题)在把字段Trip Start Timestamp作为DateTime类型导入时,凌晨的“00:00:00~00:00:59"的数据全部读入错误,导致计算时一天只有1~23小时的时间段。
- err_output3.png
- 重新建表,数据引擎选为Log,用string类型读入时间。
CREATE TABLE BITEST.TAXI_TRIPS
(
Trip IDString,
Taxi IDString,
Trip Start TimestampString,
Trip End TimestampNullable(String),
Trip SecondsNullable(UInt64),
Trip MilesNullable(Float64),
Pickup Census TractNullable(UInt64),
Dropoff Census TractNullable(UInt64),
Pickup Community AreaNullable(UInt64),
Dropoff Community AreaNullable(UInt64),
FareNullable(String),
TipsNullable(String),
TollsNullable(String),
ExtrasNullable(String),
Trip TotalNullable(String),
Payment TypeNullable(String),
CompanyNullable(String),
Pickup Centroid LatitudeNullable(Float64),
Pickup Centroid LongitudeNullable(Float64),
Pickup Centroid LocationNullable(String),
Dropoff Centroid LatitudeNullable(Float64),
Dropoff Centroid LongitudeNullable(Float64),
Dropoff Centroid LocationNullable(String),
Community AreasNullable(UInt64)
)
ENGINE = Log()
16.再次执行导入数据命令行
- clickhouse-client --password --query='INSERT INTO BITest1.TAXI_TRIPS FORMAT CSVWithNames' < Taxi_Trips.csv --max_insert_block_size=1024
17.再次检证条数和Trip Start Timestamp`字段,读入成功。
报表分析
- 问题一: ⾏⻋⾥程最多的出租⻋Top 100
SELECT sum(
Trip Miles) AS Miles ,Taxi IDAS ID
FROM BITEST.TAXI_TRIPS
GROUP BYTaxi ID
order by Miles DESC
limit 100
- output1.png
-
问题二:搭载乘客最多的出租车Top 100
- 提供的数据中并没乘客数量字段,改为计算最多的出租车
SELECT count(1) as num ,
Taxi IDAS ID
FROM BITEST.TAXI_TRIPS
GROUP BYTaxi ID
order by num DESC
limit 100
-output2.png -
问题三:分析出租⻋每⽇⾼峰期和低峰期 (⼩时级别)
SELECT
AVG(t1.num) AS avg_num,
t1.hour
FROM
(
SELECT
substring(Trip Start Timestamp, 1, 10) AS date,
if(substring(Trip Start Timestamp, 21, 2) = 'AM', concat(substring(Trip Start Timestamp, 12, 2), 'AM'), concat(substring(Trip Start Timestamp, 12, 2), 'PM')) AS hour,
count(Taxi ID) AS num
FROM BITEST.TAXI_TRIPS
GROUP BY
date,
hour
) AS t1
GROUP BY t1.hour
ORDER BY avg_num ASC
-output3.png -
问题四:分析出出租⻋平均搭载时⻓。
SELECT
Taxi IDAS ID,
avg(Trip Seconds) AS avg_time
FROM BITEST.TAXI_TRIPS
GROUP BYTaxi ID
ORDER BY avg_time DESC
LIMIT 1000
-output4.png






