方法一:使第三方工具导入
来源:https://ask2md.medium.com/migrate-replicate-mysql-to-clickhouse-81c7ee5fce46
1.1 安装依赖软件(Installation and Configuration)
apt-get install python3-dev default-libmysqlclient-dev build-essential
apt-get install python3-pymysql
apt-get install pkg-config
apt install python3
pip3 install mysqlclient
pip3 install mysql-replication
pip3 install clickhouse-mysql
如果报错了,需要降级 pymysql
:pip3 install --user 'pymysql<0.9'
1.1 Create MySQL users required by Clickhouse
mysql> CREATE USER 'new_user'@'%' IDENTIFIED BY 'My@123' ;
mysql> CREATE USER 'new_user'@'127.0.0.1' IDENTIFIED BY 'My@123' ;
mysql> CREATE USER 'new_user'@'localhost' IDENTIFIED BY 'My@123' ;
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'new_user'@'%' ;
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'new_user'@'127.0.0.1' ;
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'new_user'@'localhost' ;
1.2 Make sure to have MySQL settings
server_id = <your id>
binlog_format = ROW
binlog_row_image = FULL
查看方式
# To Check the details, type following commands in MySQL console
mysql> Select @@server_id;
mysql> Select @@binlog_format
mysql> Select @@binlog_row_image
1.3 I - Migrate data from MySQL to Clickhouse
Create schema of MySQL table for Clickhouse
语法
clickhouse-mysql \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-passowrd> \
--create-table-sql-template \
--with-create-database \
--src-tables=<dbname>.<table-name> > <table-name>.sql
例子
clickhouse-mysql \
--src-host=127.0.0.1 \
--src-user=new_user \
--src-password=My@123 \
--create-table-sql-template \
--with-create-database \
--src-tables=mysql_DB.my_tbl > my_tbl.sql
注意⚠️:Before proceeding next steps, change some schema values like Engine, DataType, and remove params from MergeTree(<param1>,<param2>)
as convenient.
把生成的 SQL 语句的最后面的 MergeTree()
括号里面的东西删除了就可以了,只留下 MergeTree()
1.4 Create Table in Clickhouse
clickhouse-client \
--host=<clichouse_host> \
--password=<clickhouse_password> \
-mn < /file/path/to/<table_name>.sql
- 可以使用
ReplacingMergeTree
进行重复信息的过滤,过滤依据的是ORDER BY
。
例子
clickhouse-client \
--host=127.0.0.1 \
--password=My@123 \
-mn < /file/path/to/my_tbl.sql
If facing the error of merge engine, edit the .sql file and add “ORDER BY (<col name>)
” after Engine.
如果报错了,那么需要增加 ORDER BY
,例如在 MergeTree()
后面增加 ORDER BY id
1.5 Migrate Table data from MySQL to Clickhouse
语法
clickhouse-mysql \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--migrate-table \
--src-tables=<mysql-dbname>.<mysql-table_name> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-schema <clickhouse-dbname> \
--dst-table <clickhouse-table_name> \
--log-file=/file/path/to/<table_name_migt>.log
--dst-password
和--log-file=
可以不写。--dst-host=
为必填--dst-password
默认 9000
举例
clickhouse-mysql \
--src-host=127.0.0.1 \
--src-port=3306 \
--src-user=new_user \
--src-password=My@123 \
--migrate-table \
--src-tables=mysql_DB.my_tbl \
--dst-host=127.0.0.1 \
--dst-password=My@123 \
--dst-schema clickhouse_DB \
--dst-table my_tbl \
--log-file=/file/path/to/my_tbl_migt.log
The above command is connecting to mysql on host ‘127.0.0.1’ with given credentials, it will copy the data from mysql_DB.my_tbl to the clickhouse table clickhouse_DB.my_tbl. Logfile print all logs of migrations.
如果只是复制已有的数据,那么到这里就可以了。
1.6 II - Replicate MySQL Table to Clickhouse (副本)
As the data is migrated to clickhouse, but what about the new data in mysql. For that replication of the mysql table, data is needed to be replicated in the clickhouse database. It's an additional step where any new entry is inserted in mysql, it will automatically reflect in clickhouse if the table is syncing with clickhouse. The below command is used for syncing the table.
Syntax:
clickhouse-mysql \
--src-server-id=<mysql-server-id> \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--src-tables=<mysql_DB>.<my_tbl> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-schema <clickhouse_DB> \
--dst-table <clickhouse_tbl> \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--log-file=/file/path/to/<table_name_repl>.log
Example:
clickhouse-mysql \
--src-server-id=1 \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=127.0.0.1 \
--src-user=new_user \
--src-password=My@123 \
--src-tables=mysql_DB.my_tbl \
--dst-host=127.0.0.1 \
--dst-password=My@123 \
--dst-schema clickhouse_DB \
--dst-table my_tbl \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--log-file=/file/path/to/my_tbl_repl.log
The above command connects the mysql host 127.0.0.1 with the given credentials and replicates the mysql table mysql_DB.my_tbl in the clickhouse table clickhouse_DB.my_tbl
NOTE: To run this command in the background use & after the command, and for any reason to kill this background command, use grep method to find and kill the process id.
$ sudo ps aux | grep clickhouse $ Kill -9
III - Create Schema & Migrate table in Clickhouse
The scenario of creating a mysql table schema and then migrating the table data in clickhouse, it can be done using only one command.
Syntax:
clickhouse-mysql \
--src-server-id=<mysql-server-id> \
--src-wait \
--nice-pause=1 \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--src-tables=<mysql_DB>.<my_tbl> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-create-table \
--migrate-table
Example:
clickhouse-mysql \
--src-server-id=1 \
--src-wait \
--nice-pause=1 \
--src-host=127.0.0.1 \
--src-user=root \
--src-password=root \
--src-tables=myDB.tbl_dnd \
--dst-host=127.0.0.1 \
--dst-create-table \
--migrate-table
The above command will create a table schema and migrate the table data into clickhouse.
IV - Migration & Replication in One command
If it needs to migrate and replicate the mysql table data to clickhouse using one command, we can follow the below command. So with the migration of the table data, it will start replicating the new insertion value of the mysql table into the clickhouse table.
Syntax:
clickhouse-mysql \
--src-server-id=<mysql-server-id> \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--src-tables=<mysql_DB>.<my_tbl> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-schema <clickhouse_DB> \
--dst-table <clickhouse_tbl> \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--migrate-table \
--log-file=/file/path/to/<table_name_migt_repl>.log
Example:
clickhouse-mysql \
--src-server-id=1 \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=127.0.0.1 \
--src-user=new_user \
--src-password=My@123 \
--src-tables=mysql_DB.my_tbl \
--dst-host=127.0.0.1 \
--dst-password=My@123 \
--dst-schema clickhouse_DB \
--dst-table my_tbl \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--migrate-table \
--log-file=/file/path/to/my_tbl_migt_repl.log
Validate the successful data import by counting the records of a table
Running the normal select query in both database mysql & clickhouse, As mysql select query is giving the result in minutes while in clickhouse it returns within a seconds.
MySQL
mysql> SELECT COUNT(*) FROM mysql_DB.my_tbl\G *************************** 1. row *************************** COUNT(*): 935623858
1 row in set (5 min 17.12 sec)
Clickhouse
ClickHouse-LAB :) SELECT COUNT(*) FROM clickhouse_DB.my_tbl\G
SELECT COUNT(*)
FROM clickhouse_DB.my_tbl
Row 1:
────────────
COUNT(): 935623858
1 rows in set. Elapsed: 0.233 sec. Processed 935.62 million rows, 491.10 MB (4.91 billion
rows/s., 4.91 GB/s.)
The above count and timing may vary in the different systems.
方法二:导出 csv 然后导入
来源:https://stackoverflow.com/questions/47708424/importing-from-mysql-dump-to-clickhouse
mysql -u user -ppass --compress -ss -e "SELECT * FROM table WHERE id >0 AND id <=1000000" db_name | sed 's/\"//g;s/\t/","/g;s/^/"/;s/$/"/' | clickhouse-client --query="INSERT INTO db_name.table FORMAT CSV"
方法三:创建虚拟 table(clickhouse 导入到一半就自动断开了)
来源:https://medium.com/datadenys/how-to-load-data-directly-from-mysql-to-clickhouse-744c02218b49
- Create proxy table based on Mysql Engine
CREATE TABLE mysql_data
(
`entity_id` UInt64,
`object` Nullable(String),
`property` Nullable(String),
`value` Float64,
`at` DateTime
)
ENGINE = MySQL('127.0.0.1:3306', 'db', 'table', 'usr', 'pwd')
- Copy data from Mysql to Clickhouse
CREATE TABLE default.data
(
`entity_id` UInt64,
`object` String,
`property` String,
`value` Float64,
`at` DateTime
)
ENGINE = MergeTree
ORDER BY (entity_id, object, property, value, at)
SETTINGS index_granularity = 8192
INSERT INTO data SELECT * FROM mysql_data
这个命令执行了几秒钟就中断了。查了一下,貌似低版本的 clickhouse 不会出现这个问题。