mysql 数据导入 clickhouse

创建日期: 2024-11-25 19:58 | 作者: 风波 | 浏览次数: 26 | 分类: ClickHouse

方法一:使第三方工具导入

来源:https://ask2md.medium.com/migrate-replicate-mysql-to-clickhouse-81c7ee5fce46

1.1 安装依赖软件(Installation and Configuration)

apt-get install python3-dev default-libmysqlclient-dev build-essential
apt-get install python3-pymysql
apt-get install pkg-config

apt install python3
pip3 install mysqlclient
pip3 install mysql-replication
pip3 install clickhouse-mysql

如果报错了,需要降级 pymysqlpip3 install --user 'pymysql<0.9'

1.1 Create MySQL users required by Clickhouse

mysql>​ CREATE USER ​ 'new_user'​@'%'​ IDENTIFIED BY ​ 'My@123'​ ;
mysql>​ CREATE USER ​ 'new_user'​@'127.0.0.1'​ IDENTIFIED BY ​'My@123'​ ;
mysql>​ CREATE USER ​ 'new_user'​@'localhost'​ IDENTIFIED BY ​ 'My@123'​ ;
mysql>​ GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'new_user'​@'%'​ ;
mysql>​ GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'new_user'​@'127.0.0.1'​ ;
mysql>​ GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'new_user'​@'localhost'​ ;

1.2 Make sure to have MySQL settings

查看方式

# To Check the details, type following commands in MySQL console
mysql> Select @@server_id;
mysql> Select @@binlog_format
mysql> Select @@binlog_row_image

1.3 I - Migrate data from MySQL to Clickhouse

Create schema of MySQL table for Clickhouse

语法

clickhouse-mysql \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-passowrd> \
--create-table-sql-template \
--with-create-database \
--src-tables=<dbname>.<table-name> > <table-name>.sql

例子

clickhouse-mysql \
--src-host=127.0.0.1 \
--src-user=new_user \
--src-password=My@123 \
--create-table-sql-template \
--with-create-database \
--src-tables=mysql_DB.my_tbl > my_tbl.sql

注意⚠️:Before proceeding next steps, change some schema values like Engine, DataType, and remove params from MergeTree(<param1>,<param2>) as convenient.

把生成的 SQL 语句的最后面的 MergeTree() 括号里面的东西删除了就可以了,只留下 MergeTree()

1.4 Create Table in Clickhouse

clickhouse-client \
--host=<clichouse_host> \
--password=<clickhouse_password> \
-mn < /file/path/to/<table_name>.sql

例子

clickhouse-client \
--host=127.0.0.1 \
--password=My@123 \
-mn < /file/path/to/my_tbl.sql 

If facing the error of merge engine, edit the .sql file and add “ORDER BY (<col name>)” after Engine.

如果报错了,那么需要增加 ORDER BY,例如在 MergeTree() 后面增加 ORDER BY id

1.5 Migrate Table data from MySQL to Clickhouse

语法

clickhouse-mysql \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--migrate-table \
--src-tables=<mysql-dbname>.<mysql-table_name> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-schema <clickhouse-dbname> \
--dst-table <clickhouse-table_name> \
--log-file=/file/path/to/<table_name_migt>.log

举例

clickhouse-mysql \
--src-host=127.0.0.1 \
--src-port=3306 \
--src-user=new_user \
--src-password=My@123 \
--migrate-table \
--src-tables=mysql_DB.my_tbl \
--dst-host=127.0.0.1 \
--dst-password=My@123 \
--dst-schema clickhouse_DB \
--dst-table my_tbl \
--log-file=/file/path/to/my_tbl_migt.log

The above command is connecting to mysql on host ‘127.0.0.1’ with given credentials, it will copy the data from mysql_DB.my_tbl to the clickhouse table clickhouse_DB.my_tbl. Logfile print all logs of migrations.

如果只是复制已有的数据,那么到这里就可以了。

1.6 II - Replicate MySQL Table to Clickhouse (副本)

As the data is migrated to clickhouse, but what about the new data in mysql. For that replication of the mysql table, data is needed to be replicated in the clickhouse database. It's an additional step where any new entry is inserted in mysql, it will automatically reflect in clickhouse if the table is syncing with clickhouse. The below command is used for syncing the table.

Syntax:

clickhouse-mysql \
--src-server-id=<mysql-server-id> \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--src-tables=<mysql_DB>.<my_tbl> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-schema <clickhouse_DB> \
--dst-table <clickhouse_tbl> \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--log-file=/file/path/to/<table_name_repl>.log

Example:

clickhouse-mysql \
--src-server-id=1 \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=127.0.0.1 \
--src-user=new_user \
--src-password=My@123 \
--src-tables=mysql_DB.my_tbl \
--dst-host=127.0.0.1 \
--dst-password=My@123 \
--dst-schema clickhouse_DB \
--dst-table my_tbl \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--log-file=/file/path/to/my_tbl_repl.log

The above command connects the mysql host 127.0.0.1 with the given credentials and replicates the mysql table mysql_DB.my_tbl in the clickhouse table clickhouse_DB.my_tbl

NOTE: To run this command in the background use & after the command, and for any reason to kill this background command, use grep method to find and kill the process id.

$ sudo ps aux | grep clickhouse $ Kill -9

III - Create Schema & Migrate table in Clickhouse

The scenario of creating a mysql table schema and then migrating the table data in clickhouse, it can be done using only one command.

Syntax:

clickhouse-mysql \
--src-server-id=<mysql-server-id> \
--src-wait \
--nice-pause=1 \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--src-tables=<mysql_DB>.<my_tbl> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-create-table \
--migrate-table

Example:

clickhouse-mysql \
--src-server-id=1 \
--src-wait \
--nice-pause=1 \
--src-host=127.0.0.1 \
--src-user=root \
--src-password=root \
--src-tables=myDB.tbl_dnd \
--dst-host=127.0.0.1 \
--dst-create-table \
--migrate-table

The above command will create a table schema and migrate the table data into clickhouse.

IV - Migration & Replication in One command

If it needs to migrate and replicate the mysql table data to clickhouse using one command, we can follow the below command. So with the migration of the table data, it will start replicating the new insertion value of the mysql table into the clickhouse table.

Syntax:

clickhouse-mysql \
--src-server-id=<mysql-server-id> \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--src-tables=<mysql_DB>.<my_tbl> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-schema <clickhouse_DB> \
--dst-table <clickhouse_tbl> \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--migrate-table \
--log-file=/file/path/to/<table_name_migt_repl>.log

Example:

clickhouse-mysql \
--src-server-id=1 \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=127.0.0.1 \
--src-user=new_user \
--src-password=My@123 \
--src-tables=mysql_DB.my_tbl \
--dst-host=127.0.0.1 \
--dst-password=My@123 \
--dst-schema clickhouse_DB \
--dst-table my_tbl \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--migrate-table \
--log-file=/file/path/to/my_tbl_migt_repl.log

Validate the successful data import by counting the records of a table

Running the normal select query in both database mysql & clickhouse, As mysql select query is giving the result in minutes while in clickhouse it returns within a seconds.

MySQL

mysql>​ SELECT COUNT(*) FROM mysql_DB.my_tbl\G *************************** 1. row *************************** COUNT(*): 935623858
 1 row in set (5 min 17.12 sec)

Clickhouse

ClickHouse-LAB :) SELECT COUNT(*) FROM clickhouse_DB.my_tbl\G
SELECT COUNT(*)
FROM clickhouse_DB.my_tbl
Row 1:
────────────
COUNT(): 935623858
1 rows in set. Elapsed: 0.233 sec. Processed 935.62 million rows, 491.10 MB (4.91 billion
rows/s., 4.91 GB/s.)

The above count and timing may vary in the different systems.

方法二:导出 csv 然后导入

来源:https://stackoverflow.com/questions/47708424/importing-from-mysql-dump-to-clickhouse

mysql -u user  -ppass --compress -ss -e "SELECT * FROM table WHERE id >0  AND id <=1000000" db_name | sed 's/\"//g;s/\t/","/g;s/^/"/;s/$/"/' | clickhouse-client --query="INSERT INTO db_name.table FORMAT CSV"

方法三:创建虚拟 table(clickhouse 导入到一半就自动断开了)

来源:https://medium.com/datadenys/how-to-load-data-directly-from-mysql-to-clickhouse-744c02218b49

  1. Create proxy table based on Mysql Engine
CREATE TABLE mysql_data
(
    `entity_id` UInt64,
    `object` Nullable(String),
    `property` Nullable(String),
    `value` Float64,
    `at` DateTime
)
ENGINE = MySQL('127.0.0.1:3306', 'db', 'table', 'usr', 'pwd')
  1. Copy data from Mysql to Clickhouse
CREATE TABLE default.data
(
    `entity_id` UInt64,
    `object` String,
    `property` String,
    `value` Float64,
    `at` DateTime
)
ENGINE = MergeTree
ORDER BY (entity_id, object, property, value, at)
SETTINGS index_granularity = 8192
  1. INSERT INTO data SELECT * FROM mysql_data 这个命令执行了几秒钟就中断了。查了一下,貌似低版本的 clickhouse 不会出现这个问题。
26 浏览
15 爬虫
0 评论