百度智能云

All Product Document

          Relational Database Service

          Use of utf8mb4 Character Set on the Baidu AI Cloud

          Background

          In the creation of instance on the Baidu AI Cloud, the character set by default is utf8. In fact, MySQL's "utf8" character set is not UTF-8 character set in the real sense, because it only supports one character with up to three bytes, and the UTF-8 character set in the real sense can support up to four bytes.

          MySQL "utf8" character set is a sort of "Exclusive Code", there are few Unicode characters it can encode, so storage of data with this character set is relatively constrained.

          MySQL5.5 and later versions have issued a "utf8mb4" character set to support the UTF-8 character set in the real sense, and "utf8mb4" is compatible with MySQL's utf8 character set.

          Failure Phenomenon

          When the user inserts "emoji" emoticon, there is an error report as follows:

          Incorrect string value: ‘\xF0\x9F\x98\x83 <…’ for column ‘xxxxx’ at row 1

          Analysis of cause

          If there is an error report "incorrect string", it is because some special characters need to be stored in utf8's superset "utf8mb4 character set", so we need to make sure that MySQL client, database connection, and "emoji" emoticon objects (database, table, field) awaiting storage all support "utf8mb4" character set.

          Solution

          1. First, database, table, and field should all support "utf8mb4" character set:

          (1) modify the database-level character set :

          alter database db_name default character set utf8mb4 ;

          (2) modify the database-level character set:

          alter table tb_01 character set utf8mb4;

          Checksum table values must keep consistent before and after the table-level modification to character set:

          image.png image.png

          (3) Modify the field-level character set:

          alter table tb_01 change column1 column 1 varchar(50) character set utf8mb4;

          Checksum table values must keep consistent before and after the field-level modification to character set:

          1. If you want to correctly write and read characters of "utf8mb4" character set, you need to:

          (1) Specify "utf8mb4" character set when establishing the connection:

          set names utf8mb4;

          (2) Modify character set stored in MySQL database:

          Modify character-set-server = utf8mb4, default-character-set = utf8mb4后。 The two parameters do not take effect after the MySQL instance is restarted.

          Conclusions & suggestions

          1. At the inception phase of RDS creation, strongly recommend you to define an appropriate database character set, table character set, and field character.
          2. In a given table in the service uses the data in which each character exceeds 3 bytes, such as emoji emoticon. Recommend you to use the "utf8mb4" character set directly.
          Previous
          MySQL5.7 New Feature - GeneratedColumn
          Next
          Error Occurred When Importing the GeneratedColumn Using the mysqldump