When using MySQL as you database engine, either locally or on the cloud (AWS RDS), there might be UnicodeEncodeError
raising from MySQL, rather than from Python/Django:
Internal Server Error: /url/path/operation/id
UnicodeEncodeError at /url/path/operation/id
'ascii' codec can't encode characters in position 109-112: ordinal not in range(128)
This is due to the charset and collation problem on MySQL database on the server side. For example, if you deployed your website to AWS EC2, the default character set and collation are latin1
and latin1_swedish_ci
. Any character sending from the client side with Unicode encoding, MySQL will raise the above error. To resolve this problem, we need to change the encoding to utf8/utf8mb4
. utf8mb4
(including astral symbols) is highly recommended to avoid any Unicode errors, though it is a little slower than utf8
.
Before doing anything operation to your MySQL database, BACKUP! BACKUP! BACKUP! .
Assume that, you are using MySQL on you EC2 instance, rather than the RDS service, to save money. You should use SSH to connect to your AWS EC2 instance first for could hosting first:
ssh -i your-pem-file.pem uname@your.domain.com
Then, backup your database.
mysqldump -u root -p dbName > dbName_bak_date_keywords.sql
To check the character set of your MySQL database, use the following command:
mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
You will be prompt with the following table
+--------------------------+-------------------+
| Variable_name | Value |
| ------------------------ | ----------------- |
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+--------------------------+-------------------+
10 rows in set (0.00 sec)
If this is your case, you need to change them to support Unicode. To make the change, you will be using the following ALTER TABLE
command.
# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a VARCHAR
column.
After applying to each table, you should be getting the following table:
+--------------------------+-------------------+
| Variable_name | Value |
| ------------------------ | ----------------- |
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci|
| collation_database | utf8mb4_unicode_ci|
| collation_server | utf8mb4_unicode_ci|
+--------------------------+-------------------+
10 rows in set (0.00 sec)
Please be aware, the character_set_filesystem
and character_set_system
cannot be changed. Of course, you can use utf8
for some settings, as long as you are sure there is no astral symbols involved.
Similarly, a very long VARCHAR
column may need to be changed to one of the longer TEXT
types if you want to convert it from utf8
to utf8mb4
.
InnoDB has a maximum index length of 767
bytes, so for utf8
or utf8mb4
columns, you can index a maximum of 255
or 191
characters, respectively. If you currently have utf8
columns with indexes longer than 191
characters, you will need to index a smaller number of characters. In an InnoDB table, these column and index definitions are legal:
col1 VARCHAR(500) CHARACTER SET utf8, INDEX (col1(255))
To use utf8mb4
instead, the index must be smaller:
col1 VARCHAR(500) CHARACTER SET utf8mb4, INDEX (col1(191))
The preceding types of changes are most likely to be required only if you have very long columns or indexes. Otherwise, you should be able to convert your tables from utf8
to utf8mb4
without problems.