
Fixing Japanese Character Sorting in PostgreSQL and Ruby: A Guide to Locale Collation
Table of Contents
ToggleSorting multilingual data, especially when it includes Japanese and English characters, poses unique challenges when relying on default database and language settings. By default, PostgreSQL collation settings and Ruby’s internal sort mechanisms do not handle Japanese character sorting correctly, often resulting in inconsistent or incorrect orderings across different environments.
In one of our projects in Japan, we encountered this exact issue in production: Japanese and English strings of customers’ names were being sorted randomly. The solution required a deep dive into database collation and understanding how Ruby’s .sort compares to ActiveRecord’s .order when it comes to multilingual sorting in PostgreSQL.
Problem Use Case: Incorrect Sorting of Mixed Japanese and English Characters
When using the default locale settings, Japanese (ja_JP) and English (en_EN) characters were not sorted correctly in our production environment. Ruby’s .sort method also produced unexpected results, as it does not consider locale-aware collation. The root cause of this issue stemmed from database-level collation settings, which were not optimized for multilingual sorting.
Solution: Setting DB Collation for Locale Support
What is DB Collation?
Database collation defines how text data is sorted and compared in a database. It determines the order of characters based on language and locale rules. For example, English and Japanese have different sorting rules, and using the wrong collation can lead to incorrect ordering in queries.
To correctly sort Japanese and English characters together, we need to configure PostgreSQL’s collation settings. The following steps illustrate how to install and set the proper locale, and update the settings.
Step 1: Install Japanese Locale on the System
Ensure that the necessary Japanese locale is installed and configured:
Step 2: Verify Current PostgreSQL Collation
Run the following command to check your database’s current collation settings:
If it’s not ja_JP.UTF-8, update your database to fix Japanese-English sorting issues.
Step 3: Update Collation for Japanese Character Sorting
To modify the collation settings, follow these steps:
This ensures proper database collation for Japanese and English characters combined.
⚠️ Troubleshooting: Potential Errors
- In case, you face any role connection error like below
- psql: error: connection to server on socket “/var/run/postgresql/.s.PGSQL.5432” failed: FATAL: role “username” does not exist
- Solution: Execute above commands with prefix $ sudo -u database_username
- For such icu locale provider specific error
- ERROR: invalid LC_COLLATE locale name: “ja_JP.utf8”
- HINT: If the locale name is specific to ICU, use ICU_LOCALE.
- Solution: Use below command instead of mentioned above to create a database as PostgreSQL 13+ gives an option of ICU locale support and not libc by default.
Ruby .sort vs ActiveRecord .order: Understanding the Difference
After resolving collation issues at the database level, it is important to understand how Ruby handles sorting at the application level.
.sort Method in Ruby
Ruby’s .sort is used to order an array in ascending order but does not leverage database-level collation settings. Instead, it sorts using Ruby’s internal comparison mechanisms.
Example:
The output is based on Ruby’s internal string comparison, which does not always follow linguistic rules.
.order Method in ActiveRecord
Unlike .sort, ActiveRecord’s .order method sorts records at the database level using the configured collation settings.
Example:
Since .order runs at the database level, it ensures proper sorting based on the locale settings, making it the recommended approach for multilingual sorting.
To configure and debug collation settings in PostgreSQL, you can also use following visual tools rather than command-line:
- DBeaver (GUI-based database management tool)
- pgAdmin (Web-based PostgreSQL management tool)
Conclusion
To sort Japanese characters in a database accurately, configure PostgreSQL collation with either ja_JP.UTF-8 or an ICU locale such as ja-JP. Doing this ensures consistent multilingual sorting in PostgreSQL.
Also, prefer using ActiveRecord .order instead of Ruby .sort to delegate sorting responsibility to the database engine, which is now locale-aware.
Ready to fix your sorting issues?
Don’t let incorrect character ordering affect your user experience. Configure PostgreSQL with the right locale and optimize your Ruby app for multilingual sorting.
Need help implementing locale-aware collation in your project? Contact us to talk to our experts today.
⚡ Bonus Tip: Double-check your application’s sorting logic by comparing it with Google Sheets or Excel’s default sorting behaviour to spot anomalies.
By setting up proper locale collation and understanding how Ruby ActiveRecord methods work, you’ll resolve issues around character sorting in your application.