MySQL feature request/proposition: Croatian utf8 collation (utf8_croatian_ci)
We use MySQL database for pretty much everything now days. It’s de-facto standard for horizontally scaled web sites and it’s used by biggest players in the industry. But one thing that is lacking, and which is very important for our regional market is proper Croatian collation support for utf8 charsets. Without it, MySQL server can’t be considered choice for eg. government migration to open-source platform in near future.
We tried implementing it on our own for couple of times, but without any luck. The problem lies in fact that Croatian language (Serbian and Bosnian too) have digraph characters (single characters consisted of two characters – lj, nj and dž). And without proper support for those, we will never be able to sort things right (a-b-c-č-ć-d-dž-đ-…i-j-k-l-lj-m-n-nj-…u-v-z-ž)
What does it take to implement Croatian utf8 collation? It takes modifying source code beyond our knowledge (we tried creating new collation with Vietnamese as a base for digraphs as a pair of basic latin letter + accented Latin letter).
AFAIK the countries which would benefit from the same implementation (alongside Croatia) are: Bosnia, Serbia (for latin charset) and Monte Negro (for latin charset). So please, if you can – spread the word! I think that support for this would be appreciated by thousands of MySQL developers in our region who are now forced to use hacks from ’90 to get correct sort order. :)
I’ve submitted S4 feature request to MySQL – http://bugs.mysql.com/44523 and
I’ve posted a feature request/proposition on official MySQL dev forum, so we will see what happens. It certainly wouldn’t harm if you would sign in to bugs.mysql.com and MySQL dev forum and reply to my feature request and topic with “Yes please” or something similar. It’s free, and it can make difference. :)
5 thoughts on “MySQL feature request/proposition: Croatian utf8 collation (utf8_croatian_ci)”
April 28, 2009 at 15:15
The solution already exists here; the only issue is “dž”, as only two-byte combinations are allowed. I have personally tested the solution and it works as advertised (apart the abovementioned “dž”, which was not an issue in my case).
April 28, 2009 at 16:45
Hi Berislav,
Look who created that thread (me) and problems we’ve encountered with this solution – http://forums.mysql.com/read.php?103,192187,216993#msg-216993
Believe me, that’s not working as we would want it to. The only solution is to implement proper support.
April 29, 2009 at 05:44
I agree and added a comment on your feature request – first but I hope not the only one :)
April 29, 2009 at 08:21
much obliged puzz!