Language support in phpBB 2.2
Posted on phpBB 2.2 support forum 05-Sep-03
Hi,
This is my first post here, so first of all I'd like to thank all phpBB
developers and supporters for a great work. I especially like the new
pricing scheme you've announced...
I use phpBB 2.0.4 on web site the serves an Israeli hiking group that I
also manage. It's a small message board by any standard, but it serves
as a crucial information center for the group.
I set up the board to work Hebrew, and it works nicely, but the task
wasn't as smooth and easy as it could. Partly because of missing
features in the code, partly because of inadequate support files such
as templates and translations.
My intention isn't to criticize - on the contrary I wish to describe a
few problems and how they can fixed in 2.2, for the benefit of all
Right-to-Left (RTL) languages, not just Hebrew but also Arabic and
maybe others. For some of the issues I may be able to contribute
knowledge and actual work. This is a long (but IMHO interesting) post,
so take a deep breath or just skip on to something else...
1. Multiple languages (with different directionality) in the
same board.
By this I mean not user interface in different languages, but forums
and topics that use different languages. Note the only way to support
multiple languages in a web page is using UTF-8 (there's only one
charset declaration for the entire HTMl page).
What I wish is to have a language attribute per the entire board, per
each forum and maybe even per each topic. Forums default the board's
language, and topics default to the forum's language. The default UI
language (eg. for guests and new users) is of course the baord's
language, but can be changed by users at will.
When a user enters a forum, the forum is displayed using the forum's
language, including proper layout (LTR or RTL). The UI layout and
language don't change, only the layout of the table that contains the
forum's data.
When a user posts a new topic, the default language of
the topic is the forum's language, but the user may change the language
(if permitted by configuration setting). When a user replies to an
exisiting topic, the default language is the topic's
language, but again if permitted for the forum the user may change the
language.
When a topic is displayed, the layout (LTR or RTL) for the topic is
determined by the language of the first message. The contents of
messages in other lanuages is displyaed with the correct directionality
for each languages, but withing the layout determined by the first
message.
I implemented something that more or less works along these lines,
using a smart but not very elegant trick. I add a special character
combination to a forum's name which determines whether the forum is LTR
or RTL. Same for topic titles. Then I've hooked a piece of code that
checks for this signature and modified the rendering of the contents.
You may wish to look at this on my site http//hug-elad.org/forum - note
that most of the forums are in Hebrew, which will probably look like
gibberish to you. There are two (almost empty) English forums at the
bottom. Look at a message in a Hebrew forum and then at a message in an
English forum, at see how the layout is switched. Also note the forum
names in the jumpbox contain ~R~ or ~L~
that signify either an RTL or an LTR forum (it works at the topic level
too).
What's needed to implement this:
- A language setting field in the board's main config record.
- A language field in the forum record (default = 0 = main board
lang).
- A language field in the topic header record (default = 0 = forum
lang).
- A language field in the message header record (default = 0 =
topic lang).
- Of course add language configuration / selection at the proper
screens.
- In new message forum - change of language should reload the form
using the new language and layout.
2. Improve templates
The current templates do not fully accoutn for directionality. Even
though directionality itself is included, right/lft alignment is in
many cases hard coded, making the output mixed up. Not much to say
except:
- Add language layout directives (ie. DIR=) to all the relevant
templates. Note that not just to set the page layout (that more or less
works today), but also at the level of fourm / topic tables.
3. Take into account UTF-8 string lengths!
I converted the Hebrew language to UTF-8, to make phpBB compatible to
the rest of my site, and also to be ready in case I need to support
other languages (eg. Arabic, Russian).
It works, but there are problems with string length limits. Most
language encodings use 1 byte per character, so it's easy to match the
input size limit to the column size in the database.
However UTF-8 uses a variable number of bytes per character, usually
1-2 but with Chinese/Japanese/Korean even up to 4 (or is it 6) bytes
per character.
Currently phpBB code and DB schemas don't take this into account. In
one case I typed a very long topic title, which was within the character
limit, but in UTF-8 it exceeded the byte limit. This led
to a corruption of the topic and I had some hard time fixing it with
phpMyAdmin.
What's needed to correcct this:
- Increase sizes of string columns in the DB schema. Some databases
do not recognize the distinction between chars and bytes, so there's no
escape from just making the columns a bit wider - I suggest by 40-60%.
- In forms, if the current encoding is UTF-8, limit the size of
input strings to half of the size declared in the DB. While it's true
that there's no way to predict exactly how many bytes will be used, a
ratio of 2 bytes per char is will prevent many ocerruns.
- It may be wise to derive the bytes per char ration from the board
main language setting mentioned in the previous point. For English and
most Latin languages it's 11, Hebrew and Arabic are 21 or 31,
Chinese/Japanese will probably be 41 I guess, etc.
- Make sure there are no buffer overruns. I'm not sure if the
corruption I experienced is due to a DB bug or a PHP code bug, but the
entire supply chain should be checked - for security reasons as well.
4. Enforce English / LTR layout in some places
The previous points were all about flexibility of using different
languages and layouts, but in some places (mostly admin stuff),
non-English languages and non-LTR layout may cause serious problems.
Why?
Most boards are installed on hosted web sites, where the board admin
has no control at all about the underlying locale settings, filesystem
character support, etc. So admins must be very careful not to use for
example file and directory names that contain exotic characters.
It's perfectly possible to type English text (eg. file name etc.) into
an input file even when the page is in Hebrew. However, due to the BiDi
algorithm at work on the browser side, the text doesn't appear as it
should. This is crucial for paths and file names, but other stuff may
get confused as well. For example
In LTR (normal) layout: forum/mydir/
In RTL layout will show /forum/mydir
In LTR (normal) layout 1+2=3
In RTL layout will show 3=1+2
Note how the slash in the end is shown as if it's in the beginning of
the path string, while it's actually still in the end. Even for me, an
experienced and knowledgeable user, this is confusing and causing
mis-typing.
By the way, as it is today the admin CP is language aware but almost
completely layout UNaware - which causes even more
serious confusion. For example radio selections are inverted (looks
like Yes is selected while actually it's No,
etc.). Imagine what this can do for settings such as "Board Active"...
Possible approaches:
- Alternative #1 Lock certain pages to English, or at least to LTR
layout, regardless of the user's and/or the board's default language.
This user is usually the admin or a moderator, and they should be able
to handle some English.
- Alternative #2 (recommended) Lock language and layout only for
specific input fields, which are deemed as prone to BiDi typing
mistakes, such as path and file names, email, password, etc. In this
case, the admin CP templates also should also be ensured to include
layout support.
That't it, at least for now.
Thank you for the time and
attention to read all this.
I hope there will be a fruitfull discussion following this message, and
I expect that whoever is in charge will instruct how to submit those
enhancements request to the developers.
As I said I'm willing to help - mostly I can help with translation,
fixing templates, testing, etc.
Regards,
E.Z