Free Software

Software Engineering with FOSS and Linux

Selecting language of multilingual web sites

Multilingual sites will usually offer a way to their users to switch between languages of the content, either through a link in their pages or through the configuration of user preferences. For first-time visitors, however, a site needs a way to determine their prefered language(s). The standard way to identify this is by inspecting the Accept-Language HTTP header sent to the site by the user’s browser.

According to the HTTP 1.1 Specification, the Accept-Language header can be used to assign a weight to each language, determining the users’ prefered order of natural languages of multilingual content. For example,

Accept-Language: el,en;q=0.5,fr;q=0.4

means that the user prefers Greek content, but if it is not available then English and French are also acceptable, with English having a higher priority.

You can parse the Accept-Language header to determine the appropriate language. Although a simple parsing can be used in most cases, addressing the gritty details of the specification can be a bit tricky. Here is an implementation of the parsing algorithm in PHP. It might be an overkill, but it gives you a pretty good idea:


<?
$default_lang 
"en";

function sort_descending_weights$a$b )
{
    
# Each array element is a (lang,weight) pair
    
if ( $a] != $b] )
    {
        return ( 
$a] < $b] ) ? : -1;
    }

    # If two languages have the same weight, then we might want to impose 
    # our own precedence. Put your own ordering code here. For simplicity, we
    # just assume that the default language takes priority.
    
if ( $a] == $default_lang ) return -1;
    else if ( 
$b] == $default_lang ) return 1;
    else return 
0;
}

function is_language_available$lang )
{
    return 
true;
}

function get_prefered_language( )
{
    global 
$default_lang;
    
    
# If no Accept_Language header exists, use the site's default language
    
if ( !in_array'HTTP_ACCEPT_LANGUAGE'$_SERVER ) ) 
    {
        return 
$default_lang;
    }

    # Parse the header. A * indicates any language not explicitly specified
    
$h $_SERVER'HTTP_ACCEPT_LANGUAGE' ];
    
$list explode","$h );
    if ( 
count$list ) == ) return $default_lang;
    
$prefs = array();
    foreach ( 
$list as $langs )
    {
        
$tmp explode";q="$langs );
        
$lang $tmp];
        
$weight count$tmp ) == 1.0 $tmp];
        
array_push$prefs, array( $lang$weight ) );
    }

    # The specification doesn't enforce weight to be in descending order.
    # Sort the parsed values.
    
usort$prefssort_descending_weights );    

    # Pick an available language. is_language_available() is a stub.
    
foreach ( $prefs as $pref )
    {
        list( 
$lang$weight ) = $pref;
        if ( 
is_language_available$lang ) ) return $lang;
    }
    return 
$default_lang;
}    

echo get_prefered_language();
?>


In Firefox 3.0, users may specify their prefered content languages in the Preferences / Content / Languages Menu:

prefs lang

For ease of use, exact weights don’t need to be specified. When users change the order of the languages in the list, the browser will calculate and send the appropriate weights in the HTTP request.

September 3, 2009 - Posted by | Programming | , , , , ,

4 Comments »

  1. Looks like stupidly and overly verbosely coded. Why do you need a regexp?

    For reference, here’s the above coded in Python:
    def acceptedlangs(http_a_l):
    # “en” is the default language with the lowest weight
    # however, if defined in our input, it’ll be overriden
    langs = { ‘en’: 0 }

    # first construct a dict of lang => weight
    for lang in http_a_l.split(‘,’):
    lang = lang.strip();
    weight = 1

    try:
    (lang, q) = lang.split(‘;’)
    weight = float(q.split(‘=’)[1])
    except ValueError:
    # no weight on that language
    pass

    langs[lang] = weight

    # if language is something like es-BR, also add es
    # down the queue with a weight of 0.1
    if lang.find(“-“):
    baselang = lang.split(‘-‘)[0]
    if baselang not in langs:
    langs[baselang] = 0.1

    # sort the dict by value and return only the keys back
    sortedlangs = sorted(langs, key=langs.__getitem__, reverse=True)
    return sortedlangs

    Comment by Faidon Liambotis | September 3, 2009 | Reply

    • You are right. Updated :P However you might want to adjust your ordering for equal weights, so sorting by weight alone may not be enough.

      Comment by xpapad | September 3, 2009 | Reply

  2. Wonderful goods from you, man. I have have in mind your stuff
    previous to and you are just extremely magnificent. I really like what you’ve got
    here, certainly like what you’re saying and the way in which by which you are saying
    it. You are making it entertaining and you still take care of to
    keep it wise. I can not wait to read far more from you.
    That is really a terrific web site.

    Comment by Anonymous | September 4, 2013 | Reply

  3. Thanks for update. I have created an array for each variable, that contains the translations for each lang in the same order like here: $welcome = array(“hello”,”bonjour”)

    Comment by ganaysa | August 16, 2014 | Reply


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: