Closed Bug 1337069 Opened 8 years ago Closed 6 years ago

Improve the use of Unix API in OSPreferences

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

Details

In bug 1333184 we're introducing OSPreferences API. In this bug we'll work on using more of the Posix API to handle more of the information from the OS.
Depends on: 1333184
Mentor: gandalf
Keywords: good-first-bug
Priority: -- → P3
Hi.. I am a beginner (have worked only on 7-8 moz bugs).
Can I help with this bug?
Flags: needinfo?(gandalf)
Sure!

The first two thoughts I have is that you'll need to investigate what abilities POSIX gives us to learn about user preferences in terms of regional settings and languages.

I know about the "LANG" env, but there seem to be many more POSIX envs like "LOCALE" etc. that maybe should be used by us as precedence.

I also know of at least one user who said that he changed "en_US.UTF8" to "en_US.ISO" in "LANG" env and expected us to show date/time in ISO format.

So, the first task is to read about the standard and how various envs should be used (or if there are any other ways in unix/linux that we should learn about user language and regional settings preferences).

Once you have this, we can talk about how to work with intl/locale/unix/OSPreferences_unix.cpp to implement those customizations.
Flags: needinfo?(gandalf)
I found the following info regarding precedence:

> The values of locale categories are determined by a precedence order; the first condition met below determines the value:
> 
>    1.If the LC_ALL environment variable is defined and is not null, the value of LC_ALL is used.
> 
>    2.If the LC_* environment variable ( LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME) is defined and is not null, the value of the environment variable is used to initialise the category that corresponds to the environment variable.
> 
>    3.If the LANG environment variable is defined and is not null, the value of the LANG environment variable is used.
> 
>    4.If the LANG environment variable is not set or is set to the empty string, the implementation-dependent default locale is used.
> 
> If the locale value is "C" or "POSIX", the POSIX locale is used and the standard utilities behave in accordance with the rules in POSIX Locale , for the associated category. 

(Retrieved from http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html)
Usage info is also present in the above link.

Do let me know if I need to explore further.
Flags: needinfo?(gandalf)
That looks good.

Next question - is ICU's uloc_getDefault [0] doing all the heavy lifting for us here? Does it go through all the LC_* env values?

If so, what's the best way to retrieve the ".%" value (like "en_US.UTF8" vs. "en_US.ISO" vs. "en_US.POSIX"), is it standardized? What the postfix means? Should we alter our date/time formatting based on this? Anything else?


[0] http://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/intl/locale/gtk/OSPreferences_gtk.cpp#19
Flags: needinfo?(gandalf) → needinfo?(swapneshks)
Hi Zibi,

Sorry for the late notification. Actually some important college related work has popped up because of which I won't be able to give much time to moz bugs till next week. I am totally fine if someone wants to work on this bug till then. 
I'll surely continue working on this bug after next week if no one volunteers to work on this by that time.

Thanks.
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #4)
> Next question - is ICU's uloc_getDefault [0] doing all the heavy lifting for
> us here? Does it go through all the LC_* env values?
> 
> If so, what's the best way to retrieve the ".%" value (like "en_US.UTF8" vs.
> "en_US.ISO" vs. "en_US.POSIX"), is it standardized? What the postfix means?
> Should we alter our date/time formatting based on this? Anything else?

After doing a bit of back-tracking (starting from [0]), I found [1] and there it looks like LC_ALL and LC_* are being covered. For going through LC_*, [2] is used.

[3] has info regarding the postfix. I think the default_loc column in [4] might give us some info about formatting.

[0] http://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/intl/locale/gtk/OSPreferences_gtk.cpp#19
[1] http://searchfox.org/mozilla-central/source/intl/icu/source/common/putil.cpp#1489
[2] http://searchfox.org/mozilla-central/source/intl/icu/source/common/putil.cpp#1404
[3] http://searchfox.org/mozilla-central/source/intl/icu/source/common/putil.cpp#1523
[4] http://searchfox.org/mozilla-central/source/intl/icu/source/common/putil.cpp#1494
Flags: needinfo?(swapneshks) → needinfo?(gandalf)
Wondering if for example we should consider using LC_TIME locale for LocaleService::GetRegionalPrefsLocales if `intl.regional_prefs.use_os_locales` is set to true.
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #7)
> Wondering if for example we should consider using LC_TIME locale for
> LocaleService::GetRegionalPrefsLocales if
> `intl.regional_prefs.use_os_locales` is set to true.

I think so. At least that always used to be respected before.
Blocks: 1404166
Depends on: 1394470
Blocks: 1394470
No longer depends on: 1394470
Taking.
Assignee: nobody → gandalf
Status: NEW → ASSIGNED
Flags: needinfo?(gandalf)
Keywords: good-first-bug
There are two pieces to this puzzle. I'm going to tackle them both in this bug, unless the patch grows beyond reason and I may split them then:

1) Following the correct POSIX env conventions
2) Picking up Gnome conventions


The first one is about making decisions around LANG, LC_ALL, LC_MESSAGES, LC_COLLATE, etc.
The latter is about picking up correct Gtk/Gnome settings (see bug 1389972 for example) for regional preferences.
Mentor: gandalf
For the POSIX, we're in the following situation:

=========== POSIX =========================

POSIX defines a set of variables that are meant to instruct software on what locale to use for various operations. In particular to our platform:

 - LC_ALL - catch-all top-priority env variable for use for all localization and internationalization
 - LC_COLLATE - for collation
 - LC_CTYPE - for upper/lower casing
 - LC_MESSAGES - for localization
 - LC_MONETARY - currencies
 - LC_NUMERIC - numbers
 - LC_TIME - date and time
 - LANG - catch-all last-fallback locale to use for localization and internationalization

POSIX recommendation for locale selection recommends the following algorithm for selecting the right locale for an operation:

1) LC_ALL
2) LC_*
3) LANG
4) App dependent default locale (en-US for Gecko)

============== Gecko ============================

How does it apply to Gecko and in particular LocaleService and OSPreferences?

LocaleService manages locale selection for Gecko, and OSPreferences manages locale selection from the OS environment.

OSPreferences maintains two locale lists:

 - SystemLocales              - locales used to localize the operating system in which the app operates
 - RegionalPrefsLocales       - locales selected in the OS to internationalize regional preferences

LocaleService maintains four locale lists:

 - AppLocales                 - locales used to localize the Gecko application
 - AvailableLocales           - locales which are available for selection for the given Gecko application
 - RequestedLocales           - locales which are requested by the user for app localization
 - RegionalPrefsLocales       - locales which are requested by the user for regional preferences internationalization


=========== Alignment ===========================

There are multiple differences between Gecko and POSIX:

1) Gecko operates on locale fallback lists, while POSIX uses a single locale per category
2) POSIX LC_MESSAGES aligns with OSPreferences::SystemLocales
3) POSIX env per intl category: COLLATE, CTYPE, MONETARY, NUMERIC, TIME misalignes with Gecko's OSPreferences::RegionalPrefsLocales

The first two are easy to bind.

1) Since lists superset a single item, we can easily express POSIX data in Gecko with a single element lists.
2) We should follow the POSIX priority fallback in retrieving the OSPreferences::SystemLocales using LC_MESSAGES in step (2)

The third one will require additional changes to our API:

3) Aligning locale per category will require us to (re)introduce the categories as arguments to OSPreferences::GetRegionalPrefsLocales and LocaleService::GetRegionalPrefsLocales.

There are multiple things we'll have to consider here. We can relatively easily extend mozIntl to pick up the right locale for the right formatter:

  - Intl.DateTimeFormat => LC_TIME
  - Intl.NumberFormat => LC_NUMERIC
  - Intl.NumberFormat[type=currency] => LC_MONETARY
  - String.toLowerCase/toUpperCase => LC_CTYPE
  - Intl.Collator => LC_COLLATE

The tricky part will be to make it work for Intl object, rather than mozIntl. It may require us to extend the XPCLocale::LocalizeRuntime to handle a selector for a locale, rather than a single "JS Environment locale", and this comes with a fingerprinting consequence (instead of a single bit of information, a tracker would get a bit per category).
Errata to comment 12. There seems to be even more variables now:

LC_PAPER="en_US.UTF-8" - could be used for printing related UIs if we use Gecko's widgets for that
LC_NAME="en_US.UTF-8" - could be used for formatting of any names (name, last name, title etc.)
LC_ADDRESS="en_US.UTF-8" - address formatting
LC_TELEPHONE="en_US.UTF-8" - phone number formatting
LC_MEASUREMENT="en_US.UTF-8" - measurements (potentially useful for Intl.UnitFormat)
LC_IDENTIFICATION="en_US.UTF-8" - some metadata for address purposes?

Out of those, I see only clear use for LC_MEASUREMENT to be aligned with the upcoming Intl.UnitFormat, but it's good to list them here.
Another piece of data: `LANGUAGE` env variable is provided for gettext apps to take a fallback list of locales for localization purposes. We could use it together with LC_MESSAGES to get a fallback list of SystemLocales.
For Gnome, we're in the following situation:

========== Gnome =============

With the deprecation of Unity we can now focus only on the pure Gnome experience which will be shared by Ubuntu and other Gnome distributions.

Gnome 3.26 provides a single "manual" option to customize - hourCycle 12/24.

Gnome 3.26 also provides a UI for selecting Language and Formats in Settings>Region and Language. It seems that "Language" settings is bound to "LC_MESSAGES" and "Formats" settings is bound to a list of variables "LC_TIME, LC_NUMERIC, LC_MONETARY, etc".

That means that the only option beyond POSIX is the hourCycle option, while everything else is just a UI for selecting POSIX environment variables.


========= Gecko ==============

Gecko allows for retrieving a customized value for date/time formatting pattern via OSPreferences, which allows us to reach to GTK for hourCycle settings.


=========== Alignment ===========================

Gecko currently supports retrieving Gnome's hourCycle settings and applies it onto date/time format patterns used by mozIntl.
Gecko is doing that irrelevant of the language subtag of the locale selected for LC_*.
This is possible because of two reasons:

 - GTK allows us to recognize between "non-set" value (following defaults) and "set" value (user manually selected 12/24).
 - hourCycle is not interfering with localization

I conclude that at the moment there's nothing else we have to do that would be specific to Gnome, and all effort should go into better alignment between Gecko and POSIX.
I'd like to notice, that at the moment, the core expectation of POSIX, which is that we will follow the selected environment variable for localization, is not possible in Gecko localization model.

We started conversations about building a multi-lingual Firefox (bug 1358824) and some conversations about runtime language selection in Firefox (and Firefox for Android) (see bug 1325870 for UX discussion).

This means that at the moment user downloads Firefox in a single language and from our perspective we cannot follow LC_MESSAGES really.

Action plan:

1) Improve the Intl part:

 - extend the GetRegionalPrefsLocales to accept an optional category: DATETIME, NUMBER, MONETARY, COLLATE, MEASUREMENT, CTYPE
 - for UNIX, add ability to retrieve the correct category if provided, using LC_ALL/LC_${category}/LANG fallback
 - change how JS Env locale works to retrieve a correct language, at least for chrome context (content may remain in a single locale for fingerprinting reasons)
   - this will allow us to remove the mozIntl language selection

2) Improve the OSPreferences::SystemLocales part

 - Use LC_MESSAGES and LANGUAGE env. variable to retrieve the fallback list of SystemLocales

This has a caveat. On other platforms, we separate the OSLocale as a locale that the operating system uses, from "Locale that user requested for Firefox". SystemLocales can be used for example by telemetry to say "User is on Windows en-US" or "User is on MacOS de-AT". This is different from stating "User requestes Firefox to be in it-IT", but it doesn't seem like UNIX has such separation available. There's no difference between the two and Gnome is following LC_MESSAGES.

The good news here for us is that one of the primary uses of SystemLocales is for us to be able to pick the default Firefox locale once we gain ability to do so, based on the OSLocale, which happens to align with LC_MESSAGES nicely.

So I conclude that for our purpose, using LC_MESSAGES+LANGUAGE for OSPreferences::SystemLocales is OK, but in the future we may want to introduce a separation (seems like Windows also allow for separating between those two bits, but the UX of that is very messy at the moment).
A correction from :swapneshks analysis from comment 6:

the `uloc_getDefault` which we currently use to retrieve SystemLocales is in fact retrieving a locale for LC_MESSAGES [0], which is correct.

[0] http://searchfox.org/mozilla-central/rev/01970ed92d74f82d4e94a1e4365892bbcc593889/intl/icu/source/common/putil.cpp#1565

We could extend it to retrieve `LANGUAGE` to get a better fallback, but that would be a bit of a stretch, since the env. variable is meant for gettext only.

ICU provides a method to retrieve POSIX env variables for other categories, but we do not currently use them.
The `format` setting in Gnome switches:

LC_NUMERIC
LC_TIME
LC_MONETARY
LC_PAPER
LC_MEASUREMENT

I think it would be a good start to take any of those five as the RegionalLangs from OSPreferences, and the LC_TIME is the most relevant to our codebase.
Depends on: 1409158
Depends on: 1409185
No longer blocks: 1391411
Update - with the last fixes, we now solved the most common scenarios reported for Linux+GTK and Gecko.

There's still an opportunity to do more sophisticated things, but the core is there and I think we don't have to rush to invest more time now into it.

I'll keep the bug open and if any new bugs show up, we'll look into adding features, but for now, I think we're a good citizen of the Gnome ecosystem :)
Let's close it since no new dependencies popped up. Any new bugs should be open separately.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.