Closed Bug 382398 Opened 17 years ago Closed 14 years ago

checksetup.pl localized messages should be output in the console's charset

Categories

(Bugzilla :: Installation & Upgrading, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 4.0

People

(Reporter: vitaly.fedrushkov, Assigned: mkanat)

References

Details

(Keywords: intl)

Attachments

(1 file, 3 obsolete files)

Problem running checksetup.pl from non UTF-8 capable console. We have messages.html.tmpl in UTF-8 which is right, but windows people (besides Cygwin bash users) do use different text charsets -- for example, Windows-1251 here in Russia. Keeping messages in different charsets within single file is not good. [based on bug 352608 comment 3]
As a note, in case I don't fix this -- the solution is for Bugzilla::Install::Util::install_string to encode things into the console charset if the console charset is not UTF-8.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
Summary: localized checksetup.pl charset → checksetup.pl localized messages should be output in the console's charset
Target Milestone: --- → Bugzilla 3.2
Keywords: l12y
Keywords: l12yintl
How about to check 'LANG' shell environment? If user uses cmd.exe of Windows, we can assume that user can use utf-8. On others, i think we should consider that shell cannot display utf-8 when $ENV{LANG} doesn't include '.UTF-8'.
No, you can just use the POSIX locale functions for it, and they should return something sensible on Windows, I think. The only difficult part is that POSIX locales don't map to Encode's understanding of character sets, necessarily.
(In reply to comment #2) > If user uses cmd.exe of Windows, we can assume that user can use utf-8. Wrong assumption: russian Windows uses codepage 1251 fot text windows. Cygwin bash works well however.
So we're talking about binmode(STDOUT, ":encoding($charset)"); binmode(STDERR, ":encoding($charset)"); here, and we're trying to find a way to determine $charset?
(In reply to comment #5) > So we're talking about > > binmode(STDOUT, ":encoding($charset)"); > binmode(STDERR, ":encoding($charset)"); > > here, and we're trying to find a way to determine $charset? If :encoding($charset) will properly translate utf-8 into that charset, then yeah. I'm not sure if the POSIX locale functions work on Windows or not, but if they do, that would possibly give us the info we need on all platforms. Otherwise there might be some Win32:: function we can use.
Can we rely on console windows using True Type fonts? Then we could enforce codepage 65001.
I don't think so. Even worse, I suspect codepages may not be a subset of cp 65001.
Yeah, I'm pretty sure all Windows consoles use bitmap fonts by default.
Workaround, tested on Russian Windows: Select Lucida Console as cmd window font Run chcp 65001 before checksetup.pl
Bugzilla 3.2 is restricted to security bugs only. Moreover, this bug is either assigned to nobody or got no traction for several months now. Rather than retargetting it at each new release, I'm clearing the target milestone and the bug will be retargetted to some sensible release when someone starts fixing this bug for real (Bugzilla 3.8 more likely).
Target Milestone: Bugzilla 3.2 → ---
Severity: normal → enhancement
Target Milestone: --- → Bugzilla 3.8
Attached patch v1 (obsolete) (deleted) — Splinter Review
Okay, this does it. I didn't test it on Windows, but I did test that the POSIX::setlocale function works on Windows (which it does). If you try to print out a character that your encoding doesn't support, Perl throws warnings.
Assignee: installation → mkanat
Status: NEW → ASSIGNED
Attachment #434739 - Flags: review?(LpSolit)
To work correctly, it requires the patch from bug 550765.
Depends on: 550765
Comment on attachment 434739 [details] [diff] [review] v1 This is a huge improvement over what we have currently, but there are still a few bits which are displayed incorrectly, see the output of checksetup.pl below, with french templates installed. Problems are: 1) Wide character in print at Bugzilla/Install/Requirements.pm line 340. Vérification des modules Perl DBD disponiblesâ?¦ 2) ATTENTION : Vous devez définir le paramètre max_allowed_packet dans votre configuration MySQL à au moins 3276750. Actuellement, il est défini à 1048576. Vous pouvez définir ce paramètre dans la section [mysqld] de votre fichier de configuration MySQL. ----------- C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\bugzilla>checksetup.pl * Bugzilla 3.7 avec Perl 5.10.1 * sur Win7 Build 7100 Vérification des modules Perl. Vérification de CGI.pm (v3.33) ok: v3.45 trouvé Vérification de Digest-SHA (tout) ok: v5.48 trouvé Vérification de TimeDate (v2.21) ok: v2.24 trouvé Vérification de DateTime (v0.28) ok: v0.53 trouvé Vérification de DateTime-TimeZone (v0.79) ok: v1.11 trouvé Vérification de DBI (v1.41) ok: v1.609 trouvé Vérification de Template-Toolkit (v2.22) ok: v2.22 trouvé Vérification de Email-Send (v2.16) ok: v2.198 trouvé Vérification de Email-MIME (v1.861) ok: v1.863 trouvé Vérification de Email-MIME-Encodings (v1.313) ok: v1.313 trouvé Vérification de Email-MIME-Modifier (v1.442) ok: v1.444 trouvé Vérification de URI (tout) ok: v1.52 trouvé Wide character in print at Bugzilla/Install/Requirements.pm line 340. Vérification des modules Perl DBD disponiblesâ?¦ Vérification de DBD-Pg (v1.45) non trouvé Vérification de DBD-mysql (v4.00) ok: v4.011 trouvé Vérification de DBD-Oracle (v1.19) non trouvé Les modules Perl suivants sont optionnels : Vérification de GD (v1.20) ok: v2.44 trouvé Vérification de Chart (v2.1) ok: v2.4.1 trouvé Vérification de Template-GD (tout) ok: v1.56 trouvé Vérification de GDTextUtil (tout) ok: v0.86 trouvé Vérification de GDGraph (tout) ok: v1.44 trouvé Vérification de XML-Twig (tout) ok: v3.34 trouvé Vérification de MIME-tools (v5.406) ok: v5.427 trouvé Vérification de libwww-perl (tout) ok: v5.829 trouvé Vérification de PatchReader (v0.9.4) ok: v0.9.5 trouvé Vérification de perl-ldap (tout) ok: v0.39 trouvé Vérification de Authen-SASL (tout) ok: v2.13 trouvé Vérification de RadiusPerl (tout) ok: v0.17 trouvé Vérification de SOAP-Lite (v0.710.06) ok: v0.710.10 trouvé Vérification de JSON-RPC (tout) ok: v0.96 trouvé Vérification de Test-Taint (tout) ok: v1.04 trouvé Vérification de HTML-Parser (v3.40) ok: v3.64 trouvé Vérification de HTML-Scrubber (tout) ok: v0.08 trouvé Vérification de Email-MIME-Attachment-Stripper (tout) ok: v1.316 trouvé Vérification de Email-Reply (tout) ok: v1.202 trouvé Vérification de TheSchwartz (tout) non trouvé Vérification de Daemon-Generic (tout) non trouvé Vérification de mod_perl (v1.999022) non trouvé *********************************************************************** * MODULES OPTIONNELS * *********************************************************************** * Certains modules Perl ne sont pas indispensables pour Bugzilla, * * mais en installant la dernière version, vous pourrez accéder à des * * fonctionnalités supplémentaires. * * * * Les modules optionnels que vous n'avez pas installés sont listés * * ci-dessous, avec le nom de la fonctionnalité qu'ils activent. Sous * * ce tableau se trouvent les commandes pour installer chaque module. * *********************************************************************** * MODULE NAME * ENABLES FEATURE(S) * *********************************************************************** * TheSchwartz * File d'attente de courrier * * Daemon-Generic * File d'attente de courrier * * mod_perl * mod_perl * *********************************************************************** * Note pour les utilisateurs Windows * *********************************************************************** * Pour installer les modules listés ci-dessous, vous devez d'abord * * exécuter la commande suivante en tant qu'administrateur : * * * * ppm repo add theory58S http://cpan.uwinnipeg.ca/PPMPackages/10xx/ *********************************************************************** COMMANDES POUR INSTALLER LES MODULES OPTIONNELS : TheSchwartz: ppm install TheSchwartz Daemon-Generic: ppm install Daemon-Generic mod_perl: ppm install mod_perl Reading ./localconfig... OPTIONAL NOTE: If you want to be able to use the 'difference between two patches' feature of Bugzilla (which requires the PatchReader Perl module as well), you should install patchutils from: http://cyberelk.net/tim/patchutils/ Vérification de DBD-mysql (v4.00) ok: v4.011 trouvé Checking for MySQL (v4.1.2) ok: found v5.5.1-m2-community ATTENTION : Vous devez définir le paramètre max_allowed_packet dans votre configuration MySQL à au moins 3276750. Actuellement, il est défini à 1048576. Vous pouvez définir ce paramètre dans la section [mysqld] de votre fichier de configuration MySQL. Suppression des modèles compilés existants. Précompilation des modèles.terminé. Checking for GraphViz (any) ok: found
Attachment #434739 - Flags: review?(LpSolit) → review-
Unless there is a a technical limitation, we should really take it for 3.6. Else the output is unreadable, all lines beings of the form: Vérification des modules Perl� Vérification de CGI.pm (v3.33) ok: v3.45 trouvé Vérification de Digest-SHA (tout) ok: v5.48 trouvé Vérification de TimeDate (v2.21) ok: v2.24 trouvé
Flags: blocking3.6?
Target Milestone: Bugzilla 3.8 → Bugzilla 3.6
It's too much of an enhancement and refactoring at this point to take for 3.6. Bug 550765 should resolve the issues with checksetup.pl, provided that the templates are stored in UTF-8 and the user's terminal encoding is UTF-8 (which should be the most common encoding for modern terminals).
Flags: blocking3.6? → blocking3.6-
Target Milestone: Bugzilla 3.6 → Bugzilla 3.8
(In reply to comment #16) > templates are stored in UTF-8 and the user's terminal encoding is UTF-8 (which > should be the most common encoding for modern terminals). It's not on Windows, which is what comment 15 is about.
Ahh. Well, that's been a problem for quite some time (since Bugzilla 3.2), and it's what this bug is about. This patch affects every command-line script in Bugzilla, though, not just checksetup, so I don't want to mess around with that while we're in an RC stage. FWIW, there are many languages (Russian, CJK, anything that isn't ISO-8859-1) that will do nothing but throw warnings on Windows's default charset, so checksetup.pl will become entirely a string of warnings. I think that's not a safe thing to do post-RC, also, but it's probably OK for 3.8 because we will have some time to test and get feedback and see if it really is a problem in practical situations.
Attached patch v2 (obsolete) (deleted) — Splinter Review
Okay, I figured it out. There were two problems: 1) We were calling init_console twice, which was leading to double-encoding characters. 2) We didn't set encoding() on STDERR.
Attachment #434739 - Attachment is obsolete: true
Attachment #440395 - Flags: review?(LpSolit)
Without your patch, the output on Windows 7 is: * Bugzilla 3.7 avec Perl 5.10.1 * sur Win7 Build 7600 V├®rification des modules PerlÔǪ V├®rification de CGI.pm (v3.33) ok: v3.48 trouv├® With your patch: * Bugzilla 3.7 avec Perl 5.10.1 * sur Win7 Build 7600 VÚrification des modules Perlà VÚrification de CGI.pm (v3.33) ok: v3.48 trouvÚ This is only a slightly better, but all letters with accents are still rendered incorrectly.
Hum, despite the shell uses cp1252, the last few lines of checksetup.pl are displayed correctly when using cp850.
(In reply to comment #20) >> VÚrification des modules Perlà > VÚrification de CGI.pm (v3.33) ok: v3.48 trouvÚ I can't reproduce this issue. Using the current French templates, the lines appear correctly for me using Windows's default terminal settings. Do you have something unusual about your terminal configuration? I do see a problem with a single message in checksetup.pl--the one printed about the DBD modules. But that's it.
Also, you might want to try throwing some debug code into set_output_encoding to see what Bugzilla thinks your terminal's encoding is. Mine says cp1252.
Okay, so the problem that I was experiencing (and possibly that you were experiencing as well) is that CGI.pm sets binmode on STDOUT, but only on Windows! I'm going to report it to them as a bug.
I've reported the CGI.pm bug here: https://rt.cpan.org/Ticket/Display.html?id=57524
Attached patch v3 (obsolete) (deleted) — Splinter Review
Okay, this works around the CGI.pm bug. Calling set_output_encoding over and over is harmless, because it does nothing if the output encodings are already correct.
Attachment #440395 - Attachment is obsolete: true
Attachment #445575 - Flags: review?(LpSolit)
Attachment #440395 - Flags: review?(LpSolit)
Hum, this change doesn't help. The output remains the same.
I added some debug code into set_output_encoding() as follows: sub set_output_encoding { # If we've already set an encoding layer on STDOUT, don't # add another one. my @stdout_layers = PerlIO::get_layers(STDOUT); print "\nSTDOUT layers are " . join("/", @stdout_layers) . "\n"; return if grep(/^encoding/, @stdout_layers); my $encoding; my $locale = setlocale(LC_CTYPE); print "LC_CTYPE = $locale\n"; if ($locale =~ /\.([^\.]+)$/) { $encoding = $1; print "found encoding $encoding\n"; if (ON_WINDOWS) { $encoding = "cp$encoding"; print "Windows detected. Setting encoding to $encoding\n"; } } $encoding = Encode::resolve_alias($encoding) if $encoding; print "encoding alias is $encoding\n"; ... } And now the output of checksetup.pl becomes: C:\Program Files\Bugzilla\bugzilla>..\perl\perl\bin\perl.exe checksetup.pl -t STDOUT layers are unix/crlf LC_CTYPE = French_Switzerland.1252 found encoding 1252 Windows detected. Setting encoding to cp1252 encoding alias is cp1252 * Bugzilla 3.7 avec Perl 5.10.1 * sur Win7 Build 7600 VÚrification des modules Perlà STDOUT layers are unix/crlf LC_CTYPE = French_Switzerland.1252 found encoding 1252 Windows detected. Setting encoding to cp1252 encoding alias is cp1252 VÚrification de CGI.pm (v3.33) ok: v3.48 trouvÚ STDOUT layers are unix/crlf/encoding(cp1252)/utf8 VÚrification de Digest-SHA (tout) ok: v5.48 trouvÚ STDOUT layers are unix/crlf/encoding(cp1252)/utf8 VÚrification de TimeDate (v2.21) ok: v2.24 trouvÚ Is the mix encoding(cp1252)/utf8 expected?
Have you tried explicit 'chcp 65001' before checksetup.pl? Any changes in output?
(In reply to comment #29) > Have you tried explicit 'chcp 65001' before checksetup.pl? Any changes in > output? What's that?
chcp returns 850, despite LC_CTYPE says 1252.
Oh, and chcp 65001 before checksetup.pl has no effect, with or without the patch applied.
(In reply to comment #32) > Oh, and chcp 65001 before checksetup.pl has no effect, with or without the > patch applied. I take that back. I changed the font used by cmd.exe to Lucida, and now your trick works great, without mkanat's patch!
Ohhh, I think maybe we have to use a different function to get the console encoding, on Windows. I know what it is, I'll provide another patch and see if it makes a difference.
Attached patch v4 (deleted) — Splinter Review
Okay, this patch uses OutputCP instead of setlocale, now, on Windows. Does this fix your problem?
Attachment #445575 - Attachment is obsolete: true
Attachment #445601 - Flags: review?(LpSolit)
Attachment #445575 - Flags: review?(LpSolit)
As I said in comment 33, I see no difference now that I set the font to Lucida, so I cannot review your patch as "ok, this fixes my problem". Vitaly, does this patch help in your case?
Attachment #445601 - Flags: review?(LpSolit) → review?(timello)
Attachment #445601 - Flags: review?(timello) → review+
Comment on attachment 445601 [details] [diff] [review] v4 It works! I tested it using cp1252. I printed some portuguese words with accents which were written in UTF-8. They all were printed the way they should be. I suppose it will work for other languages too.
Flags: approval?
Flags: approval? → approval+
Committing to: bzr+ssh://bzr.mozilla.org/bugzilla/trunk/ modified Bugzilla.pm modified checksetup.pl modified Bugzilla/Install/Requirements.pm modified Bugzilla/Install/Util.pm Committed revision 7257.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Keywords: relnote
Added to the release notes in bug 604256.
Keywords: relnote
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: