Henrik Gemal

Comment 5

•

23 years ago

"Content-type" should be "Content-Type" "Content-transfer-encoding" should be "Content-Transfer-Encoding"

Tobias Burnus

Reporter

Comment 6

•

23 years ago

Attached patch defparams.pl patch (v3): Send per default the encoding for HTML (header) and for emails (obsolete) (deleted) — Details — Splinter Review

> "Content-type" should be "Content-Type" > "Content-transfer-encoding" should be "Content-Transfer-Encoding" Fixed. I should really go to bed ... ("Obsolet" marking is bug 97729 by the way)

Tobias Burnus

Reporter

Updated

•

23 years ago

Keywords: patch, review

Dave Miller [:justdave]

Comment 7

•

23 years ago

Comment on attachment 70148 [details] [diff] [review] defparams.pl patch (v3): Send per default the encoding for HTML (header) and for emails Can't use a META tag for content-type. It's broken and makes Bad Things happen in Netscape 4.x. Need to actually send a charset parameter on the Content-Type header being spit out in the HTTP. Please see the discussion on bug 38856 for why this was refused entry the last time it was presented, and take anything into consideration from that bug that we need to do to keep everyone happy.

Attachment #70148 - Flags: review-

Dave Miller [:justdave]

•

23 years ago

Attached patch Bigger patch for text/html (v4) (obsolete) (deleted) — Details — Splinter Review

This patch addresses the problems by replacing the print "Content-Type: text/html\n\n" by a function. This patch was _not_ thoroughly tested, the %...% part in the email settings is untested (to come...). In additionally the documentation (3.5.5) needs to be updated if this is checked in.

Tobias Burnus

Reporter

Comment 9

•

23 years ago

Attached patch Bigger patch for text/html (v5) (obsolete) (deleted) — Details — Splinter Review

Now tested. Changes to previous version HTMLencoding -> encoding (since used by mail and needed for %encoding% substitution %encoding% substition works now

Tobias Burnus

Reporter

Comment 10

•

23 years ago

Attached patch Bigger patch for text/html (v5) (obsolete) (deleted) — Details — Splinter Review

Now tested. Changes to previous version - HTMLencoding -> encoding (since used by mail and needed for %encoding% substitution) - %encoding% substition works now (in the email params)

Tobias Burnus

Reporter

Updated

•

23 years ago

Keywords: patch, review

Tobias Burnus

Reporter

•

23 years ago

Attached patch Bigger patch for text/html (v7) (obsolete) (deleted) — Details — Splinter Review

minor changes to PutHTMLContentType (I confess I missed to initialise a variable using ''; plus: Call Param('encoding') only once).

Tobias Burnus

Reporter

Comment 18

•

23 years ago

Attached patch Bigger patch for text/html (v8) (obsolete) (deleted) — Details — Splinter Review

Rediff after relogin.cgi and defparams.pl had been changed.

Tobias Burnus

Reporter

Comment 19

•

23 years ago

I think this would greatly accompany bug 126456 (2.16 blocker/"Fix our error handling"). Regarding this: Would it make sense to set $vars->{'header_done'} in the PutHTMLContentType once bug 126456 is checked in or is this the wrong place to do so?

Severity: normal → major

Gervase Markham [:gerv]

Comment 20

•

23 years ago

No, vars->{'header_done'} should be set only after the global/header template has been printed. I'm still not convinced about the way you are doing things in this bug, but I need more time to look at it to work out why. ;-) Gerv

Tobias Burnus

Reporter

Comment 21

•

23 years ago

--- post_bug.cgi 2002/02/05 00:20:08 1.39 |+++ post_bug.cgi 2002/02/24 15:43:09 |-print "Content-type: text/html\n\n"; |+pPutHTMLContentType(); s/pP/P/ > I'm still not convinced about the way you are doing things in this bug, but I > need more time to look at it to work out why. ;-) Hmm. I thought it wasn't that bad ;-) As long as you can come up with something else with sends the email and the HTML pages with right charset, I'm fine with that.

Gervase Markham [:gerv]

Comment 22

•

23 years ago

Comment on attachment 71202 [details] [diff] [review] Bigger patch for text/html (v8) This is the way I think it should work. The function should be called SendHTTPHeader(), and take an array of strings, including a Content-Type. It should print them all, in the order given, with \n separating, but if it spots an HTML Content-Type, it should slyly insert the charset into it. It prints \n\n at the end. This seems to me to be a much cleaner interface, and it works for different content-types too, and is extensible. Gerv

Attachment #71202 - Flags: review-

Tobias Burnus

Reporter

Comment 23

•

23 years ago

Attached patch Bigger patch for text/html (v9) (obsolete) (deleted) — Details — Splinter Review

Fix pPut... error and rediff after xml.cgi and userprefs.cgi got checked in.

Dave Miller [:justdave]

Comment 24

•

23 years ago

*** Bug 128609 has been marked as a duplicate of this bug. ***

Dave Miller [:justdave]

Comment 25

•

23 years ago

altering the summary of this bug to more closely match what the patch is actually accomplishing.

Summary: Bugzilla should send encoding ISO-8859-1 per default → Allow administrator to set charset encoding for pages and email

Tobias Burnus

Reporter

Comment 26

•

23 years ago

Attached patch v10: Encoding patch for mail and html (obsolete) (deleted) — Details — Splinter Review

-needs work- Goal: - Provide new function "PutHTMLContentHeader" which sends the content-transfer-encoding for HTML - Provide option sending emails with content-transfer-encoding 8bit or quoted-printable (This can be set via the editparams.cgi) Done: - Options are in defparams.pl - PutHTMLContentType is used. - Default email setting uses MIME with %encoding% and %transportencoding% - The email body is either send as 8bit or quoted-printable Todo: - Honour RFC 2047 for the encoding of the header - Use MIME encoding and other features for the other emails which presently are not affected by editparams.cgi - Check whether we need to change something for 16bit characters. I fear that MIME:QuotedPrint doesn't do the right thing in this case. - Do some clean up - Testing: I haven't yet tested the changes between v9 and v10. I'd be glad if someone could assure me that I'm on the right road.

Gervase Markham [:gerv]

Comment 27

•

23 years ago

burnus: if you disagree with my assessment of how this would most cleanly be implemented (as written in comment #22), could you at least say why? :-) Gerv

Tobias Burnus

Reporter

Comment 28

•

23 years ago

Attached patch v11/v1h: Encoding patch for HTML (obsolete) (deleted) — Details — Splinter Review

I split the two areas mail and html output. This only contains the changes needed for HTML and tries to addess all issues given in comment #22. The only difference is that a "SendHTTPHeader()" is equivalent to a SendHTTPHeader("Content-Type: text/html"). I think this patch is rather clean and independed of addressing the email encoding. Checking this in first, reduces the size of the more complicated email patch. (Does someone know a lightwight perl implementation for the encoding of the header of emails? I have an slight idea how to write it, but it is going to be ugly and lengthy :-( > burnus: if you disagree with my assessment of how this would most cleanly be > implemented (as written in comment #22), could you at least say why? :-) Well the reason is simple: I overlooked this comment :-(

Gervase Markham [:gerv]

Comment 29

•

23 years ago

Comment on attachment 72387 [details] [diff] [review] v11/v1h: Encoding patch for HTML >+# This sends a HTTP header >+# It takes an list as argument and prints them \n separated "a list as an argument" :-) >+# If it finds "Content-Type: text/html" and the param "encoding" is set >+# it adds the charsetencoding >+# If called without an argument it assumes that "Content-Type: text/html" is >+# ment. "meant". >+sub SendHTTPHeader(@){ >+ my $header = join("\n",@_); >+ my $encoding = Param('encoding'); >+ if($header eq "") { >+ $header = "Content-Type: text/html"; >+ } $header ||= "Content-Type: text/html" is neater :-) >+DefParam("encoding", >+ "Character encoding used for the HTML documents. (This should match the encoding used by the database.)", >+ "t", >+ 'iso-8859-1'); >+ Please default this to nothing. See long arguments in other bugs for the reason. Other than those nits, r=gerv :-) Gerv

Attachment #72387 - Flags: review+

Tobias Burnus

Reporter

Comment 30

•

23 years ago

Attached patch 72387: v12/v2h: Encoding patch for HTML (obsolete) (deleted) — Details — Splinter Review

Fixed the issues which have been risen in comment 29.

Gervase Markham [:gerv]

Comment 31

•

23 years ago

Comment on attachment 72860 [details] [diff] [review] 72387: v12/v2h: Encoding patch for HTML r=gerv. Gerv

Attachment #72860 - Flags: review+

Gervase Markham [:gerv]

•

22 years ago

*** Bug 152190 has been marked as a duplicate of this bug. ***

robinson

Comment 39

•

22 years ago

Attached patch Refresh of patch 77394 to apply to 2_16-BRANCH (obsolete) (deleted) — Details — Splinter Review

I've refreshed this again for the benefit of people who will be running 2.16, as well as for the benefit of anyone who wants to review it for inclusion in 2.17 when it opens (hint, hint).

Attachment #77394 - Attachment is obsolete: true

Bradley Baetz (:bbaetz)

Updated

•

22 years ago

Blocks: 160096

Bradley Baetz (:bbaetz)

Comment 40

•

22 years ago

*** Bug 160097 has been marked as a duplicate of this bug. ***

Myk Melez [:myk] [@mykmelez]

Updated

•

Comment 77

•

22 years ago

Comment on attachment 72387 [details] [diff] [review] v11/v1h: Encoding patch for HTML one year worth of bitrot...

Attachment #72387 - Flags: review-

Dave Miller [:justdave]

Comment 78

•

•

•

21 years ago

:: sigh :: :) This is very high on my list, btw, we'll have this in 2.20 or bust. :)

Whiteboard: i18n

Target Milestone: Bugzilla 2.18 → Bugzilla 2.20

Nicolas Mailhot

Comment 138

•

21 years ago

BTW not properly encoded headers are one criterium filters like amavisd-new use to detect spam. This means if you use strict filters they'll also eat some of your bugzilla trafic (this is bad) I vote for UTF-8 everywhere with database conversion on upgrade. Other encodings might work better with old tools but variable encoding cause so many problems everywhere it's not even fun to write about here. One encoding to rule them all, period. Screw ancient tools - if they can't grok UTF-8 that means they also need massive amounts of handholding everywhere to work anyway.

Dave Miller [:justdave]

Comment 139

•

•

20 years ago

*** Bug 256665 has been marked as a duplicate of this bug. ***

:glob ✱

Assignee

Comment 166

•

20 years ago

Attached patch updated patch (obsolete) (deleted) — Details — Splinter Review

ok, here's a different patch: - updated against the tip - uses CGI's charset() method, which simplifies things a lot - charset is always utf-8, so the param is now a boolean - param defaults to false (for existing installs), with checksetup setting it to true for new installs todo: - encode email headers as per rfc 2047 - encode email body as base64 (depending on percentage of non-8-bit chars) - set accept-encoding attribute on all forms (template function?) - add test to ensure accept-encoding attribute is always present for a new bug (imho): - migration tools for existing installs

Christian Reis

Comment 167

•

20 years ago

How about splitting the email bits out into a separate bug to make it easier to get traction on this one?

:glob ✱

Assignee

Comment 168

•

20 years ago

Attached patch utf-8 patch with initial email support (obsolete) (deleted) — Details — Splinter Review

here's *preliminary* email encoding, with the following issues: - not applied to all instances of sendmail - need to move the call to Param - doesn't encode to/from/reply-to - needs loads of testing :) but it's a start.

:glob ✱

Assignee

Updated

•

20 years ago

Attachment #70148 - Attachment is obsolete: true

Attachment #70262 - Attachment is obsolete: true

Attachment #70270 - Attachment is obsolete: true

Attachment #70271 - Attachment is obsolete: true

Attachment #70410 - Attachment is obsolete: true

Attachment #70804 - Attachment is obsolete: true

Attachment #71202 - Attachment is obsolete: true

Attachment #71859 - Attachment is obsolete: true

Attachment #72246 - Attachment is obsolete: true

Attachment #72387 - Attachment is obsolete: true

Comment 178

•

20 years ago

I don't know where this bug is going to, but let me repeat one of the two points from the initioal report: "Presently the bugzilla webpages don't contain an encoding header." Doing a "grep header *.pl *.cgi", I see many occurrences of "print $cgi->header();". There's nothing wrong with it, except that the CGI.pm documentation says: "The -charset parameter can be used to control the character set sent to the browser. If not provided, defaults to ISO-8859-1." Currently Firefox 1.0 (just to name one example) still thinks pages are ISO-8859-1, and I'll have to switch the charset for every page I visit to UTF-8. Well someone stated bugzilla uses UTF-8 internally, and exclusively. But please: Why don't you tell the browser? If this sounds easy to fix, would you please change those "$cgi->header()" to "$cgi->header('-charset' => 'utf-8')" in the near future? Did I miss something important? Sorry if I sound a bit impatient after almost two years...

Anne (:annevk)

Comment 179

•

20 years ago

Perhaps it would help if you read the bug. Especially the parts about legacy content and such.

Ulrich Windl

Comment 180

•

20 years ago

(In reply to comment #179) > Perhaps it would help if you read the bug. Especially the parts about legacy > content and such. If you are starting with a new and empty bugzilla and (just for example) German translation, all the "Umlaute" are mis-displayed. This has nothing to do with legacy contents. The longer you wait to decide on the right character encoding, the more "legacy content" you will get. To make things worse (or maybe better) Microsoft's Internet Explorer sets the page encoding (automatically?) to UTF-8, so even when I enter the same words on the same machine with two different browsers, they will use two different character encodings. This has nothing to do with legacy contents. Content is already misdisplayed _now_. Despite of that you could make the character encoding a configurable parameter, so those who thing that everything is OK with ISO-8859-1 can leave everything as it is now. To summarize: It is a bug to use UTF-8 character coding in the HTML when saying the page is encoded as ISO-8859-1 in the HTTP header.

Brodie

•

20 years ago

> You meant IDN in the domain-name part of an email address? no, as part of the real name. eg. From: =?UTF-8?B?RnLDqWTDqXJpYyBCdWNsaW4=?= <lpsolit@gmail.com> i think we should ignore IDNs for now. i've started working on getting this patch up to date and in a more workable state.

Jungshik Shin

•

20 years ago

Attached patch utf-8 v3 (obsolete) (deleted) — Details — Splinter Review

updated utf-8 patch. this patch: - adds a boolean utf-8 parameter, which is enabled by default on new installs, and disabled by default on existing installs if utf-8 is enabled : - page's charset is set to utf-8 - encoding attribute added to xml pages - all emails are encoded: - email charset is set to utf-8 - subjects are utf-8 quotedprint'ed if required - name component of email addresses is qp'ed if required - the body is encoded as: - quotedprint if less that 50% characters require encoding - base64 otherwise

Comment 200

•

20 years ago

(In reply to comment #198) > A while back someone proposed on IRC that we turned on UTF-8 for every bug ID > that was greater than a certain number. That's not new :-) It was first proposed by Markus Kuhn in 2002(?) and I repeated it a couple of times here (e.g. see comment #195 point #2 ). Anyway, it'll be my last comment here on the migration. If there's anything new to add, I'll add in bug 280633

Gervase Markham [:gerv]

Comment 201

•

20 years ago

Anne: that breaks things when you have content from multiple bugs on the same page, such as buglists or longlist.cgi output. I think a Bugzilla needs to be either UTF-8, or a specific charset, or "no charset" (undefined, as now). Having it as > 1 specific charset, or part as some charset and part as no charset sounds like a nightmare. Gerv

David Baron :dbaron:

Comment 202

•

20 years ago

(In reply to comment #201) > Anne: that breaks things when you have content from multiple bugs on the same > page, such as buglists or longlist.cgi output. So does the current solution of allowing any encoding. Sending pages that have content from multiple bugs as UTF-8 and sending all bugs with ID > current bug number as UTF-8 seems like a reasonable start to me. We'll stop accumulating content of unknown encoding in new bugs, and there will still be a way to view the content on the older bugs (by viewing the bug as its own page) if there's some content that isn't UTF-8.

Jungshik Shin

•

20 years ago

i'll do an updated patch when i have the chance, however i can answer a few of your queries now: > >+ $value =~ s/[\r\n]+$//; > > I think that any given header should only have one line-ending, right? on windows, get() was returning the fields with CRLF, so chomp wasn't stripping them. looking at the code, maybe i don't need to do that anymore. i'll have a play. > >+ push @addresses, '=?UTF-8?Q?' . encode_qp($name) . '?= <' . $addr->address . '>'; > > Names can have commas in them. Does this deal with that? yes. Mail::Address->parse() returns an array of addresses, splitting in the correct location. however i just realised that Mail::Address->name() flips the order of comma seperated names. ie. "jones, byron" becomes "byron jones". i should be using phrase(). > Also, does this QP encode the entire name? no, only characters that require QP'ing > >+ $changed = 1; > >+ } else { > >+ push @addresses, $addr->format; > > Why do we even call format on the address, if we haven't changed it? Couldn't > we just output it as a raw string? (Or is there some other problem with that > that I'm not aware of?) there may be more than on address on the field, so we can't use the raw string. > >+ $self->charset(Param('utf8') ? 'UTF-8' : ''); > > I think that usually the charset is lowercase, in HTTP headers. ok, i'll make it lowercase > >+<?xml version="1.0" [% IF Param('utf8') %]encoding="UTF-8" [% END %]standalone="yes" ?> > > And the same, here. Although here I'm pretty sure it doesn't matter. it's case insentitive, but the xml specs use uppercase, so that's what most people use.

Status: NEW → ASSIGNED

:glob ✱

Assignee

Comment 206

•

20 years ago

Attached patch utf-8 v4 (obsolete) (deleted) — Details — Splinter Review

this version addresses issues raised. notes: in the parameter description i didn't want to include the bug number, as that's more for the documentation, and it would be confusing if the local bugzilla install had a bug number 280633 i've set MIME::Parser to not use temp files. while the MIME::Parse docs indicate there's a performance hit, as we only parse the header, the temporary objects are always empty. i've added "sender" and "errors-to" to the list of fields to encode email addresses on. the other two x- headers are added by mail servers, so there's no reason to check them here.

:glob ✱

Assignee

Updated

•

20 years ago

Attachment #173337 - Attachment is obsolete: true

Attachment #173713 - Flags: review?

Brodie

Comment 207

•

20 years ago

Would there be a performance gain to check for 7-bit clean outside calling the encode_message function? Would that save copying the (header, body) pair a number of times? e.g. ($header, $body) = encode_message($header, $body) if Param('utf8'); becomes # make sure there's work to be done if (Param('utf8') and (!is_7bit_clean($header) or !is_7bit_clean($body))) { ($header, $body) = encode_message($header, $body); }

Cédric Caron

Comment 208

•

20 years ago

The full name of the administrator user created by checksetup need to be converted to UTF-8 if this name contains non ASCII chars.

Cédric Caron

Updated

•

20 years ago

Blocks: 280905

:glob ✱

Assignee

Comment 209

•

20 years ago

(In reply to comment #208) > The full name of the administrator user created by checksetup need to be > converted to UTF-8 if this name contains non ASCII chars. that's tricky as i can't tell what charset the console is running in. how about i update checksetup to only allow 7-bit clean characters in the admin name, with a comment saying that once bugzilla is running the name can be updated via the webpages?

Max Kanat-Alexander

Comment 210

•

20 years ago

(In reply to comment #209) > how about i update checksetup to only allow 7-bit clean characters in the admin > name, with a comment saying that once bugzilla is running the name can be > updated via the webpages? I think that's an acceptable solution. Just put the is_7bit_clean function in Bugzilla::Util, and don't "require" Bugzilla::Util until you need it. (Don't "use" it -- that will break checksetup. But you probably know that. :-)) Of course, I think you can pull out the "locale" information from the console, somehow. You could preserve that environment variable, the same way that we currently preserve $ENV{'PATH'}. I'm not sure it would work on Win32, though.

Cédric Caron

Comment 211

•

20 years ago

why not using the utf8 perl suport ? http://search.cpan.org/dist/perl/lib/utf8.pm

Christian :Biesinger (don't email me, ping me on IRC)

•

20 years ago

Attached patch utf-8 v5 (obsolete) (deleted) — Details — Splinter Review

adds the administrator name checking to checksetup, and the optimisation suggested by brodie.

Attachment #173713 - Attachment is obsolete: true

Attachment #174022 - Flags: review?

Håvard Wigtil

Comment 216

•

20 years ago

Note that this patch doesn't apply for BugMail.pm cleanly due to one line changed in bug 280973. Sorry for not providing an updated patch, but it's a bit hard to do on the setup I'm on right now.

:glob ✱

Assignee

Comment 217

•

20 years ago

Attached patch utf-8 v6 (obsolete) (deleted) — Details — Splinter Review

fixed bitrot; thanks Håvard

Attachment #174022 - Attachment is obsolete: true

:glob ✱

Assignee

Updated

•

20 years ago

Attachment #174022 - Flags: review?

:glob ✱

Assignee

Updated

•

20 years ago

Attachment #174458 - Flags: review?

Håvard Wigtil

Comment 218

•

20 years ago

This patch does not apply cleanly for me against yesterday's CVS, there are problems in checksetup.pl and BugMail.pm. It seems to be simple bitrot issues, but I can't get at CVS from here to fix them properly at the moment.

:glob ✱

Assignee

Comment 219

•

20 years ago

Attached patch utf-8 v7 (obsolete) (deleted) — Details — Splinter Review

Comment 223

•

20 years ago

Comment on attachment 176018 [details] [diff] [review] utf-8 v7 >Index: checksetup.pl >+ # As it's a new install, enable UTF-8 >+ SetParam('utf8', 1); I'm not sure if the new admin check is the best place to check this. Lots of folks upgrading from 2.16 or earlier are going to get nailed with this dialog even when it's not a new install because they twiddled with the bits on their admin account. We should try to find some other way to ensure that it's a new install. >Index: Bugzilla/CGI.pm >+ $self->charset(Param('utf8') ? 'utf-8' : ''); Nit: 'UTF-8' should be all uppercase in the header, to follow RFC 3629 section 8. (technically the field isn't case-sensitive, but since it's defined that way in the RFC we should follow it) Rest of this looks good to me. Find a better way to detect a new install and this has an r+ from me.

Attachment #176018 - Flags: review? → review-

:glob ✱

Assignee

Comment 224

•

20 years ago

Attached patch utf-8 v8 (obsolete) (deleted) — Details — Splinter Review

improved "new install" detection -- if we have to create data/nomail, it's a new install.

Attachment #176018 - Attachment is obsolete: true

Attachment #177578 - Flags: review?

Dave Miller [:justdave]

Comment 225

•

20 years ago

Comment on attachment 177578 [details] [diff] [review] utf-8 v8 ok, all code style and architecture nits addressed, actually tried testing it now... The summary encoding is not happening correctly. I'm not sure what's wrong with it, but Eudora is showing decoded summaries with an extra = on the end, and Thunderbird is outright refusing to decode them. Subject: =?UTF-8?Q?[Bug 579] This is a s=C3=BCmm=C3=A1ry= ?=

Attachment #177578 - Flags: review? → review-

Dave Miller [:justdave]

Comment 226

•

20 years ago

We need to require MIME::Base64 v3.03 also. MIME::Tools doesn't explicitly prereq it, but it won't install due to test failures if you don't have at least that version. It'll be fewer tech support problems for us if we just outright require it to save people from getting the install errors on MIME::Tools.

:glob ✱

Assignee

Comment 227

•

20 years ago

Attached patch utf-8 v9 (obsolete) (deleted) — Details — Splinter Review

fixes subject encoding requires MIME::Base64 (version 3.01 on windows, 3.03 on unix)

Attachment #177578 - Attachment is obsolete: true

Attachment #177580 - Flags: review?

Dave Miller [:justdave]

Comment 228

•

20 years ago

Comment on attachment 177580 [details] [diff] [review] utf-8 v9 woot!

Attachment #177580 - Flags: review? → review+

:glob ✱

Assignee

Comment 229

•

20 years ago

Attached patch utf-8 v10 (obsolete) (deleted) — Details — Splinter Review

<justdave> hmmm..... <justdave> actually, can we swap the order of MIME::Tools and MIME::Base64 in the modules list? <justdave> MIME::Base64 is the prereq and since people tend to work top to bottom...

Attachment #177580 - Attachment is obsolete: true

Attachment #177581 - Flags: review?

Dave Miller [:justdave]

Updated

•

20 years ago

Attachment #177581 - Flags: review? → review+

Dave Miller [:justdave]

Comment 230

•

20 years ago

woot! woot!!

Flags: approval+

David Baron :dbaron:

Comment 231

•

20 years ago

(In reply to comment #225) > The summary encoding is not happening correctly. I'm not sure what's wrong > with it, but Eudora is showing decoded summaries with an extra = on the end, > and Thunderbird is outright refusing to decode them. > > Subject: =?UTF-8?Q?[Bug 579] This is a s=C3=BCmm=C3=A1ry= ?= If my memory is correct, that form is not allowed to have spaces within it. All the words that are ASCII should be passed through outside the =?-escaped form, and all the words that are not ASCII should be escaped separately. For the details, see ftp://ftp.rfc-editor.org/in-notes/rfc2047.txt (which I haven't really looked at while writing this comment; the above is from memory).

David Baron :dbaron:

•

20 years ago

> Do we not have more XML outputs than just show.xml.tmpl which need the > encoding defined? ahhh, you're correct. template/en/default/bug/show.xml.tmpl template/en/default/config.rdf.tmpl template/en/default/list/list.rdf.tmpl template/en/default/list/list.rss.tmpl template/en/default/reports/duplicates.rdf.tmpl > Surely the best test for a new install is if we are creating localconfig no, because when we create localconfig, data/params hasn't been created; it's created in the second phase of first-time checksetup. > creating the database? i normally manually create an empty database before kicking off checksetup, as i have to set the access permissions for the bugzilla account anyhow. so i have new installs with an existing, but empty, database.

Gervase Markham [:gerv]

Comment 237

•

20 years ago

> no, because when we create localconfig, data/params hasn't been created; it's > created in the second phase of first-time checksetup. Right then - so let's do it in the second phase, when we create data/params. Gerv

:glob ✱

Assignee

Comment 238

•

20 years ago

i've hit a snag. even if the header is folded correctly, Mail::Mailer strips \n\s* from the lines, removing the folding, so lines can break rfc by exceeding the max length. grr

Christian :Biesinger (don't email me, ping me on IRC)

•

20 years ago

Depends on: 287064

Ulrich Windl

Comment 241

•

20 years ago

(In reply to comment #240) I don't quite understand some parts: +sub encode_qp_words($) { + my ($line) = (@_); + + my $line = encode_qp($line, ''); + $line =~ s/ /=20/g; Shouldn't you replace SPC with '_'? + return "=?UTF-8?Q?$line?="; Is this an unconditional return? Looks like it. Will the rest ever be considered? + + my @encoded; + foreach my $word (split / /, $line) { Are there any SPCs left? + if (!is_7bit_clean($word)) { + push @encoded, '=?UTF-8?Q?' . encode_qp($word, '') . '?='; + } else { + push @encoded, $word; + } + } + return join(' ', @encoded); +}

:glob ✱

Assignee

Comment 242

•

20 years ago

Comment on attachment 178110 [details] [diff] [review] utf-8 v11 > + $line =~ s/ /=20/g; > > Shouldn't you replace SPC with '_'? the rfc allows for _ or =20 : The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be represented as "_" (underscore, ASCII 95.) > + return "=?UTF-8?Q?$line?="; > > Is this an unconditional return? Looks like it. > Will the rest ever be considered? d'oh, those three lines are debug code and shouldn't be there. thanks for pointing that out.

Attachment #178110 - Attachment is obsolete: true

Attachment #178110 - Flags: review?

:glob ✱

Assignee

Comment 243

•

20 years ago

Attached patch utf-8 v12 (obsolete) (deleted) — Details — Splinter Review

Mail::Mailer version 1.67 fixes the bugs that were stopping us from using it. This patch bumps up the minimum version, and addresses the other outstanding issues.

Attachment #179244 - Flags: review?

Frédéric Buclin

Updated

•

20 years ago

Blocks: 281522

:glob ✱

Assignee

Updated

•

20 years ago

Blocks: 287684

:glob ✱

Assignee

Updated

•

20 years ago

Blocks: 287682

:glob ✱

Assignee

Comment 244

•

20 years ago

note that it's still possible for us to generate emails with lines greater than 75 characters, if the subject doesn't contain any spaces we don't have a point to wrap it at. i know how to fix this, but it's a fair amount of work, so i'd prefer for that to be covered in another bug. note that the current bugzilla code can also generate >75 char lines as there's no checks in place to stop this .. for example if the url is too long, eg "http://you-havent-visited-editparams.cgi-yet/userprefs.cgi" the "Configure bugmail" line in the message footer will be more than 75 characters.

Kelley Cook

Comment 245

•

20 years ago

(In reply to comment #244) > note that it's still possible for us to generate emails with lines greater than > 75 characters, if the subject doesn't contain any spaces we don't have a point > to wrap it at. Why would you consider this to be a problem? From RFC 2822: 2.1.1. Line Length Limits There are two limits that this standard places on the number of characters in a line. Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters, excluding the CRLF. So IMO, you are following the spirit of the RFC and are wrapping when possible; sometimes as you pointed out that is not possible. I would find the alternative of MIME encoding the subject lines that are >75chars to be a much worse solution as I would have to assume that the only mail agents still susceptible to being bit by the 78 character recommended limit to be extremely old and therefore wouldn't understand MIME encoding anyway.

Stephen Lee

Comment 246

•

20 years ago

(In reply to comment #245) > > [...] generate emails with lines greater than 75 characters > Why would you consider this to be a problem? > From RFC 2822: [snip] See RFC 2047, specifically (from its section 2): An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used. While there is no limit to the length of a multiple-line header field, each line of a header field that contains one or more 'encoded-word's is limited to 76 characters. Current patch would fail to meet this requirement if someone creates a summary with too many consecutive non-spaces so that an 'encoded-word' longer than 75 characters is created (which mail programs etc. may not recognise). Solution (as described in the RFC) is to break up the text into smaller chunks creating multiple encoded-word entities each <= 75 characters, but this can just as well be done after this patch lands.

Myk Melez [:myk] [@mykmelez]

Comment 247

•

Marc Schumann [:Wurblzap]

Comment 253

•

19 years ago

*** Bug 298243 has been marked as a duplicate of this bug. ***

Dave Miller [:justdave]

Comment 254

•

19 years ago

OK, we've branched, and the trunk is open. Let's get this thing reviewed and landed! :)

Marc Schumann [:Wurblzap]

Comment 255

•

19 years ago

Comment on attachment 179244 [details] [diff] [review] utf-8 v12 Hit by bitrot, but trivial unrotting -- r=wurblzap on an unrotted patch. In a follow-up bug, we need to find a way to stop substr() from splitting UTF-8 characters in half :/ Glitches in standards compliance should imho be handled in post-checkin fixes. Tested on Windows, using smtp and testfile as mail_delivery_method. Couldn't get my hands on MIME-tools 5.417, but it works for me with 5.411a just as well. Works for newchangedmail, passwordmail, flag mail. Tested both quoted-printable and base64 encodings (forced base64 by turning the 8-bit-content check around). Tested 7-bit-only mails. Let's do it :)

Attachment #179244 - Flags: review? → review+

Dave Miller [:justdave]

Comment 256

•

19 years ago

*** Bug 175782 has been marked as a duplicate of this bug. ***

Dave Miller [:justdave]

•

19 years ago

Checking in checksetup.pl; /cvsroot/mozilla/webtools/bugzilla/checksetup.pl,v <-- checksetup.pl new revision: 1.420; previous revision: 1.419 done Checking in defparams.pl; /cvsroot/mozilla/webtools/bugzilla/defparams.pl,v <-- defparams.pl new revision: 1.163; previous revision: 1.162 done Checking in Bugzilla/BugMail.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/BugMail.pm,v <-- BugMail.pm new revision: 1.42; previous revision: 1.41 done Checking in Bugzilla/CGI.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/CGI.pm,v <-- CGI.pm new revision: 1.18; previous revision: 1.17 done Checking in Bugzilla/Util.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Util.pm,v <-- Util.pm new revision: 1.34; previous revision: 1.33 done Checking in template/en/default/config.rdf.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/config.rdf.tmpl,v <-- config.rdf.tmpl new revision: 1.5; previous revision: 1.4 done Checking in template/en/default/bug/show.xml.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/show.xml.tmpl,v <-- show.xml.tmpl new revision: 1.8; previous revision: 1.7 done Checking in template/en/default/list/list.rdf.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.rdf.tmpl,v <-- list.rdf.tmpl new revision: 1.5; previous revision: 1.4 done Checking in template/en/default/list/list.rss.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.rss.tmpl,v <-- list.rss.tmpl new revision: 1.4; previous revision: 1.3 done Checking in template/en/default/reports/duplicates.rdf.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/reports/duplicates.rdf.tmpl,v <-- duplicates.rdf.tmpl new revision: 1.2; previous revision: 1.1 done

Status: ASSIGNED → RESOLVED

Closed: 19 years ago

Resolution: --- → FIXED

Tim Thome

Comment 259

•

19 years ago

FYI - if a site admin changes CGI.pm to reflect UFT for the UTF-8 security fixes, a CVS update will fail. bugzilla/docs/html/security-bugzilla.html should be changed to reflect the fixes for this change. -------------- <<<<<<< CGI.pm # Make sure that we don't send any charset headers $self->charset('UTF-8'); ======= # Send appropriate charset $self->charset(Param('utf8') ? 'UTF-8' : ''); >>>>>>> 1.18 -------------- thx tim

Dave Miller [:justdave]

•

19 years ago

(In reply to comment #261) > Wasn't this checked in on trunk-only - therefor it is Bugzilla version 2.22 or > later? True. > This sentence doesn't read correctly to me... Ok. I'm no native speaker -- please give me a good sentence, and I'll put it into a patch.

Colin Ogilvie [:cso]

Comment 263

•

19 years ago

(In reply to comment #262) > Ok. I'm no native speaker -- please give me a good sentence, and I'll put it > into a patch. Apparently it does make sense to others... so just fix the first bit :)

Marc Schumann [:Wurblzap]

Comment 264

•

19 years ago

Attached patch Documentation patch 1.2 (deleted) — Details — Splinter Review

Attachment #192066 - Attachment is obsolete: true

Attachment #195374 - Flags: review?(documentation)

Colin Ogilvie [:cso]

Comment 265

•

19 years ago

Comment on attachment 195374 [details] [diff] [review] Documentation patch 1.2 r=me by inspection....

Attachment #195374 - Flags: review?(documentation) → review+

Marc Schumann [:Wurblzap]

Comment 266

•

•

19 years ago

Added to the Bugzilla 2.22 Release Notes in bug 322960.

Keywords: relnote

Renat Sabitov

Comment 275

•

19 years ago

There is some issues when I use bugzilla with utf-8. With Mysql db: 1) Mysql connection should be utf-8, this is enabled by this patch: Index: Bugzilla/DB/Mysql.pm =================================================================== RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Mysql.pm,v retrieving revision 1.36 diff -r1.36 Mysql.pm 70a71,76 > $self->do ("set session character_set_results=utf8"); > $self->do ("set session character_set_client=utf8"); > $self->do ("set session character_set_connection=utf8"); > $self->do ("set session character_set_database=utf8"); > $self->do ("set session character_set_server=utf8"); > 2) Summary field in search results (aka bug list) get trimmed right between bytes in unicode character, and lenght of a Summary is about 30 characters for Russian, not 60. Before "..." you can see bad symbol. See patch: Index: Bugzilla/Template.pm =================================================================== RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Template.pm,v retrieving revision 1.41 diff -r1.41 Template.pm 272a273,275 > > my $utf8_string = $string; > utf8::decode ($utf8_string); 274c277 < return $string if !$length || length($string) <= $length; --- > return $string if !$length || length($utf8_string) <= $length; 277c280,283 < my $newstr = substr($string, 0, $strlen) . $ellipsis; --- > my $newstr = substr($utf8_string, 0, $strlen) . $ellipsis; > > utf8::encode ($newstr); 3) Comments wrapped not as Unicode text, but as byte coded. As result lenght of each string of Russian text is about 40 characters, not 80. Patch for this: Index: Bugzilla/Util.pm =================================================================== RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Util.pm,v retrieving revision 1.45 diff -r1.45 Util.pm 30a31 > use utf8; 230a232,233 > utf8::decode($comment); > 247a251 > utf8::encode($wrappedcomment); 5) Search with Russian text is impossible, only ASCII. With Postgresql db, created with "-E UNICODE": 1) Perl strings not marked unicode strings, perl not use utf-8. Patch: Index: Bugzilla.pm =================================================================== RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla.pm,v retrieving revision 1.29 diff -r1.29 Bugzilla.pm 26a27 > use encoding 'utf8'; Index: Bugzilla/DB/Pg.pm =================================================================== RCS file: /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Pg.pm,v retrieving revision 1.18 diff -r1.18 Pg.pm 69,70c69,78 < < my $self = $class->db_new($dsn, $user, $pass); --- > my $attributes = { RaiseError => 0, > AutoCommit => 1, > PrintError => 0, > ShowErrorStatement => 1, > HandleError => \&_handle_error, > TaintIn => 1, > FetchHashKeyName => 'NAME', > pg_enable_utf8 => 1}; > > my $self = $class->db_new($dsn, $user, $pass, $attributes); After this patch imported DB displayed correctly, searches with russian are worked, but bugzilla installation is not usable - new comments, bugs and other strings are saved in db in bad unicode strings. My point of view is if UTF-8 is declared as encoding for new installation, than all bugzilla perl scripts MUST work with strings as Unicode strings, not as bytes. All issues above is about this.

Renat Sabitov

Comment 276

•

19 years ago

As examples for issues I wrote please see some bugs on landfill: http://landfill.bugzilla.org/bugzilla-2.22-branch/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=srr%40stacksoft.ru&emailtype1=exact&emailassigned_to1=1&emailreporter1=1 http://landfill.bugzilla.org/bugzilla-tip-pg/buglist.cgi?bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&email1=srr%40stacksoft.ru&emailtype1=exact&emailassigned_to1=1&emailreporter1=1

yvon

Comment 278

•

17 years ago

Attached file test, please ignore (obsolete) (deleted) — Details

please ignore, just test attach zip

Marc Schumann [:Wurblzap]

Updated

•

17 years ago

Attachment #283019 - Attachment is obsolete: true

Dave Miller [:justdave]

Comment 279

•

17 years ago

The content of attachment 283019 [details] has been deleted by Dave Miller <justdave@bugzilla.org> who provided the following reason: irrelevant to this bug The token used to delete this attachment was generated at 2007-10-01 09:55:04 PDT.

Nobody; OK to take it and work on it

Updated

•

12 years ago

QA Contact: matty_is_a_geek → default-qa

defparams.pl patch: Send per default the encoding for HTML (header) and for emails 23 years ago Tobias Burnus (deleted), patch		Details \| Diff \| Splinter Review
defparams.pl patch (v2): Send per default the encoding for HTML (header) and for emails 23 years ago Tobias Burnus (deleted), patch		Details \| Diff \| Splinter Review
defparams.pl patch (v3): Send per default the encoding for HTML (header) and for emails 23 years ago Tobias Burnus (deleted), patch	justdave : review-	Details \| Diff \| Splinter Review
Bigger patch for text/html (v4) 23 years ago Tobias Burnus (deleted), patch		Details \| Diff \| Splinter Review
Bigger patch for text/html (v5) 23 years ago Tobias Burnus (deleted), patch		Details \| Diff \| Splinter Review
Bigger patch for text/html (v5) 23 years ago Tobias Burnus (deleted), patch		Details \| Diff \| Splinter Review
Bigger patch for text/html (v6) 23 years ago Tobias Burnus (deleted), patch		Details \| Diff \| Splinter Review
Bigger patch for text/html (v7) 23 years ago Tobias Burnus (deleted), patch		Details \| Diff \| Splinter Review
Bigger patch for text/html (v8) 23 years ago Tobias Burnus (deleted), patch	gerv : review-	Details \| Diff \| Splinter Review
Bigger patch for text/html (v9) 23 years ago Tobias Burnus (deleted), patch		Details \| Diff \| Splinter Review
v10: Encoding patch for mail and html 23 years ago Tobias Burnus (deleted), patch		Details \| Diff \| Splinter Review
v11/v1h: Encoding patch for HTML 23 years ago Tobias Burnus (deleted), patch	gerv : review+ justdave : review-	Details \| Diff \| Splinter Review
72387: v12/v2h: Encoding patch for HTML 23 years ago Tobias Burnus (deleted), patch	gerv : review+ justdave : review-	Details \| Diff \| Splinter Review
Refresh of 72860 to apply to current HEAD 23 years ago robinson (deleted), patch		Details \| Diff \| Splinter Review
Refresh of patch 77394 to apply to 2_16-BRANCH 22 years ago robinson (deleted), patch		Details \| Diff \| Splinter Review
This page is rendered in quirks mode, as are all bug pages. 22 years ago Felix Miata (deleted), image/png		Details
updated patch 20 years ago :glob ✱ (deleted), patch		Details \| Diff \| Splinter Review
utf-8 patch with initial email support 20 years ago :glob ✱ (deleted), patch	justdave : review-	Details \| Diff \| Splinter Review
utf-8 v3 20 years ago :glob ✱ (deleted), patch	mkanat : review-	Details \| Diff \| Splinter Review
utf-8 v4 20 years ago :glob ✱ (deleted), patch		Details \| Diff \| Splinter Review
utf-8 v5 20 years ago :glob ✱ (deleted), patch		Details \| Diff \| Splinter Review
utf-8 v6 20 years ago :glob ✱ (deleted), patch		Details \| Diff \| Splinter Review
utf-8 v7 20 years ago :glob ✱ (deleted), patch	justdave : review-	Details \| Diff \| Splinter Review
utf-8 v8 20 years ago :glob ✱ (deleted), patch	justdave : review-	Details \| Diff \| Splinter Review
utf-8 v9 20 years ago :glob ✱ (deleted), patch	justdave : review+	Details \| Diff \| Splinter Review
utf-8 v10 20 years ago :glob ✱ (deleted), patch	justdave : review-	Details \| Diff \| Splinter Review
utf-8 v11 20 years ago :glob ✱ (deleted), patch		Details \| Diff \| Splinter Review
utf-8 v12 20 years ago :glob ✱ (deleted), patch	Wurblzap : review+	Details \| Diff \| Splinter Review
utf-8 v12 unrotted 19 years ago Marc Schumann [:Wurblzap] (deleted), patch	Wurblzap : review+	Details \| Diff \| Splinter Review
Documentation patch 19 years ago Marc Schumann [:Wurblzap] (deleted), patch	cso : review-	Details \| Diff \| Splinter Review
Documentation patch 1.2 19 years ago Marc Schumann [:Wurblzap] (deleted), patch	cso : review+	Details \| Diff \| Splinter Review
test, please ignore 17 years ago yvon (deleted), text/plain		Details