Closed Bug 410902 Opened 17 years ago Closed 17 years ago

Some characters are mangled in diff and interdiff modes when viewing attachment

Categories

(Bugzilla :: Attachments & Requests, defect)

3.1.2
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 3.2

People

(Reporter: LpSolit, Assigned: LpSolit)

References

Details

(Keywords: regression)

Attachments

(1 file, 5 obsolete files)

Same issue as in bug 408446. Bugzilla::Attachment::PatchReader::(inter)diff also need to be fixed. Less critical than the original bug, but still needs to be fixed ASAP.
Flags: blocking3.1.3?
This isn't so critical that it *must* be fixed in a devel release (patches are usually mostly ASCII), but it should definitely be fixed before 3.2.
Flags: blocking3.2+
Flags: blocking3.1.3?
Flags: blocking3.1.3-
Some user of Bugzilla-jp says, this bug might be critical for documents and web-site based projects.

# But, i think it might be ok with blocking3.2+ :)
Attached patch working(?) patch (obsolete) (deleted) — Splinter Review
it seems working with this..
Assignee: attach-and-request → shimono
Status: NEW → ASSIGNED
I thought about fixing it this way too, but I think the problem is that you only want the diff of the patch to use RAW, but the footer of the page should still use UTF8, else some e.g. saved search names may be rendered incorrectly. Is what I'm saying crazy or correct?
Comment on attachment 297800 [details] [diff] [review]
working(?) patch

Right, what LpSolit said is correct.

We should have a "utf8_disable" and "utf8_enable" function in Bugzilla::Util, probably, so that we don't have to keep adding these :raw and :utf8 statements manually. The functions can do the check on the utf8 param themselves, so we can also eliminate that from the callers.
Attachment #297800 - Flags: review-
(In reply to comment #4)
> I thought about fixing it this way too, but I think the problem is that you
> only want the diff of the patch to use RAW, but the footer of the page should
> still use UTF8, else some e.g. saved search names may be rendered incorrectly.
> Is what I'm saying crazy or correct?

With the patch v.1, i could get correct output from localized Japanese templates.
# I've localized only two of them to test this..
Have someone tested with localized templates on current tip? Of cource, i should file another bug for this, if this is a REAL problem..

(In reply to comment #5)
> (From update of attachment 297800 [details] [diff] [review])
> Right, what LpSolit said is correct.
> 
> We should have a "utf8_disable" and "utf8_enable" function in Bugzilla::Util,
> probably, so that we don't have to keep adding these :raw and :utf8 statements
> manually. The functions can do the check on the utf8 param themselves, so we
> can also eliminate that from the callers.

I agree with this..
But, it seems that we should have another bug for localized template processing..
Attached patch patch rev.2 (obsolete) (deleted) — Splinter Review
working(?) patch without 'binmode raw'.
But, PatchReader/DiffPrinter/raw.pm causes many errors for 'Wide Charactor'.
# We should set utf8 for PerlIO.
Attachment #297800 - Attachment is obsolete: true
Attached patch patch, v3 (obsolete) (deleted) — Splinter Review
Some methods in PatchReader itself need to be overriden to support utf8.
Attached patch patch, v4 (obsolete) (deleted) — Splinter Review
This patch avoids to override PatchReader methods, and works fine with:
- diff mode
- raw diff mode
- raw interdiff mode
- view mode (already working without this patch)

But it still fails with:
- interdiff mode

I'm pretty sure the problem comes from:

    open my $interdiff_fh, "$lc->{interdiffbin} $old_filename $new_filename|";
    binmode $interdiff_fh;

For some reason, $interdiff_fh remains unsensitive to FILTER utf8. I tried to pass different MODE flags to open(), without success. This last case still needs investigation. I would like to avoid overriding PatchReader methods, if possible.
Attachment #311909 - Attachment is obsolete: true
Attached patch patch, v5 (obsolete) (deleted) — Splinter Review
OK, I finally managed to fix the problem in a nice way, which doesn't break non-patch attachments and which works with all modes.
Assignee: shimono → LpSolit
Attachment #298277 - Attachment is obsolete: true
Attachment #311987 - Attachment is obsolete: true
Attachment #312086 - Flags: review?(mkanat)
Attachment #312086 - Flags: review?(shimono)
Comment on attachment 312086 [details] [diff] [review]
patch, v5

>+        $attachment->data; # Initialize ->{data}.
>+        utf8::decode($attachment->{data}) if Bugzilla->params->{'utf8'};

  That assumes that the attachment data is valid UTF8. utf8::decode only handles valid utf8. Any other encoding will probably cause an error to be thrown.

>Index: Bugzilla/Template.pm
>-                    $var =~ s/[\x{202a}-\x{202e}]//g;
>+                    # If there are invalid utf8 characters (for instance in
>+                    # attachments) leave the string alone.
>+                    eval { $var =~ s/[\x{202a}-\x{202e}]//g; };

  Why put that in an eval? Can it actually die somehow??
(In reply to comment #11)
>   That assumes that the attachment data is valid UTF8. utf8::decode only
> handles valid utf8. Any other encoding will probably cause an error to be
> thrown.

Errors never happened in my case. We can hardly make a better guess as the whole page is displayed as UTF-8. That's why I asked himorin for review, to know if this triggers errors with another encoding.


>   Why put that in an eval? Can it actually die somehow??

Yes, everytime the patches being compared in interdiff have another encoding that UTF-8. e.g. é and à in ISO-8859-1 triggers an error. By ignoring them, the output is fine (i.e. ISO-8859-1 characters are displayed as they used to be when the page encoding is UTF-8).
(In reply to comment #12)
> >   That assumes that the attachment data is valid UTF8. utf8::decode only
> > handles valid utf8. Any other encoding will probably cause an error to be
> > thrown.
> 
> Errors never happened in my case. We can hardly make a better guess as the
> whole page is displayed as UTF-8. That's why I asked himorin for review, to
> know if this triggers errors with another encoding.

I've tested with patch file written in shift_jis encoding, but bugzilla doesn't crash with displaying in the 'diff' mode.
I could not read the text (in the 'diff' mode) with setting encoding from ff menu, but it might be acceptable.
# should we promote writting files in utf-8 :p

Should i test more about using non-utf8 encodings with utf8? (like on interdiff)
from apache error log

Malformed UTF-8 character (4 bytes, need 3, after start byte 0xf0) in substitution (s///) at /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi/Template/Context.pm line 359, <$interdiff_fh> line 12.

This is thrown by a sjis to sjis interdiff page. But i could view the interdiff page.
(In reply to comment #14)
> from apache error log

Bah, I see these warnings as well. :(
(In reply to comment #12)
> >   Why put that in an eval? Can it actually die somehow??
> 
> Yes, everytime the patches being compared in interdiff have another encoding
> that UTF-8. e.g. é and à in ISO-8859-1 triggers an error. By ignoring them,
> the output is fine (i.e. ISO-8859-1 characters are displayed as they used to be
> when the page encoding is UTF-8).

  Okay. Could you put that in a comment?


(In reply to comment #13)
> I could not read the text (in the 'diff' mode) with setting encoding from ff
> menu, but it might be acceptable.

  Yes, that's to be expected--we don't know the encoding of patches, so we assume they're all in UTF-8 and if they're not they just won't display correctly (that is, in Diff mode).
(In reply to comment #14)
> Malformed UTF-8 character (4 bytes, need 3, after start byte 0xf0) in
> substitution (s///) at
> /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi/Template/Context.pm line
> 359, <$interdiff_fh> line 12.

This seems to happen with interdiff only, not with diff, which makes me think that $reader->iterate_fh may not like UTF-8. I will try to slurp the output of interdiff into a variable, decode it, and pass it to $reader->iterate_string instead, and see if this works.
Attached patch patch, v6 (deleted) — Splinter Review
OK, I managed to fix all errors thrown, including warnings in the apache error log. The problem was that get_unified_diff() was calling
  new PatchReader::DiffPrinter::raw($fh);
without calling binmode ':utf8', $fh first, which was responsible for the errors being thrown. This way, I no longer need to hack FILTER html in Template.pm and the eval() can go away.
Attachment #312086 - Attachment is obsolete: true
Attachment #313109 - Flags: review?(shimono)
Attachment #313109 - Flags: review?(mkanat)
Attachment #312086 - Flags: review?(shimono)
Attachment #312086 - Flags: review?(mkanat)
Comment on attachment 313109 [details] [diff] [review]
patch, v6

for log.

some errors when i queried 'interdiff' mode.
[Thu Apr 03 11:37:42 2008] [error] [client 130.54.52.3] [Thu Apr  3 11:37:42 2008] attachment.cgi: Use of uninitialized value in pattern match (m//) at /usr/lib/perl5/site_perl/5.8.5/PatchReader/Raw.pm line 250., referer: http://bugzilla-trunk.test.mozilla.gr.jp/attachment.cgi?id=3671&action=diff
[Thu Apr 03 11:38:06 2008] [error] [client 130.54.52.3] [Thu Apr  3 11:38:06 2008] attachment.cgi: Use of uninitialized value in string ne at Bugzilla/Attachment/PatchReader.pm line 212., referer: http://bugzilla-trunk.test.mozilla.gr.jp/attachment.cgi?id=3671&action=diff

Version informations.
* This is Bugzilla 3.1.3+ on perl 5.8.5
* Running on Linux 2.6.9-67.0.4.ELsmp #1 SMP Sun Feb 3 07:08:57 EST 2008
Cecking for         PatchReader (v0.9.4)  ok: found v0.9.5

diff and interdiff modes worked fine, too.
Attachment #313109 - Flags: review?(shimono) → review+
Comment on attachment 313109 [details] [diff] [review]
patch, v6

This all looks fine, code-wise.
Attachment #313109 - Flags: review?(mkanat) → review+
Flags: approval+
Checking in attachment.cgi;
/cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v  <--  attachment.cgi
new revision: 1.144; previous revision: 1.143
done
Checking in Bugzilla/Util.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Util.pm,v  <--  Util.pm
new revision: 1.69; previous revision: 1.68
done
Checking in Bugzilla/Attachment/PatchReader.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Attachment/PatchReader.pm,v  <--  PatchReader.pm
new revision: 1.5; previous revision: 1.4
done
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Blocks: 441921
we probably have a regression again:

https://bugzilla.mozilla.org/attachment.cgi?id=361327&action=diff

is broken while 

https://bugzilla.mozilla.org/attachment.cgi?id=361360&action=diff

looks good.  Could somebody confirm?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: