Closed Bug 229010 Opened 21 years ago Closed 16 years ago

Non-ASCII URI recognition problem

Categories

(Bugzilla :: Bugzilla-General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 3.0

People

(Reporter: jshin1987, Unassigned)

References

Details

(Keywords: intl)

Bugzilla does better than Mozilla-mail in recognzing URIs with non-ASCII
characters (see bug 228543 comment #3 and attachment 137453 [details]), but it fails to
anchor the whole URI if the 'path' part has non-ASCII characters. Well, they're
not supposed to be there, but people use them as following:

http://株式会社アスキー.jp/ の 「株」
http://だいぎん.jp/ の 「だ」
http://トッパンフォームズ.jp/ の 「ム」
http://昭栄.jp/ の 「栄」
Depends on: bz-charset
Ooops. Sorry this bug is invalid unless the following URI (without space)
recognition fails as well.  I didn't realize that there are 'space' characters
following  '/' in all four cases. 

http://株式会社アスキー.jp/の株
Even without space following '/', the path part is not recognized as a part of
the URI so that this bug is valid.
jshin:
BTW: What about filing a bug to make the default charset for
bugzilla.mozilla.org UTF-8 ?
It's already filed by Markus Kuhn a long long time ago :-) See bug 135762.

Jungshik Shin wrote:
> It's already filed by Markus Kuhn a long long time ago :-) See bug 135762.

Cool... let's PUSH for that change... :)
turning it on is a one-line change in the current version of Bugzilla (see bug
126266).  It won't happen on bugzilla.mozilla.org until someone comes up with a
feasible means of migrating the existing data (which is in different charsets in
different bugs (and sometimes even in different comments in the same bug).  See
bug 126266 comment 123.
This is certainly off-topic here, but ....
Markus Kuhn, I and others already came up with a reasonable (in our opinion)
migration path (bug 126266 comment #112, bug 135762 comment #7), but the trouble
is that we haven't reached a consensus on that. The consequence is that as time
goes by, we accumulate more and more 'legacy' data that is hard to deal with
when we finally decide to migrate (unless somebody conjures up the character
encoding detector that is guaranteed to work  for even a very short chunk of
text like several bytes. Markus and my migration path don't call for such a
magic, but not everyone is satisfied with it). Hopefully, this will change with
bug 126266 fixed (as the first step toward the migration in a sense).
Can't we simply add a small hack in bugzilla which allows per-bugreport charset
ids and marks all NEW bug reports as UTF-8 ?
and then you have non-utf8 non-ascii data showing up in buglists....

which I suppose is already toasted now if you get more than one in the same
list, so I guess that wouldn't be such a bad thing...
Reassigning bugs that I'm not actively working on to the default component owner
in order to try to make some sanity out of my personal buglist.  This doesn't
mean the bug isn't being dealt with, just that I'm not the one doing it.  If you
are dealing with this bug, please assign it to yourself.
Assignee: justdave → general
QA Contact: mattyt-bugzilla → default-qa
Link in comment #1 is now recognised correctly so I'd say this got fixed by the dependent bugs.
Status: NEW → RESOLVED
Closed: 16 years ago
Depends on: bz-recode
Resolution: --- → FIXED
Target Milestone: --- → Bugzilla 3.0
You need to log in before you can comment on or make changes to this bug.