Closed Bug 70838 Opened 24 years ago Closed 22 years ago

Mozilla needs a consistent way to handle non-ASCII characters in form input [form sub]

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

VERIFIED WONTFIX
Future

People

(Reporter: mozilla.org, Assigned: alexsavulov)

References

(Depends on 1 open bug, )

Details

(Keywords: compat, intl)

The bug database contains a number of bugs relating to non-ASCII characters in form input. The HTML 4.01 spec says that form input is limited to US-ASCII when enctype is not specified. See: http://www.w3.org/TR/html4/interact/forms.html#h-17.13.3 http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4 However, there may be forms out in the field (especially pages in languages other than English) that do not specify enctype but are expecting to receive characters outside US-ASCII (e.g. they may expect the form input to use the charset of the original document containing the form). Mozilla basically needs to decide how to handle these cases. Currently, there are some cases where Mozilla simply encodes non-ASCII characters as %3F, which seems somewhat arbitrary. See also the following bugs: bug 18643 bug 29271 bug 58033 bug 60043 bug 65697 bug 67090
Keywords: compat, correctness
Summary: Mozilla needs consistent way to hanlde non-ASCII characters in form input → Mozilla needs a consistent way to handle non-ASCII characters in form input
Setting status to new. This seems to be something like a tracking bug, and as such it should probably depend on the individual bugs, but from a semantic point of view, this decision probably needs to be made _before_ the individual bugs can be resolved, so making this bug block the others would make sense, too. I vote for the tracking bug interpretation. If you agree, can you add all the individual bugs as dependencies?
Status: UNCONFIRMED → NEW
Ever confirmed: true
I don't have an opinion on the dependency/blocking question, but I agree the decisions should be made before action is taken on some of the other bugs. Some of the decisions have probably already been made, but when I looked on www.mozilla.org a while ago I couldn't find any place where they had been publicly documented.
Keywords: intl
Sounds like a job for pollmann
Assignee: rods → pollmann
Target Milestone: --- → mozilla0.9.3
Shouldn't the target be ASAP, before milestone 0.9.3? This bug does not by itself involve any code; it is just for making and/or documenting a decision about what the desired behavior should be, so as to provide guidance on what should be done with a number of other bugs.
Depends on: 18643, 29271, 60043, 65697, 67090
Missed 0.9.3.
Target Milestone: mozilla0.9.3 → mozilla0.9.4
> The HTML 4.01 spec says that form input is limited to US-ASCII when > enctype is not specified This means that data submitted via the "GET" method can only contain ASCII characters (because the enctype for GET is always "application/x-www-form-urlencoded", and this content type is limited to US-ASCII). I am working as a web developer for 3 years now, and I didn't know this until now, and I believe that most other web developers don't know this. There are a lot of forms out in the WWW that expect other than pure ASCII data to submit via the GET method, e.g. the Russian search engine "Rambler.ru" would not be able to search for Russian words if only ASCII characters would be allowed in forms submitted by GET. > (e.g. they may expect the form input to use the > charset of the original document containing the form) This is probably the way most applications and developers expect it. (I, too, believed this is the correct behavior...) IMHO Internet Explorer handles things in this way (though in IE there is an option "always send URL as UTF-8", is this related to this problem?), and the developers of CGI-Skripts like the above mentioned Russian search engine probably expect the data they get by GET or POST to be in the encoding of the page that contains the form. That's why I vote for Mozilla to handle the problem in this way: if no enctype in the form is specified, form data is to be encoded in the encoding specified by the page containing the form. (Isn't it working in this way right now? I think it is, at least I have no problems submitting russian queries to rambler.ru)
I read that some asia organisations start to extand the dns namespace with asia characters. Did anyone know if this is working with mozilla, can be a related problem :)
Target Milestone: mozilla0.9.4 → mozilla0.9.6
Bulk reassigning form bugs to Alex
Assignee: pollmann → alexsavulov
Summary: Mozilla needs a consistent way to handle non-ASCII characters in form input → Mozilla needs a consistent way to handle non-ASCII characters in form input [form sub]
*** This bug has been marked as a duplicate of 81203 ***
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
This is a tracking bug, how can it be a dupe of that bug?
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
pushing to 0.9.7 because is a metabug and i have to wait and see how the still open bugs develope.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
*** Bug 109226 has been marked as a duplicate of this bug. ***
retargeting since dependencies still unresolved: 29271: 0.9.8 waiting on Naoki 18643: future addition of a hidden field _charset_ to post charset information as soon as 29271 is done
Target Milestone: mozilla0.9.7 → mozilla1.0.1
The point of this bug report was to make sure the solutions to all the other bugs were consistent with each other *before* people got to doing technical work on them. Since most of the other bugs have been marked fixed, this bug should be marked resolved. There's really no point in addressing this bug report after all the other bugs have already been fixed. It would be nice to see some documentation of how Mozilla handles non-ASCII form input, because some of the solutions (such as the _charset_ hack) are not standards-based.
agree. the comment #13 is only for folks tracking milestone movements
Priority: -- → P2
Severity: major → normal
Priority: P2 → P3
Target Milestone: mozilla1.0.1 → Future
Since various hacks and fixes have already been applied to handle certain cases of non-ASCII form input, this bug is somewhat irrelevant now. (See comment 14.) I will mark this bug "won't fix."
Status: REOPENED → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → WONTFIX
verifying
Status: RESOLVED → VERIFIED
Component: HTML: Form Submission → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.