Closed Bug 19954 Opened 25 years ago Closed 25 years ago

[4.xP][PP][dogfood]{html40}submission of newlines in textareas

Categories

(Core :: DOM: Core & HTML, defect, P3)

x86
Linux
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: dbaron, Assigned: sfraser_bugs)

References

Details

(Whiteboard: [PDT+][12/10])

DESCRIPTION: Newlines in textareas should be submitted as CR-LF (i.e., %0D%0A). This is specified in section 17.13.4 of HTML 4.0 and is the behavior of NN 4.x (on Linux, anyway). Currently they are being sent as LF (%0A) on Linux, and I believe (based on sfraser's comments in bug 16813) they are being sent as CR (%0D) on Mac. (I'd guess they're correct on Windows, but that should be checked.) STEPS TO REPRODUCE: * load http://bugzilla.mozilla.org/enter_bug.cgi * Type the following into the textarea: ZZZZZZZ MMMMMMM * click "Remember values as bookmarkable template" * examine the URL that is presented to be bookmarked ACTUAL RESULTS: * the text is encoded as ZZZZZZZ%0AMMMMMMM (on Linux), and something else on other platforms... EXPECTED RESULTS: * it should be encoded as ZZZZZZZ%0D%0AMMMMMMM DOES NOT WORK CORRECTLY ON: * Linux, mozilla, 1999-11-22-08-M12 WORKS CORRECTLY ON: * Linux NN 4.61 ADDITIONAL INFORMATION: This should be tested with both GET and POST form submission, and for both content types described in HTML 4.0. See HTML 4.0: http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.4
Assignee: karnaze → pollmann
Eric, reassign to Steve if appropriate.
Actually, I think this should apply anytime there are newlines in anything to be submitted. This could happen in lots of ways. It could be caused by entities in HTML (CR, LF, or CR-LF) or through JS. It could occur in textareas, hidden inputs, control values for radio/checkbox inputs, etc. So it should probably be handled at the level of form submission...
cc self.
I think the solution here is to add a flag to nsIDocumentEncoder that says 'I want HTML spec line breaks', and use that in nsGfxTextControlFrame::GetTextControlFrameState() when we get the value.
Assignee: pollmann → sfraser
I'm fixing this.
Status: NEW → ASSIGNED
Target Milestone: M12
setting to M12 and setting assigned
Target Milestone: M12
I believe that we should be doing linefeed conversions at the same time we handle conversion to Unicode, which happens in CNavDTD::CollectSkippedContent(). That way, we are converting line breaks before stuff gets inserted into teh content model, which sounds right. rickg, is this good? Here's the diff: Index: CNavDTD.cpp =================================================================== RCS file: /cvsroot/mozilla/htmlparser/src/CNavDTD.cpp,v retrieving revision 3.241 diff -u -2 -r3.241 CNavDTD.cpp --- CNavDTD.cpp 1999/11/23 03:07:07 3.241 +++ CNavDTD.cpp 1999/11/24 00:06:45 @@ -1919,4 +1919,5 @@ int aIndex=0; int aMax=mSkippedContent.GetSize(); + PRBool aMustConvertLinebreaks = PR_FALSE; for(aIndex=0;aIndex<aMax;aIndex++){ @@ -1932,4 +1933,8 @@ if((eHTMLTag_textarea==theNodeTag) && (eToken_entity==theTokenType)) { ((CEntityToken*)theNextToken)->TranslateToUnicodeStr(mScratch); + // since this is an entity, we know that it's only one character. + // check to see if it's a CR, in which case we'll need to do line + // termination conversion at the end. + aMustConvertLinebreaks |= (mScratch[0] == kCR); } else theNextToken->GetSource(mScratch); @@ -1939,4 +1944,17 @@ mTokenRecycler->RecycleToken(theNextToken); } + + // if the string contained CRs (hence is either CR, or CRLF terminated) + // we need to convert line breaks + if (aMustConvertLinebreaks) + { + PRInt32 offset; + while ((offset = aNode.mSkippedContent.Find("\r\n")) != kNotFound) + aNode.mSkippedContent.Cut(offset, 1); // remove the CR + + // now replace remaining CRs with LFs + aNode.mSkippedContent.ReplaceChar("\r", kNewLine); + } + // Let's hope that this does not hamper the PERFORMANCE!! mLineNumber += aNode.mSkippedContent.CountChar(kNewLine);
Ack! Here's a non-doublespaced version of the patch. Index: CNavDTD.cpp =================================================================== RCS file: /cvsroot/mozilla/htmlparser/src/CNavDTD.cpp,v retrieving revision 3.241 diff -u -2 -r3.241 CNavDTD.cpp --- CNavDTD.cpp 1999/11/23 03:07:07 3.241 +++ CNavDTD.cpp 1999/11/24 00:09:04 @@ -1919,4 +1919,5 @@ int aIndex=0; int aMax=mSkippedContent.GetSize(); + PRBool aMustConvertLinebreaks = PR_FALSE; for(aIndex=0;aIndex<aMax;aIndex++){ @@ -1932,4 +1933,8 @@ if((eHTMLTag_textarea==theNodeTag) && (eToken_entity==theTokenType)) { ((CEntityToken*)theNextToken)->TranslateToUnicodeStr(mScratch); + // since this is an entity, we know that it's only one character. + // check to see if it's a CR, in which case we'll need to do line + // termination conversion at the end. + aMustConvertLinebreaks |= (mScratch[0] == kCR); } else theNextToken->GetSource(mScratch); @@ -1939,4 +1944,17 @@ mTokenRecycler->RecycleToken(theNextToken); } + + // if the string contained CRs (hence is either CR, or CRLF terminated) + // we need to convert line breaks + if (aMustConvertLinebreaks) + { + PRInt32 offset; + while ((offset = aNode.mSkippedContent.Find("\r\n")) != kNotFound) + aNode.mSkippedContent.Cut(offset, 1); // remove the CR + + // now replace remaining CRs with LFs + aNode.mSkippedContent.ReplaceChar("\r", kNewLine); + } + // Let's hope that this does not hamper the PERFORMANCE!! mLineNumber += aNode.mSkippedContent.CountChar(kNewLine);
Looking for a review on that patch. Parser people?
Adding Harish, as he's the parser people. Unfortunately, he's out on vacation for a while. I don't know dates.
Waiting for resolution on this and 16813 which are related; waiting for parser people to return from vacation. Will address on 11/29
Target Milestone: M12
Summary: [4.xP][PP]{html40}submission of newlines in textareas → [4.xP][PP][dogfood]{html40}submission of newlines in textareas
M12
Whiteboard: [PDT+]
Putting on PDT+ radar.
OK, ignore that patch above: it's a fix for 16813, and not for this bug.
I'm not sure how to fix this one. The alternatives are: 1. In nsGfxTextControlFrame::GetTextControlFrameState(), editor->OutputToString() with a flag that says to output text with CRLF breaks. This means that any time someone gets the value of a textarea (e.g. from JS) they'll get CRLF termination. That might be bad for JS. 2. Do the conversion in nsFormFrame::URLEncode(). We'd have to convert from platform line breaks back to CRLF, so this would be less efficient. Thoughts?
Because of my comment on this bug at 11/23/99 13:23, I think I'd prefer (2), if I understand them correctly.
If we do add the new output flag, it sounds like we should also change the nsGfxTextControlFrame's API so that nsFormFrame can pass a flag down into it telling it whether to use CRLF mode or not. This is a little ugly, but no more so than having to add the flag in the first place.
I agree with David. Solution 2) sounds better than solution 1). I think that the logic that will have to be added to URLEncode() will be just as efficient as what would have to added to GetTextControlFrameState(). However, if you do decide to make the change to URLEncode(), please check to see if you need to make it for multipart forms as well (nsFormFrame::ProcessAsMultipart()). A testcase for Multipart might look like: <HTML> <BODY> <FORM ENCTYPE="multipart/form-data" ACTION="http://pollmann.net/echo.cgi"> <TEXTAREA NAME="foo">Test Case</TEXTAREA> </FORM> </BODY> </HTML> The reason you'll need to test this separately is that ProcessAsMultipart doesn't call URLEncode on the form values. I'm not sure what Nav and IE do with multipart forms regarding CR/LF.
I have fixed for line termination conversion in nsFormFrame, but there is still a problem when you go Back to a form page; the FrameManger's state for that frame holds a string that also needs to go through line termination conversion. My first solution above would fix that case too; the second solution requires more line termination conversion code in nsTextControlFrame::SaveState() and maybe RestoreState(). So I'm tending towards that solution.
Sorry, "that solution" == the first one (always get CRLF data from textareas).
Whiteboard: [PDT+] → [PDT+][12/3]
Still awaiting input on some issues, from pollman and a DOM person, in particular, what line termination would be expected when you get the value of a textarea in JS.
Blocks: 12658
linking to 12658, composer pdt+ bug tracking
I tested the linebreaks gotten from textareas in Nav 4.x, and IE 5.0 on various platforms. In all cases, platform-native line breaks are being used.
Tested in what? JS? Form submission? (It's certainly not true in NN 4.61 Linux for form submission.) If the answer is JS, then I think that's further reason to fix this bug at the level of form submission encoding.
Sorry, yes, at the level of getting textarea.value from JS. I am working on a solution to fix this bug at the form submission level.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Checked in fixes, in nsFormFrame.cpp. This will need to be verified on all 3 platforms. QA should verify that CRLF data is being sent to the server in each case.
QA Contact update.
Status: RESOLVED → REOPENED
Okay, the good news is it works on: - WinNT 1999120516 mozilla & - MacOS86 1999120508 netscape. The bad news is it still doesn't work on Linux6 1999120608 mozilla. Reopening bug.
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
Whiteboard: [PDT+][12/3] → [PDT+][12/10]
ckritzer: When you say 'it still doesn't work', what problem do you observe?
Oh yeah, details...those would be good... Specificly, there is an extra line feed after every carriage return in a bugzilla form submission on Linux. If you type: AAAAAAA BBBBBBB CCCCCCC and submit the bugzilla form in 1999120708 Linux6 mozilla, here's what it will look like: AAAAAAA BBBBBBB CCCCCCC Does that make sense?
This is a test.
Chris: I just tried submitting a few lines with a return (like this) between them to this bug using today's Linux build, and they seem to have submitted correctly. (another return) Could you please explain in painstaking detail exactly what you're doing, and what the result is?
AFAICT, this bug is fixed in Linux mozilla 1999-12-06-13-M12. Bug 16813 may not be.
Bug 16813 is also mostly fixed. I just filed bug 21121 for the remaining issues. I filed it using a template with proper CR-LF encoding (bug 16813), and it worked fine. I also created a bug template, and that gave the proper encoding in the URL (this bug). However, there are two seemingly random newlines in bug 21121 that I did not type. They need to be investigated. Perhaps I should open another bug...
Based on dbaron's commens, marking fixed. The 'extra newline' problem is known (the output system is doing extra wrapping when it should not). It is covered by but 20603.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
Marking fixed, because sfraser said he would, but didn't.
Dang! I'm *still* seeing this (with the 1999120715 build)... Akkana, Dave, Simon, what version of linux are you running? Do you install over previous builds, or use a new directory? Do you toss out mozilla registry & profiles, or keep them build to build? My answers to the above are Linux6 (gnome) on an HP Vectra XA, new directory, and keep them build to build...
**What** are you still seeing? I think that's the key question here. (I install builds in new directories, but keep my ~/.mozilla/ directory. I use Linux 2.2.5 (which comes with RedHat 6.0) and gnome 1.0.53.)
Status: RESOLVED → VERIFIED
Marking VERIFIED FIXED. Tested on Linux6 1999120808 mozilla. Also verified (and behaves correctly) on: - Win98 1999120808 mozilla - MacOS86 1999120808 commercial
Blocks: 21564
No longer blocks: 21564
Component: HTML: Form Submission → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.