Closed Bug 651072 Opened 14 years ago Closed 13 years ago

Support HTML parsing in XMLHttpRequest per XMLHttpRequest Level 2

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla11

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

(Keywords: dev-doc-complete)

Attachments

(4 files, 4 obsolete files)

(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
(deleted), patch
mossop
: review+
Details | Diff | Splinter Review
(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
(deleted), patch
smaug
: review+
Details | Diff | Splinter Review
Add support for parsing text/html into a DOM in XMLHttpRequest per the XMLHttpRequest Level 2 spec: http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#document-response-entity-body
Attached patch First attempt at implementation (obsolete) (deleted) — Splinter Review
This is going to need a lot of testing...
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Attached patch Support HTML in XHR, with tests (obsolete) (deleted) — Splinter Review
Attachment #562026 - Attachment is obsolete: true
Blocks: 102699
As discussed earlier, HTML is not supported for multipart responses, because the current multipart infrastructure doesn't work with a parser that doesn't finish synchronously when OnStopRequest fires and it's unclear if there's demand for multipart support.
Attachment #571005 - Attachment is obsolete: true
Attachment #573822 - Flags: review?(bugs)
Currently, the extension updater relies on error pages getting a null responseXML. When HTML parsing is supported, text/html error pages result in a non-null responseXML. However, it's not OK to check the XHR status field for the success code 200, because the extension manager also does requests whose responses don't expose a status code (I did check if this was due to security concerns or due to the channel not being an HTTP channel).
Attachment #573824 - Flags: review?(dtownsend)
(In reply to Henri Sivonen (:hsivonen) from comment #5) > Created attachment 573824 [details] [diff] [review] [diff] [details] [review] > Make extension updater treat text/html responses as errors > > Currently, the extension updater relies on error pages getting a null > responseXML. When HTML parsing is supported, text/html error pages result in > a non-null responseXML. However, it's not OK to check the XHR status field > for the success code 200, because the extension manager also does requests > whose responses don't expose a status code (I did check if this was due to > security concerns or due to the channel not being an HTTP channel). Actually mostly what we expect for non-XML pages is that the responseXML has the namespace XMLURI_PARSE_ERROR. What is the content of a failed html parse? Can we just turn off html parsing in this case? Html is not a valid response here.
(In reply to Dave Townsend (:Mossop) from comment #6) > Actually mostly what we expect for non-XML pages is that the responseXML has > the namespace XMLURI_PARSE_ERROR. With text/html responses, the root element with have the http://www.w3.org/1999/xhtml namespace. > What is the content of a failed html parse? HTML parsing can't fail (except for Out of Memory), so if the response is of type text/html, responseXML will be non-null. > Can we just turn off html parsing in this case? I tried to make the extension manager override the MIME type to application/xml, but browser_manualupdates.js failed anyway. I didn't investigate further, though maybe I should.
(In reply to Henri Sivonen (:hsivonen) from comment #7) > with have will have
(In reply to Henri Sivonen (:hsivonen) from comment #7) > I tried to make the extension manager override the MIME type to > application/xml, but browser_manualupdates.js failed anyway. I didn't > investigate further, though maybe I should. Looking at the source, it seems like there has to be a channel type that is involved in browser_manualupdates.js and that really honor nsIChannel::SetContentType().
(In reply to Henri Sivonen (:hsivonen) from comment #9) > that really honor nsIChannel::SetContentType(). that *does not* I can't type English correctly today, it seems.
Comment on attachment 573824 [details] [diff] [review] Make extension updater treat text/html responses as errors (In reply to Henri Sivonen (:hsivonen) from comment #10) > (In reply to Henri Sivonen (:hsivonen) from comment #9) > > that really honor nsIChannel::SetContentType(). > > that *does not* > > I can't type English correctly today, it seems. Strange, everything I can see is using http there. That said I've realised which bit of the code this is now (more caffeine helped!) and it might actually be ok to allow http in this case, but I'd probably need to test that a bit better. This is fine for now.
Attachment #573824 - Flags: review?(dtownsend) → review+
Comment on attachment 573822 [details] [diff] [review] Support HTML in XHR, with tests and with sync XHR support and non-support for multipart Btw, I filed https://bugzilla.mozilla.org/show_bug.cgi?id=701787 But as of now, it doesn't affect to this bug. > static nsresult CreateDocument(const nsAString& aNamespaceURI, > const nsAString& aQualifiedName, > nsIDOMDocumentType* aDoctype, > nsIURI* aDocumentURI, > nsIURI* aBaseURI, > nsIPrincipal* aPrincipal, > nsIScriptGlobalObject* aScriptObject, > bool aSVGDocument, >- nsIDOMDocument** aResult); >+ nsIDOMDocument** aResult, >+ bool aForceHtml = false); ok, now it is happening... Adding another "force this document". Could you perhaps replace 'bool aSVGDocument' with some enum. It could have values none, html, svg for now. IIRC, there aren't really that many nsContentUtils::CreateDocument calls in the tree, so the change should be pretty small. >- mLoadLengthComputable(false), mLoadTotal(0), >+ mLoadLengthComputable(false), >+ mIsHtml(false), >+ mLoadTotal(0), Shouldn't you initialize also mWarnAboutMultipartHtml >+ if (mWarnAboutMultipartHtml) { >+ mWarnAboutMultipartHtml = false; >+ nsContentUtils::ReportToConsole(nsContentUtils::eDOM_PROPERTIES, >+ "HTMLMultipartXHRWarning", >+ nsnull, >+ 0, >+ nsnull, // Response URL not kept around >+ EmptyString(), >+ 0, >+ 0, >+ nsIScriptError::warningFlag, >+ "DOM Events", Is there some better string for this than "DOM Events". >@@ -2041,49 +2087,67 @@ nsXMLHttpRequest::OnStopRequest(nsIReque > } > } > > channel->SetNotificationCallbacks(nsnull); > mNotificationCallbacks = nsnull; > mChannelEventSink = nsnull; > mProgressEventSink = nsnull; > >- mState &= ~XML_HTTP_REQUEST_SYNCLOOPING; >- > if (NS_FAILED(status)) { > // This can happen if the server is unreachable. Other possible > // reasons are that the user leaves the page or hits the ESC key. > > mErrorLoad = true; > mResponseXML = nsnull; > } > >- NS_ASSERTION(!parser || parser->IsParserEnabled(), >- "Parser blocked somehow?"); >- > // If we're uninitialized at this point, we encountered an error > // earlier and listeners have already been notified. Also we do > // not want to do this if we already completed. > if (mState & (XML_HTTP_REQUEST_UNSENT | > XML_HTTP_REQUEST_DONE)) { >+ mState &= ~XML_HTTP_REQUEST_SYNCLOOPING; > return NS_OK; > } This change makes the method a bit more error prone. It is easier to forget to remove XML_HTTP_REQUEST_SYNCLOOPING. So, add a comment to be careful with XML_HTTP_REQUEST_SYNCLOOPING handling. >+class nsXHRParseEndListener : public nsIDOMEventListener >+{ >+public: >+ NS_DECL_ISUPPORTS >+ nsresult HandleEvent(nsIDOMEvent *event) { NS_IMETHOD HandleEvent and { should be in the next line. >+ mXHR->ChangeStateToDone(); >+ mXHR = nsnull; >+ return NS_OK; >+ } >+ nsXHRParseEndListener(nsXMLHttpRequest* aXHR) : mXHR(aXHR) {} >+ virtual ~nsXHRParseEndListener() {} >+private: >+ nsRefPtr<nsXMLHttpRequest> mXHR; >+}; So, XHR keeps the document alive, and document keeps event listener manager alive, and event listener manager keeps nsXHRParseEndListener alive which keeps XHR alive... Could you have a raw reference to XHR here, and XHR could also have a raw reference to this listener, and if either one is being deleted, the variables would be set to null. (I wonder why the patch doesn't leak right now, but I assume it could leak in some cases) >@@ -694,17 +694,17 @@ nsHTMLDocument::StartDocumentLoad(const > > if (loadAsHtml5 && !viewSource && > (!(contentType.EqualsLiteral("text/html") || plainText) && > aCommand && !nsCRT::strcmp(aCommand, "view"))) { > loadAsHtml5 = false; > } > > // TODO: Proper about:blank treatment is bug 543435 >- if (loadAsHtml5 && !viewSource) { >+ if (loadAsHtml5 && aCommand && !nsCRT::strcmp(aCommand, "view")) { Could you explain this change.
Attachment #573822 - Flags: review?(bugs) → review+
Attached patch Address review comments (deleted) — Splinter Review
(In reply to Olli Pettay [:smaug] from comment #12) > ok, now it is happening... Adding another "force this document". > Could you perhaps replace 'bool aSVGDocument' with some enum. > It could have values none, html, svg for now. I added an enum, but I used a more descriptive value instead of "none". :-) > >- mLoadLengthComputable(false), mLoadTotal(0), > >+ mLoadLengthComputable(false), > >+ mIsHtml(false), > >+ mLoadTotal(0), > Shouldn't you initialize also mWarnAboutMultipartHtml Yes. > >+ if (mWarnAboutMultipartHtml) { > >+ mWarnAboutMultipartHtml = false; > >+ nsContentUtils::ReportToConsole(nsContentUtils::eDOM_PROPERTIES, > >+ "HTMLMultipartXHRWarning", > >+ nsnull, > >+ 0, > >+ nsnull, // Response URL not kept around > >+ EmptyString(), > >+ 0, > >+ 0, > >+ nsIScriptError::warningFlag, > >+ "DOM Events", > Is there some better string for this than "DOM Events". I don't know. I don't know what consumes that string and what strings it recognizes. > So, add a comment to be careful with XML_HTTP_REQUEST_SYNCLOOPING handling. Added. > >+class nsXHRParseEndListener : public nsIDOMEventListener > >+{ > >+public: > >+ NS_DECL_ISUPPORTS > >+ nsresult HandleEvent(nsIDOMEvent *event) { > NS_IMETHOD HandleEvent > and { should be in the next line. OK. > >+ mXHR->ChangeStateToDone(); > >+ mXHR = nsnull; > >+ return NS_OK; > >+ } > >+ nsXHRParseEndListener(nsXMLHttpRequest* aXHR) : mXHR(aXHR) {} > >+ virtual ~nsXHRParseEndListener() {} > >+private: > >+ nsRefPtr<nsXMLHttpRequest> mXHR; > >+}; > > So, XHR keeps the document alive, and document keeps event listener manager > alive, > and event listener manager keeps nsXHRParseEndListener alive which keeps XHR > alive... > Could you have a raw reference to XHR here, and XHR could also have a raw > reference to this listener, > and if either one is being deleted, the variables would be set to null. Wouldn't raw references be more error-prone than having an owning reference and a guarantee that the event will fire, so the reference will get nulled out? > (I wonder why the patch doesn't leak right now, but I assume it could leak > in some cases) When the listener is added, it's clear that the parser has gotten far enough that DOMContentLoaded *will* fire. And nsXHRParseEndListener drops its XHR reference after the event has been handled. > >- if (loadAsHtml5 && !viewSource) { > >+ if (loadAsHtml5 && aCommand && !nsCRT::strcmp(aCommand, "view")) { > Could you explain this change. Now that this line of code can see a broader range of parser commands, it's important to limit the special case to loading about:blank to a docshell. Otherwise, if about:blank is loaded via XHR, the old parser would be activated, which wouldn't be good.
Attachment #574575 - Flags: review?(bugs)
Comment on attachment 574575 [details] [diff] [review] Address review comments > class nsXHRParseEndListener : public nsIDOMEventListener > { > public: > NS_DECL_ISUPPORTS >- nsresult HandleEvent(nsIDOMEvent *event) { >+ NS_IMETHOD HandleEvent(nsIDOMEvent *event) >+ { >+ NS_ASSERTION(mXHR, "This event is supposed to fire exactly once."); > mXHR->ChangeStateToDone(); >- mXHR = nsnull; >+ mXHR = nsnull; // Break the reference cycle > return NS_OK; > } > nsXHRParseEndListener(nsXMLHttpRequest* aXHR) : mXHR(aXHR) {} > virtual ~nsXHRParseEndListener() {} > private: > nsRefPtr<nsXMLHttpRequest> mXHR; This feels still risky. Could you use nsWeakPtr. (XHR happens to extend nsSupportsWeakReference already)
Attachment #574575 - Flags: review?(bugs) → review+
Backed out 75ace50df008 in an attempt to fix test_XHRSendData.html crashes after a bad merge from mozilla-central to inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/d0c677daedff Example crash: https://tbpl.mozilla.org/php/getParsedLog.php?id=7432104&tree=Mozilla-Inbound
Target Milestone: mozilla11 → ---
The problem is (in the redirect test) if (NS_FAILED(status)) { // This can happen if the server is unreachable. Other possible // reasons are that the user leaves the page or hits the ESC key. mErrorLoad = true; mResponseXML = nsnull; } and then nsCOMPtr<nsIDOMEventTarget> eventTarget = do_QueryInterface(mResponseXML); nsEventListenerManager* manager = eventTarget->GetListenerManager(true); where status is NS_ERROR_NOT_AVAILABLE. (Which comes from nsInputStreamPump::mStatus.) Haven't investigated further.
Attached patch Deal with null mResponseXML after error (obsolete) (deleted) — Splinter Review
(In reply to Ms2ger from comment #17) > The problem is (in the redirect test) > > if (NS_FAILED(status)) { > // This can happen if the server is unreachable. Other possible > // reasons are that the user leaves the page or hits the ESC key. > > mErrorLoad = true; > mResponseXML = nsnull; > } > > and then > > nsCOMPtr<nsIDOMEventTarget> eventTarget = > do_QueryInterface(mResponseXML); > nsEventListenerManager* manager = eventTarget->GetListenerManager(true); I changed this part to deal with null mResponseXML, but... > where status is NS_ERROR_NOT_AVAILABLE. (Which comes from > nsInputStreamPump::mStatus.) > > Haven't investigated further. The error still propagates via mErrorLoad in ways that tests don't expect. What did your patches change to make the NS_ERROR_NOT_AVAILABLE occur where it didn't before?
(In reply to Henri Sivonen (:hsivonen) from comment #19) > What did your patches change to make the NS_ERROR_NOT_AVAILABLE occur where > it didn't before? I think I eyeballed through all the patches that came from the m-c side of the merge, but I didn't see anything that'd explain why something now says NS_ERROR_NOT_AVAILABLE but didn't when I pushed to try before the merge.
Ms2ger, whatever is making this fail is the same thing that made you see the "cannot SetMetaDataElement" warning a couple of days ago. That's where the bad rv starts propagating. Right about here: http://mxr.mozilla.org/mozilla-central/source/content/html/document/src/nsHTMLDocument.cpp#947 Do you recall which change made you see that warning?
This supplemental patch: 1) Deals with null mResponseXML after error 2) Avoids propagating nsresult from the cache when a late write to the cache entry fails and the error gets overwritten in the XML case anyway. 3) Makes the synchronous mode behave as if HTML parsing support had never been added (except for a console warning).
Attachment #575157 - Attachment is obsolete: true
Attachment #575813 - Flags: review?(bugs)
Comment on attachment 575813 [details] [diff] [review] Supplemental patch to fix the problems that lead to backout and to unsupport the synchronous mode >@@ -733,16 +734,30 @@ nsXMLHttpRequest::GetResponseXML(nsIDOMD > nsnull, // Response URL not kept around > EmptyString(), > 0, > 0, > nsIScriptError::warningFlag, > "DOM Events", > mOwner->WindowID()); > } >+ if (mWarnAboutSyncHtml) { >+ mWarnAboutSyncHtml = false; >+ nsContentUtils::ReportToConsole(nsContentUtils::eDOM_PROPERTIES, >+ "HTMLSyncXHRWarning", >+ nsnull, >+ 0, >+ nsnull, // Response URL not kept around >+ EmptyString(), >+ 0, >+ 0, >+ nsIScriptError::warningFlag, >+ "DOM Events", >+ mOwner->WindowID()); >+ } "DOM Events" is just wrong. Just use "DOM". It is used on some case. Fix also the other error reporting.
Attachment #575813 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #23) > "DOM Events" is just wrong. Just use "DOM". It is used on some case. > Fix also the other error reporting. Thanks. Landed with s/DOM Events/DOM/: https://hg.mozilla.org/integration/mozilla-inbound/rev/0c4d3b7be17e
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Depends on: 704760
Depends on: 704761
Notes for documentation: * HTML parsing is not supported in the synchronous mode (because the synchronous mode fits badly into the architecture of Gecko and the Web platform and we don't want to let people use it more than what the legacy requires) * HTML support when responseType == "" should be considered to be at risk. If supporting HTML in the default mode causes too much trouble with legacy uses of XHR (particularly because error pages with text/html response bodies now get a non-null responseXML), we might have to disable HTML support for responseType == "" * HTML support in the asynchronous mode with responseType == "document" can be safely be treated as a feature that will appear in Firefox 11 (unless something totally unforeseeable happens). * If there is no declared encoding for incoming text/html content (on the HTTP level, by using a BOM or in a <meta> tag within the first 1024 bytes), the default encoding is UTF-8.
Depends on: 705072
Drafted some initial docs: https://developer.mozilla.org/en/HTML_in_XMLHttpRequest The documentation assumes that the fix for bug 705072 will land.
Depends on: 705216
This is triggering for HTTP server errors with HTML error pages. :-( The caller is expecting XML or JSON, but you're parsing the HTML error page now. I don't think it's a good idea to parse the HTML at all unless the caller requests it, e.g. by calling .responseHTML or similar. XHR is mostly a protocol, but many servers respond with an HTML page when they are confused/broken, but most of the time, this error HTML page is useless for the calling app. Parsing it automatically is misguided, in most cases.
BenB, bug 705072 limits HTML parsing to responseType == "document". Legacy usage is resposeType == "". (The spec changed accordingly, too, yesterday.)
That's good, thanks. This also fixes us (bug 705216) and avoids that normal XMLHttpRequests are not affected. Good, thanks.
Target Milestone: --- → mozilla11
(In reply to Henri Sivonen (:hsivonen) from comment #27) > Drafted some initial docs: > https://developer.mozilla.org/en/HTML_in_XMLHttpRequest > > The documentation assumes that the fix for bug 705072 will land. That documentation has been overhauled and cleaned up by lots of people; it should be good.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: