Closed Bug 1286911 Opened 8 years ago Closed 8 years ago

Crash in msvcr120.dll@0xf608 | nsHtml5TreeBuilder::accumulateCharacters

Categories

(Core :: DOM: HTML Parser, defect)

43 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- wontfix
firefox-esr45 50+ fixed
firefox50 + fixed
firefox51 + fixed
firefox52 + fixed

People

(Reporter: philipp, Assigned: hsivonen)

References

Details

(Keywords: crash, regression, sec-critical, Whiteboard: [post-critsmash-triage][adv-main50+][adv-esr45.5+])

Crash Data

Attachments

(2 files, 5 obsolete files)

(deleted), patch
wchen
: review+
abillings
: sec-approval+
Details | Diff | Splinter Review
(deleted), patch
wchen
: review+
abillings
: sec-approval+
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is report bp-6463f3bc-e6b8-4127-a8de-f777c2160714. ============================================================= Crashing Thread (31) Frame Module Signature Source 0 msvcr120.dll msvcr120.dll@0xf608 1 xul.dll nsHtml5TreeBuilder::accumulateCharacters(wchar_t const*, int, int) parser/html/nsHtml5TreeBuilderCppSupplement.h:960 2 xul.dll nsHtml5TreeBuilder::characters(wchar_t const*, int, int) parser/html/nsHtml5TreeBuilder.cpp:454 3 xul.dll nsHtml5Tokenizer::flushChars(wchar_t*, int) parser/html/nsHtml5Tokenizer.cpp:273 4 xul.dll nsHtml5Tokenizer::stateLoop<nsHtml5SilentPolicy>(int, wchar_t, int, wchar_t*, bool, int, int) parser/html/nsHtml5Tokenizer.cpp:3448 5 xul.dll nsHtml5Tokenizer::tokenizeBuffer(nsHtml5UTF16Buffer*) parser/html/nsHtml5Tokenizer.cpp:405 6 xul.dll nsHtml5StreamParser::ParseAvailableData() parser/html/nsHtml5StreamParser.cpp:1414 7 xul.dll nsHtml5StreamListener::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned __int64, unsigned int) parser/html/nsHtml5StreamListener.cpp:80 8 xul.dll nsDocumentOpenInfo::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned __int64, unsigned int) uriloader/base/nsURILoader.cpp:303 crashes with this and similar signatures seem to have gotten far more frequent since firefox 43, so maybe this is related to the change in bug 489820?
Component: General → HTML: Parser
Any thoughts, William?
Flags: needinfo?(wchen)
In ParseAvailableData, we call EnsureBufferSpace right before we tokenize the input and the crash looks like a buffer overflow. EnsureBufferSpace assumes that the worse case buffer size is strBufLen + inputLength + charRefBufLen + 2 (for LT_GT, LT_SOLIDUS and RSQB_RSQB). So I suspect that there are cases in the tokenizer where we are outputting more than the assumed worst case. My guess is that the tokenizer ends up in a state that generates 2 characters multiple times given some input and the worst case assumes assumes we end up in such a state at most once for the input. If this can happen then a very loose upper bound worst case buffer length would be strBufLen + 2 * inputLength + charRefBufLen. Alternatively we can take the patch from bug 489820 [1] that dynamically grows the buffer length. [1] https://bug489820.bmoattachments.org/attachment.cgi?id=8626084
Flags: needinfo?(wchen) → needinfo?(hsivonen)
Crash Signature: nsHtml5TreeBuilder::accumulateCharacters] → nsHtml5TreeBuilder::accumulateCharacters] [@ memcpy | nsHtml5TreeBuilder::accumulateCharacters ]
Group: dom-core-security
(In reply to William Chen [:wchen] from comment #2) > My guess is that the tokenizer ends up in a state that generates 2 > characters multiple times given some input and the worst case assumes > assumes we end up in such a state at most once for the input. If this can > happen then a very loose upper bound worst case buffer length would be > strBufLen + 2 * inputLength + charRefBufLen. I'll try to see where the worst-case calculation went wrong. > Alternatively we can take the patch from bug 489820 [1] that dynamically > grows the buffer length. Except when we'd be doing a large allocation without a way to communicate OOM, so let's try to fix this by fixing the worst-case calculation.
Assignee: nobody → hsivonen
So far, I've been unable to identify a case where LT_GT, LT_SOLIDUS or RSQB_RSQB would be emitted in a loop. The two other possible explanations are: * An accumulation buffer is accidentally emitted twice due to the the first time it's emitted not clearing it. (That emitting doesn't automatically clear a buffer is inherently bug-prone and worth fixing.) * A cstart index getting set when tokenizing one buffer and getting used when tokenizing another buffer. I'll investigate both of these now, even though if either of these was at fault, we should expect people to report missing or duplicate content on pages often and we haven't seen such reports.
(In reply to Henri Sivonen (:hsivonen) from comment #5) > * A cstart index getting set when tokenizing one buffer and getting used > when tokenizing another buffer. This seems impossible.
Next, I'll have to look at the state that can be loaded at speculation boundaries. The worst-case calculation doesn't look at state lurking there!
(In reply to Henri Sivonen (:hsivonen) from comment #7) > Next, I'll have to look at the state that can be loaded at speculation > boundaries. The worst-case calculation doesn't look at state lurking there! Nope, nothing wrong-looking here. The best guesses are still wchen's original guess (which wouldn't have visible symptoms when the buffer happens to be large enough, which is often is due to rounding to the next power of two) and my guess of double emission of a buffer (which would have visible symptoms and is, therefore, less likely).
Mike(tm) Smith found a manifestation of this bug in the W3C Validator thanks to exception logging. This in itself means that the speculation machinery isn't to blame. Additionally, the stack shows that the bug can be seen as a pure tokenizer bug (whereas the Firefox stack does not say whether the tokenizer or the tree builder doesn't match the space calculation). java.lang.ArrayIndexOutOfBoundsException: 2149 at nu.validator.htmlparser.impl.Tokenizer.appendStrBuf(Tokenizer.java:856) at nu.validator.htmlparser.impl.Tokenizer.stateLoop(Tokenizer.java:2527) at nu.validator.htmlparser.impl.Tokenizer.tokenizeBuffer(Tokenizer.java:1322) at nu.validator.htmlparser.io.Driver.runStates(Driver.java:321) at nu.validator.htmlparser.io.Driver.tokenize(Driver.java:219) at nu.validator.htmlparser.sax.HtmlParser.tokenize(HtmlParser.java:503) at nu.validator.htmlparser.sax.HtmlParser.parse(HtmlParser.java:423) at nu.validator.xml.LanguageDetectingXMLReaderWrapper.parse(LanguageDetectingXMLReaderWrapper.java:868) at nu.validator.xml.WiretapXMLReaderWrapper.parse(WiretapXMLReaderWrapper.java:158) at nu.validator.xml.AttributesPermutingXMLReaderWrapper.parse(AttributesPermutingXMLReaderWrapper.java:303) at nu.validator.servlet.VerifierServletTransaction.validate(VerifierServletTransaction.java:1029) at nu.validator.servlet.VerifierServletTransaction.service(VerifierServletTransaction.java:840) at nu.validator.servlet.VerifierServlet.doPost(VerifierServlet.java:270) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) at nu.validator.servlet.MultipartFormDataFilter.doFilter(MultipartFormDataFilter.java:185) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at nu.validator.servlet.InboundGzipFilter.doFilter(InboundGzipFilter.java:60) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at nu.validator.servlet.InboundSizeLimitFilter.doFilter(InboundSizeLimitFilter.java:63) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:300) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745)
Henri suggested this be sec-crit. He's working on a patch.
Keywords: sec-critical
This patch lowers the probability of this sort of problem arising from accidental reuse of buffer contents. We should probably do this anyway even if it didn't fix this bug.
I've deployed a version of the parser that's been patched to be more prone to this bug to validator.nu in the hope of increasing the probability of getting an exception logged there. I don't have my hopes high, though.
Findings: TL;DR: I didn't find the bug. * I found a mismatch between the buffer size and use when commentPolicy is ALTER_INFOSET. Neither Firefox nor the validator run the parser in that mode, so that doesn't explain the stack traces. * As noted earlier, zeroing the buffer length bookkeeping once there's a need to clear the buffer as opposed to zeroing it after the previous contents have been consumed makes the dangers greater in the case of a bug, so we should change that. (I have a patch.) * Some users had volunteered URLs in crash reports. I ran the Java version of the parser using data fetched from a handful these URLs. I had patched the parser to make it extremely likely for the problem to show up (using a 2-char input buffer to the tokenizer to maximize buffer boundaries and reallocating the parser's internal buffers every time as tightly as possible). This didn't show an ArrayIndexOutOfBoundsException. * I think it's not reasonable to keep Firefox in a vulnerable state while researching this further. I think we should either MOZ_CRASH when writing past buffer capacity or reallocate the buffer. The former is worse UX while the latter means that the true bug is more likely to remain undiscovered and unaddressed. I think we should go for the better UX (which may still be bad UX on 32-bit systems due to a resulting OOM crash). We can add telemetry to see if the change in when buffer length is zeroed fixes the bug without the bug having been pinpointed.
Attached patch Java patch (includes translator changes) (obsolete) (deleted) — Splinter Review
Attachment #8783621 - Attachment is obsolete: true
Flags: needinfo?(hsivonen)
Attachment #8788146 - Flags: review?(wchen)
Attached patch Gecko patch (obsolete) (deleted) — Splinter Review
I'm pretty unhappy about settling for defensive code when I couldn't locate the actual bug so that I'd know I've fixed it. Three things are changed: * C++ and Java: The length of intermediate buffers is eagerly set to zero after the buffer has been used are deemed unnecessary to use. * C++ only: Checks for writes past the end of buffer are added back. This restores memory-safety. When appropriate, buffer resizing is attempted, but if it fails we MOZ_CRASH. (The difficulty of propagating OOM status was part of the reason of changing this in the other direction in the first place.) * Java only: Double the size of strBuf when commentPolicy is ALTER_INFOSET.
Attachment #8788149 - Flags: review?(wchen)
Oh, and I realize that it's logically unnecessary to make the translator capable of outputting both MOZ_RELEASE_ASSERT and MOZ_CRASH, but relying on the side effects of the assertion condition seemed stylistically dodgy even when the assertion is never omitted.
Comment on attachment 8788146 [details] [diff] [review] Java patch (includes translator changes) Review of attachment 8788146 [details] [diff] [review]: ----------------------------------------------------------------- It's hard to verify that we're calling clearStrBufAfterUse() everywhere that we need to, but I could find any obvious omissions. ::: src/nu/validator/htmlparser/impl/Tokenizer.java @@ +846,5 @@ > * @param c > * the UTF-16 code unit to append > */ > + @Inline private void appendStrBuf(char c) { > + // CPPONLY: if (strBufLen == strBuf.length) { If we still care about trying to catch the bug regarding worst case buffer length, then we should add a debug assertion here (and in the other appendStrBuf). I think it would be OK for debug builds to crash and non-debug builds to attempt to grow the buffer. @@ +1352,5 @@ > + if (commentPolicy == XmlViolationPolicy.ALTER_INFOSET) { > + // When altering infoset, if the comment contents are consecutive > + // hyphens, each hyphen generates a space, too. These buffer > + // contents never get emitted as characters() to the tokenHandler, > + // whith is why this calculation happens after the call to type-o
Attachment #8788146 - Flags: review?(wchen) → review+
Attachment #8788149 - Flags: review?(wchen) → review+
(In reply to William Chen [:wchen] from comment #18) > Comment on attachment 8788146 [details] [diff] [review] > Java patch (includes translator changes) > > Review of attachment 8788146 [details] [diff] [review]: > ----------------------------------------------------------------- > > It's hard to verify that we're calling clearStrBufAfterUse() everywhere that > we need to, but I could find any obvious omissions. I take it you meant "couldn't". > ::: src/nu/validator/htmlparser/impl/Tokenizer.java > @@ +846,5 @@ > > * @param c > > * the UTF-16 code unit to append > > */ > > + @Inline private void appendStrBuf(char c) { > > + // CPPONLY: if (strBufLen == strBuf.length) { > > If we still care about trying to catch the bug regarding worst case buffer > length, then we should add a debug assertion here (and in the other > appendStrBuf). I think it would be OK for debug builds to crash and > non-debug builds to attempt to grow the buffer. > > @@ +1352,5 @@ > > + if (commentPolicy == XmlViolationPolicy.ALTER_INFOSET) { > > + // When altering infoset, if the comment contents are consecutive > > + // hyphens, each hyphen generates a space, too. These buffer > > + // contents never get emitted as characters() to the tokenHandler, > > + // whith is why this calculation happens after the call to > > type-o Thanks. I've fixed these.
Attachment #8788149 - Attachment is obsolete: true
Attachment #8789252 - Flags: review+
Comment on attachment 8789252 [details] [diff] [review] Gecko patch with review comments addressed, r=wchen [Security approval request comment] > How easily could an exploit be constructed based on the patch? I failed to figure out how to trigger the bug myself, but an attacker might notice something that I missed. > Do comments in the patch, the check-in comment, or tests included > in the patch paint a bulls-eye on the security problem? Two potential types of bugs, yes. How to trigger either condition, no. > Which older supported branches are affected by this flaw? Everything from 43 onwards, including 45-esr. > If not all supported branches, which bug introduced the flaw? Bug 489820. > Do you have backports for the affected branches? If not, how > different, hard to create, and risky will they be? The same patch applies. > How likely is this patch to cause regressions; how much testing does it need? The patch does not introduce regressions in the html5lib test suite, which does a good job at exercising various tokenizer states. However, this does not prove that the patch doesn't break anything. However, due to the way the patch has been constructed, flaws in the patch (around setting buffer fill status to empty) are more likely to manifest in debug builds only. In that sense, the regression risk for release builds should be fairly small.
Attachment #8789252 - Flags: sec-approval?
Comment on attachment 8789253 [details] [diff] [review] Java patch (includes translator changes) with review comments addressed, r=wchen [Security approval request comment] This patch discloses the same change but via a different (non-product) repo: https://hg.mozilla.org/projects/htmlparser/ . I'm not sure if I'm supposed to request sec-approval on this one, but since landing this would disclose the changes, I'm setting the request flag just in case. See comment 21 for details. Note that this de facto needs to land at the same time as the other patch. Otherwise, further changes to the parser would be blocked until this patch landed.
Attachment #8789253 - Flags: sec-approval?
(In reply to Al Billings [:abillings] from comment #17) > Is ESR45 is affected? Yes.
We're waaaay too late for 49. This can go into trunk on 9/27, which is now two weeks into the next cycle.
Whiteboard: [checkin on 9/27]
Attachment #8789252 - Flags: sec-approval? → sec-approval+
Attachment #8789253 - Flags: sec-approval? → sec-approval+
Whiteboard: [checkin on 9/27]
I can't redo the investigation/fix right now. Here's the old patch with the mitigations only (that put an end to the vulnerability) and with the parts that tried to address a potential root cause simply deleted (the parts that turned out to be broken when exposed to a larger set of tests than I ran locally).
Attachment #8789253 - Attachment is obsolete: true
Flags: needinfo?(hsivonen)
Attachment #8795633 - Flags: review?(wchen)
Attachment #8795633 - Flags: review?(wchen) → review+
Attachment #8795634 - Flags: review?(wchen) → review+
This is fine for now, but we need to remember to add those assertions back and investigate the failure.
Comment on attachment 8795634 [details] [diff] [review] Only mitigate, don't try to fix the root cause for now, Gecko version [Security approval request comment] > How easily could an exploit be constructed based on the patch? I failed to figure out how to trigger the bug myself, but an attacker might notice something that I missed. > Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? General type of problem, yes. How to trigger it, no. > Which older supported branches are affected by this flaw? Everything from 43 onwards, including 45-esr. > If not all supported branches, which bug introduced the flaw? Bug 489820. > Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? The same patch should apply as-is. (If not, the rebase is trivial.) > How likely is this patch to cause regressions; how much testing does it need? Unlikely to cause regressions. This patch just adds buffer bound checks and doesn't try to address the root cause. Attempting to address a guess of a root cause is what caused the tree to burn on previous attempt.
Attachment #8795634 - Flags: sec-approval?
Comment on attachment 8795633 [details] [diff] [review] Only mitigate, don't try to fix the root cause for now, Java version [Security approval request comment] See comment 31. This is the htmlparser repo patch for the same thing.
Attachment #8795633 - Flags: sec-approval?
Comment on attachment 8795634 [details] [diff] [review] Only mitigate, don't try to fix the root cause for now, Gecko version sec-approval+ for trunk. We'll want this on Aurora, Beta, and ESR45, right?
Attachment #8795634 - Flags: sec-approval? → sec-approval+
Attachment #8795633 - Flags: sec-approval? → sec-approval+
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Comment on attachment 8795634 [details] [diff] [review] Only mitigate, don't try to fix the root cause for now, Gecko version [Approval Request Comment] > If this is not a sec:{high,crit} bug, please state case for ESR consideration: sec-critical > User impact if declined: Write past the end of buffer. > Fix Landed on Version: 52 > Risk to taking this patch (and alternatives if risky): Very low. > String or UUID changes made by this patch: None. Approval Request Comment > [Feature/regressing bug #]: Bug 489820. > [User impact if declined]: Write past the end of buffer. > [Describe test coverage new/current, TreeHerder]: The test suite parses a lot of HTML. However, the actual OOM situation isn't tested. > [Risks and why]: Very low. > [String/UUID change made/needed]: None. NOTE TO SHERIFF: This patch contains an accidental whitespace-only change to parser/html/javasrc/HtmlAttributes.java and that change doesn't apply cleanly to old branches. Letting that change be rejected on branches is harmless. Sorry about the inconvenience. (I'm not creating a new patch in order to make it easier to keep track of the flags already set on this patch.)
Attachment #8795634 - Flags: approval-mozilla-esr45?
Attachment #8795634 - Flags: approval-mozilla-beta?
Attachment #8795634 - Flags: approval-mozilla-aurora?
Comment on attachment 8795634 [details] [diff] [review] Only mitigate, don't try to fix the root cause for now, Gecko version Sec-crit, Aurora51+, Beta50+, ESR45+
Attachment #8795634 - Flags: approval-mozilla-esr45?
Attachment #8795634 - Flags: approval-mozilla-esr45+
Attachment #8795634 - Flags: approval-mozilla-beta?
Attachment #8795634 - Flags: approval-mozilla-beta+
Attachment #8795634 - Flags: approval-mozilla-aurora?
Attachment #8795634 - Flags: approval-mozilla-aurora+
Depends on: 1307645
No longer depends on: 1307645
hi, the patch landed in 50.0b5, but this crash still appears to show up under the [@ nsHtml5TreeBuilder::accumulateCharacters ] signature now, like in https://crash-stats.mozilla.com/report/index/598b2498-f9b3-4890-bfc7-a30922161010
Crash Signature: nsHtml5TreeBuilder::accumulateCharacters] [@ memcpy | nsHtml5TreeBuilder::accumulateCharacters ] → nsHtml5TreeBuilder::accumulateCharacters] [@ memcpy | nsHtml5TreeBuilder::accumulateCharacters] [@ nsHtml5TreeBuilder::accumulateCharacters]
Flags: needinfo?(hsivonen)
(In reply to [:philipp] from comment #41) > hi, the patch landed in 50.0b5, but this crash still appears to show up > under the [@ nsHtml5TreeBuilder::accumulateCharacters ] signature now, like > in > https://crash-stats.mozilla.com/report/index/598b2498-f9b3-4890-bfc7- > a30922161010 This is expected. Now the crash isn't exploitable (except as denial of service). Removing the crash needs a follow-up, i.e. clean-up of the parts that burned the tree in comment 27 and were omitted from the later re-landing. Filed bug 1309195 as the follow-up. The bug description is intentionally vague pending the disclosure of this one.
Crash Signature: nsHtml5TreeBuilder::accumulateCharacters] [@ memcpy | nsHtml5TreeBuilder::accumulateCharacters] [@ nsHtml5TreeBuilder::accumulateCharacters] → nsHtml5TreeBuilder::accumulateCharacters] [@ memcpy | nsHtml5TreeBuilder::accumulateCharacters]
Flags: needinfo?(hsivonen)
Group: dom-core-security → core-security-release
Blocks: 489820
Keywords: regression
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main50+][adv-esr45.5+]
(In reply to Henri Sivonen (:hsivonen) from comment #9) > Mike(tm) Smith found a manifestation of this bug in the W3C Validator thanks > to exception logging. FWIW, that turned out to be a separate bug that only existed in the validator branch of the parser and never existed in Firefox.
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: