(line numbers based on revision 6f8f9de6efb7, i think)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 3

•

15 years ago

Attached patch Address review comments in nsHtml5Parser.cpp (obsolete) (deleted) — Details — Splinter Review

Attachment #432131 - Flags: review?(jonas)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

•

15 years ago

nsHtml5TreeOpExecutor.cpp Again there are lots of lines over 80 chars wide. I'd say for now lets leave that as it's harder to review patches that contain whitespace cleanup together with actual changes. But we should at some point go through and fix up this. And please make sure to not use over long lines in new code. nsHtml5TreeOpExecutor::InitializeStatics 101 if (sTreeOpQueueMinLength <= 0) { 102 sTreeOpQueueMinLength = 200; 103 } 104 if (sTreeOpQueueLengthLimit < sTreeOpQueueMinLength) { 105 sTreeOpQueueLengthLimit = sTreeOpQueueMinLength; 106 } 107 if (sTreeOpQueueMaxLength < sTreeOpQueueMinLength) { 108 sTreeOpQueueMaxLength = sTreeOpQueueMinLength; 109 } 110 if (sTreeOpQueueMaxTime <= 0) { 111 sTreeOpQueueMaxTime = 200; 112 } These adjustments won't help very much given that if the pref is changed they won't be rerun, right? Maybe you should install a observer for "html5.opqueue" and always update the prefs and run these adjustments if anything under that changes. nsHtml5TreeOpExecutor::DidBuildModel 139 if (!aTerminated) { 140 // Break out of update batch if we are in one 141 // and aren't forcibly terminating 142 EndDocUpdate(); For my own understanding, why should we not exit the current patch if we are forcably terminated? When can we get terminated during a bach, aren't we preventing scripts from running during a batch? 153 // This is comes from nsXMLContentSink Remove the "is". 180 nsHtml5TreeOpExecutor::WillInterrupt() 181 { 182 return WillInterruptImpl(); 183 } So it sounds like all the WillInterrupt stuff is going away from html5 parsing, is this correct? nsHtml5TreeOpExecutor::SetDocumentCharset 207 if (mDocShell) { 208 // the following logic to get muCV is copied from 209 // nsHTMLDocument::StartDocumentLoad 210 // We need to call muCV->SetPrevDocCharacterSet here in case 211 // the charset is detected by parser DetectMetaTag 212 nsCOMPtr<nsIMarkupDocumentViewer> muCV; Please call this variable 'mucv' or some such. As it's named now it looks confusingly similar to an mFoo member variable. nsHtml5TreeOpExecutor::UpdateStyleSheet This function should probably be renamed to HandleLinkElement or some such. 277 // Break out of the doc update created by Flush() to zap a runnable 278 // waiting to call UpdateStyleSheet without the right observer This comment doesn't make sense to me. Sounds like someone is intentionally waiting to "call UpdateStyleSheet without the right observer". nsHtml5TreeOpExecutor::Flush 380 if (aForceWholeQueue) { 381 if (numberOfOpsToFlush > (PRUint32)sTreeOpQueueMinLength) { 382 flushStart = PR_IntervalNow(); // compute averages only if enough ops 383 } 384 } else { 385 if (numberOfOpsToFlush > (PRUint32)sTreeOpQueueMinLength) { 386 flushStart = PR_IntervalNow(); // compute averages only if enough ops 387 if (numberOfOpsToFlush > (PRUint32)sTreeOpQueueLengthLimit) { 388 numberOfOpsToFlush = (PRUint32)sTreeOpQueueLengthLimit; 389 reflushNeeded = PR_TRUE; 390 } 391 } 392 } You can simplify this to: if (numberOfOpsToFlush > (PRUint32)sTreeOpQueueMinLength) { flushStart = PR_IntervalNow(); // compute averages only if enough ops if (!aForceWholeQueue && numberOfOpsToFlush > (PRUint32)sTreeOpQueueLengthLimit) { numberOfOpsToFlush = (PRUint32)sTreeOpQueueLengthLimit; reflushNeeded = PR_TRUE; } } Though see below. Editorial: There is whitespace at the end of line 383 here. 398 for (nsHtml5TreeOperation* iter = (nsHtml5TreeOperation*)start; iter < end; ++iter) { 399 if (NS_UNLIKELY(!mParser)) { 400 // The previous tree op caused a call to nsIParser::Terminate(); 401 break; 402 } How could we end up terminating here? Isn't the idea that we won't be running script while inside the doc-update? Is this due to charset switching or something else? 413 if (flushStart) { 414 PRUint32 delta = PR_IntervalToMilliseconds(PR_IntervalNow() - flushStart); 415 sTreeOpQueueLengthLimit = delta ? 416 (PRUint32)(((PRUint64)sTreeOpQueueMaxTime * (PRUint64)numberOfOpsToFlush) 417 / delta) : 418 sTreeOpQueueMaxLength; // if the delta is less than one ms, use max 419 if (sTreeOpQueueLengthLimit < sTreeOpQueueMinLength) { 420 // both are signed and sTreeOpQueueMinLength is always positive, so this 421 // also takes care of the theoretical overflow of sTreeOpQueueLengthLimit 422 sTreeOpQueueLengthLimit = sTreeOpQueueMinLength; 423 } 424 if (sTreeOpQueueLengthLimit > sTreeOpQueueMaxLength) { 425 sTreeOpQueueLengthLimit = sTreeOpQueueMaxLength; 426 } 427 #ifdef DEBUG_NS_HTML5_TREE_OP_EXECUTOR_FLUSH 428 printf("FLUSH DURATION (millis): %d\n", delta); 429 printf("QUEUE NEW MAX LENGTH: %d\n", sTreeOpQueueLengthLimit); 430 #endif 431 } I'm definitely nervous about changing the way we back off to the main event loop. Specifically things like checking if there are pending user events seems like a good idea but are no longer happening. While we can't duplicate exactly what the old parser does, since we can't have callbacks for each parsed token, I don't see why we couldn't keep approximately the same strategy. I.e. start measuring time when entering nsHtml5TreeOpExecutor::Flush or some such, then treat iteration through the "perform" loop do what we currently do in nsContentSink::DidProcessATokenImpl. I need to review flushing more in detail to see how well this matches. I definitely agree that we likely will want to change things there, but I'd rather do that orthogonally to changing the parser so that we don't risk improvements in one patch masking regressions in another. We also don't have a good way to measure responsiveness automatically, so messing around with this stuff means that we need to put in a lot more manual QAing. 441 if (scriptElement) { 442 NS_ASSERTION(!reflushNeeded, "Got scriptElement when queue not fully flushed."); 443 RunScript(scriptElement); // must be tail call when mFlushState is eNotFlushing What's preventing that assertion from firing? I.e. what is preventing finding a script to execute after we've set reflushNeeded to true. nsHtml5TreeOpExecutor::ProcessBASETag(nsIContent* aContent) 459 if (mHasProcessedBase) { 460 return NS_OK; 461 } 462 mHasProcessedBase = PR_TRUE; Hmm.. won't this mean that: <base target="..."> <base href="..."> won't work? I.e. we'll ignore the second base element here, despite that being the first one with a href. Steps 3 and 4 of http://www.whatwg.org/specs/web-apps/current-work/#document-base-url seems to indicate otherwise. Same if the elements are reversed but for the target attribute. nsHtml5TreeOpExecutor::DocumentMode Maybe call this SetDocumentMode or UpdateDocumentMode or some such.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 8

•

15 years ago

Attached patch Address review comments in nsHtml5TreeOpExecutor.cpp (deleted) — Details — Splinter Review

•

15 years ago

You're also using the charset detector off the main thread, which I suspect isn't safe. In fact, since you are creating it on the main thread, and then calling it from the parser thread, I'm surprised this doesn't assert as it doesn't seem to be implemented using threadsafe refcounting macros. This is also something we need to fix before we can turn on by default. Do you want me to file a separate bug on this charset threadsafety stuff?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 19

•

15 years ago

nsHtml5StreamParser::SniffStreamBytes 340 for (PRUint32 i = 0; i < aCount; i++) { 341 switch (mBomState) { ... 390 case SEEN_UTF_8_SECOND_BYTE: 391 if (aFromSegment[i] == 0xBF) { 392 rv = SetupDecodingFromBom("UTF-8", "UTF-8"); // upper case is the raw form 393 NS_ENSURE_SUCCESS(rv, rv); 394 PRUint32 count = aCount - (i + 1); 395 rv = WriteStreamBytes(aFromSegment + (i + 1), count, &writeCount); 396 NS_ENSURE_SUCCESS(rv, rv); 397 *aWriteCount = writeCount + (i + 1); 398 return rv; 399 } 400 mBomState = BOM_SNIFFING_OVER; 401 break; 402 default: 403 goto bom_loop_end; 404 } 405 } 406 // if we get here, there either was no BOM or the BOM sniffing isn't complete yet 407 bom_loop_end: Change the loop-condition to be |i < aCount && mBomState != BOM_SNIFFING_OVER| and change the default condition to set |mBomState = BOM_SNIFFING_OVER| and get rid of the goto? http://xkcd.com/292/ :) 418 if (mUnicodeDecoder) { 419 mUnicodeDecoder->SetInputErrorBehavior(nsIUnicodeDecoder::kOnError_Recover); It seems like the responsibility for this call varies with where we instantiate mUnicodeDecoder. Would be nice to make it consistent and always make it the responsibility of whoever created the decoder. Or do we need to set it to different ™ nsHtml5StreamParser::WriteStreamBytes 475 if (NS_FAILED(convResult)) { Could you add a comment describing what we do on failure. Took a while for me to figure out. 476 if (totalByteCount < aCount) { // mimicking nsScanner even though this seems wrong 477 ++totalByteCount; 478 ++aFromSegment; Could you assert here? I agree this seems wrong. 479 } 480 mLastBuffer->getBuffer()[end] = 0xFFFD; 481 ++end; 482 mLastBuffer->setEnd(end); 483 if (end == NS_HTML5_STREAM_PARSER_READ_BUFFER_SIZE) { 484 mLastBuffer = (mLastBuffer->next = new nsHtml5UTF16Buffer(NS_HTML5_STREAM_PARSER_READ_BUFFER_SIZE)); No need for the parenthesis. a = b = c; is ok. 485 } 486 mUnicodeDecoder->Reset(); 487 if (totalByteCount == aCount) { 488 *aWriteCount = totalByteCount; 489 return NS_OK; 490 } 491 } else if (convResult == NS_PARTIAL_MORE_OUTPUT) { 492 mLastBuffer = (mLastBuffer->next = new nsHtml5UTF16Buffer(NS_HTML5_STREAM_PARSER_READ_BUFFER_SIZE)); 493 NS_ASSERTION(totalByteCount < aCount, "The Unicode decoder has consumed too many bytes."); Same here nsHtml5StreamParser::internalEncodingDeclaration The name of this function seems very strange to me. Also, it's not documented anywhere. Is this called any time we find a meta-charset? If so, would calling it something like MetaCharsetFound make sense? 700 NS_ASSERTION(IsParserThread(), "Wrong thread!"); 701 if (mCharsetSource >= kCharsetFromMetaTag) { // this threshold corresponds to "confident" in the HTML5 spec 702 return; 703 } 704 705 // The encodings are different. This comment threw me off for a while. It doesn't appear to be correct, we haven't done any equality checks at that point, have we. Shouldn't it be moved until after the calias->Equals(newEncoding, mCharset, &eq); call? 739 // we still want to reparse 740 mTreeBuilder->NeedsCharsetSwitchTo(preferred); 741 mTreeBuilder->Flush(); The 'still' here threw me off a little bit too. Isn't this the first point where we've determined that we want to reparse? nsHtml5StreamParser::ParseAvailableData 762 if (!mFirstBuffer->hasMore()) { 763 if (mFirstBuffer == mLastBuffer) { 764 switch (mStreamState) { 765 case STREAM_BEING_READ: 766 // never release the last buffer. 767 if (!mSpeculating) { 768 // reuse buffer space if not speculating 769 mFirstBuffer->setStart(0); 770 mFirstBuffer->setEnd(0); 771 } 772 mTreeBuilder->FlushLoads(); 773 // Dispatch this runnable unconditionally, because the loads 774 // that need flushing may have been flushed earlier even if the 775 // flush right above here did nothing. 776 if (NS_FAILED(NS_DispatchToMainThread(mLoadFlusher))) { 777 NS_WARNING("failed to dispatch load flush event"); 778 } 779 return; // no more data for now but expecting more 780 case STREAM_ENDED: 781 if (mAtEOF) { 782 return; 783 } 784 mAtEOF = PR_TRUE; 785 mTokenizer->eof(); 786 mTreeBuilder->StreamEnded(); 787 mTreeBuilder->Flush(); 788 if (NS_FAILED(NS_DispatchToMainThread(mExecutorFlusher))) { 789 NS_WARNING("failed to dispatch executor flush event"); 790 } 791 return; // no more data and not expecting more 792 default: 793 NS_NOTREACHED("It should be impossible to reach this."); 794 return; 795 } 796 } else { 797 mFirstBuffer = mFirstBuffer->next; 798 continue; 799 } 800 } The else on line 796 is actually an else-after-return. Would probably be clearer to change those 'return's into 'break's and put a return after the switch statement and remove the else. The optimizer should turn it into the same flow anyway. The timer stuff is very strange. It seems like it keeps refiring on the main event loop, weather we've parsed anything or not, until we happen to be speculating at the time when the timer fires, or we finish parsing. (This means that the "just in case" comments at the various points when we cancel and restart the timer isn't really true, as we have no idea if the timer is currently pending or not.) It seems to me much better to have the timer running on the parser thread first of all. Also, to save batteries, only start the timer if we have pending things that need flushing. I.e. when we reach the end of a network buffer without getting blocked by a script (and maybe when we get unblocked with a successful speculation, or do we always flush what we have parsed in that case?) I think I'd like to have this timer situation solved before turning the parser on by default.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 20

•

15 years ago

Attached patch Address review comments in nsHtml5StreamParser.cpp (deleted) — Details — Splinter Review

(In reply to comment #9) > Another thing is to get rid of the the reconsume stuff, it should be faster to > rewind the reading iterator one char. I refreshed my memory. The reason why the reconsume pattern exists in the tokenizer is that in the error reporting mode (that the validator uses but Gecko doesn't), reading a character isn't idompotent, because reading a character can have the side effect of emitting an error. In the case of the meta scanner, the data source abstraction currently in use doesn't provide a way to rewind. In both cases, the reconsume pattern will go away in C++ on bug 482920 is fixed. I hope this isn't considered a blocker to turning the parser on by default. (In reply to comment #16) > I still think we need to change how we're doing element/attribute/entity names, > however I'll take that to a separate bug. FWIW, I wanted to put element and attribute names in the data segment of the shared lib but couldn't because static atoms aren't in the data segment and, therefore, don't have memory addresses at link time. > More review comments: > > What is the purpose of nsHtml5RefPtr? Why not just use nsRefPtr? I see that bnewman answered this question already. > We need to get rid of nsHtml5AtomTable.cpp and the "fake" nsIAtoms. Do you mean that we just need to make the fake nsIAtoms not implement nsIAtom or do you want to get rid of the nsHtml5AtomTable mechanism more generally? (I'm OK with the former but want to avoid the latter.) > It's very > confusing that we in part of the code have atom like things that aren't really > atoms. If you don't need these things to be unique (which my understanding is > that you don't?), Even dynamically allocated local names need to be "unique" as far as an nsHtml5Tokenizer instance or an nsHtml5TreeBuilder instance can tell: to deal with duplicate attribute names and to make end tag names match start tag names. > then simply using pointers to nsStrings or even > nsStringBuffers seems like a cleaner solution. We could use a class similar to > nsAttrName that wraps either an nsStringBuffer or an nsIAtom depending on if > the low-bit is set. Or some such. This can be done after turning on by default > though. This is bug 539433. > nsHtml5StateSnapshot: > Might be cleaner to just make all those members public and avoid all the getter > methods. The getters are there so that the getters can be part of a Java interface / nsAHtml5TreeBuilderState that both the state snapshot and the tree builder implement. > Would be nice if getListLength had something about formatting elements in its > name (unless you remove it entirely, see previous point). I'll address this in a follow-up patch. > 157 if (!!listOfActiveFormattingElements[i]) { > > No need for the !! That's generated code. The translator makes the C++ code avoid !== and == expressions where one of the operands is zero. However, it doesn't check if the resulting expression is used in a context where the potentially resulting !! is useless. The translator uses the visitor pattern, so the visits for the !== and == expressions don't know about the context of the expression. In order to avoid spending excessive time on making the translator more elegant than it needs to be to work, I haven't tried to get rid of useless but harmless !!. (There's even |if (!!this)| in one generated file...) > nsHtml5StreamParser: > 197 nsHtml5StreamParser::~nsHtml5StreamParser() > 198 { > 199 NS_ASSERTION(NS_IsMainThread(), "Wrong thread!"); > 200 mTokenizer->end(); > 201 mRequest = nsnull; > 202 mObserver = nsnull; > 203 mUnicodeDecoder = nsnull; > 204 mSniffingBuffer = nsnull; > 205 mMetaScanner = nsnull; > 206 mFirstBuffer = nsnull; > 207 mExecutor = nsnull; > 208 mTreeBuilder = nsnull; > 209 mTokenizer = nsnull; > 210 mOwner = nsnull; > 211 if (mFlushTimer) { > 212 mFlushTimer->Cancel(); > 213 mFlushTimer = nsnull; > 214 } > 215 } > > Is there a reason to null all these things out? Mainly to distinguish an already-deleted object more easily in a debugger. Shall I make these #ifdef DEBUG? > 238 > nsHtml5StreamParser::SetupDecodingAndWriteSniffingBufferAndCurrentSegment(const > PRUint8* aFromSegment, // can be null > 239 > PRUint32 aCount, > 240 > PRUint32* aWriteCount) > 241 { > 242 NS_ASSERTION(IsParserThread(), "Wrong thread!"); > 243 nsresult rv = NS_OK; > 244 nsCOMPtr<nsICharsetConverterManager> convManager = > do_GetService(NS_CHARSETCONVERTERMANAGER_CONTRACTID, &rv); > 245 NS_ENSURE_SUCCESS(rv, rv); > 246 rv = convManager->GetUnicodeDecoder(mCharset.get(), > getter_AddRefs(mUnicodeDecoder)); > > I don't think that the charset converter manager is threadsafe yet. I believe I made the part of the charset converter manager that the HTML5 parser uses thread-safe in the off-the-main-thread parser landing. https://hg.mozilla.org/mozilla-central/rev/7cda86954b4c > Nor are > some of the decoders I looked at. I suspect the simplest fix is to make them > so, it shouldn't be too hard. In the changeset referenced above, I made the instantiation of nsOneByteDecoderSupport objects protected by a mutex. Once a given one-byte converter has loaded its tables, it has no mutable state and is, therefore, thread-safe. When I looked at the multibyte decoders, it seemed to me that each instance was safe for using from a single thread and there was no issue with loading data tables because the data tables were baked into code ahead of compile time. > This we need to fix before turning it on by default as to avoid random crashes. Are there particular decoders that I have missed? > 248 mCharset.Assign("windows-1252"); // lower case is the raw form > > Use AssignLiteral (This might apply elsewhere too) Fixed. (Both occurrences.) > 979 nsHtml5StreamParser::ContinueAfterFailedCharsetSwitch() > 980 { > 981 NS_ASSERTION(NS_IsMainThread(), "Wrong thread!"); > 982 mozilla::MutexAutoLock tokenizerAutoLock(mTokenizerMutex); > 983 Uninterrupt(); > 984 nsCOMPtr<nsIRunnable> event = new nsHtml5StreamParserContinuation(this); > 985 if (NS_FAILED(mThread->Dispatch(event, nsIThread::DISPATCH_NORMAL))) { > 986 NS_WARNING("Failed to dispatch nsHtml5StreamParserContinuation"); > 987 } > 988 } > > Is there a reason to uninterrupt from the main thread? Why not do it from the > parser thread as to avoid blocking the main thread. When calling Uninterrupt(), the main thread already owns mTokenizerMutex and Uninterrupt() doesn't acquire an additional mutex, so no additional blocking occurs. (See the comment in the implementation of Uninterrupt().) > nsHtml5StreamParser::ContinueAfterScripts > > 864 if (mSpeculations.IsEmpty()) { > 865 // Not quite sure how exactly this happens... > 866 // Maybe an artifact of defer scripts? > 867 return; > 868 } > > Sounds like an assertion would be good. I put an NS_WARNING here. > 902 mozilla::MutexAutoLock tokenizerAutoLock(mTokenizerMutex); > 903 #ifdef DEBUG > 904 nsCOMPtr<nsIThread> mainThread; > 905 NS_GetMainThread(getter_AddRefs(mainThread)); > 906 mAtomTable.SetPermittedLookupThread(mainThread); > 907 #endif > > Please don't indent unless there is an actual block. Might actually be good to > add a block to avoid mainThread existing only in debug builds. Added a block. (In reply to comment #18) > You're also using the charset detector off the main thread, which I suspect > isn't safe. In fact, since you are creating it on the main thread, and then > calling it from the parser thread, I'm surprised this doesn't assert as it > doesn't seem to be implemented using threadsafe refcounting macros. > > This is also something we need to fix before we can turn on by default. Do you > want me to file a separate bug on this charset threadsafety stuff? Yes, a separate bug would be good to have if there really is a thread-safety problem left. As far as I can tell, there is no problem: After bug , the detector is refcounted only from the main thread. The sniffing is run only from the parser thread. The sniffing seems to affect only the internal state of the object. The data tables shared by the instances are baked into code ahead of compile time. (In reply to comment #19) > nsHtml5StreamParser::SniffStreamBytes > > 340 for (PRUint32 i = 0; i < aCount; i++) { > 341 switch (mBomState) { > ... > 390 case SEEN_UTF_8_SECOND_BYTE: > 391 if (aFromSegment[i] == 0xBF) { > 392 rv = SetupDecodingFromBom("UTF-8", "UTF-8"); // upper case is the > raw form > 393 NS_ENSURE_SUCCESS(rv, rv); > 394 PRUint32 count = aCount - (i + 1); > 395 rv = WriteStreamBytes(aFromSegment + (i + 1), count, > &writeCount); > 396 NS_ENSURE_SUCCESS(rv, rv); > 397 *aWriteCount = writeCount + (i + 1); > 398 return rv; > 399 } > 400 mBomState = BOM_SNIFFING_OVER; > 401 break; > 402 default: > 403 goto bom_loop_end; > 404 } > 405 } > 406 // if we get here, there either was no BOM or the BOM sniffing isn't > complete yet > 407 bom_loop_end: > > Change the loop-condition to be |i < aCount && mBomState != BOM_SNIFFING_OVER| > and change the default condition to set |mBomState = BOM_SNIFFING_OVER| and get > rid of the goto? http://xkcd.com/292/ :) Changed. I wish C++ had break with label. > 418 if (mUnicodeDecoder) { > 419 > mUnicodeDecoder->SetInputErrorBehavior(nsIUnicodeDecoder::kOnError_Recover); > > It seems like the responsibility for this call varies with where we instantiate > mUnicodeDecoder. Would be nice to make it consistent and always make it the > responsibility of whoever created the decoder. Or do we need to set it to > different ™ Good catch. Fixed. > nsHtml5StreamParser::WriteStreamBytes > > 475 if (NS_FAILED(convResult)) { > > Could you add a comment describing what we do on failure. Took a while for me > to figure out. Added comment. > 476 if (totalByteCount < aCount) { // mimicking nsScanner even though > this seems wrong > 477 ++totalByteCount; > 478 ++aFromSegment; > > Could you assert here? I agree this seems wrong. I added an NS_NOTREACHED in an else branch instead of merely asserting to avoid a buffer overrun if a decoder misbehaves. FWIW, the docs at http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/intl/uconv/public/nsIUnicodeDecoder.h#123 are not good enough. > 479 } > 480 mLastBuffer->getBuffer()[end] = 0xFFFD; > 481 ++end; > 482 mLastBuffer->setEnd(end); > 483 if (end == NS_HTML5_STREAM_PARSER_READ_BUFFER_SIZE) { > 484 mLastBuffer = (mLastBuffer->next = new > nsHtml5UTF16Buffer(NS_HTML5_STREAM_PARSER_READ_BUFFER_SIZE)); > > No need for the parenthesis. a = b = c; is ok. Changed. > 485 } > 486 mUnicodeDecoder->Reset(); > 487 if (totalByteCount == aCount) { > 488 *aWriteCount = totalByteCount; > 489 return NS_OK; > 490 } > 491 } else if (convResult == NS_PARTIAL_MORE_OUTPUT) { > 492 mLastBuffer = (mLastBuffer->next = new > nsHtml5UTF16Buffer(NS_HTML5_STREAM_PARSER_READ_BUFFER_SIZE)); > 493 NS_ASSERTION(totalByteCount < aCount, "The Unicode decoder has > consumed too many bytes."); > > Same here Changed. > nsHtml5StreamParser::internalEncodingDeclaration > > The name of this function seems very strange to me. Also, it's not documented > anywhere. The documentation is over here: http://hg.mozilla.org/projects/htmlparser/file/7df2f945c5b7/src/nu/validator/htmlparser/common/EncodingDeclarationHandler.java#l36 > Is this called any time we find a meta-charset? Yes, it is. > If so, would calling it something like MetaCharsetFound make sense? "Meta charset" is not HTML5 spec terminology. "[character] encoding declaration" is spec terminology. That's why the method is named the way it is named. In general, I have tried to stick to spec terminology when naming things even when the spec terminology deviates from colloquial usage. (Consider "entities" vs. "named character refernces".) > 700 NS_ASSERTION(IsParserThread(), "Wrong thread!"); > 701 if (mCharsetSource >= kCharsetFromMetaTag) { // this threshold > corresponds to "confident" in the HTML5 spec > 702 return; > 703 } > 704 > 705 // The encodings are different. > > This comment threw me off for a while. It doesn't appear to be correct, we > haven't done any equality checks at that point, have we. Shouldn't it be moved > until after the calias->Equals(newEncoding, mCharset, &eq); call? Removed the comment. > 739 // we still want to reparse > 740 mTreeBuilder->NeedsCharsetSwitchTo(preferred); > 741 mTreeBuilder->Flush(); > > The 'still' here threw me off a little bit too. Isn't this the first point > where we've determined that we want to reparse? My thinking here was that we presumptively want to reparse but decide not to if the new encoding is an alias for the current encoding. Removed the comment. > nsHtml5StreamParser::ParseAvailableData > > 762 if (!mFirstBuffer->hasMore()) { > 763 if (mFirstBuffer == mLastBuffer) { > 764 switch (mStreamState) { > 765 case STREAM_BEING_READ: > 766 // never release the last buffer. > 767 if (!mSpeculating) { > 768 // reuse buffer space if not speculating > 769 mFirstBuffer->setStart(0); > 770 mFirstBuffer->setEnd(0); > 771 } > 772 mTreeBuilder->FlushLoads(); > 773 // Dispatch this runnable unconditionally, because the loads > 774 // that need flushing may have been flushed earlier even if the > 775 // flush right above here did nothing. > 776 if (NS_FAILED(NS_DispatchToMainThread(mLoadFlusher))) { > 777 NS_WARNING("failed to dispatch load flush event"); > 778 } > 779 return; // no more data for now but expecting more > 780 case STREAM_ENDED: > 781 if (mAtEOF) { > 782 return; > 783 } > 784 mAtEOF = PR_TRUE; > 785 mTokenizer->eof(); > 786 mTreeBuilder->StreamEnded(); > 787 mTreeBuilder->Flush(); > 788 if (NS_FAILED(NS_DispatchToMainThread(mExecutorFlusher))) { > 789 NS_WARNING("failed to dispatch executor flush event"); > 790 } > 791 return; // no more data and not expecting more > 792 default: > 793 NS_NOTREACHED("It should be impossible to reach this."); > 794 return; > 795 } > 796 } else { > 797 mFirstBuffer = mFirstBuffer->next; > 798 continue; > 799 } > 800 } > > The else on line 796 is actually an else-after-return. Would probably be > clearer to change those 'return's into 'break's and put a return after the > switch statement and remove the else. The optimizer should turn it into the > same flow anyway. Removed the else. I tried changing the returns into breaks, but when I looked at the result, it looked less obvious than just returning instead of breaking first. > The timer stuff is very strange. It seems like it keeps refiring on the main > event loop, weather we've parsed anything or not, until we happen to be > speculating at the time when the timer fires, or we finish parsing. The timer is supposed to keep refiring whenever the parser is not speculating. When the parser is not speculating, it's useful to move data to the main thread every now and then even when there's no </script> in sight. > (This means that the "just in case" comments at the various points when we > cancel and restart the timer isn't really true, as we have no idea if the timer > is currently pending or not.) Removed the comments. > It seems to me much better to have the timer running on the parser thread first > of all. I put the timer on the main thread, because I couldn't find a timer implementation that'd fire on a non-main thread. Also, initially I wanted also to check if the document is in the frontmost tab when the timer fires, but I gave up on that idea. > Also, to save batteries, only start the timer if we have pending things > that need flushing. I.e. when we reach the end of a network buffer without > getting blocked by a script For practical purposes, this is approximated by sTimerStartDelay being longer than the subsequent delays. > (and maybe when we get unblocked with a successful > speculation, or do we always flush what we have parsed in that case?) The parser always flushes when a speculation succeeds. The timer is strictly about non-speculative parsing. > I think I'd like to have this timer situation solved before turning the parser > on by default. That seems problematic if we don't already have off-the-main-thread timer infrastructure in place.

Attachment #437849 - Flags: review?(jonas)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

•

15 years ago

nsHtml5TreeBuilder.cpp The big thing here is that I'm worried that all the switch statements with loads of fallthroughs and gotos are going to make it incredibly hard to work with this code in the future. Is there any evidence that the fallthroughs are saving any cycles? If not I think I'd prefer to keep the code sane so that we can modify it in the future when needed. The gotos are harder due to the deep nesting of switch statements and loops making 'break' largely useless. I think there is a general over use of switch statements though. In many cases switch statements could be converted to normal if-else statements. In some cases the 'else' part isn't even needed. I've commented on a few specific instances below, but it's a more general problem. I do like the function names in this class though, in general much easier to understand what's going on without reading all the callers and/or function implementations. nsHtml5TreeBuilder::doctype 124 needToDropLF = PR_FALSE; 125 for (; ; ) { 126 switch(foreignFlag) { 127 case NS_HTML5TREE_BUILDER_IN_FOREIGN: { 128 goto doctypeloop_end; 129 } 130 default: { 131 switch(mode) { 132 case NS_HTML5TREE_BUILDER_INITIAL: { 133 nsString* emptyString = nsHtml5Portability::newEmptyString(); 134 appendDoctypeToDocument(!name ? nsHtml5Atoms::emptystring : name, !publicIdentifier ? emptyString : publicIdentifier, !systemIdentifier ? emptyString : systemIdentifier); 135 nsHtml5Portability::releaseString(emptyString); 136 if (isQuirky(name, publicIdentifier, systemIdentifier, forceQuirks)) { 137 138 documentModeInternal(QUIRKS_MODE, publicIdentifier, systemIdentifier, PR_FALSE); 139 } else if (isAlmostStandards(publicIdentifier, systemIdentifier)) { 140 141 documentModeInternal(ALMOST_STANDARDS_MODE, publicIdentifier, systemIdentifier, PR_FALSE); 142 } else { 143 documentModeInternal(STANDARDS_MODE, publicIdentifier, systemIdentifier, PR_FALSE); 144 } 145 mode = NS_HTML5TREE_BUILDER_BEFORE_HTML; 146 return; 147 } 148 default: { 149 goto doctypeloop_end; 150 } 151 } 152 } 153 } 154 } 155 doctypeloop_end: ; 156 157 return; What is the purpose of the loop here? It doesn't seem like we actually ever loop. Also, why the 'goto's rather than simply returning? nsHtml5TreeBuilder::comment 164 for (; ; ) { 165 switch(foreignFlag) { 166 case NS_HTML5TREE_BUILDER_IN_FOREIGN: { 167 goto commentloop_end; 168 } 169 default: { 170 switch(mode) { 171 case NS_HTML5TREE_BUILDER_INITIAL: 172 case NS_HTML5TREE_BUILDER_BEFORE_HTML: 173 case NS_HTML5TREE_BUILDER_AFTER_AFTER_BODY: 174 case NS_HTML5TREE_BUILDER_AFTER_AFTER_FRAMESET: { 175 appendCommentToDocument(buf, start, length); 176 return; 177 } 178 case NS_HTML5TREE_BUILDER_AFTER_BODY: { 179 flushCharacters(); 180 appendComment(stack[0]->node, buf, start, length); 181 return; 182 } 183 default: { 184 goto commentloop_end; 185 } 186 } 187 } 188 } 189 } 190 commentloop_end: ; Same here. It seems much cleaner written as if (foreignFlag != NS_HTML5TREE_BUILDER_IN_FOREIGN) { switch(mode) { case NS_HTML5TREE_BUILDER_INITIAL: case NS_HTML5TREE_BUILDER_BEFORE_HTML: case NS_HTML5TREE_BUILDER_AFTER_AFTER_BODY: case NS_HTML5TREE_BUILDER_AFTER_AFTER_FRAMESET: { appendCommentToDocument(buf, start, length); return; } case NS_HTML5TREE_BUILDER_AFTER_BODY: { flushCharacters(); appendComment(stack[0]->node, buf, start, length); return; } default: { } } } nsHtml5TreeBuilder::eof 436 switch(foreignFlag) { 437 case NS_HTML5TREE_BUILDER_IN_FOREIGN: { 438 439 while (stack[currentPtr]->ns != kNameSpaceID_XHTML) { 440 popOnEof(); 441 } 442 foreignFlag = NS_HTML5TREE_BUILDER_NOT_IN_FOREIGN; 443 } 444 default: 445 ; // fall through 446 } Doesn't seem to be any point in using a switch here. An if-statement would be much more readable. 464 case NS_HTML5TREE_BUILDER_IN_HEAD: { 465 if (currentPtr > 1) { 466 467 } 468 while (currentPtr > 0) { 469 popOnEof(); 470 } Remove the if-statement. nsHtml5TreeBuilder::startTag 764 if (!isCurrent(nsHtml5Atoms::table)) { 765 766 } Same here 831 if (currentPtr != eltPos) { 832 833 } There seems to be many more of these, I haven't commented on any more of them. This finishes the review of the bulk of the code. There are a few specific items I've got left to review, such as the flushing policy and notification batches. There's also a bunch more places that need more comments, even in the java code. In general every class function and member needs a description IMHO, though in some cases groups of functions/members can be described together. This is especially true in the tokenizer and tree builder which by nature implements very complicated algorithms, which makes it very hard to figure out what a variable is used for by looking at the code. We also need the comments in the java code to be carried over to the Cpp code, though I think there's a separate bug on that. I haven't found any real blockers for turning the parser on by default though, other than the ones I've explicitly noted already.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 25

•

15 years ago

Attached patch Rename the getter for the length of list of active formatting elements (deleted) — Details — Splinter Review

(In reply to comment #20) > > Would be nice if getListLength had something about formatting elements in its > > name (unless you remove it entirely, see previous point). > > I'll address this in a follow-up patch. Addressed by this patch.

Attachment #438967 - Flags: review?(jonas)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 26

•

15 years ago

Attached patch Address more review comments in C++ only parts (obsolete) (deleted) — Details — Splinter Review

Attachment #439220 - Flags: review?(jonas)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 27

•

15 years ago

Attached patch Address more review comments in Java parts (deleted) — Details — Splinter Review

(In reply to comment #22) > What we've done in the past is to create static pointers to pointers to atoms. > I.e. things like &nsGkAtoms::foo. Dunno if that will work here. In any case, > I'll file a separate bug on this stuff. In that case, the type of the field would be nsIAtom** instead of nsIAtom*, which would mess up types in the code produced by the translator. Worse, simple pointer compares would be lost and be turned into a pointer dereference plus compare. > > > nsHtml5StreamParser: > > > 197 nsHtml5StreamParser::~nsHtml5StreamParser() > > > 198 { > > > 199 NS_ASSERTION(NS_IsMainThread(), "Wrong thread!"); > > > 200 mTokenizer->end(); > > > 201 mRequest = nsnull; > > > 202 mObserver = nsnull; > > > 203 mUnicodeDecoder = nsnull; > > > 204 mSniffingBuffer = nsnull; > > > 205 mMetaScanner = nsnull; > > > 206 mFirstBuffer = nsnull; > > > 207 mExecutor = nsnull; > > > 208 mTreeBuilder = nsnull; > > > 209 mTokenizer = nsnull; > > > 210 mOwner = nsnull; > > > 211 if (mFlushTimer) { > > > 212 mFlushTimer->Cancel(); > > > 213 mFlushTimer = nsnull; > > > 214 } > > > 215 } > > > > > > Is there a reason to null all these things out? > > > > Mainly to distinguish an already-deleted object more easily in a debugger. > > Shall I make these #ifdef DEBUG? > > Please do. Done. > > > 979 nsHtml5StreamParser::ContinueAfterFailedCharsetSwitch() > > > 980 { > > > 981 NS_ASSERTION(NS_IsMainThread(), "Wrong thread!"); > > > 982 mozilla::MutexAutoLock tokenizerAutoLock(mTokenizerMutex); > > > 983 Uninterrupt(); > > > 984 nsCOMPtr<nsIRunnable> event = new nsHtml5StreamParserContinuation(this); > > > 985 if (NS_FAILED(mThread->Dispatch(event, nsIThread::DISPATCH_NORMAL))) { > > > 986 NS_WARNING("Failed to dispatch nsHtml5StreamParserContinuation"); > > > 987 } > > > 988 } > > > > > > Is there a reason to uninterrupt from the main thread? Why not do it from the > > > parser thread as to avoid blocking the main thread. > > > > When calling Uninterrupt(), the main thread already owns mTokenizerMutex and > > Uninterrupt() doesn't acquire an additional mutex, so no additional blocking > > occurs. > > This doesn't appear to be the case. You're here explicitly grabbing the lock in > order to be able to safely call Uninterrupt. If you instead do that while on > the parser thread, there is no need for the main thread to block. Oh, right. In this case, that's true. I changed it the way you suggested. However, I'm nervous that this optimization for a rare case (the case where the shell is unable to fulfill a request to switch charsets fails) comes back to bite us later, because previously Interrupt() and Uninterrupt() were both called from the same thread so Uninterrupt() was guaranteed to cancel one Interrupt(). Now, Interrupt() is, as far as I can tell, called only in places that can't start rerunning until the parser thread has uninterrupted itself and posted more data to the main thread. However, if we later change how Interrupt() is called, it would be easy to introduce a situation where the main thread first posts a runnable to the parser thread, then interrupts and then the runnable cancels that new interrupt when it gets a chance to run on the parser thread. > > > nsHtml5StreamParser::ContinueAfterScripts > > > > > > 864 if (mSpeculations.IsEmpty()) { > > > 865 // Not quite sure how exactly this happens... > > > 866 // Maybe an artifact of defer scripts? > > > 867 return; > > > 868 } > > > > > > Sounds like an assertion would be good. > > > > I put an NS_WARNING here. > > Why not NS_ASSERTION? Because I'm not sure if there's a legitimate situation where that condition holds true. (I realize that not being sure about that isn't exactly good.) > > (In reply to comment #18) > > > You're also using the charset detector off the main thread, which I suspect > > > isn't safe. In fact, since you are creating it on the main thread, and then > > > calling it from the parser thread, I'm surprised this doesn't assert as it > > > doesn't seem to be implemented using threadsafe refcounting macros. > > > > > > This is also something we need to fix before we can turn on by default. Do you > > > want me to file a separate bug on this charset threadsafety stuff? > > > > Yes, a separate bug would be good to have if there really is a thread-safety > > problem left. As far as I can tell, there is no problem: After bug , the > > detector is refcounted only from the main thread. The sniffing is run only from > > the parser thread. The sniffing seems to affect only the internal state of the > > object. The data tables shared by the instances are baked into code ahead of > > compile time. > > I'll file a separate bug. If we're lucky (which it sounds like based on your > description) all we need to do is to move the creation of the detector to the > thread where its used. What would be the benefit except being able to avoid the detector creation in some cases and being able to delete it earlier in others? It seems to me that it's not a thread/ problem that it's allocated on one thread, used on another and destroyed on the creator thread but only after the other thread isn't going to use the object anyway. Moving the destruction of the detector off-the-main-thread would be annoying in the nsIParser::Terminate and cycle collector cases. Wouldn't those then require creating a runnable solely to clean up the detector? > > > 476 if (totalByteCount < aCount) { // mimicking nsScanner even though > > > this seems wrong > > > 477 ++totalByteCount; > > > 478 ++aFromSegment; > > > > > > Could you assert here? I agree this seems wrong. > > > > I added an NS_NOTREACHED in an else branch instead of merely asserting to avoid > > a buffer overrun if a decoder misbehaves. > > > > FWIW, the docs at > > http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/intl/uconv/public/nsIUnicodeDecoder.h#123 > > are not good enough. > > Would be great if you could fix them. I tried to fix the docs. I hope I wrote the right things! > > > If so, would calling it something like MetaCharsetFound make sense? > > > > "Meta charset" is not HTML5 spec terminology. "[character] encoding > > declaration" is spec terminology. That's why the method is named the way it is > > named. In general, I have tried to stick to spec terminology when naming things > > even when the spec terminology deviates from colloquial usage. (Consider > > "entities" vs. "named character refernces".) > > How about EncodingDeclarationFound? Function names that are simple nouns sound > like getters to me. The other callback names, for better or worse, use SAX-style naming and SAX uses spec-wise correct nouns, so having this one not be a noun would be inconsistent with the naming of the rest of the callbacks. > > I put the timer on the main thread, because I couldn't find a timer > > implementation that'd fire on a non-main thread. Also, initially I wanted also > > to check if the document is in the frontmost tab when the timer fires, but I > > gave up on that idea. > > Timers by default fire on the thread that created them. So if you simply use > the parser thread to create the timer, then the timer will fire using the > parser thread. So you use the same API on any thread, as you are on the main > thread. > > Worst case you can simply set the 'target' property on the nsITimer if for some > reason you really can't create the timer using the parser thread. I'm amazed that I've managed to overlook the target property. :-( I moved the timer to the parser thread. I have a question about the thread-safety of nsITimer::Cancel on multicores, though: nsTimerImpl::Cancel assigns to mCanceled without acquiring a mutex and nsTimerImpl::Fire() reads mCanceled without acquiring a mutex. However, before nsTimerImpl::Cancel returns, it calls TimerThread::RemoveTimer which acquires a mutex. Is that mutex acquisition enough to invalidate caches on other cores so that when nsTimerImpl::Cancel returns on one thread, another thread sees mCanceled as true in nsTimerImpl::Fire()? Or is nsTimerImpl even supposed to be thread-safe in the sense that you are allowed to Cancel from non-target thread? If Cancel isn't actually thread-safe, I need to do something more complicated with a runnable for canceling the timer when the parser is released early due to nsIParser::Terminate and the timer is armed. (In reply to comment #23) > Tokenizer comments > > nsHtml5Tokenizer::setContentModelFlag > > 111 nsHtml5Tokenizer::setContentModelFlag(PRInt32 contentModelFlag, nsIAtom* > contentModelElement) > > Please use an enum for the 'contentModelFlag' argument, that way you can also > scope it to the tokenizer and given it shorter names. I realize that it would be better code style to use enums both in Java and in C++. I actually used to have an enum there before the C++ porting effort even started. However, the switching on the tokenizer state is very performance-critical to the parser and the performance characteristics of enums in Java are surprisingly bad (well, at least of the PowerPC port of the HotSpot client VM that I benchmarked on back then). http://krijnhoetmer.nl/irc-logs/whatwg/20080427#l-256 So for perf reasons, I don't want to go back to enums on the Java side. As for the C++ side, I've viewed the generated code as a way of talking to the C++ compiler--not as much a way of expressing things to human readers--and when talking to the compiler, it shouldn't matter either way. Therefore, in the interest of not pouring effort excessively into the translator, I haven't tried to make the translator turn groups of ints into enums in the translation. > Also, is "tokenizerState" > or "stateFlag" a better name? I take it this what the spec refers to as the > 'state' of the tokenizer? When I wrote that piece of code, the spec had a concept of "content model flag". The concept has since been removed from the spec. Renamed various things in this area. > nsHtml5Tokenizer::contentModelElementToArray > > 175 default: { > 176 > 177 return; > 178 } > > Shouldn't you clear contentModelElementNameAsArray here? Or if it should never > reach the default, please assert. The Java code already asserts there. (That's why you see the empty line there.) Carrying the assertions over to C++ is bug 503190. I added a message to the assertion, though. > 193 > 194 void > 195 nsHtml5Tokenizer::clearStrBufAndAppendCurrentC(PRUnichar c) > 196 { > 197 strBuf[0] = c; > 198 strBufLen = 1; > 199 } > 200 > 201 void > 202 nsHtml5Tokenizer::clearStrBufAndAppendForceWrite(PRUnichar c) > 203 { > 204 strBuf[0] = c; > 205 strBufLen = 1; > 206 } > > Why two functions tht do the same thing? These are artifacts of an earlier attempt to boost performance by using the input buffer as the backing buffer for things that now go into strBuf and copying data into strBuf lazily if an input buffer boundary fell inside a name. It turned out that checking if it's necessary to copy data was slower than just unconditionally copying it, so the overall design didn't stay. However, I didn't clean up the redundant methods immediately in order to make it easier to experiment more later. I guess the experimentation isn't ever going to happen. Flattened these into one. > 208 void > 209 nsHtml5Tokenizer::clearStrBufForNextState() > 210 { > 211 strBufLen = 0; > 212 } > > Why the 'ForNextState' part? If the buffer were used in the current state, the variant that puts a single PRUnichar into the buffer would be appropriate. Renamed anyway. Also inlined methods like this. > nsHtml5Tokenizer::emitStrBuf > 241 if (strBufLen > 0) { > 242 tokenHandler->characters(strBuf, 0, strBufLen); > 243 } > > Is the tokenHandler ever anything other than the treebuilder? If not, we should > rename the member to 'treeBuilder'. In the Java test harness for the tokenizer, it's not the tree builder. I expect it not to be always be the tree builder even in C++ once I've fixed bug 482909. When the Java version has interfaces but the C++ doesn't, I've made the translator flatten the types but not rename the members. > 246 void > 247 nsHtml5Tokenizer::clearLongStrBufForNextState() > 248 { > 249 longStrBufLen = 0; > 250 } > 251 > 252 void > 253 nsHtml5Tokenizer::clearLongStrBuf() > 254 { > 255 longStrBufLen = 0; > 256 } > 257 > 258 void > 259 nsHtml5Tokenizer::clearLongStrBufAndAppendCurrentC(PRUnichar c) > 260 { > 261 longStrBuf[0] = c; > 262 longStrBufLen = 1; > 263 } > 264 > 265 void > 266 nsHtml5Tokenizer::clearLongStrBufAndAppendToComment(PRUnichar c) > 267 { > 268 longStrBuf[0] = c; > 269 longStrBufLen = 1; > 270 } > > Lots of duplication here, is there really a need for that? Not any longer as mentioned above. Removed redundancy and marked as inline. > strBuf and longStrBuf is somewhat poor naming. It's unclear when to use which. > Maybe identifierBuf and charBuf is better? Though why do we need both? Is the > idea that we'll pass out the long buffer instead of copying it out? That > doesn't appear to be happening currently. We don't need two buffers that are capable of growing without bound. (A second buffer whose length is capped at the length of the longest named name (plus 1?) would be practical to have around, though, now that the named character name tables no longer contain the full names.) I thought I had filed a bug about this long ago, but now I can't find it, so I filed bug 559303. > 285 nsHtml5Tokenizer::appendSecondHyphenToBogusComment() > 286 { > 287 appendLongStrBuf('-'); > 288 } > 289 > 290 void > 291 nsHtml5Tokenizer::adjustDoubleHyphenAndAppendToLongStrBufAndErr(PRUnichar > c) > 292 { > 293 > 294 appendLongStrBuf(c); > 295 } > > These functions seem excessive. These methods are part of the XML Infoset coercion feature of the Java version of the parser. The latter is also part of error reporting. I marked them inline in C++ to make their invocation overhead compile away for sure. > nsHtml5Tokenizer::flushChars > > 342 cstart = 0x7fffffff; > > PR_INT32_MAX? Curious. I though I was already using that there. I had translator support and everything already in place. Fixed. > nsHtml5Tokenizer::emitCurrentTagToken > > 363 nsHtml5HtmlAttributes* attrs = (!attributes ? > nsHtml5HtmlAttributes::EMPTY_ATTRIBUTES : attributes); > 364 if (endTag) { > 365 > 366 tokenHandler->endTag(tagName); > 367 delete attributes; > 368 } else { > 369 tokenHandler->startTag(tagName, attrs, selfClosing); > 370 } > > Move 'attrs' into the 'else'. You could possibly even get rid of the temporary. 'attrs' is used on the blank line of the 'if' branch but only when error reporting is enabled. > 362 stateSave = NS_HTML5TOKENIZER_DATA; > ... > 374 return stateSave; > > It seems like this function always returns the same value. If so, just get rid > of the return value and use the value inline where needed. If not, add a > comment explaining what happens. The method calls into the token handler, and the token handler can call back into the tokenizer and change stateSave. Added comment (though it doesn't show up in C++ because comments aren't copied over yet). > 391 void > 392 nsHtml5Tokenizer::addAttributeWithoutValue() > 393 { > 394 > 395 if (!!attributeName) { > 396 attributes->addAttribute(attributeName, > nsHtml5Portability::newEmptyString()); > 397 attributeName = nsnull; > 398 } > 399 } > 400 > 401 void > 402 nsHtml5Tokenizer::addAttributeWithValue() > 403 { > 404 if (!!attributeName) { > 405 nsString* val = longStrBufToString(); > 406 attributes->addAttribute(attributeName, val); > 407 attributeName = nsnull; > 408 } > 409 } > > No need for the '!!'. See comment 20. > Also, can these functions really be called without an > attribute name? If that shouldn't happen, just assert and assume it's there. They can in the end tag case. > 411 void > 412 nsHtml5Tokenizer::startErrorReporting() > 413 { > 414 } > > Remove this function? Removed. > nsHtml5Tokenizer::tokenizeBuffer > > 460 if (pos == buffer->getEnd()) { > 461 buffer->setStart(pos); > 462 } else { > 463 buffer->setStart(pos + 1); > 464 } > > Why is this needed? Shouldn't we treat buffers as empty if the start == end? The first is about getting ready to return because the tokenizer loop hit the end of the buffer. The else branch is about two things: Either the parser is suspending after a </script> or a charset <meta> and pos is the index of '>' or the tokenizer is burping at a CR and pos is the index of the CR. In either of those cases, when the buffer is examined the next time, the position should point to the character after the '>' or the CR (which may actually mean that the buffer gets marked as empty if pos pointed to the last character). > 3427 void > 3428 nsHtml5Tokenizer::rememberAmpersandLocation(PRUnichar add) > 3429 { > 3430 additional = add; > 3431 } > > This doesn't seem to remember a location at all, but rather a character. I > guess this makes more sense in the validator code :( I remembers the ampersand location in the Java code. The argument 'add' was introduced to align with a later spec change and I failed to make the name of the method more elaborate at that point. Renamed. > 3447 void > 3448 nsHtml5Tokenizer::emitOrAppendStrBuf(PRInt32 returnState) > 3449 { > 3450 if ((returnState & (~1))) { > 3451 appendStrBufToLongStrBuf(); > 3452 } else { > 3453 emitStrBuf(); > 3454 } > 3455 } > > Please create a #define for the '~1' and put it together with the definitions > for the values that can be passed in here. That should reduce the risk that > someone changes the flag values without updating this value. Fixed. > 3867 void > 3868 nsHtml5Tokenizer::requestSuspension() > 3869 { > 3870 shouldSuspend = PR_TRUE; > 3871 } > 3872 > 3873 void > 3874 nsHtml5Tokenizer::becomeConfident() > 3875 { > 3876 confident = PR_TRUE; > 3877 } > 3878 > 3879 PRBool > 3880 nsHtml5Tokenizer::isNextCharOnNewLine() > 3881 { > 3882 return PR_FALSE; > 3883 } > 3884 > 3885 PRBool > 3886 nsHtml5Tokenizer::isPrevCR() > 3887 { > 3888 return lastCR; > 3889 } > 3890 > 3891 PRInt32 > 3892 nsHtml5Tokenizer::getLine() > 3893 { > 3894 return -1; > 3895 } > 3896 > 3897 PRInt32 > 3898 nsHtml5Tokenizer::getCol() > 3899 { > 3900 return -1; > 3901 } > > These all seem worthy of removal. More effects of validator code I guess :( requestSuspension() is used in the </script> and charset <meta> cases in Gecko. The others are Java-only, so I removed them from C++. (In reply to comment #24) > nsHtml5TreeBuilder.cpp > > The big thing here is that I'm worried that all the switch statements with > loads of fallthroughs and gotos are going to make it incredibly hard to work > with this code in the future. Is there any evidence that the fallthroughs are > saving any cycles? In the tokenizer the fallthroughs are an attempt at saving cycles in the Java case (saving cycles in the C++ case is bug 482920). In the tree builder the fallthroughs are there mainly to avoid code duplication while keeping the structure of the spec recognizable in the code structure. The spec has a lot of fall throughs to other modes, so they are implemented as fall throughs in the code. Given the structure of the spec, I think the fallthroughs actually make things maintainable in the first place if switch is used at all. Whether it's a good idea to use switch instead of having a class per mode inheriting from a pure-virtual class is another question, but I'd be very, very reluctant to change the overall structure of the tree builder that radically at this point in time. I do realize that fixing bug 552908 in the current architecture is going to lead to ugly code, but that spec feature didn't exist when I designed the architecture. Finding out how a bunch of virtual calls would compare to huge switches from the perf POV would involve refactoring the whole thing to use the other way and then benchmarking. As for the gotos, they are all breaks and continues in the Java code. They aren't used in random ways--they are used in 3 conceptually simple ways, so they should be fine for maintenance. Unfortunately, C++ doesn't have the feature of break and continue by label, so I had to make the translator emulate those with goto. The three concepts are: 1) Done processing this token: break starttagloop; 2) Reprocess the token: continue starttagloop; 3) Process the token according to the rules of another mode (later in the order of modes): break statenameloop; > If not I think I'd prefer to keep the code sane so that we > can modify it in the future when needed. I believe the code is quite modifiable by modifying the Java source and rerunning the translator. > The gotos are harder due to the deep nesting of switch statements and loops > making 'break' largely useless. I think the inability to break the a from a switch inside the loop is a defect in C++. Anyway, the place to make modifications is the TreeBuilder.java and nsHtml5TreeBuilder.cpp is just an intermediate file for feeding it into the C++ compiler. > I think there is a general over use of switch statements though. In many cases > switch statements could be converted to normal if-else statements. In some > cases the 'else' part isn't even needed. I've commented on a few specific > instances below, but it's a more general problem. For consistency, when certain variables are switched on in the startTag method, I've also switched on them when handling other tokens. Maybe that wasn't a great idea in retrospect. It does make the different token handler methods more consistent among themselves, though. > nsHtml5TreeBuilder::doctype > > 124 needToDropLF = PR_FALSE; > 125 for (; ; ) { > 126 switch(foreignFlag) { > 127 case NS_HTML5TREE_BUILDER_IN_FOREIGN: { > 128 goto doctypeloop_end; > 129 } > 130 default: { > 131 switch(mode) { > 132 case NS_HTML5TREE_BUILDER_INITIAL: { > 133 nsString* emptyString = nsHtml5Portability::newEmptyString(); > 134 appendDoctypeToDocument(!name ? nsHtml5Atoms::emptystring : > name, !publicIdentifier ? emptyString : publicIdentifier, !systemIdentifier ? > emptyString : systemIdentifier); > 135 nsHtml5Portability::releaseString(emptyString); > 136 if (isQuirky(name, publicIdentifier, systemIdentifier, > forceQuirks)) { > 137 > 138 documentModeInternal(QUIRKS_MODE, publicIdentifier, > systemIdentifier, PR_FALSE); > 139 } else if (isAlmostStandards(publicIdentifier, > systemIdentifier)) { > 140 > 141 documentModeInternal(ALMOST_STANDARDS_MODE, publicIdentifier, > systemIdentifier, PR_FALSE); > 142 } else { > 143 documentModeInternal(STANDARDS_MODE, publicIdentifier, > systemIdentifier, PR_FALSE); > 144 } > 145 mode = NS_HTML5TREE_BUILDER_BEFORE_HTML; > 146 return; > 147 } > 148 default: { > 149 goto doctypeloop_end; > 150 } > 151 } > 152 } > 153 } > 154 } > 155 doctypeloop_end: ; > 156 > 157 return; > > What is the purpose of the loop here? It doesn't seem like we actually ever > loop. The purpose is to be able to be break of it to jump forward. > Also, why the 'goto's rather than simply returning? In the error reporting cases, there's the line err("Stray doctype."); before the return where you see a blank line in C++. To use return directly in all the cases that now break out of the loop, the error emission line would have to be duplicated everywhere. > nsHtml5TreeBuilder::comment > > 164 for (; ; ) { > 165 switch(foreignFlag) { > 166 case NS_HTML5TREE_BUILDER_IN_FOREIGN: { > 167 goto commentloop_end; > 168 } > 169 default: { > 170 switch(mode) { > 171 case NS_HTML5TREE_BUILDER_INITIAL: > 172 case NS_HTML5TREE_BUILDER_BEFORE_HTML: > 173 case NS_HTML5TREE_BUILDER_AFTER_AFTER_BODY: > 174 case NS_HTML5TREE_BUILDER_AFTER_AFTER_FRAMESET: { > 175 appendCommentToDocument(buf, start, length); > 176 return; > 177 } > 178 case NS_HTML5TREE_BUILDER_AFTER_BODY: { > 179 flushCharacters(); > 180 appendComment(stack[0]->node, buf, start, length); > 181 return; > 182 } > 183 default: { > 184 goto commentloop_end; > 185 } > 186 } > 187 } > 188 } > 189 } > 190 commentloop_end: ; > > Same here. It seems much cleaner written as > > if (foreignFlag != NS_HTML5TREE_BUILDER_IN_FOREIGN) { > switch(mode) { > case NS_HTML5TREE_BUILDER_INITIAL: > case NS_HTML5TREE_BUILDER_BEFORE_HTML: > case NS_HTML5TREE_BUILDER_AFTER_AFTER_BODY: > case NS_HTML5TREE_BUILDER_AFTER_AFTER_FRAMESET: { > appendCommentToDocument(buf, start, length); > return; > } > case NS_HTML5TREE_BUILDER_AFTER_BODY: { > flushCharacters(); > appendComment(stack[0]->node, buf, start, length); > return; > } > default: { > } > } > } Yeah, that switch on foreignFlag is mainly about being pointlessly consistent with startTag. Fixed. > nsHtml5TreeBuilder::eof > > 436 switch(foreignFlag) { > 437 case NS_HTML5TREE_BUILDER_IN_FOREIGN: { > 438 > 439 while (stack[currentPtr]->ns != kNameSpaceID_XHTML) { > 440 popOnEof(); > 441 } > 442 foreignFlag = NS_HTML5TREE_BUILDER_NOT_IN_FOREIGN; > 443 } > 444 default: > 445 ; // fall through > 446 } > > Doesn't seem to be any point in using a switch here. An if-statement would be > much more readable. Same here. Fixed. > 464 case NS_HTML5TREE_BUILDER_IN_HEAD: { > 465 if (currentPtr > 1) { > 466 > 467 } > 468 while (currentPtr > 0) { > 469 popOnEof(); > 470 } > > Remove the if-statement. > > nsHtml5TreeBuilder::startTag > 764 if (!isCurrent(nsHtml5Atoms::table)) { > 765 > 766 } > > Same here > > 831 if (currentPtr != eltPos) { > 832 > 833 } > > There seems to be many more of these, I haven't commented on any more of them. In all these cases, there's a method call for reporting an error inside the if. With the exception of the one that calls isCurrent(), it should be safe to assume that the C++ compiler optimizes these away. Apart from relying on the C++ compiler's optimizer, I haven't tried to put effort into making the translator produce more elegant output here, because I have been assumming we might actually want to report tree builder errors to the error console eventually (bug 512229). > This finishes the review of the bulk of the code. Thank you! I'd appreciate it if you could r+ the patches on this bug, too, to get them landed in order to avoid merge conflicts. > There's also a bunch more places that need more comments, even in the java > code. In general every class function and member needs a description IMHO, OK. I'll write JavaDocs for every member. > We also need the comments in the java code to be carried over to the Cpp code, > though I think there's a separate bug on that. There was only bug 539213. I filed bug 559547. > I haven't found any real blockers for turning the parser on by default though, > other than the ones I've explicitly noted already. That is, the flush timer and chardet creation?

Attachment #439222 - Flags: review?(jonas)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 28

•

15 years ago

> > > > 979 nsHtml5StreamParser::ContinueAfterFailedCharsetSwitch() > > > > 980 { > > > > 981 NS_ASSERTION(NS_IsMainThread(), "Wrong thread!"); > > > > 982 mozilla::MutexAutoLock tokenizerAutoLock(mTokenizerMutex); > > > > 983 Uninterrupt(); > > > > 984 nsCOMPtr<nsIRunnable> event = new nsHtml5StreamParserContinuation(this); > > > > 985 if (NS_FAILED(mThread->Dispatch(event, nsIThread::DISPATCH_NORMAL))) { > > > > 986 NS_WARNING("Failed to dispatch nsHtml5StreamParserContinuation"); > > > > 987 } > > > > 988 } > > > > > > > > Is there a reason to uninterrupt from the main thread? Why not do it from the > > > > parser thread as to avoid blocking the main thread. > > > > > > When calling Uninterrupt(), the main thread already owns mTokenizerMutex and > > > Uninterrupt() doesn't acquire an additional mutex, so no additional blocking > > > occurs. > > > > This doesn't appear to be the case. You're here explicitly grabbing the lock in > > order to be able to safely call Uninterrupt. If you instead do that while on > > the parser thread, there is no need for the main thread to block. > > Oh, right. In this case, that's true. > > I changed it the way you suggested. However, I'm nervous that this optimization > for a rare case (the case where the shell is unable to fulfill a request to > switch charsets fails) comes back to bite us later, because previously > Interrupt() and Uninterrupt() were both called from the same thread so > Uninterrupt() was guaranteed to cancel one Interrupt(). Now, Interrupt() is, as > far as I can tell, called only in places that can't start rerunning until the > parser thread has uninterrupted itself and posted more data to the main thread. > However, if we later change how Interrupt() is called, it would be easy to > introduce a situation where the main thread first posts a runnable to the > parser thread, then interrupts and then the runnable cancels that new interrupt > when it gets a chance to run on the parser thread. I think i'm following you, but not completely sure :) I agree that if things get too complicated and we start getting concerned that a call to Uninterrupt might start Uninterrupting an different Interript than it was scheduled for, then we might need to roll back this change. Ideally we won't get there though, but if we do, lets cross that bridge then. > > > > nsHtml5StreamParser::ContinueAfterScripts > > > > > > > > 864 if (mSpeculations.IsEmpty()) { > > > > 865 // Not quite sure how exactly this happens... > > > > 866 // Maybe an artifact of defer scripts? > > > > 867 return; > > > > 868 } > > > > > > > > Sounds like an assertion would be good. > > > > > > I put an NS_WARNING here. > > > > Why not NS_ASSERTION? > > Because I'm not sure if there's a legitimate situation where that condition > holds true. (I realize that not being sure about that isn't exactly good.) That's fine. If the assertion starts firing in legitimate situations we'll see what that situation is and can replace the current comment with a description of why this is needed. Assertions aren't in release builds so no risk that millions of users will be affected. > > > (In reply to comment #18) > > > > You're also using the charset detector off the main thread, which I suspect > > > > isn't safe. In fact, since you are creating it on the main thread, and then > > > > calling it from the parser thread, I'm surprised this doesn't assert as it > > > > doesn't seem to be implemented using threadsafe refcounting macros. > > > > > > > > This is also something we need to fix before we can turn on by default. Do you > > > > want me to file a separate bug on this charset threadsafety stuff? > > > > > > Yes, a separate bug would be good to have if there really is a thread-safety > > > problem left. As far as I can tell, there is no problem: After bug , the > > > detector is refcounted only from the main thread. The sniffing is run only from > > > the parser thread. The sniffing seems to affect only the internal state of the > > > object. The data tables shared by the instances are baked into code ahead of > > > compile time. > > > > I'll file a separate bug. If we're lucky (which it sounds like based on your > > description) all we need to do is to move the creation of the detector to the > > thread where its used. > > What would be the benefit except being able to avoid the detector creation in > some cases and being able to delete it earlier in others? It seems to me that > it's not a thread/ problem that it's allocated on one thread, used on another > and destroyed on the creator thread but only after the other thread isn't going > to use the object anyway. The problem with the current situation is that if someone changes the implementation of the detector to do an extra addref/release inside one of the functions that you are calling on the separate thread, then that person will start hitting a bunch of threadsafty warnings. Despite that person not doing anything wrong, or there really being a problem anywhere. However that person won't be able to check in his/her changes until they've figured out how the HTML5 parser works and how to change it to not fire the assertions. > Moving the destruction of the detector > off-the-main-thread would be annoying in the nsIParser::Terminate and cycle > collector cases. Wouldn't those then require creating a runnable solely to > clean up the detector? One solution would be to only do this in debug builds, as that is the only place where we'll hit the refcounting assertions. > > > > If so, would calling it something like MetaCharsetFound make sense? > > > > > > "Meta charset" is not HTML5 spec terminology. "[character] encoding > > > declaration" is spec terminology. That's why the method is named the way it is > > > named. In general, I have tried to stick to spec terminology when naming things > > > even when the spec terminology deviates from colloquial usage. (Consider > > > "entities" vs. "named character refernces".) > > > > How about EncodingDeclarationFound? Function names that are simple nouns sound > > like getters to me. > > The other callback names, for better or worse, use SAX-style naming and SAX > uses spec-wise correct nouns, so having this one not be a noun would be > inconsistent with the naming of the rest of the callbacks. My impression is that right now it's sort of a mix, which is what makes things hard to understand. Yes, some "interfaces" use SAX-style naming, but they are implemented on classes which implemented both sax-style naming as well as other interfaces, thus creating the mix. The main problem was that it was very hard to find docs. One solution is to group functions together based on which interface they implement and then create clear headers showing which interface is being implemented. That makes it easier to find your way to the interface where the function should be documented. For example like: http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsGenericElement.h#351 > I have a question about the thread-safety of nsITimer::Cancel on multicores, > though: > nsTimerImpl::Cancel assigns to mCanceled without acquiring a mutex and > nsTimerImpl::Fire() reads mCanceled without acquiring a mutex. However, before > nsTimerImpl::Cancel returns, it calls TimerThread::RemoveTimer which acquires a > mutex. Is that mutex acquisition enough to invalidate caches on other cores so > that when nsTimerImpl::Cancel returns on one thread, another thread sees > mCanceled as true in nsTimerImpl::Fire()? > > Or is nsTimerImpl even supposed to be thread-safe in the sense that you are > allowed to Cancel from non-target thread? If Cancel isn't actually thread-safe, > I need to do something more complicated with a runnable for canceling the timer > when the parser is released early due to nsIParser::Terminate and the timer is > armed. I'm not sure if aquiring a mutex will flush all cores or just the current one. However in any case, when calling Cancel() you're always running the risk that the other thread is already passed the are-we-cancelled check and thus will fire. In fact, it might already be in the midst of firing already. So if you just call cancel, you need to be aware that the parser might already be currently flushing. Not sure if that's a problem yet. Haven't yet looked at the patch. > So for perf reasons, I don't want to go back to enums on the Java side. As for > the C++ side, I've viewed the generated code as a way of talking to the C++ > compiler--not as much a way of expressing things to human readers--and when > talking to the compiler, it shouldn't matter either way. Therefore, in the > interest of not pouring effort excessively into the translator, I haven't tried > to make the translator turn groups of ints into enums in the translation. I think this is where we differ in view :) I think the C++ code needs to be readable to humans too, for a few reasons. The most obvious reason is that during profiling and debugging, it's the C++ code gecko developers will be looking at. Also, I wouldn't be surprised if we'll eventually end up forking the java and C++ code. Though by no means I want that to happen anytime soon. If that happens we'll be working with the C++ code directly. I am however fine with using #defines until we deem forking necessary, if that ever happens. > > nsHtml5Tokenizer::tokenizeBuffer > > > > 460 if (pos == buffer->getEnd()) { > > 461 buffer->setStart(pos); > > 462 } else { > > 463 buffer->setStart(pos + 1); > > 464 } > > > > Why is this needed? Shouldn't we treat buffers as empty if the start == end? > > The first is about getting ready to return because the tokenizer loop hit the > end of the buffer. The else branch is about two things: Either the parser is > suspending after a </script> or a charset <meta> and pos is the index of '>' or > the tokenizer is burping at a CR and pos is the index of the CR. In either of > those cases, when the buffer is examined the next time, the position should > point to the character after the '>' or the CR (which may actually mean that > the buffer gets marked as empty if pos pointed to the last character). Ah, so pos always points to the last character consumed? > (In reply to comment #24) > > nsHtml5TreeBuilder.cpp > > > > The big thing here is that I'm worried that all the switch statements with > > loads of fallthroughs and gotos are going to make it incredibly hard to work > > with this code in the future. Is there any evidence that the fallthroughs are > > saving any cycles? > > In the tokenizer the fallthroughs are an attempt at saving cycles in the Java > case (saving cycles in the C++ case is bug 482920). > > In the tree builder the fallthroughs are there mainly to avoid code duplication > while keeping the structure of the spec recognizable in the code structure. The > spec has a lot of fall throughs to other modes, so they are implemented as fall > throughs in the code. Given the structure of the spec, I think the fallthroughs > actually make things maintainable in the first place if switch is used at all. Ok, if the fallthroughs follow the spec then that makes things better I agree. > Whether it's a good idea to use switch instead of having a class per mode > inheriting from a pure-virtual class is another question, but I'd be very, very > reluctant to change the overall structure of the tree builder that radically at > this point in time. I do realize that fixing bug 552908 in the current > architecture is going to lead to ugly code, but that spec feature didn't exist > when I designed the architecture. Finding out how a bunch of virtual calls > would compare to huge switches from the perf POV would involve refactoring the > whole thing to use the other way and then benchmarking. 100% agreed. We definitely shouldn't do any such major refactorings right now. > As for the gotos, they are all breaks and continues in the Java code. They > aren't used in random ways--they are used in 3 conceptually simple ways, so > they should be fine for maintenance. Unfortunately, C++ doesn't have the > feature of break and continue by label, so I had to make the translator emulate > those with goto. > > The three concepts are: > 1) Done processing this token: break starttagloop; > 2) Reprocess the token: continue starttagloop; > 3) Process the token according to the rules of another mode (later in the > order of modes): break statenameloop; I agree, the lack of labeled breaks in C++ sucks. But the limitation is there and we have to live with it. Like I said above, I think the C++ code needs to be readable to a human. I don't have a suggestion for how to improve the current situation, other than trying to use switch-statements less and if-statements more. There are quite a few switch statements with only one or two cases other than the default. In these situations if-statements would likely be cleaner and might make it possible to use plain breaks in a few more places. I think this is something that we can deal with separately though. > > nsHtml5TreeBuilder::doctype > > > > 124 needToDropLF = PR_FALSE; > > 125 for (; ; ) { > > 126 switch(foreignFlag) { > > 127 case NS_HTML5TREE_BUILDER_IN_FOREIGN: { > > 128 goto doctypeloop_end; > > 129 } > > 130 default: { > > 131 switch(mode) { > > 132 case NS_HTML5TREE_BUILDER_INITIAL: { > > 133 nsString* emptyString = nsHtml5Portability::newEmptyString(); > > 134 appendDoctypeToDocument(!name ? nsHtml5Atoms::emptystring : > > name, !publicIdentifier ? emptyString : publicIdentifier, !systemIdentifier ? > > emptyString : systemIdentifier); > > 135 nsHtml5Portability::releaseString(emptyString); > > 136 if (isQuirky(name, publicIdentifier, systemIdentifier, > > forceQuirks)) { > > 137 > > 138 documentModeInternal(QUIRKS_MODE, publicIdentifier, > > systemIdentifier, PR_FALSE); > > 139 } else if (isAlmostStandards(publicIdentifier, > > systemIdentifier)) { > > 140 > > 141 documentModeInternal(ALMOST_STANDARDS_MODE, publicIdentifier, > > systemIdentifier, PR_FALSE); > > 142 } else { > > 143 documentModeInternal(STANDARDS_MODE, publicIdentifier, > > systemIdentifier, PR_FALSE); > > 144 } > > 145 mode = NS_HTML5TREE_BUILDER_BEFORE_HTML; > > 146 return; > > 147 } > > 148 default: { > > 149 goto doctypeloop_end; > > 150 } > > 151 } > > 152 } > > 153 } > > 154 } > > 155 doctypeloop_end: ; > > 156 > > 157 return; > > > > What is the purpose of the loop here? It doesn't seem like we actually ever > > loop. > > The purpose is to be able to be break of it to jump forward. But the switch statement is right inside it, so any break will break out of that. Also, there aren't actually any break statements in the loop/switch. > Yeah, that switch on foreignFlag is mainly about being pointlessly consistent > with startTag. > > Fixed. For what it's worth, the switch in startTag seems unneeded too and better written as an if-statement. > > 464 case NS_HTML5TREE_BUILDER_IN_HEAD: { > > 465 if (currentPtr > 1) { > > 466 > > 467 } > > 468 while (currentPtr > 0) { > > 469 popOnEof(); > > 470 } > > > > Remove the if-statement. > > > > nsHtml5TreeBuilder::startTag > > 764 if (!isCurrent(nsHtml5Atoms::table)) { > > 765 > > 766 } > > > > Same here > > > > 831 if (currentPtr != eltPos) { > > 832 > > 833 } > > > > There seems to be many more of these, I haven't commented on any more of them. > > In all these cases, there's a method call for reporting an error inside the if. > With the exception of the one that calls isCurrent(), it should be safe to > assume that the C++ compiler optimizes these away. Apart from relying on the > C++ compiler's optimizer, I haven't tried to put effort into making the > translator produce more elegant output here, because I have been assumming we > might actually want to report tree builder errors to the error console > eventually (bug 512229). I really don't think we'll want to do that anytime soon. Netscape tried this back in the days and the result was a ton of added complexity to the parser which we're still paying for a decade later and yet have never taken advantage of. I wouldn't be surprised if some of the quirky parsing behaviors in gecko which have restricted HTML5 was also a result of this. Code is changeable, if we'll ever want to do error reporting for parse errors we can modify the code to deal with that then. > > I haven't found any real blockers for turning the parser on by default though, > > other than the ones I've explicitly noted already. > > That is, the flush timer and chardet creation? Yeah. Reviewing the patches here now.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Updated

•

15 years ago

Attachment #432525 - Flags: review?(jonas) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 29

•

15 years ago

Comment on attachment 437849 [details] [diff] [review] Address review comments in nsHtml5StreamParser.cpp >- if (totalByteCount < aCount) { // mimicking nsScanner even though this seems wrong >+ // There's an illegal byte in the input. It's now the responsibility >+ // of this calling code to output a U+FFFD REPLACEMENT CHARACTER and >+ // reset the decoder. >+ >+ if (totalByteCount < aCount) { >+ // advance over the bad byte > ++totalByteCount; > ++aFromSegment; > } >+#ifdef DEBUG >+ else { >+ NS_NOTREACHED("The decoder signaled an error but consumed all input."); >+ } >+#endif Simply putting NS_ASSERTION(totalByteCount < aCount, "...") before the if statement would avoid the #ifdef r=me either way.

Attachment #437849 - Flags: review?(jonas) → review+

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 30

•

15 years ago

Comment on attachment 439220 [details] [diff] [review] Address more review comments in C++ only parts > However in any case, when calling Cancel() you're always running the risk that > the other thread is already passed the are-we-cancelled check and thus will > fire. In fact, it might already be in the midst of firing already. OK. Not just me thinking Cancel/Fire don't look right to be thread-safe, then. (I heard nsITimer was thread-safe, tried to verify it, it didn't look right, so I started doubting myself and thinking it was thread-safe in some non-obvious way.)

Attachment #439220 - Flags: review?(jonas)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 31

•

15 years ago

(In reply to comment #29) > Simply putting NS_ASSERTION(totalByteCount < aCount, "...") before the if > statement would avoid the #ifdef > > r=me either way. Thanks. I made that change. http://hg.mozilla.org/mozilla-central/rev/47327da4d134 http://hg.mozilla.org/mozilla-central/rev/a8327985f6e4

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 32

•

15 years ago

Attached patch Address more review comments in C++ only parts (obsolete) (deleted) — Details — Splinter Review

This patch works around the thread-unsafety of nsITimer::Cancel.

Attachment #439220 - Attachment is obsolete: true

Attachment #439510 - Flags: review?(jonas)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 33

•

15 years ago

Comment on attachment 432403 [details] [diff] [review] Address review comments in nsHtml5Parser.cpp Looks good. Though please include a diff created with -w as well as the full patch when you're doing indentation changes.

Attachment #432403 - Flags: review?(jonas) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Updated

•

15 years ago

Attachment #438967 - Flags: review?(jonas) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Updated

•

15 years ago

Attachment #439222 - Flags: review?(jonas) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 34

•

15 years ago

Comment on attachment 439222 [details] [diff] [review] Address more review comments in Java parts >@@ -793,39 +793,33 @@ >+ if (foreignFlag != IN_FOREIGN) { >+ switch (mode) { >+ case INITIAL: >+ case BEFORE_HTML: >+ case AFTER_AFTER_BODY: >+ case AFTER_AFTER_FRAMESET: Indentation here is a bit strange, elsewhere you've indented the cases. Also, there's tabs here, which isn't used elsewhere. >+ /* >+ * A comment token Append a Comment node to the Document object >+ * with the data attribute set to the data given in the comment >+ * token. >+ */ >+ appendCommentToDocument(buf, start, length); >+ return; >+ case AFTER_BODY: >+ /* >+ * A comment token Append a Comment node to the first element in >+ * the stack of open elements (the html element), with the data >+ * attribute set to the data given in the comment token. >+ */ >+ flushCharacters(); >+ appendComment(stack[0].node, buf, start, length); >+ return; >+ default: >+ break; This used to break out differently. Was that change intentional or am I reading it wrong? Looks good otherwise.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 35

•

15 years ago

Comment on attachment 439510 [details] [diff] [review] Address more review comments in C++ only parts >+class nsHtml5TimerKungFu : public nsRunnable >+{ >+private: >+ nsHtml5RefPtr<nsHtml5StreamParser> mStreamParser; >+public: >+ nsHtml5TimerKungFu(nsHtml5StreamParser* aStreamParser) >+ : mStreamParser(aStreamParser) >+ {} >+ NS_IMETHODIMP Run() >+ { >+ if (mStreamParser->mFlushTimer) { >+ mStreamParser->mFlushTimer->Cancel(); >+ mStreamParser->mFlushTimer = nsnull; >+ } >+ return NS_OK; >+ } >+}; >+ >+void >+nsHtml5StreamParser::DropTimer() >+{ >+ if (mFlushTimer) { >+ nsCOMPtr<nsIRunnable> event = new nsHtml5TimerKungFu(this); >+ if (NS_FAILED(mThread->Dispatch(event, nsIThread::DISPATCH_NORMAL))) { >+ NS_WARNING("Failed to dispatch TimerKungFu event"); >+ } >+ } >+} I'm not sure I understand the purpose of canceling the timer on the parser thread. The problem that was brought up earlier was that calling cancel from the main thread might not prevent the timer from just having started firing. However this solution has the same problem. The timer might fire between the call to nsHtml5StreamParser::DropTimer and the call to nsHtml5TimerKungFu::Run. For that matter, the problem exists in the code that is in the tree already. By the time we cancel the main-thread-only timer, nsHtml5StreamParserTimerFlusher::Run might have just started running on the parser thread. Also, it would be great if you could diff patches with -P. Not a huge deal though.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 36

•

15 years ago

(In reply to comment #35) > I'm not sure I understand the purpose of canceling the timer on the parser > thread. The problem that was brought up earlier was that calling cancel from > the main thread might not prevent the timer from just having started firing. > > However this solution has the same problem. The timer might fire between the > call to nsHtml5StreamParser::DropTimer and the call to nsHtml5TimerKungFu::Run. The problem being addressed is the timer firing after the nsHtml5StreamParser instance it is associated with has been deleted. The timer doesn't hold a strong reference back to the nsHtml5StreamParser for two reasons: 1) chardet already takes nsHtml5StreamParser::Notify and 2) nsHtml5StreamParser is a CC participant and can't be refcounted off the main thread. nsHtml5TimerKungFu addrefs nsHtml5StreamParser (hence kungFu) so it stays alive until ::Run cancels the timer on the parser thread. This way, the timer may fire after the stream parser has been terminated but not after the stream parser has been deleted.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 37

•

15 years ago

(In reply to comment #34) > (From update of attachment 439222 [details] [diff] [review]) > >@@ -793,39 +793,33 @@ > >+ if (foreignFlag != IN_FOREIGN) { > >+ switch (mode) { > >+ case INITIAL: > >+ case BEFORE_HTML: > >+ case AFTER_AFTER_BODY: > >+ case AFTER_AFTER_FRAMESET: > > Indentation here is a bit strange, elsewhere you've indented the cases. Also, > there's tabs here, which isn't used elsewhere. Fixed. I had failed to import my old Eclipse code formatter settings when I migrated from Mac to Ubuntu. > >+ /* > >+ * A comment token Append a Comment node to the Document object > >+ * with the data attribute set to the data given in the comment > >+ * token. > >+ */ > >+ appendCommentToDocument(buf, start, length); > >+ return; > >+ case AFTER_BODY: > >+ /* > >+ * A comment token Append a Comment node to the first element in > >+ * the stack of open elements (the html element), with the data > >+ * attribute set to the data given in the comment token. > >+ */ > >+ flushCharacters(); > >+ appendComment(stack[0].node, buf, start, length); > >+ return; > >+ default: > >+ break; > > This used to break out differently. Was that change intentional or am I reading > it wrong? There's no intentional change. After re-reading this several times, I fail to see the problem. The non-default cases return in both old and new version, so no change there. The default case in the old case jumps out of the pointless loop. In the new version, there's no loop, so the default case has a no-op break just to silence compiler warnings. > Looks good otherwise. Thanks. http://hg.mozilla.org/mozilla-central/rev/2ba94ee301bf http://hg.mozilla.org/mozilla-central/rev/715e63b786d4 http://hg.mozilla.org/mozilla-central/rev/46be92d24873

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 38

•

15 years ago

Attached patch Address more review comments in C++ only parts, diffed differently (obsolete) (deleted) — Details — Splinter Review

(In reply to comment #35) > Also, it would be great if you could diff patches with -P. Not a huge deal > though. Did you mean -p? Here's the same patch with -p from hg diff. The previous patch was from hg export. (It would be so nice if hg export formatted the patches according to the hg diff config in .hgrc...)

Attachment #439510 - Attachment is obsolete: true

Attachment #439894 - Flags: review?(jonas)

Attachment #439510 - Flags: review?(jonas)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 39

•

15 years ago

Attached patch Address even more review comments (vol6) (obsolete) (deleted) — Details — Splinter Review

This patch is diffed with -p. Unfortunately, startTag() became mostly unreviewable due to formatting changes. I tried diffing with various whitespace-related flags, but the result is just as bad. I'll attach the changed .java file, too. (In reply to comment #28) > > Moving the destruction of the detector > > off-the-main-thread would be annoying in the nsIParser::Terminate and cycle > > collector cases. Wouldn't those then require creating a runnable solely to > > clean up the detector? > > One solution would be to only do this in debug builds, as that is the only > place where we'll hit the refcounting assertions. Do you actually want to hit refcounting assertions, though? What about making chardet implement addref/release using the thread-safe variants? > > > > I put an NS_WARNING here. > > > > > > Why not NS_ASSERTION? > > > > Because I'm not sure if there's a legitimate situation where that condition > > holds true. (I realize that not being sure about that isn't exactly good.) > > That's fine. If the assertion starts firing in legitimate situations we'll see > what that situation is and can replace the current comment with a description > of why this is needed. > > Assertions aren't in release builds so no risk that millions of users will be > affected. Changed to NS_NOTREACHED. > > > > > If so, would calling it something like MetaCharsetFound make sense? > > > > > > > > "Meta charset" is not HTML5 spec terminology. "[character] encoding > > > > declaration" is spec terminology. That's why the method is named the way it is > > > > named. In general, I have tried to stick to spec terminology when naming things > > > > even when the spec terminology deviates from colloquial usage. (Consider > > > > "entities" vs. "named character refernces".) > > > > > > How about EncodingDeclarationFound? Function names that are simple nouns sound > > > like getters to me. > > > > The other callback names, for better or worse, use SAX-style naming and SAX > > uses spec-wise correct nouns, so having this one not be a noun would be > > inconsistent with the naming of the rest of the callbacks. > > My impression is that right now it's sort of a mix, which is what makes things > hard to understand. Yes, some "interfaces" use SAX-style naming, but they are > implemented on classes which implemented both sax-style naming as well as other > interfaces, thus creating the mix. > > The main problem was that it was very hard to find docs. One solution is to > group functions together based on which interface they implement and then > create clear headers showing which interface is being implemented. That makes > it easier to find your way to the interface where the function should be > documented. For example like: > > http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsGenericElement.h#351 I've actually used this pattern in this particular case, but in this case, it didn't help locate the documentation in the .java files. Anyway, I'd rather add more comments pointing to the docs than rename stuff inconsistently. > > So for perf reasons, I don't want to go back to enums on the Java side. As for > > the C++ side, I've viewed the generated code as a way of talking to the C++ > > compiler--not as much a way of expressing things to human readers--and when > > talking to the compiler, it shouldn't matter either way. Therefore, in the > > interest of not pouring effort excessively into the translator, I haven't tried > > to make the translator turn groups of ints into enums in the translation. > > I think this is where we differ in view :) > > I think the C++ code needs to be readable to humans too, for a few reasons. The > most obvious reason is that during profiling and debugging, it's the C++ code > gecko developers will be looking at. > > Also, I wouldn't be surprised if we'll eventually end up forking the java and > C++ code. Though by no means I want that to happen anytime soon. If that > happens we'll be working with the C++ code directly. > > I am however fine with using #defines until we deem forking necessary, if that > ever happens. Added #defines. > > > nsHtml5Tokenizer::tokenizeBuffer > > > > > > 460 if (pos == buffer->getEnd()) { > > > 461 buffer->setStart(pos); > > > 462 } else { > > > 463 buffer->setStart(pos + 1); > > > 464 } > > > > > > Why is this needed? Shouldn't we treat buffers as empty if the start == end? > > > > The first is about getting ready to return because the tokenizer loop hit the > > end of the buffer. The else branch is about two things: Either the parser is > > suspending after a </script> or a charset <meta> and pos is the index of '>' or > > the tokenizer is burping at a CR and pos is the index of the CR. In either of > > those cases, when the buffer is examined the next time, the position should > > point to the character after the '>' or the CR (which may actually mean that > > the buffer gets marked as empty if pos pointed to the last character). > > Ah, so pos always points to the last character consumed? Yes. > > > nsHtml5TreeBuilder::doctype ... > > > What is the purpose of the loop here? It doesn't seem like we actually ever > > > loop. > > > > The purpose is to be able to be break of it to jump forward. > > But the switch statement is right inside it, so any break will break out of > that. Also, there aren't actually any break statements in the loop/switch. Simplified. > > Yeah, that switch on foreignFlag is mainly about being pointlessly consistent > > with startTag. > > > > Fixed. > > For what it's worth, the switch in startTag seems unneeded too and better > written as an if-statement. Whoa. I wonder if this code looked different when I put the switch in there. Simplified. > > In all these cases, there's a method call for reporting an error inside the if. > > With the exception of the one that calls isCurrent(), it should be safe to > > assume that the C++ compiler optimizes these away. Apart from relying on the > > C++ compiler's optimizer, I haven't tried to put effort into making the > > translator produce more elegant output here, because I have been assumming we > > might actually want to report tree builder errors to the error console > > eventually (bug 512229). > > I really don't think we'll want to do that anytime soon. Netscape tried this > back in the days and the result was a ton of added complexity to the parser > which we're still paying for a decade later and yet have never taken advantage > of. I wouldn't be surprised if some of the quirky parsing behaviors in gecko > which have restricted HTML5 was also a result of this. > > Code is changeable, if we'll ever want to do error reporting for parse errors > we can modify the code to deal with that then. I've changed the translator to output nicer code in this case. TANSTAAFL, though: The translator is now uglier.

Attachment #440199 - Flags: review?(jonas)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 40

•

15 years ago

Attached file nsHtml5TreeBuilder.cpp as of the "vol 6" patch (obsolete) (deleted) — Details

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 41

•

15 years ago

Attached file TreeBuilder.java as of the "vol 6" patch (obsolete) (deleted) — Details

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 42

•

15 years ago

(In reply to comment #28) > > > I'll file a separate bug. If we're lucky (which it sounds like based on your > > > description) all we need to do is to move the creation of the detector to the > > > thread where its used. > > > > What would be the benefit except being able to avoid the detector creation in > > some cases and being able to delete it earlier in others? It seems to me that > > it's not a thread/ problem that it's allocated on one thread, used on another > > and destroyed on the creator thread but only after the other thread isn't going > > to use the object anyway. > > The problem with the current situation is that if someone changes the > implementation of the detector to do an extra addref/release inside one of the > functions that you are calling on the separate thread, then that person will > start hitting a bunch of threadsafty warnings. Despite that person not doing > anything wrong, or there really being a problem anywhere. However that person > won't be able to check in his/her changes until they've figured out how the > HTML5 parser works and how to change it to not fire the assertions. I recalled why I didn't just make the chardet addref/release thread-safe: The detector addrefs/releases its observer. The observer is the stream parser, which is a cycle collection participant and, therefore, can't be addrefed or released off the main thread. Since your problem scenario is someone changing the detector, since there isn't a current thread-safety problem and since fixing the anticipated problem ahead of time is complex due to cycle collection, can we consider this a bridge we cross when someone actually wants to change the detector? The detector is about as stable as code in Gecko gets. > > In all these cases, there's a method call for reporting an error inside the if. > > With the exception of the one that calls isCurrent(), it should be safe to > > assume that the C++ compiler optimizes these away. Apart from relying on the > > C++ compiler's optimizer, I haven't tried to put effort into making the > > translator produce more elegant output here, because I have been assumming we > > might actually want to report tree builder errors to the error console > > eventually (bug 512229). > > I really don't think we'll want to do that anytime soon. Netscape tried this > back in the days and the result was a ton of added complexity to the parser > which we're still paying for a decade later and yet have never taken advantage > of. I wouldn't be surprised if some of the quirky parsing behaviors in gecko > which have restricted HTML5 was also a result of this. Adding error reporting to the parser wouldn't add a ton of complexity. Since the Java version already supports error reporting, the required complexity is already known. I'd expect the perf cost in the tree builder to be tiny. In the tokenizer, I'd expect the perf cost to be notable but smaller than the cost of supporting view source highlighting. My current plan is to generate a second copy of the main tokenizer loop for view source. One option would be putting tokenizer error reporting only in that copy and reporting errors only if the user views source (possibly by baking it into the source highlighting and not reporting to the console at all).

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 43

•

15 years ago

Attached patch Address more review comments in C++ only parts, with an added null check (deleted) — Details — Splinter Review

Attachment #439894 - Attachment is obsolete: true

Attachment #440481 - Flags: review?(jonas)

Attachment #439894 - Flags: review?(jonas)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 44

•

15 years ago

Attached patch Interdiff for the null check addition in attachment 440481 (deleted) — Details — Splinter Review

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 45

•

15 years ago

Comment on attachment 440481 [details] [diff] [review] Address more review comments in C++ only parts, with an added null check >+nsHtml5StreamParser::FlushTreeOpsAndDisarmTimer() >+{ Please add an assertion to check that this is called on the right (main?) thread. Those assertions were awesome to have when reviewing and trying to grokk the code. > nsHtml5StreamParser::ContinueAfterFailedCharsetSwitch() > { > NS_ASSERTION(NS_IsMainThread(), "Wrong thread!"); > mozilla::MutexAutoLock tokenizerAutoLock(mTokenizerMutex); >- Uninterrupt(); > nsCOMPtr<nsIRunnable> event = new nsHtml5StreamParserContinuation(this); > if (NS_FAILED(mThread->Dispatch(event, nsIThread::DISPATCH_NORMAL))) { > NS_WARNING("Failed to dispatch nsHtml5StreamParserContinuation"); > } > } Is grabbing the mutex still needed? >+class nsHtml5TimerKungFu : public nsRunnable >+{ >+private: >+ nsHtml5RefPtr<nsHtml5StreamParser> mStreamParser; >+public: >+ nsHtml5TimerKungFu(nsHtml5StreamParser* aStreamParser) >+ : mStreamParser(aStreamParser) >+ {} >+ NS_IMETHODIMP Run() >+ { >+ if (mStreamParser->mFlushTimer) { >+ mStreamParser->mFlushTimer->Cancel(); >+ mStreamParser->mFlushTimer = nsnull; >+ } >+ return NS_OK; >+ } >+}; Ok, now I see what is happening. The important part here isn't that the cancel call is happening on any particular thread. But that you addref the stream parser on the main thread, and then post a message to the parser thread, and then post a message back again to the main thread and only then release the stream parser. This ensures that the stream parser is being kept alive until after we're done processing the timer, or cancelling it. And the nsHtml5RefPtr is what's making it all happen. Is this correct? And it seems like the same setup is being used to keep the stream parser alive when sending necko data to it? Please document this thoroughly. Also document the fact that while we're calling various functions on the stream parser on the parser thread, it's critical that it's never addreffed or released there. r=me with that.

Attachment #440481 - Flags: review?(jonas) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Comment 46

•

15 years ago

Comment on attachment 440199 [details] [diff] [review] Address even more review comments (vol6) Ugh, yeah, this patch unreadable :( I sort of skimmed the resulting java file, but if there was anything in particular you'd like me to look at let me know. I still see a few overuses of switch statements, for example when looking for whitespace and in generateImpliedEndTagsExceptFor and generateImpliedEndTags. But we can always deal with that over time.

Attachment #440199 - Flags: review?(jonas) → review+

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 47

•

15 years ago

(In reply to comment #45) > (From update of attachment 440481 [details] [diff] [review]) > >+nsHtml5StreamParser::FlushTreeOpsAndDisarmTimer() > >+{ > > Please add an assertion to check that this is called on the right (main?) > thread. Those assertions were awesome to have when reviewing and trying to > grokk the code. Added NS_ASSERTION(IsParserThread(), "Wrong thread!"); > > nsHtml5StreamParser::ContinueAfterFailedCharsetSwitch() > > { > > NS_ASSERTION(NS_IsMainThread(), "Wrong thread!"); > > mozilla::MutexAutoLock tokenizerAutoLock(mTokenizerMutex); > >- Uninterrupt(); > > nsCOMPtr<nsIRunnable> event = new nsHtml5StreamParserContinuation(this); > > if (NS_FAILED(mThread->Dispatch(event, nsIThread::DISPATCH_NORMAL))) { > > NS_WARNING("Failed to dispatch nsHtml5StreamParserContinuation"); > > } > > } > > Is grabbing the mutex still needed? No. Removed. > >+class nsHtml5TimerKungFu : public nsRunnable > >+{ > >+private: > >+ nsHtml5RefPtr<nsHtml5StreamParser> mStreamParser; > >+public: > >+ nsHtml5TimerKungFu(nsHtml5StreamParser* aStreamParser) > >+ : mStreamParser(aStreamParser) > >+ {} > >+ NS_IMETHODIMP Run() > >+ { > >+ if (mStreamParser->mFlushTimer) { > >+ mStreamParser->mFlushTimer->Cancel(); > >+ mStreamParser->mFlushTimer = nsnull; > >+ } > >+ return NS_OK; > >+ } > >+}; > > Ok, now I see what is happening. The important part here isn't that the cancel > call is happening on any particular thread. But that you addref the stream > parser on the main thread, and then post a message to the parser thread, and > then post a message back again to the main thread and only then release the > stream parser. > > This ensures that the stream parser is being kept alive until after we're done > processing the timer, or cancelling it. And the nsHtml5RefPtr is what's making > it all happen. > > Is this correct? Correct. > And it seems like the same setup is being used to keep the stream parser alive > when sending necko data to it? Yes. > Please document this thoroughly. Also document the fact that while we're > calling various functions on the stream parser on the parser thread, it's > critical that it's never addreffed or released there. > > r=me with that. Thanks. I'll land the patch with these changes tomorrow so that I can watch the tinderbox.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 48

•

15 years ago

Attached patch Address even more review comments (vol6), rev 2 (deleted) — Details — Splinter Review

(In reply to comment #46) > (From update of attachment 440199 [details] [diff] [review]) > Ugh, yeah, this patch unreadable :( > > I sort of skimmed the resulting java file, but if there was anything in > particular you'd like me to look at let me know. There was a bug in the unreviewable part. I've uploaded a corrected patch here, but it's unreviewable, too. And the interdiff is unreviewable, so I just say what changed since the previous patch: The big switch in startTag was in the |else| branch of |if (inForeign)|. In this patch, there's no |else| branch and the big switch is just after the |if|. Sorry about bothering you a second time with this patch. This didn't get caught earlier, because we don't have full html5lib tree builder test mochitest coverage (bug 559023) and because when I ran the full test suite with the Java version of the parser, I failed to notice that the test harness got stuck in an infinite loop instead of stopping running.

Attachment #440199 - Attachment is obsolete: true

Attachment #440200 - Attachment is obsolete: true

Attachment #440201 - Attachment is obsolete: true

Attachment #440769 - Flags: review?(jonas)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 49

•

15 years ago

Attached patch One-line change to make CC unlinking safer (deleted) — Details — Splinter Review

This patch (on top of attachment 440481 [details] [diff] [review]) should make the case where the cycle collector unlinks nsHtml5Parser before nsHtml5StreamParser safer.

Attachment #440978 - Flags: review?(jonas)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Updated

•

15 years ago

Attachment #440769 - Flags: review?(jonas) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Reporter

Updated

•

15 years ago

Attachment #440978 - Flags: review?(jonas) → review+

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 50

•

15 years ago

Thanks for the reviews. Pushed two patches: http://hg.mozilla.org/mozilla-central/rev/9732d2b16662 http://hg.mozilla.org/mozilla-central/rev/a1c53e4881ca The "vol 6" is the only one unlanded patch on this bug. I didn't land it in the hopes that I might get r+ on bug 548232 soon, which would avoid merge conflicts from reordering the "vol 6" patch and the patch from bug 548232.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 51

•

15 years ago

Pushed: http://hg.mozilla.org/mozilla-central/rev/cdacbfe66545

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Address review comments in nsHtml5Parser.cpp 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
Address review comments in nsHtml5Parser.cpp 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch	sicking : review+	Details \| Diff \| Splinter Review
Address review comments in nsHtml5TreeOpExecutor.cpp 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch	sicking : review+	Details \| Diff \| Splinter Review
Address review comments in nsHtml5StreamParser.cpp 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch	sicking : review+	Details \| Diff \| Splinter Review
Rename the getter for the length of list of active formatting elements 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch	sicking : review+	Details \| Diff \| Splinter Review
Address more review comments in C++ only parts 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
Address more review comments in Java parts 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch	sicking : review+	Details \| Diff \| Splinter Review
Address more review comments in C++ only parts 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
Address more review comments in C++ only parts, diffed differently 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
Address even more review comments (vol6) 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch	sicking : review+	Details \| Diff \| Splinter Review
nsHtml5TreeBuilder.cpp as of the "vol 6" patch 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), text/plain		Details
TreeBuilder.java as of the "vol 6" patch 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), text/plain		Details
Address more review comments in C++ only parts, with an added null check 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch	sicking : review+	Details \| Diff \| Splinter Review
Interdiff for the null check addition in attachment 440481 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch		Details \| Diff \| Splinter Review
Address even more review comments (vol6), rev 2 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch	sicking : review+	Details \| Diff \| Splinter Review
One-line change to make CC unlinking safer 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11) (deleted), patch	sicking : review+	Details \| Diff \| Splinter Review