Closed Bug 76722 Opened 24 years ago Closed 23 years ago

Gecko blocks the main thread for > 1 sec processing events

Categories

(Core :: Layout, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9.3

People

(Reporter: phil, Assigned: harishd)

References

()

Details

(Keywords: perf, Whiteboard: [pdt+][nsBranch+][fixed and verified on the trunk])

Attachments

(16 files)

(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
One of our embedding customers tells us that they see Gecko blocking the main thread while processing events. Sometimes one event takes more than 1 sec to process. kmcclusk, can you look into this and see what you can find?
related to hyatt/darin work on flow control analysis???? do we have bugs filed on that analysis and possible fixes yet?
Is there a particular site where this is happening?
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.1
A single content flush and a single reflow on www.cnn.com typically exceed 1 second on a 433Mhz PC running WINNT. When a single content flush or reflow is processed we do not return to the message pump until they have completed. There are two possible solutions: 1) Implement interruptible reflow and content creation (A big job) 2) Limit the amount of data that processed by a single content sink flush I'll explore alternative 2.
If you've cvs add'ed nsDelayAlarm.*, you can include them in the patch via cvs diff -u -N .... /be
In my profiles, it isn't content creation that takes a lot of time. It's frame creation (which right now happens in the same event as content creation). It is during frame creation that style resolution and image load kickoffs happen, and these are the expensive operations. I don't see a problem with doing many more ContentAppended and ContentInserted operations than we do now as long as we fix the O(n^2) problems with appending content and get a little smarter about coalescing reflows that occur as a result of appending and inserting content.
I instrumented: nsParser::OnDataAvailable nsHTMLContentSink::FlushTags nsPresShell:ProcessReflowCommand nsCSSFrameConstructor.cpp => frame construction. On www.cnn.comI saw the sequence of OnDataAvailable -> Chunk of Markup FlushTags OnDataAvailable -> Chunk of Markup FlushTags .... Until almost the entire page loads then I finally saw almost all of the frames constructed as the result of a single call to FlushTags, followed by a single reflow. Almost all of the frames are created and reflowed in one shot which leads to frame construction and reflow delay's for a single event which are greater than one second If I save www.cnn.com locally and remove the links to external style sheets.I see the following sequence OnDataAvailable==> Chunk of Markup FlushTags Frame Construction=> Frames corresponding to the Markup created Process Reflow Command ... OnDataAvailable==> Chunk of Markup FlushTags Frame Construction=> Frames corresponding to the Markup created Process Reflow Command I discussed this with rickg@netscape.com and attinasi@netscape.com The inclusion of the external sheets forces the parser to block until all of the external style sheets have been loaded. Once they have loaded, the parser is unblocked and it sends all of the data it was caching while in the blocked state over to the content sink. This is what causes the majority of the frame construction and reflowing to be done in one shot. We also seem to spent too much time away for the message pump when processing long documents. I will try modifying how much data necko passes to the parser's stream listener by changing nsStorageTransport.cpp's MAX_IO_CHUNK from 8192 to 1024.
To prevent the parser from sending all of the data that has been cached while it is blocked in one shot we would need to limit the amount of data that it could pass to the content sink and schedule messages to complete the flushing of the parsers cache.
"In my profiles, it isn't content creation that takes a lot of time". I agree. The content sink flush which results in the creation of the frames is what takes the majority of the time. Thats what I meant when I referred to content flush, not the creation of the content itself, although I erroneously referred to content creation in solution 1) :-)
Changing the buffer size is dangerous, since you don't want to stall Necko. If Necko can keep loading that's a good thing. IMO it's better to make the content sink only flush small amounts to the frame constructor and still allow the parser to queue up 8192 blocks. I think your original idea of tuning the content sink is the way to go here.
Although I guess we should tune the parser a little bit to make sure it doesn't synchronously parse a whole huge amount of content. :) I assume the parser can literally grab the entire page from Necko (e.g., 80K) while blocked, and that it does something like parse the entire 80K and send a bunch of synchronous notifications to the content sink? If that's what it does, you're right that we should tune the parser to post events instead of parsing in one fell swoop.
As I told Kevin and Attinasi, it would be very easy to gate the parser on how much content to toss to the content model in any one cycle. We will be discussing this a bit more tomorrow if folks want to join in.
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Blocks: 71668
kevin, how are we doing with your patch? Are we able to run through I-BENCH test or jrgm's page-load test?
"Are we able to run through I-BENCH test or jrgm's page-load test?" I was able to run through I-BENCH and jrgm's page loader test on Linux with a release build. The page load times with jrgm's test were within the range of error for me of approx 4% so I couldn't see any impact from the patch. The I-BENCH scores for the initial page load time were not consistent. I need to tune the prefs which control how frequently the parser is interruptibted and how often the system clock is sampled. After I have done some tuning I'll re-run I-BENCH and jrgm's page loader test.
Just to clarify. The solution that rickg and I have been working on is really not the correct solution to this problem in the long term. The correct solution would to be to make reflow and frame construction interruptible. Making them interruptible would be a large task so we have been exploring using the parser's pre-existing incremental/interruptible capability to limit the number of frames that are created and reflowed in one shot. If both reflow and frame construction were interruptible then there would no need to interrupt the parser's processing of tokens. The patch I have attached interrupts the parser based on how much time has been spent processing tokens. In addition the patch sets up a PL_Event to continue calling the parser to continue processing if the document has been completely loaded, but the parser has been interrupted.
So you can do something like process some tokens, schedule an event, then during the processing of that event you parse some more tokens and possible schedule another event, and so on? That sounds good to me.
Thats right. The continue parsing PL_event only needs to be scheduled if the parser's OnDataAvailable has been called for the last time. If necko is still loading a document the tokens that were left around when the parser was interrupted will be processed the next time the parser's OnDataAvailable is called (Provided it doesn't interrupt again). I also added a dummy parser request to the load group to prevent JavaScript onload handlers and other document done processing from happening until the final chunk of data has been tokenized and passed over to the content sink.
Let me know if/when you're ready for a review. I'd be happy to look at this.
Hyatt: you're close. As KM said, the parser is already incremental and interruptable. All we're really doing is using that in conjuction with a gating function that tells us to quit scanning/tokenizing/model-buildling once a given threshold has been exceeded. The trick now is to find the right threshold that: 1) provides improved user responsiveness; 2) doesn't slow normal page construction; 3) performs the max possible work per iteration.
I am not sure if i miss some info, but i think that logic in HTMLContentSink::DidProcessAToken is wrong. First that == should be >=, but if you want to check time after every 20 token, it should be more like this: HTMLContentSink::DidProcessAToken(void) { mTokenCount++; if ((mInterruptParsing) && (mTokenCount >= mMaxTokenCount)) { mTokenCount = 0; if (mDelayTimer.HasExceeded(mMaxProcessingTime)) return NS_ERROR_HTMLPARSER_INTERRUPTED; } return NS_OK; }
Do not be discouraged if this patch causes a slight slowdown on page load times. I would expect it to. By not hogging the event queue, we're going to make paint suppression really unsuppress painting at the right time, and we're going to get a few more cycles of animation and progress. The tradeoff is that pages with CSS should show some content earlier, so you may have a positive perception gain even though the overall time slows.
actually, from the work bbaetz has been doing, you'll most likely see a big win on pages with many images. (turns out that this is related to why we seem to do better with fewer connections per server per page.) because the ui thread is hogged for so long (and the ui thread's event queue essentially blocked), http connections are not reclaimed by the http handler (since this happens on the ui thread) as early as they really could be. new connections are not started until the old ones are reclaimed. in fact, bbaetz has found that this contributes to about a 35% slowdown on pages like www.netscape.com. overall, it amounts to about a 5% slowdown on jrgm's page-loader test. dropping the number of http connections actually helped with this problem because it essentially reduced the number of active consumers of http, and hence kept the ui thread's event queue free'er... this is just an educated guess, though. we're working on a patch that will enable http to reclaim connections on the socket transport thread to get around this problem.
Well, good. Let's hope for the best! :)
My bug is bug 83772. Comments welcome...
I tried this patch, and I crash on the ibench tests. I can't seem to get a backtrace out of gdb though. I can run ibench without the patch applied.
The patch is ready for review/super-review I will check it in disabled by setting mCanInterruptParsing to false. This will prevent the content sink from ever returning NS_ERROR_HTMLPARSER_INTERRUPTED so it will never schedule ParserContinue events and the Dummy parser requests will not get added to the load group. I tuned the values for: mMaxTokenProcessingTime - New throttle I added to control Maximumum time spent processing tokens. mNotificationInterval - Existing throttle for controlling how often the content sink flushes. I tuned the values by loading: http://lxr.mozilla.org/seamonkey/source/view/src/nsViewManager.cpp and http://lxr.mozilla.org/seamonkey/find?string=nsCSSFrameConstructor.cpp then I reloading them from the cache to get consistent load times. I set the default values for mMaxTokenProcessingTime and mNotificationInterval as low as possible without affecting page load time. I set mMaxTokenProcessingTime to be a multiple of mNotificationInterval by default because the two are linearly related. Both default values can be overridden using prefs. I did this on the following machines with release builds: 500Mhz Linux build. 750Mhz WINNT system 433Hmz WINNT system 200Mhz WIN95 system I also ran I-BENCH and jrgm's page-load test. The current settings do not affect page load times but they DO NOT make Mozilla super-responsive during the load of large pages. They make Mozilla more responsive but its not as responsive as I would like. We need to improve the performance of incremental reflows so we can reduce the value for mNotificationInterval and mMaxTokenProcessingTime without affecting page performance. As a test to see that it is working you can set the user pref for mNotificationInterval to an absurdly low value user_pref("content.notify.interval", 1); and load: http://lxr.mozilla.org/seamonkey/find?string=nsCSSFrameConstructor.cpp Mozilla will be extremly responsive during the load. During the page load, you can click on menus and scroll as if the document was completed loaded. You will see small chunks of content added and reflowed. The default values are: user_pref("content.max.tokenizing.time", 360000); user_pref("content.notify.interval", 120000); You can turn off the interrupting of the parser using the following pref: user_pref("content.interrupt.parsing", true);
Whiteboard: waiting for review/super-review
"I tried this patch, and I crash on the ibench tests. I can't seem to get a backtrace out of gdb though. I can run ibench without the patch applied." I just applied the latest patch to a pull from this morning on Linux. I ran the I-Bench performance tests with a release build and it worked fine, it didn't crash. I ran both the specific HTML page load test http://i-bench.zdnet.com/ibench/performance_tests/perf_loadspeed.php as well as running the performance tests with all of the default settings on http://i-bench.zdnet.com/ibench/testlist/run.php If you think the parser interruption code is causing a problem you can turn off the interruption of the code either through the pref user_pref("content.interrupt.parsing", false); or you can modify mozilla/content/html/document/src/nsHTMLContentSink.cpp at line 2492 and change mCanInterruptParsing = PR_TRUE; to: mCanInterruptParsing = PR_FALSE;
I should have noted earlier: I had applied the patch 'attach_id=36855' after bbaetz comment, to a win2k build and could run the ibench stuff with no problem.
Drat. I thought I updated my comment here, but obviously not. The crash was repeatable, going away when I reverted the patch, and then returning when I reapplied it, but then (without updating from cvs) it just disappeared. I had no luck in getting a backtrace out of gdb, so I have no idea what caused it. I was seeing slower ibench times from home -> ibench.zdnet.com (ie higher latency than to dogspit interanlly) with the patch, though, but if noone else is seeing those then I'll just blame that as pilot error.
With the latest patch 37679, I improved performance by replacing PR_Now calls with PR_Interval_Now. I ran I-BENCH on Linux with a release build and I am not seeing any change in page load times.
Keywords: perf
OS: Linux → All
Hardware: PC → All
I see one problem with this patch, steps to reproduce: Go bugzilla query page, enter "perf" to keywords-field, and submit query -> Page loads ok, but throbber keeps spinning forever. When i set pref user_pref("content.interrupt.parsing", false); problem goes away. This is on linux. I have some other changes on my tree, so not 100% sure is it this patch.
I don't even have the patch installed (and never have) and I have the problem of the throbber never stopping, but only with the classic skin. CVS build from 06/11, RedHat 7.1
I tried Kevin's patch on my Mac OS 9.1 400 MHz w/ 384 MB of memory and I did not see any side-effects. However, using the default setting, I could not tell any change in responsiveness. Then again, it wasn't that bad without it. More performance tuning needs to be done as cranking the setting up the max did show more responsiveness. Let's get this checked in to get more eyes looking at it.
Comments on patch: 1) in nsParser::PostContinueEvent() you should check for (nsnull != mEventQueue) before creating the new event - if null, ev will leak (and it is a waste of time creating it). 2)new data members in nsParser.h / nsHTMLContentSink.cpp: should we use PRPackedBool instead of PRBool? Probably no big deal sinec there are very few parser or content sink instances. 3) Initialize the statics for DummyParserRequest? These are all really minor comments. The code look good, and it looks like the testing has been thorough too. I did not see any problems with the use of the mCanInterruptParsing data member, but did you check to make sure that disabling interrupt-of-parsing does indeed disable it? If so, then sr=attinasi
The patch works fine for me too (but I'm biased). :) Couple of small nits: 1. In nsParserContinueEvent::HandleEvent(), make sure the parser argument is non-null. 2. This whole process needs to be documented. The parser was always incremental, but this is a new kind of incrementalism, and should be described. To attinasi: packed bool won't help. If you need 1 bit, it still costs you eight. Rick
[Nit on a nit: PRPackedBool is eight bits (unsigned char). PRBool is 32 (int).]
This sounds very related to bug 15122.
Whiteboard: waiting for review/super-review → waiting for approval
patch id=36536 looks good. r=harishd
patch id=38536 looks good. r=harishd
sr=vidur@netscape.com per phone conversation with Vidur
Blocks: 83989
a=chofmann
Checked in patch 38536 with the default for content.interrupt.parsing set to false. To test interrupting the parser to improve user interactivity during long page loads set the pref user_pref("content.interrupt.parsing", true); You can increase the responsiveness by changing content.notify.interval. The default is user_pref("content.notify.interval", 120000); if you set it to user_pref("content.notify.interval", 1000); Mozilla will get very responsive during long page loads, but overall page load times will increase. The current defaults are tuned to not affect overall page load times.
Whiteboard: waiting for approval → Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js
Whiteboard: Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js → Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js, critical for 0.9.2
I am still seeing problem with bugzilla queries. When setting user_pref("content.interrupt.parsing", true); after bugzilla query throbber just keep spinning. When setting pref to false, throbber stops after query. Tested with build 2001062121 in linux and new profile.
I think we have what we need for mozilla 0.9.2 with the addtion of the prefs to get us some help with testing. moving this off the critical for 0.9.2 list. adding the nsbranch keyword so we can track if we want to take it for the netscape release later on down the road.
Keywords: nsBranch
Whiteboard: Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js, critical for 0.9.2 → Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js,
Bugzilla queries are likely suffering from another problem. See bug 81524.
No its not that ui freezes, mozilla works fine, just trobber keeps spinning, AND this only happens when i enable content.interrupt.parsing.
Harish and I looked at the problem with the bugzilla query. The problem is that necko loads the entire document then starts a new load of the same document for some reason. Necko never passes the flag to indicate the document has completed for the second load so the DummyParserRequest that is added when content.interrupt.parsing is enabled never gets removed from the load request so the throbber keeps going.
The bugzilla query page is multipart/x-mixed-replace, so the document should be loading twice, shouldn't it? Or something like that.
Depends on: 87370
With Darin's help I traced into the multi-part converter code and the problem is that the nsMultiMixedConv implementation doesn't pass the OnStopRequest through to the parser for the second document. I filed bug 87370 for this problem.
Target Milestone: mozilla0.9.2 → mozilla0.9.3
The patch I attached will dynamically switch between two different settings for the content.notify.interval and max.token.processing.interval based on the users actions. The basic algorithm is if a user event (mouse mouse, keypress etc.) has occurred in the last second then use values which maximizes interactive responsiveness. If a user event has not occurred in the last second then return to the values which maximize page-load performance. The idea is that if the user is going to grab the thumb of the scrollbar or make a menu selection they would prefer to emphasize the processing of the user-events over page-load.
sr=attinasi for the patch ID 40220 The patch makes interaction really sweet, and actually makes the app look 'smart' when a large page is loading and you reach for the scroll-thumb because the mouse-movement slows down the load and makes grabbing the thumb easier. Also, the menus are way more responseive when you want to access them, but page load is full-speed when the mouse is left alone. This is a great improvement over the static inverval!
Remove + printf("Enable interrupting parser %d\n", enableInterruptParsing); with that r=harishd.
I did some runs with a 06/26 trunk build on linux and win98. (I didn't get to mac, since I was chasing down a different issue [bug 88166] at the same time). What I did was to create a new profile and set it up with sidebar collapsed, in modern skin, and then after a few restarts, run the page-loader test. The first run was with the default settings, which makes this turned off at the moment. Between each test, I delete NewCache, history.dat, cookies.txt and restarted a few times to get to the same initial state. I ran tests with the pref set to true and timing values of 120000, 10000, 40000, 80000, and then back to 120000, and finally again as 'off' (disabled). Linux/500MHz/128MB/rh9.1 times in msec. cached uncached 1) off 1775 1949 2) 120000 1795 1925 3) 10000 1998 2107 4) 40000 1795 1914 5) 80000 1777 1896 6) 120000 1776 1933 7) off 1765 1890 win98/500MHz/128MB times in msec. cached uncached 1) off 1404 1680 2) 120000 1442 1656 3) 10000 1471 1713 4) 40000 1456 1684 5) 80000 1431 1675 6) 120000 1454 1665 7) off 1413 1640 So, it is effectively neutral (according to my testing) on Linux [i.e., can't differentiate the changes from 1% noise]. However, I do see a measurable difference of about ~3% for cached load times on win98, for a setting of 120000 (the default). (I should note that I think this patch has much different dynamics on win98 than on win NT variants: when I ran on win2k with a setting of 10000 usec, the DOM L2 spec shot up from ~4 seconds to ~30 seconds to finish loading. But on win98, I only went from ~4 to 5.5 seconds (I think).) It would still be good to run the ibench test, and to check the Mac (my bad). Sorry not getting to this earlier. (p.s. I am away until Monday afternoon).
I found a problem: when loading http://i-bench.zdnet.com/ibench/i-bench.htm if I have the lastest patch installed and I move the pointer around rapidly to cause the parser to be interrupted very early on in the load process the page (sometimes) doesn't load at all. At the beginning of the i-bench document it does a document.write of a script tag which has an href to external file that contains JavaScript to execute. I believe this triggers the problem. I also get an assertion: 0[ba2c40]: ###!!! ASSERTION: nsHTMLDocument::Reset() - dummy doc write request s till exists!: 'mDocWriteDummyRequest == nsnull', file S:\mozilla\content\html\do cument\src\nsHTMLDocument.cpp, line 371 The following is the stack that triggers the assertion: nsDebug::Assertion(const char * 0x01cf2a9c, const char * 0x01cf2a7c, const char * 0x01cf2a44, int 371) line 290 + 13 bytes nsHTMLDocument::Reset(nsHTMLDocument * const 0x028f7a40, nsIChannel * 0x02de5920, nsILoadGroup * 0x00ed4ee0) line 371 + 49 bytes nsHTMLDocument::OpenCommon(nsIURI * 0x028e0760) line 2137 + 39 bytes nsHTMLDocument::Open(nsHTMLDocument * const 0x028f7b80, nsIDOMDocument * * 0x0012e390) line 2244 + 18 bytes nsHTMLDocument::Open(nsHTMLDocument * const 0x028f7b7c) line 2212 + 40 bytes nsHTMLDocument::WriteCommon(const nsAString & {...}, int 0) line 2309 + 31 bytes nsHTMLDocument::ScriptWriteCommon(int 0) line 2425 + 19 bytes nsHTMLDocument::Write(nsHTMLDocument * const 0x028f7b80) line 2452 XPTC_InvokeByIndex(nsISupports * 0x028f7b80, unsigned int 19, unsigned int 0, nsXPTCVariant * 0x0012e83c) line 139 XPCWrappedNative::CallMethod(XPCCallContext & {...}, XPCWrappedNative::CallMode CALL_METHOD) line 1881 + 42 bytes XPC_WN_CallMethod(JSContext * 0x00f6f420, JSObject * 0x00dab2b0, unsigned int 1, long * 0x024636ec, long * 0x0012ea70) line 1252 + 11 bytes js_Invoke(JSContext * 0x00f6f420, unsigned int 1, unsigned int 0) line 807 + 23 bytes js_Interpret(JSContext * 0x00f6f420, long * 0x0012f888) line 2702 + 15 bytes js_Execute(JSContext * 0x00f6f420, JSObject * 0x00dab138, JSScript * 0x02dffe40, JSStackFrame * 0x00000000, unsigned int 0, long * 0x0012f888) line 986 + 13 bytes JS_EvaluateUCScriptForPrincipals(JSContext * 0x00f6f420, JSObject * 0x00dab138, JSPrincipals * 0x028f60b0, const unsigned short * 0x02420028, unsigned int 741, const char * 0x02dfe630, unsigned int 1, long * 0x0012f888) line 3273 + 25 bytes nsJSContext::EvaluateString(nsJSContext * const 0x00f6e920, const nsAString & {...}, void * 0x00dab138, nsIPrincipal * 0x028f60ac, const char * 0x02dfe630, unsigned int 1, const char * 0x00000000, nsAString & {...}, int * 0x0012f8f4) line 608 + 85 bytes nsScriptLoader::EvaluateScript(nsScriptLoadRequest * 0x029bde90, const nsAFlatString & {...}) line 565 nsScriptLoader::ProcessRequest(nsScriptLoadRequest * 0x029bde90) line 477 + 22 bytes nsScriptLoader::OnStreamComplete(nsScriptLoader * const 0x028f50c4, nsIStreamLoader * 0x029bdbd0, nsISupports * 0x029bde90, unsigned int 0, unsigned int 741, const char * 0x02dfe750) line 756 nsStreamLoader::OnStopRequest(nsStreamLoader * const 0x029bdbd4, nsIRequest * 0x029bd8d0, nsISupports * 0x00000000, unsigned int 0) line 120 + 81 bytes nsStreamListenerTee::OnStopRequest(nsStreamListenerTee * const 0x02dfeac0, nsIRequest * 0x029bd8d0, nsISupports * 0x00000000, unsigned int 0) line 25 nsHttpChannel::OnStopRequest(nsHttpChannel * const 0x029bd8d4, nsIRequest * 0x029b9bf0, nsISupports * 0x00000000, unsigned int 0) line 2111 nsOnStopRequestEvent::HandleEvent() line 161 nsARequestObserverEvent::HandlePLEvent(PLEvent * 0x029cc9e4) line 64 PL_HandleEvent(PLEvent * 0x029cc9e4) line 590 + 10 bytes PL_ProcessPendingEvents(PLEventQueue * 0x00f32f50) line 520 + 9 bytes _md_EventReceiverProc(HWND__ * 0x00050778, unsigned int 49363, unsigned int 0, long 15937360) line 1071 + 9 bytes USER32! 77e7124c() 00f32f50() My guess is that we have some code that assumes that when the stream is complete that all of the document has been tokenized and the content model is complete. The interrupting of the parser is disabled if we are in a sub-parser context so it will not interrupt the parser when it's in the middle of the parsing the result of a document.write, but there may be some other places where we need to disallow the interrupting of the parser.
Kevin, does that page fail to load with the interval set to a very low number _without_ the patch that makes it dynamic? I would think that it would, since the dynamic stuff just sets the interval really low...
If I set initial value for the mNotificationInterval to be the same as the value that it is lowered to when there is a user event it does not fail. The assertion only appears when the mNotificationInterval is dynamically switched between two values.
Ok, I found the problem. I missed one of the places where I should have replaced mNotificationInterval with GetNotificationInterval: nsHTMLContentSink.cpp PRBool HTMLContentSink::IsTimeToNotify() { ... LL_I2L(interval, mNotificationInterval); Should be LL_I2L(interval, GetNotificationInterval());
Great, I'm glad you found it. [s]r=attinasi on the inline fix.
I checked patch 40583 into the trunk. Dynamic interrupting of the parser is now turned on by default on the trunk. To test this, try loading a large document such as http://lxr.mozilla.org/seamonkey/find?string=nsCSSFrameConstructor.cpp. As you move the mouse to grab the scrollbar or make a menu selection Mozilla should be extremely responsive. If you stop moving the mouse Mozilla will go back to the defaults to enable fast page load.
Whiteboard: Checked in a fix, but it is disabled. To enable the fix add user_pref("content.interrupt.parsing", true); to pref.js, → Checked in a fix to the trunk
I ran I-BENCH with a release build on WINNT 750Mhz CableModem and the times were unchanged with the dynamic switching. However, the times are slower with dynamic switching turned on if the mouse pointer is inside the window when the I-BENCH tests are run. This is because windows generates a mouse move event when a new window is constructed under the mouse pointer, this mouse move event causes the lower content.notification.interval and max.token.processing intervals to be used for 3/4 of second when the page begins loading. I can improve the performance on I-BENCH when the pointer is in the window by increasing the lower values returned for content.notification.interval and max.token.processing intervals in the GetNotificationInterval() and GetMaxTokenProcessingTime(). However, Doing this makes Mozilla less interactive during long page loads. Suppressing the mouse move events generated when the mouse just happens to be over a window when it is created would be a better solution.
cc: saari for the mouse event on page load issue.
3/4 second added on to the total i-bench time, or to the per-run time?
With the latest patch installed there isn't any impact on I-Bench load times. The 3/4 of second is how long the dynamic switching waits after a user-event before it switches to lower settings to improve user responsiveness. If another user-event occurs within 3/4 of second of the previous user event it will stick with the lower settings. If 3/4 of second goes by without a user event it switches back to the higher settings which maximimizes page load performance. The problem I was having with I-BENCH was that when each page was loaded and the mouse pointer was in the window it was jumping to the lower settings because the creation of the window was generating a mouse move event. It should be sticking with higher settings which maximize page load I wasn't actually moving the mouse or typing. The lastest patch waits 2 seconds after the beginning of the document load before it looks at the user event time when determining if it should jump to the lower setting. This eliminates the jumping to the lower value when Mozilla cycles through pages. It also eliminates the jumping to the lower value immediately after the user hits return in the URL bar, or clicks on a link if it results in the loading of a new document.
I'd like to suggest that the comment be reworded a little (it is a confusing sentance). From: + // should be used. but only consider using the lower + // value if the document has already been loading + // for 2 seconds. 2 seconds was chosen because it is + // greater than the default 3/4 of second that is used + // to determine when to switch between the modes and it + // gives the document a little time to create windows + // result in user events because on some systems creating + // a window under the mouse pointer will generate a + // mouse move event. to + // should be used. But only consider using the lower + // value if the document has already been loading + // for 2 seconds. 2 seconds was chosen because it is + // greater than the default 3/4 of second that is used + // to determine when to switch between the modes and it + // gives the document a little time to create windows. + // This is important because on some systems (Windows, + // for example) when a window is created and the mouse + // is over it, a mouse move event is sent, which will kick + // us into interactive mode otherwise. It also supresses + // reaction to pressing the ENTER key in the URL bar... Or something like that. Anyway, the information is there, I think you just had a typo or something. Also, since the value 2 seconds is really related to the 3/4 second default value (2X + fuzz-factor), why not make them consts or defines and put the relationship into the code (in the odd event that the default interval is changed, the wait-time would be changed automatically that way). Not critical, but it seems cleaner to me. sr=attinasi.
r=harishd
Checked in patch 40719 into the trunk. I-BENCH scores should be back to what they were previously.
If you want to bring the Mozilla0.9.2 branch up to date with the patches I've installed on the trunk apply patches 40583 and 40719
Assignee: kmcclusk → harishd
Status: ASSIGNED → NEW
Reassigning this bug to Harish since I will be gone until Aug 29.
If you encounter problems and you want to turn off the interrupting of the parser as the default: change line 2509 in nsHTMLContentSink.cpp from: PRBool enableInterruptParsing = PR_TRUE; to be PRBool enableInterruptParsing = PR_FALSE;
adding vtrunk keyword to indicate the bug has been fixed on the trunk. Bug left open for tracking checkin to branch (nsbranch) when appropriate. Once bug has been fixed on the branch also, pls remove vtrunk keyword.
Keywords: vtrunk
This is a high impact fix for user experience during long page loads. It has also been on the trunk for a week now. Marking nsBranch+.
Whiteboard: Checked in a fix to the trunk → [nsBranch+] Checked in a fix to the trunk
Ran browser buster, with build ID 2001070604, and did not see anything abnormal.
Verified FIXED on today's trunk builds on Win2k and Linux. I checked this by comparing the interactivity of the application while loading nsCSSFrameConstructor.cpp via LXR.
Whiteboard: [nsBranch+] Checked in a fix to the trunk → [nsBranch+][fixed and verified on the trunk]
Whiteboard: [nsBranch+][fixed and verified on the trunk] → [pdt+][nsBranch+][fixed and verified on the trunk]
Checked into branch on behalf of Harish. Marking fixed.
Really marking fixed.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
chris - pls verify this on the branch builds. Thanks.
Keywords: vtrunkvbranch
Marking verified with the July 16th branch builds (20010716) with Linux Redhat, Windows ME, and Mac OS 9.1.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: