Closed Bug 51938 Opened 24 years ago Closed 21 years ago

Getting the frame for a window coordinate is inefficient (GetFrameForPoint)

Categories

(Core :: Layout: Block and Inline, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED

People

(Reporter: sfraser_bugs, Assigned: roc)

References

()

Details

(Keywords: perf, Whiteboard: [dogfood+])

Attachments

(3 files, 4 obsolete files)

Select and mouseover code require that we get the nsFrame that 'owns' a point on the screen an awful lot, and the process of getting the correct frame is very inefficient. Quantify data for selecting text in large documents (e.g. LXR blame documents) show that nsFrame::GetFrameForPoint() is being called hundreds of thousands of times for each move or click event that comes in, and this takes considerable time. Even processing mouse move events on the sample page (see URL) eats up 100% of the CPU time on a Windoze box. This is a serious performance problem for selection.
See bug 49772 which describes the selection problems to which this bug contribute.
Keywords: dogfood, perf
Blocks: 49772
marking [dogfood+] per karnaze.
Status: NEW → ASSIGNED
Whiteboard: [dogfood+]
Adding kin@netscape.com to cc list.
One way to solve this would be to keep line boxes sorted by their y-coordinate once we passed a threshold (say, 100 line boxes in a block frame). In this case, we'd use the sorted data structure to find reasonable points to start and stop searching for a frame that contains the point. Does this seem reasonable?
If you do something like that you have to be careful of line boxes that contain blocks that contain floats or absolutely positioned elements (? - they may not be a problem b/c of views) or blocks with negative margins.
Completely off the wall idea. As we render into the back buffer, or to the screen, whatever, we keep a map for each pixel and what frame rendered it. A classic memory vs speed trade off. Probably a post 6.0 thing, but I thought I'd throw it out there.
For *each pixel*? Isn't that rather a lot of memory?
You don't have to keep an entry for each and every pixel, just regions/rects. And even if you did, 32bits per address is the same as the 32bits per pixel, so it is just like having a second buffer. Is that a lot of memory? Yes. In compairison to our footprint? No comment. The idea is that you can make a much smaller search space because you already know what frame a pixel corresponds to since you did the work to put it there in the first place.
It looks like nsContrainerFrame::GetFrameForPointUsing() does most the heavy lifting, and it does a simple depth-first search. If instead of this, it tried to be smart about which child to examine, it might speed up considerably. We know the bounding rect of each child, and I think we have a flag on each child that says whether it's children all live within that rect or whether they can extend outside of it (which I think it rare.) So, we should be able to make a good guess about which child is likely to contain the point, rather than doing the blind tree walk. It might be reasonable to keep an auxillary data structure to facilitate this, but offhand I can't think of anything that adds considerable value to the existing frame tree. The space manager exists to map regions to data, so if we wanted to try saari's suggestion, the space manager could be the foundation for mapping regions to frames.
We already use the flag that says whether frames have outside children. See, for example, nsContainerFrame::GetFrameForPoint, which begins: 244 PRBool inThisFrame = mRect.Contains(aPoint); 245 246 if (! ((mState & NS_FRAME_OUTSIDE_CHILDREN) || inThisFrame ) ) { 247 return NS_ERROR_FAILURE; 248 }
Yo, Chris: if you're busy, I'm happy to take this one from your plate. I've done this code before in prior layout systems.
go for it.
Assignee: waterson → rickg
Status: ASSIGNED → NEW
Ooh-wee! Let's see what I can do...
Status: NEW → ASSIGNED
Buster and I discussed the frame system (again), and concluded that there may be cases where a child is floated, extending beyond it's parent/grandparent. There are bits (which may not be getting propagated) which indicate this case. In a nutshell -- it's very possible that layout isn't yet smart enough for a solution based on this state to succeed. Therefore, I'm proceeding with a sneaky hack (I can sense the snide remarks forming already). I'm going to build a tiny little MRU frame cache, upon the premise that there is locality of scope to selection and targeting events. The hit rate will be a function of 1) cache size; 2) locality of event. I'll post more after I've tried it.
The bit that says a frame has outside children *does* work. If it didn't, there would be a lot of cases where events didn't get through, because we use that bit now. (What we don't know is the extent of the overflow.)
Trying cache now...
The simple cache approach looks promising so far (on visual tests); I'm trying to run quanity now to prove the performance improvement.
*** Bug 53676 has been marked as a duplicate of this bug. ***
Add me to cc: This impacts me in a very noticable way when a <textarea> multiline textfield has LOTS of lines in it. Adding a printf("blah"); to nsFrame::GetFrameForPoint() showed it just going through time and again and again and again and again...
Quantify shows a 2x speedup in the most trivial case. It's my hope that we'll do even better on more complex pages (just guessing though). I've handed the (rather simple) diffs off to attinasi to review and stress test (he's more familiar with frame code then I am).
Assignee: rickg → attinasi
Status: ASSIGNED → NEW
Accepting: I'll be testing Rick's fixes for starters.
Status: NEW → ASSIGNED
I have the code changes that Rick put together, however I may not get to this very soon due to higer prioroty RTM+ bugs in my list, so if somebody else has time and wants to shepard this through, please let me know and I'll happliy send the package over to you to finish testing and shepard throught the approval / checkin process.
'Cause he forced me, back to Rick.
Assignee: attinasi → rickg
Status: ASSIGNED → NEW
My TEXTAREA example seems to be MUCH improved since my last post (tried with current tree sources). I have not tried the printf() debug again, but I would almost dare to say my problem (as observed) has been fixed. Have any changes landed from this bug ? Status anyone ? Has anyone else tried their test set lately ?
check url from bug 52005, "poor performance scrolling large plain text files": ftp://ftp.funet.fi/pub/doc/dictionaries/Finnish/words.finnish on this page, mouse movements generate an _extraordinary_ amount of GetFrameForPoint() calls. i put a printf in that function to see how often it gets called, and unlike on other pages i tried, on this large text file, xterm can't even keep up. you really want to check that out, it's quite drastic.... :I
This bug was marked to be fixed in a previous milestone but it didn't get fixed properly. Nominated for beta1.
Keywords: nsbeta1
perf issues bubbling to the top. the mru cache never got checked in,and unfortunately, rickg and marc both seem to have lost the code. :( so I'm starting over with that idea
Assignee: rickg → buster
Severity: normal → major
OS: Mac System 8.5 → All
Priority: P3 → P2
Target Milestone: --- → mozilla0.9
In the future, please to be attaching the proposed patches to the bug report.
that's ok, my new implementation is undoubtedly better than whatever hack those two clowns cobbled together :) fix to be attached soon. testing and measuring now...
Status: NEW → ASSIGNED
I have the cache for GetFrameForPoint(). Attaching diff. attinasi, please review. waterson, please sr. still a work in progress, but it is currently good enough to remove GetFrameForPoint() as an issue in perf measurements.
Attached patch diffs for new cache (obsolete) (deleted) — Splinter Review
Could you attach unified diffs?
What dbaron said! ;-) From what I could read... - In the MRU cache's Search() method, don't you want to move the `found frame' to the front of the cache? - How did you arrive at `20' for the magic number? Seems a bit high to me, mostly because that's a lot of deque entries to grovel through on a miss...
the diffs I've attached greatly reduce the cost for a hit. With a cache of size 10 (approximately 200 bytes), I consistently get a hit rate of 50-60% for simple mouse and selection actions over text and standard content blocks. But, this gives us no improvement whatsoever in the case where the point is over the "dead space" of a container. That is, when the point is not over a leaf frame, the cache is useless.
Summary: Getting the frame for a window coordinate is inefficient → Getting the frame for a window coordinate is inefficient (GetFrameForPoint)
So does this cache make the assumption that if a point is within a cached leaf frame, then that leaf frame is the correct result? That's a bad assumption.
about to attach -u diffs. concerning Chris' questions: - In the MRU cache's Search() method, don't you want to move the `found frame' to the front of the cache? The cache as a FIFO queue, and the "found frame" is being put at the "back", so that it will live the longest. Is that what you meant? You can verify that by running MRUFrameCache::SelfTest() and looking at the console output. For example: added first MRU_FRAME_CACHE_SIZE items item 0 = 00000001 (0, 0, 10, 10) item 1 = 00000002 (0, 10, 10, 10) item 2 = 00000003 (0, 20, 10, 10) item 3 = 00000004 (0, 30, 10, 10) item 4 = 00000005 (0, 40, 10, 10) item 5 = 00000006 (0, 50, 10, 10) item 6 = 00000007 (0, 60, 10, 10) item 7 = 00000008 (0, 70, 10, 10) item 8 = 00000009 (0, 80, 10, 10) item 9 = 0000000A (0, 90, 10, 10) added n+1th item item 0 = 00000002 (0, 10, 10, 10) item 1 = 00000003 (0, 20, 10, 10) item 2 = 00000004 (0, 30, 10, 10) item 3 = 00000005 (0, 40, 10, 10) item 4 = 00000006 (0, 50, 10, 10) item 5 = 00000007 (0, 60, 10, 10) item 6 = 00000008 (0, 70, 10, 10) item 7 = 00000009 (0, 80, 10, 10) item 8 = 0000000A (0, 90, 10, 10) item 9 = 0000000B (0, 100, 10, 10) added n+2th item item 0 = 00000003 (0, 20, 10, 10) item 1 = 00000004 (0, 30, 10, 10) item 2 = 00000005 (0, 40, 10, 10) item 3 = 00000006 (0, 50, 10, 10) item 4 = 00000007 (0, 60, 10, 10) item 5 = 00000008 (0, 70, 10, 10) item 6 = 00000009 (0, 80, 10, 10) item 7 = 0000000A (0, 90, 10, 10) item 8 = 0000000B (0, 100, 10, 10) item 9 = 0000000C (0, 110, 10, 10) - How did you arrive at `20' for the magic number? Seems a bit high to me, mostly because that's a lot of deque entries to grovel through on a miss... Yeah, the comment by the number says: #define MRU_FRAME_CACHE_SIZE 10 // fairly arbitrary, needs to be tuned Notice I have it set at 10 now. It's easy to fool around with the markup and the user action (this is, after all, an optimization primarily targeted at speeding up interactive mouse events) to get cache hit rates all over the board. By far, the biggest payoff is at cachesize=1, since mouse move events come in fast enough for us to see a hit rate of nearly 50% based on just remembering the latest hit. That's true when we consider only mouse events over leaf elements, see my previous comments for the problems of "dead space" points.
David: I haven't yet found a case where it isn't true. But I'm very early on in my testing. Please elaborate, what case are you thinking about? Positioned elements?
buster: typically an MRU algorithm will promote items that it finds in the cache so that they tend to stay in the cache. I don't think that Search() does this, does it?
no, not yet. maybe never, it might be that FIFO is good enough. still testing. good catch, though. I should have documented that, especially given the name I chose for the class!
Just for reference, bug 26785 used a similar testcase.
this cache works great for simple documents. as david points out, it fails miserably for documents with overlapping elements, as may be caused by negative margins or positioned elements. as a first step, I will look into what it would take to determine whether the document contains any of these conditions, basically a bool that says whether to use the cache nor not. The most expediant way to know this might be a hook in the style system, so if the style system ever resolves a rule that might cause an overlap, we just don't use the cache (regardless of whether that rule is ever actually ever used by an instantiated element.) This would give us the performance boost we're looking for on the vast majority of pages in the short term, and we could investigate alternatives post-6.5.
Another approach to speeding up GetFrameForPoint might be to add a state bit on nsLineBox indicating whether there were overflowing children and then overriding GetFrameForPointUsing on nsBlockFrame (for the aList==null case, and use the inherited one for the others). This would avoid descending into 99%+ of the lines in cases with a single block with a huge child list (like the one above), which are probably the only cases where we have significant GetFrameForPoint performance problems.
taking
Target Milestone: mozilla0.9 → mozilla0.9.1
really taking.
Assignee: buster → waterson
Status: ASSIGNED → NEW
Keywords: patch
Status: NEW → ASSIGNED
I wonder if there's something weird going on in the test case for this bug, like every frame having 6 twips overflow...
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Target Milestone: mozilla0.9.2 → mozilla1.0
Blocks: 104166
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 (you can query for this string to delete spam or retrieve the list of bugs I've moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
Attachment #21692 - Attachment is obsolete: true
Target Milestone: mozilla1.0.1 → Future
*** Bug 235517 has been marked as a duplicate of this bug. ***
roc had some ideas about this in bug 235517: In particular, can't we make nsBlockFrame::GetFrameForPoint to iterate through the lines, scanning the frames within a line only if the point is inside the line's combinedArea? I bet that would speed things up a lot. Actually I'm sure this would work because that's how nsBlockFrame::Paint works. If we were really cool we would keep a bit in each nsBlockFrame, and set it to true if we detect that the sequence of 'line->combinedArea.y's is nondecreasing and the sequence of 'line->combinedArea.YMost's is nondecreasing. We would also cache somewhere a pointer to the most recently accessed line. Then we can quickly find the complete set of line(s) for a given point by starting at the cached pointer and searching forwards or backwards. I remember at one time thinking this could speed up painting on large pages too. [editorial note -- the painting issue is bug 52005]
Blocks: 52005
Component: Layout → Layout: Block and Inline
Priority: P2 → --
Target Milestone: Future → ---
Attached patch Implement by-line event dispatch (obsolete) (deleted) — Splinter Review
This implements the first part of my suggestion. It's quite straightforward and seems to work correctly. Any suggestions for how to measure the performance effect of this?
BTW, a note about the cursor idea I proposed in bug 52005 and here: It's probably not hard to do. We make the line cursor a frame property. We keep a state bit that tells us quickly whether we have such a property. When there is no property, Paint or GetFrameForPointUsing iterate through all lines, just like they do now. During this iteration they also check whether the line boxes combinedArea.ys and combinedArea.YMosts are nondecreasing. If so, and there are many lines (say more than 50), then they set the line cursor property to some value, say the first line. right?) When the property is available (and thus the nondecreasing invariants hold), Paint and GetFrameForPointUsing will locate the lines they're interested in based on coordinates by searching from the cursor forward or backward and storing the last line they touch back into the cursor property. Reflow just always clears the property. (We can't create or destroy lines, or change their combinedareas, except during reflow and frame teardown, right?) This would make for fairly simple code and I think negligible space and time overhead.
I just tried this patch on the testcase in bug 235517 (seeing how many profiler hits I got in a minute of moving the mouse, so it's kinda crude). I see at best a 10% improvement and the cpu still gets pegged when I move the mouse; with your patch 80% of the profiler hits are in (not under) nsBlockFrame::GetFrameForPointUsing. So the real problem is still the O(N) loop over all lines....
Attached patch line cursor cache (obsolete) (deleted) — Splinter Review
This patch implements the previous stuff plus adds the line cursor cache that I suggested. Seems to work OK once I explicitly skipped lines with empty combinedAreas when determining whether the invariant holds. My tests seem to show that it's effective on the Postgres manual testcase; instead of scanning >18K lines in the main DIV on every mouse move, we usually scan 3. bz, would you mind checking out the profile performance of this one? If it looks good I'll finish the patch by using/setting the cursor in nsBlockFrame::PaintChildren.
Assignee: waterson → roc
Attachment #142397 - Attachment is obsolete: true
With that patch on the testcase in bug 235517 (same test as in comment 53 -- 1 minute wall-clock time) I see a factor of 6 reduction in the number of profiler hits (since this is a non-realtime profile, that's actually meaningful). The top of the flat profile now looks like: Total hit count: 322 Count %Total Function Name 10 3.1 nsXULElement::HandleDOMEvent 8 2.5 nsQueryInterface::operator() 7 2.2 __libc_poll 6 1.9 SearchTable 5 1.6 _end 5 1.6 __pthread_alt_unlock 5 1.6 nsRuleNode::GetStyleData and there are only 4 total hits under nsBlockFrame::GetFrameForPoint. So certainly looks much better. ;) I haven't looked at the patch itself; just applied it blindly and profiled. Once we have a finalized patch, we should file some followup bugs on what's left after this one is fixed -- of those 322 hits 154 are handling mousing on and off anchors (and 133 of those are updating the status bar!).
Excellent. I'll finish the patch shortly. Will we really need to optimize this further? If you're waving the mouse around I'd expect to spend some time processing events and updating the status bar.
Sure. There are just some issues that I wanted to look at (like the fact that setting the status flushes pending notifications). Probably not a big deal either way... ;)
Attached patch v2 (obsolete) (deleted) — Splinter Review
Here's the version which optimizes painting as well.
Comment on attachment 143083 [details] [diff] [review] v2 let me know if you can handle this review
Attachment #143083 - Flags: superreview?(bzbarsky)
Attachment #143083 - Flags: review?(bzbarsky)
Note: if we ever find important test cases where this doesn't work because the line combinedArea.ys/ymosts decrease at some non-empty line (e.g., because some line has some relative-positioned stuff), then we can still save the day. One way would be to actually grow the line combined areas as necessary to force that invariant to hold.
*** Bug 52005 has been marked as a duplicate of this bug. ***
roc, I'm not sure I understand how lines, blockframes, out-of-flows all interact well enough to review this.... I could probably sr, but dbaron should probably do the r=.
Comment on attachment 143083 [details] [diff] [review] v2 no problem. reassigning to dbaron.
Attachment #143083 - Flags: superreview?(dbaron)
Attachment #143083 - Flags: superreview?(bzbarsky)
Attachment #143083 - Flags: review?(dbaron)
Attachment #143083 - Flags: review?(bzbarsky)
Comment on attachment 143083 [details] [diff] [review] v2 OK, I did a little self-improvement... (and asked roc what was up with the check for an empty combined area ;) ) >Index: layout/html/base/src/nsBlockFrame.cpp > nsBlockFrame::PaintChildren(nsIPresContext* aPresContext, >+ for (line_iterator line = mLines.begin(cursor); >+ line != line_end; Weird indent there. >+ if (lineArea.Intersects(aDirtyRect)) { This if/else statement is duplicated in the "no cursor" case. Perhaps it should move into a helper function? That can be a nice inline function and all if we want, but it's bad to have identical code in two different spots... >+ for (line_iterator line = begin_lines(); >+ line != line_end; Weird indent. >+ if (lineArea.y < lastY >+ || lineArea.YMost() < lastYMost) { >+ nonDecreasingYs = PR_FALSE; >+ } >+ lastY = lineArea.y; >+ lastYMost = lineArea.YMost(); I assume there is no really good reason to wrap all that in a check that nonDecreasingYs is still true since in most cases it _will_ be true and all that is pretty cheap, right? >+nsLineBox* nsBlockFrame::GetFirstLineContaining(nscoord y) { This seems to assume that "y" is within the YMost of one of our lines (hence that it's in our rect). We should probably assert that up front. >+nsBlockFrame::GetFrameForPointUsing(nsIPresContext* aPresContext, >+ nsIFrame *hit; >+ nsPoint tmp; Move declarations to first use? Eg move the decl of tmp down to where the MoveTo is right now? >+ if (lineArea.Contains(tmp)) { Again, this block is identical to the one in the "no cursor" case... >+ PRBool nonDecreasingYs = PR_TRUE; I really wonder whether we can pull the logic for all this out into a function that will get that inner bit (the part starting with if (lineArea.Contains(tmp)) as a function pointer param or something.... I guess that's hard because here we actually need to return a value. <sigh>. >Index: layout/html/base/src/nsBlockFrame.h >+ // Get the lines containing y-coord 'y', or nsnull if you must search >+ // all lines. If nonnull is returned then we guarantee that the lines' >+ // combinedArea.ys and combinedArea.yMosts are non-decreasing. >+ nsLineBox* GetFirstLineContaining(nscoord y); Comment should probably say "Get the first line containing..." r+sr=bzbarsky with the changes to factor out shared code and other nits addressed.
Attachment #143083 - Flags: superreview?(dbaron)
Attachment #143083 - Flags: superreview+
Attachment #143083 - Flags: review?(dbaron)
Attachment #143083 - Flags: review+
One other thing. The line >+const int MIN_LINES_NEEDING_CURSOR = 20; In that patch is inside a DEBUG ifdef. I don't think you want that. I _do_ think you want to make that a static const int, though.
> I assume there is no really good reason to wrap all that in a check that > nonDecreasingYs is still true since in most cases it _will_ be true and all > that is pretty cheap, right? That's my guess. > This seems to assume that "y" is within the YMost of one of our lines (hence > that it's in our rect). We should probably assert that up front. That's not necessarily true (e.g., a DIV with CSS width and height but a small number of small lines), but it doesn't matter. If y is below the YMost of the last line, we'll leave the cursor at the last line and return that. The loops that use the cursor will still work fine. I'll comment GetFirstLineContaining to indicate that sometimes it may return the last line even if that line doesn't contain the y. > I really wonder whether we can pull the logic for all this out into a function > that will get that inner bit (the part starting with if > (lineArea.Contains(tmp)) as a function pointer param or something.... I guess > that's hard because here we actually need to return a value. <sigh>. Yeah, that's a bit of a hassle. Also lots of parameters need to be threaded down into the enclosed code. My inline functions that I created to extract the common inner code are PaintLine with 8 parameters and GetFrameFromLine with 6 parameters. I'll attach the patch for checkin.
Attached patch patch to check in (deleted) — Splinter Review
Attachment #143083 - Attachment is obsolete: true
roc, see comment 66....
Oops, thanks for that catch. Fix checked in :-)
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: