Bugzilla

Comment 2

•

24 years ago

marking [dogfood+] per karnaze.

Status: NEW → ASSIGNED

Whiteboard: [dogfood+]

kinmoz

Comment 3

•

24 years ago

Adding kin@netscape.com to cc list.

Comment 4

•

24 years ago

One way to solve this would be to keep line boxes sorted by their y-coordinate once we passed a threshold (say, 100 line boxes in a block frame). In this case, we'd use the sorted data structure to find reasonable points to start and stop searching for a frame that contains the point. Does this seem reasonable?

Simon Fraser [no longer active]

Comment 5

•

24 years ago

If you do something like that you have to be careful of line boxes that contain blocks that contain floats or absolutely positioned elements (? - they may not be a problem b/c of views) or blocks with negative margins.

saari (gone)

Comment 6

•

24 years ago

Completely off the wall idea. As we render into the back buffer, or to the screen, whatever, we keep a map for each pixel and what frame rendered it. A classic memory vs speed trade off. Probably a post 6.0 thing, but I thought I'd throw it out there.

Reporter

Comment 7

•

24 years ago

For *each pixel*? Isn't that rather a lot of memory?

saari (gone)

Comment 8

•

24 years ago

You don't have to keep an entry for each and every pixel, just regions/rects. And even if you did, 32bits per address is the same as the 32bits per pixel, so it is just like having a second buffer. Is that a lot of memory? Yes. In compairison to our footprint? No comment. The idea is that you can make a much smaller search space because you already know what frame a pixel corresponds to since you did the work to put it there in the first place.

Comment 9

•

24 years ago

It looks like nsContrainerFrame::GetFrameForPointUsing() does most the heavy lifting, and it does a simple depth-first search. If instead of this, it tried to be smart about which child to examine, it might speed up considerably. We know the bounding rect of each child, and I think we have a flag on each child that says whether it's children all live within that rect or whether they can extend outside of it (which I think it rare.) So, we should be able to make a good guess about which child is likely to contain the point, rather than doing the blind tree walk. It might be reasonable to keep an auxillary data structure to facilitate this, but offhand I can't think of anything that adds considerable value to the existing frame tree. The space manager exists to map regions to data, so if we wanted to try saari's suggestion, the space manager could be the foundation for mapping regions to frames.

Comment 10

•

24 years ago

We already use the flag that says whether frames have outside children. See, for example, nsContainerFrame::GetFrameForPoint, which begins: 244 PRBool inThisFrame = mRect.Contains(aPoint); 245 246 if (! ((mState & NS_FRAME_OUTSIDE_CHILDREN) || inThisFrame ) ) { 247 return NS_ERROR_FAILURE; 248 }

Comment 11

•

24 years ago

Yo, Chris: if you're busy, I'm happy to take this one from your plate. I've done this code before in prior layout systems.

Comment 12

•

24 years ago

go for it.

Assignee: waterson → rickg

Status: ASSIGNED → NEW

Comment 13

•

24 years ago

Ooh-wee! Let's see what I can do...

Status: NEW → ASSIGNED

Comment 14

•

24 years ago

Buster and I discussed the frame system (again), and concluded that there may be cases where a child is floated, extending beyond it's parent/grandparent. There are bits (which may not be getting propagated) which indicate this case. In a nutshell -- it's very possible that layout isn't yet smart enough for a solution based on this state to succeed. Therefore, I'm proceeding with a sneaky hack (I can sense the snide remarks forming already). I'm going to build a tiny little MRU frame cache, upon the premise that there is locality of scope to selection and targeting events. The hit rate will be a function of 1) cache size; 2) locality of event. I'll post more after I've tried it.

Comment 15

•

24 years ago

The bit that says a frame has outside children *does* work. If it didn't, there would be a lot of cases where events didn't get through, because we use that bit now. (What we don't know is the extent of the overflow.)

Comment 16

•

24 years ago

Trying cache now...

Comment 17

•

24 years ago

The simple cache approach looks promising so far (on visual tests); I'm trying to run quanity now to prove the performance improvement.

Arthur Barrett

Comment 18

•

24 years ago

*** Bug 53676 has been marked as a duplicate of this bug. ***

Arthur Barrett

Comment 19

•

24 years ago

Add me to cc: This impacts me in a very noticable way when a <textarea> multiline textfield has LOTS of lines in it. Adding a printf("blah"); to nsFrame::GetFrameForPoint() showed it just going through time and again and again and again and again...

Comment 20

•

24 years ago

Quantify shows a 2x speedup in the most trivial case. It's my hope that we'll do even better on more complex pages (just guessing though). I've handed the (rather simple) diffs off to attinasi to review and stress test (he's more familiar with frame code then I am).

Assignee: rickg → attinasi

Status: ASSIGNED → NEW

Marc Attinasi

Comment 21

•

24 years ago

Accepting: I'll be testing Rick's fixes for starters.

Status: NEW → ASSIGNED

Marc Attinasi

Comment 22

•

24 years ago

I have the code changes that Rick put together, however I may not get to this very soon due to higer prioroty RTM+ bugs in my list, so if somebody else has time and wants to shepard this through, please let me know and I'll happliy send the package over to you to finish testing and shepard throught the approval / checkin process.

Marc Attinasi

Comment 23

•

24 years ago

'Cause he forced me, back to Rick.

Assignee: attinasi → rickg

Status: ASSIGNED → NEW

Arthur Barrett

Comment 24

•

24 years ago

My TEXTAREA example seems to be MUCH improved since my last post (tried with current tree sources). I have not tried the printf() debug again, but I would almost dare to say my problem (as observed) has been fixed. Have any changes landed from this bug ? Status anyone ? Has anyone else tried their test set lately ?

janne hayrynen

Comment 25

•

24 years ago

check url from bug 52005, "poor performance scrolling large plain text files": ftp://ftp.funet.fi/pub/doc/dictionaries/Finnish/words.finnish on this page, mouse movements generate an _extraordinary_ amount of GetFrameForPoint() calls. i put a printf in that function to see how often it gets called, and unlike on other pages i tried, on this large text file, xterm can't even keep up. you really want to check that out, it's quite drastic.... :I

gerardok

Comment 26

•

24 years ago

This bug was marked to be fixed in a previous milestone but it didn't get fixed properly. Nominated for beta1.

Keywords: nsbeta1

Comment 27

•

24 years ago

perf issues bubbling to the top. the mru cache never got checked in,and unfortunately, rickg and marc both seem to have lost the code. :( so I'm starting over with that idea

Assignee: rickg → buster

Severity: normal → major

OS: Mac System 8.5 → All

Priority: P3 → P2

Target Milestone: --- → mozilla0.9

Comment 28

•

24 years ago

In the future, please to be attaching the proposed patches to the bug report.

Comment 29

•

24 years ago

that's ok, my new implementation is undoubtedly better than whatever hack those two clowns cobbled together :) fix to be attached soon. testing and measuring now...

Status: NEW → ASSIGNED

Comment 30

•

24 years ago

I have the cache for GetFrameForPoint(). Attaching diff. attinasi, please review. waterson, please sr. still a work in progress, but it is currently good enough to remove GetFrameForPoint() as an issue in perf measurements.

Comment 31

•

24 years ago

Attached patch diffs for new cache (obsolete) (deleted) — Details — Splinter Review

Comment 32

•

24 years ago

Could you attach unified diffs?

Comment 33

•

24 years ago

What dbaron said! ;-) From what I could read... - In the MRU cache's Search() method, don't you want to move the `found frame' to the front of the cache? - How did you arrive at `20' for the magic number? Seems a bit high to me, mostly because that's a lot of deque entries to grovel through on a miss...

Comment 34

•

24 years ago

the diffs I've attached greatly reduce the cost for a hit. With a cache of size 10 (approximately 200 bytes), I consistently get a hit rate of 50-60% for simple mouse and selection actions over text and standard content blocks. But, this gives us no improvement whatsoever in the case where the point is over the "dead space" of a container. That is, when the point is not over a leaf frame, the cache is useless.

Summary: Getting the frame for a window coordinate is inefficient → Getting the frame for a window coordinate is inefficient (GetFrameForPoint)

Comment 35

•

24 years ago

So does this cache make the assumption that if a point is within a cached leaf frame, then that leaf frame is the correct result? That's a bad assumption.

Comment 36

•

24 years ago

about to attach -u diffs. concerning Chris' questions: - In the MRU cache's Search() method, don't you want to move the `found frame' to the front of the cache? The cache as a FIFO queue, and the "found frame" is being put at the "back", so that it will live the longest. Is that what you meant? You can verify that by running MRUFrameCache::SelfTest() and looking at the console output. For example: added first MRU_FRAME_CACHE_SIZE items item 0 = 00000001 (0, 0, 10, 10) item 1 = 00000002 (0, 10, 10, 10) item 2 = 00000003 (0, 20, 10, 10) item 3 = 00000004 (0, 30, 10, 10) item 4 = 00000005 (0, 40, 10, 10) item 5 = 00000006 (0, 50, 10, 10) item 6 = 00000007 (0, 60, 10, 10) item 7 = 00000008 (0, 70, 10, 10) item 8 = 00000009 (0, 80, 10, 10) item 9 = 0000000A (0, 90, 10, 10) added n+1th item item 0 = 00000002 (0, 10, 10, 10) item 1 = 00000003 (0, 20, 10, 10) item 2 = 00000004 (0, 30, 10, 10) item 3 = 00000005 (0, 40, 10, 10) item 4 = 00000006 (0, 50, 10, 10) item 5 = 00000007 (0, 60, 10, 10) item 6 = 00000008 (0, 70, 10, 10) item 7 = 00000009 (0, 80, 10, 10) item 8 = 0000000A (0, 90, 10, 10) item 9 = 0000000B (0, 100, 10, 10) added n+2th item item 0 = 00000003 (0, 20, 10, 10) item 1 = 00000004 (0, 30, 10, 10) item 2 = 00000005 (0, 40, 10, 10) item 3 = 00000006 (0, 50, 10, 10) item 4 = 00000007 (0, 60, 10, 10) item 5 = 00000008 (0, 70, 10, 10) item 6 = 00000009 (0, 80, 10, 10) item 7 = 0000000A (0, 90, 10, 10) item 8 = 0000000B (0, 100, 10, 10) item 9 = 0000000C (0, 110, 10, 10) - How did you arrive at `20' for the magic number? Seems a bit high to me, mostly because that's a lot of deque entries to grovel through on a miss... Yeah, the comment by the number says: #define MRU_FRAME_CACHE_SIZE 10 // fairly arbitrary, needs to be tuned Notice I have it set at 10 now. It's easy to fool around with the markup and the user action (this is, after all, an optimization primarily targeted at speeding up interactive mouse events) to get cache hit rates all over the board. By far, the biggest payoff is at cachesize=1, since mouse move events come in fast enough for us to see a hit rate of nearly 50% based on just remembering the latest hit. That's true when we consider only mouse events over leaf elements, see my previous comments for the problems of "dead space" points.

Comment 37

•

24 years ago

David: I haven't yet found a case where it isn't true. But I'm very early on in my testing. Please elaborate, what case are you thinking about? Positioned elements?

Comment 38

•

24 years ago

buster: typically an MRU algorithm will promote items that it finds in the cache so that they tend to stay in the cache. I don't think that Search() does this, does it?

Comment 39

•

24 years ago

no, not yet. maybe never, it might be that FIFO is good enough. still testing. good catch, though. I should have documented that, especially given the name I chose for the class!

Comment 40

•

24 years ago

Attached file a testcase where that assumption is not valid (deleted) — Details

Comment 41

•

24 years ago

Attached patch better diffs, -u, more comments and some fixes from feedback (not dbaron's yet) (deleted) — Details — Splinter Review

Comment 42

•

24 years ago

Just for reference, bug 26785 used a similar testcase.

Comment 43

•

24 years ago

this cache works great for simple documents. as david points out, it fails miserably for documents with overlapping elements, as may be caused by negative margins or positioned elements. as a first step, I will look into what it would take to determine whether the document contains any of these conditions, basically a bool that says whether to use the cache nor not. The most expediant way to know this might be a hook in the style system, so if the style system ever resolves a rule that might cause an overlap, we just don't use the cache (regardless of whether that rule is ever actually ever used by an instantiated element.) This would give us the performance boost we're looking for on the vast majority of pages in the short term, and we could investigate alternatives post-6.5.

Comment 44

•

24 years ago

Another approach to speeding up GetFrameForPoint might be to add a state bit on nsLineBox indicating whether there were overflowing children and then overriding GetFrameForPointUsing on nsBlockFrame (for the aList==null case, and use the inherited one for the others). This would avoid descending into 99%+ of the lines in cases with a single block with a huge child list (like the one above), which are probably the only cases where we have significant GetFrameForPoint performance problems.

Comment 45

•

24 years ago

taking

Target Milestone: mozilla0.9 → mozilla0.9.1

Comment 46

•

24 years ago

really taking.

Assignee: buster → waterson

Status: ASSIGNED → NEW

Keywords: patch

Updated

•

24 years ago

Status: NEW → ASSIGNED

Comment 47

•

24 years ago

I wonder if there's something weird going on in the test case for this bug, like every frame having 6 twips overflow...

Updated

•

24 years ago

Target Milestone: mozilla0.9.1 → mozilla0.9.2

Updated

•

24 years ago

Target Milestone: mozilla0.9.2 → mozilla1.0

Asa Dotzler [:asa]

Updated

•

23 years ago

Blocks: 104166

Asa Dotzler [:asa]

Comment 48

•

23 years ago

Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 (you can query for this string to delete spam or retrieve the list of bugs I've moved)

Target Milestone: mozilla1.0 → mozilla1.0.1

Christopher Hoess (gone)

Updated

•

23 years ago

Attachment #21692 - Attachment is obsolete: true

Updated

•

23 years ago

Target Milestone: mozilla1.0.1 → Future

Comment 49

•

21 years ago

*** Bug 235517 has been marked as a duplicate of this bug. ***

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 50

•

21 years ago

roc had some ideas about this in bug 235517: In particular, can't we make nsBlockFrame::GetFrameForPoint to iterate through the lines, scanning the frames within a line only if the point is inside the line's combinedArea? I bet that would speed things up a lot. Actually I'm sure this would work because that's how nsBlockFrame::Paint works. If we were really cool we would keep a bit in each nsBlockFrame, and set it to true if we detect that the sequence of 'line->combinedArea.y's is nondecreasing and the sequence of 'line->combinedArea.YMost's is nondecreasing. We would also cache somewhere a pointer to the most recently accessed line. Then we can quickly find the complete set of line(s) for a given point by starting at the cached pointer and searching forwards or backwards. I remember at one time thinking this could speed up painting on large pages too. [editorial note -- the painting issue is bug 52005]

Blocks: 52005

Component: Layout → Layout: Block and Inline

Priority: P2 → --

Target Milestone: Future → ---

Assignee

Comment 51

•

21 years ago

Attached patch Implement by-line event dispatch (obsolete) (deleted) — Details — Splinter Review

This implements the first part of my suggestion. It's quite straightforward and seems to work correctly. Any suggestions for how to measure the performance effect of this?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 52

•

21 years ago

BTW, a note about the cursor idea I proposed in bug 52005 and here: It's probably not hard to do. We make the line cursor a frame property. We keep a state bit that tells us quickly whether we have such a property. When there is no property, Paint or GetFrameForPointUsing iterate through all lines, just like they do now. During this iteration they also check whether the line boxes combinedArea.ys and combinedArea.YMosts are nondecreasing. If so, and there are many lines (say more than 50), then they set the line cursor property to some value, say the first line. right?) When the property is available (and thus the nondecreasing invariants hold), Paint and GetFrameForPointUsing will locate the lines they're interested in based on coordinates by searching from the cursor forward or backward and storing the last line they touch back into the cursor property. Reflow just always clears the property. (We can't create or destroy lines, or change their combinedareas, except during reflow and frame teardown, right?) This would make for fairly simple code and I think negligible space and time overhead.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 53

•

21 years ago

I just tried this patch on the testcase in bug 235517 (seeing how many profiler hits I got in a minute of moving the mouse, so it's kinda crude). I see at best a 10% improvement and the cpu still gets pegged when I move the mouse; with your patch 80% of the profiler hits are in (not under) nsBlockFrame::GetFrameForPointUsing. So the real problem is still the O(N) loop over all lines....

Assignee

Comment 54

•

21 years ago

Okay. thanks

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 55

•

21 years ago

Attached patch line cursor cache (obsolete) (deleted) — Details — Splinter Review

This patch implements the previous stuff plus adds the line cursor cache that I suggested. Seems to work OK once I explicitly skipped lines with empty combinedAreas when determining whether the invariant holds. My tests seem to show that it's effective on the Postgres manual testcase; instead of scanning >18K lines in the main DIV on every mouse move, we usually scan 3. bz, would you mind checking out the profile performance of this one? If it looks good I'll finish the patch by using/setting the cursor in nsBlockFrame::PaintChildren.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

21 years ago

Assignee: waterson → roc

Attachment #142397 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 56

•

21 years ago

With that patch on the testcase in bug 235517 (same test as in comment 53 -- 1 minute wall-clock time) I see a factor of 6 reduction in the number of profiler hits (since this is a non-realtime profile, that's actually meaningful). The top of the flat profile now looks like: Total hit count: 322 Count %Total Function Name 10 3.1 nsXULElement::HandleDOMEvent 8 2.5 nsQueryInterface::operator() 7 2.2 __libc_poll 6 1.9 SearchTable 5 1.6 _end 5 1.6 __pthread_alt_unlock 5 1.6 nsRuleNode::GetStyleData and there are only 4 total hits under nsBlockFrame::GetFrameForPoint. So certainly looks much better. ;) I haven't looked at the patch itself; just applied it blindly and profiled. Once we have a finalized patch, we should file some followup bugs on what's left after this one is fixed -- of those 322 hits 154 are handling mousing on and off anchors (and 133 of those are updating the status bar!).

Assignee

Comment 57

•

21 years ago

Excellent. I'll finish the patch shortly. Will we really need to optimize this further? If you're waving the mouse around I'd expect to spend some time processing events and updating the status bar.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 58

•

21 years ago

Sure. There are just some issues that I wanted to look at (like the fact that setting the status flushes pending notifications). Probably not a big deal either way... ;)

Assignee

Comment 59

•

21 years ago

Attached patch v2 (obsolete) (deleted) — Details — Splinter Review

Here's the version which optimizes painting as well.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

21 years ago

Attachment #142976 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 60

•

21 years ago

Comment on attachment 143083 [details] [diff] [review] v2 let me know if you can handle this review

Attachment #143083 - Flags: superreview?(bzbarsky)

Attachment #143083 - Flags: review?(bzbarsky)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 61

•

21 years ago

Note: if we ever find important test cases where this doesn't work because the line combinedArea.ys/ymosts decrease at some non-empty line (e.g., because some line has some relative-positioned stuff), then we can still save the day. One way would be to actually grow the line combined areas as necessary to force that invariant to hold.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 62

•

21 years ago

*** Bug 52005 has been marked as a duplicate of this bug. ***

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 63

•

21 years ago

roc, I'm not sure I understand how lines, blockframes, out-of-flows all interact well enough to review this.... I could probably sr, but dbaron should probably do the r=.

Assignee

Comment 64

•

21 years ago

Comment on attachment 143083 [details] [diff] [review] v2 no problem. reassigning to dbaron.

Attachment #143083 - Flags: superreview?(dbaron)

Attachment #143083 - Flags: superreview?(bzbarsky)

Attachment #143083 - Flags: review?(dbaron)

Attachment #143083 - Flags: review?(bzbarsky)

Comment 65

•

21 years ago

Comment on attachment 143083 [details] [diff] [review] v2 OK, I did a little self-improvement... (and asked roc what was up with the check for an empty combined area ;) ) >Index: layout/html/base/src/nsBlockFrame.cpp > nsBlockFrame::PaintChildren(nsIPresContext* aPresContext, >+ for (line_iterator line = mLines.begin(cursor); >+ line != line_end; Weird indent there. >+ if (lineArea.Intersects(aDirtyRect)) { This if/else statement is duplicated in the "no cursor" case. Perhaps it should move into a helper function? That can be a nice inline function and all if we want, but it's bad to have identical code in two different spots... >+ for (line_iterator line = begin_lines(); >+ line != line_end; Weird indent. >+ if (lineArea.y < lastY >+ || lineArea.YMost() < lastYMost) { >+ nonDecreasingYs = PR_FALSE; >+ } >+ lastY = lineArea.y; >+ lastYMost = lineArea.YMost(); I assume there is no really good reason to wrap all that in a check that nonDecreasingYs is still true since in most cases it _will_ be true and all that is pretty cheap, right? >+nsLineBox* nsBlockFrame::GetFirstLineContaining(nscoord y) { This seems to assume that "y" is within the YMost of one of our lines (hence that it's in our rect). We should probably assert that up front. >+nsBlockFrame::GetFrameForPointUsing(nsIPresContext* aPresContext, >+ nsIFrame *hit; >+ nsPoint tmp; Move declarations to first use? Eg move the decl of tmp down to where the MoveTo is right now? >+ if (lineArea.Contains(tmp)) { Again, this block is identical to the one in the "no cursor" case... >+ PRBool nonDecreasingYs = PR_TRUE; I really wonder whether we can pull the logic for all this out into a function that will get that inner bit (the part starting with if (lineArea.Contains(tmp)) as a function pointer param or something.... I guess that's hard because here we actually need to return a value. <sigh>. >Index: layout/html/base/src/nsBlockFrame.h >+ // Get the lines containing y-coord 'y', or nsnull if you must search >+ // all lines. If nonnull is returned then we guarantee that the lines' >+ // combinedArea.ys and combinedArea.yMosts are non-decreasing. >+ nsLineBox* GetFirstLineContaining(nscoord y); Comment should probably say "Get the first line containing..." r+sr=bzbarsky with the changes to factor out shared code and other nits addressed.

Attachment #143083 - Flags: superreview?(dbaron)

Attachment #143083 - Flags: superreview+

Attachment #143083 - Flags: review?(dbaron)

Attachment #143083 - Flags: review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 66

•

21 years ago

One other thing. The line >+const int MIN_LINES_NEEDING_CURSOR = 20; In that patch is inside a DEBUG ifdef. I don't think you want that. I _do_ think you want to make that a static const int, though.

Assignee

Comment 67

•

21 years ago

> I assume there is no really good reason to wrap all that in a check that > nonDecreasingYs is still true since in most cases it _will_ be true and all > that is pretty cheap, right? That's my guess. > This seems to assume that "y" is within the YMost of one of our lines (hence > that it's in our rect). We should probably assert that up front. That's not necessarily true (e.g., a DIV with CSS width and height but a small number of small lines), but it doesn't matter. If y is below the YMost of the last line, we'll leave the cursor at the last line and return that. The loops that use the cursor will still work fine. I'll comment GetFirstLineContaining to indicate that sometimes it may return the last line even if that line doesn't contain the y. > I really wonder whether we can pull the logic for all this out into a function > that will get that inner bit (the part starting with if > (lineArea.Contains(tmp)) as a function pointer param or something.... I guess > that's hard because here we actually need to return a value. <sigh>. Yeah, that's a bit of a hassle. Also lots of parameters need to be threaded down into the enclosed code. My inline functions that I created to extract the common inner code are PaintLine with 8 parameters and GetFrameFromLine with 6 parameters. I'll attach the patch for checkin.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 68

•

21 years ago

Attached patch patch to check in (deleted) — Details — Splinter Review

Attachment #143083 - Attachment is obsolete: true