97157 - Find in page with a large <pre> is very very slow

Reporter

Description

•

23 years ago

This is a companion to bug 31770 and to bug 95461.

Try to do a Find in the lxr page above....

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

23 years ago

Severity: normal → major

Keywords: mozilla1.0

Kathleen :Brade

Updated

•

23 years ago

Keywords: perf

Cathleen

Updated

•

23 years ago

Blocks: 91351

kinmoz

Comment 1

•

23 years ago

Cc sfraser.

Priority: -- → P3

Target Milestone: --- → mozilla0.9.5

kinmoz

Updated

•

23 years ago

Status: NEW → ASSIGNED

G. Todd

Comment 2

•

23 years ago

Posting a URL for reference:

http://www.mozilla.org/performance/mallocsites.html

Right on mozilla org ;-)  

On an older machine (PII-233 mhz) on Linux 2001083016 this page causes the
browser to black out the windows and become unresponsive for several minutes. At
several intervals the page renders and appears for a few seconds intermittently.

kinmoz

Comment 3

•

23 years ago

0.9.6 for now.

Target Milestone: mozilla0.9.5 → mozilla0.9.6

Dean Tessman

Comment 4

•

23 years ago

This works for me now that the fix for 31770 was checked in.

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 5

•

23 years ago

This bug is still there; Kin is planning on a rewrite of TextServicesDocument to
solve it.  On an lxr of nsCSSFrameConstructor.cpp (monster file), Finding the
second occurance of "HTMLAttribute" (after finding the first) takes ~20 seconds
on my build.  (The first occurance is on line 20 or so, the second is on line
1200ish.  Finding the first takes about as long as the second.)  The fix for
31770 will have made this faster than it was, perhaps, but it's still REALLY
slow on large <pre>s.

Dean Tessman

Comment 6

•

23 years ago

Huh.  It only takes two seconds to go from the first instance, at line 32, to
the second instance, at line 1256.  I'm using 2001092403 on Win2K, on a PII-500.
 Considering this is a 12000-line file, I find this speed to be acceptable.  Not
ideal, a vast improvement.  If there's more work coming to speed this up even
more then great, I guess.

kinmoz

Updated

•

23 years ago

Blocks: 106961

kinmoz

Comment 7

•

23 years ago

--> Mozilla0.9.9

Trying to load-balance bug list across milestones. I will pull this bug in to a 
sooner milestone if I can.

Target Milestone: mozilla0.9.6 → mozilla0.9.9

Akkana Peck

Assignee

Comment 8

•

23 years ago

On Kin's request, I'm looking into this.

Assignee: kin → akkana

Status: ASSIGNED → NEW

Target Milestone: mozilla0.9.9 → mozilla0.9.8

Akkana Peck

Assignee

Comment 9

•

23 years ago

Attached patch Preliminary patch: Much faster find (obsolete) (deleted) — Details — Splinter Review

Akkana Peck

Assignee

Comment 10

•

23 years ago

It's time to start soliciting comments.  It seems to be working fairly well for
my small tests so far.  In a debug build, my worst-case test (search to the end
of the lxr page for nsCSSFrameConstructor.cpp), it takes 1/4 the time of the old
find (3 seconds vs. 12 on an Athlon 1600); for incremental finds the gain is
substantially larger.

As is, this replaces the find used by the browser.  XUL/JS code to make the
editor use the new find will be showing up momentarily in a separate bug.

Comment about the implementation, which reviewers need to know:
First, the code I used for IsBlock gets that information from the parser
(technically the "right" owner of this information).  However, to do this
introduces a dependency from embedding/components/find to parser and necko. 
This may not be kosher.  It's easy enough to do what everybody else does and
just duplicate the list of nodes thought to be block, and maintain it ourselves;
this is ugly but introduces no dependencies, and I can easily rewrite it that
way if the dependencies are a problem.

All the TEXT_SVCS_TEST stuff in nsWebBrowserFind is code I'm using for
performance testing -- I can switch back and forth between old and new find by
setting an environment variable (the timing stuff will probably only work on
Unix).  My plan is to remove it all before checking in, but until then I want to
leave it in to be able to get easy performance comparison numbers.

Other limitations/issues as noted in comments in the code.

Simon, I know it's a lot of code, but do you have any cycles to glance at this?
 Even if you don't have time for a code review, I'd appreciate comments on the
embedding dependency issue and any other concerns you might have; if you do have
time to look at the code it would be much appreciated.

Of course, anyone else on the cc list is also welcome to review or offer comments.

Status: NEW → ASSIGNED

Joe Francis

Comment 11

•

23 years ago

I will be glad to review this when I get back in early January.

Akkana Peck

Assignee

Comment 12

•

23 years ago

Note: the string APIs have changed again, so line 357 of nsFind must change to read:
    ToLowerCase(patStr);
I'll assume I don't have to make a new patch just for that.
Still looking for reviews ...  This needs to go in by next Wednesday if it's
going to make the milestone.  Please?

Simon Fraser [no longer active]

Comment 13

•

23 years ago

How final is the patch? I'll review when you say it's ready to go in  ;)

Akkana Peck

Assignee

Comment 14

•

23 years ago

Thanks!  As far as I know, it's ready (modulo the #ifdef'ed testing code which
should probably be removed).  There may be issues I've missed, though, hence the
plea for reviews to get more sets of eyes on it.

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 15

•

23 years ago

I'll try to look at it also this weekend

Akkana Peck

Assignee

Comment 16

•

23 years ago

Attached patch Better patch: should build on windows, includes mac build xml (obsolete) (deleted) — Details — Splinter Review

Here's another patch which I think combines Kin's changes and mine.
I've included the mods to the macbuild xml file, and Kin's changes to make it
build on windows.
Kin says it's not working right on Windows.  Unfortunately my build isn't
working today (for possibly unrelated reasons) so we haven't gotten to the
bottom of this yet.  Reviews are ongoing and I do still need reviews, and would
love to have an indication of whether these XML files do the right thing on the
mac build.

Attachment #62247 - Attachment is obsolete: true

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 17

•

23 years ago

I want to see this in, if at all possible, for 0.9.8.

Comments:
--   Use nsnull instead of 0 in things like "mRange = 0" (for any pointer,
especially an nsCOMPtr).

--   nsFind::SetRange() doesn't create an iterator if it wasn't already created
(could crash).  Since you have code elsewhere for creating the iterator, you may
want to break that into a private method.

--   Minor: don't we have simpler ways of defining straightforward attribute
methods?

--   In AddIterNode(), you drop whitespace-only nodes.  Won't this cause
problems if you have something like this: <b>foo</b> <i>bar</i> and search for
"foo bar"?  Similarly for cases of searching for multiple whitespaces where
something breaks the whitespace into a separate node.

+    // nsIContentIterator.h says Next() will return error at end,
+    // but it doesn't really, so we have to check:
-- You should patch the nsContentIterator.h file if it lies

-- There are a couple of tabs; kill them

Other than that, looks quite sane.  Correct those, and r=rjesup@wgate.com.  We
should get r/sr's from Kin and JFrancis too I think.

Akkana Peck

Assignee

Updated

•

23 years ago

Blocks: 116405

Joe Francis

Comment 18

•

23 years ago

Attached patch updated diffs for EmbedComponents.xml (deleted) — Details — Splinter Review

I had some trouble with the mac project canges, so I created a new diff for the
.xml file.

Akkana Peck

Assignee

Comment 19

•

23 years ago

Attached patch Latest patch (deleted) — Details — Splinter Review

Here's a new patch, based on inputs from rjesup, kin, jfrancis.
It addresses:
- the issues rjesup brought up
- some build issues on mac and windows
- Joe's mac build xml file
- check pref "browser.new_find" (off by default) and only use the new behavior
if that's set
- don't save state between finds; instead, find from the selection position
- climb up to find nearest block parent and compare with previous one, and
clearQ if they're different (because iterator doesn't give us the parent node
in both directions)
- comment out iterator pre order since it doesn't currently work, and so don't
SkipNode in forward direction

I think this one should be safe to check in.  Joe's going to look at it after
dinner.

Attachment #64913 - Attachment is obsolete: true

Randell Jesup [:jesup] (needinfo me)

Reporter

Comment 20

•

23 years ago

Looks good.  What's the answer to my paranoid worry:

--   In AddIterNode(), you drop whitespace-only nodes.  Won't this cause
problems if you have something like this: <b>foo</b> <i>bar</i> and search for
"foo bar"?  Similarly for cases of searching for multiple whitespaces where
something breaks the whitespace into a separate node.


So long as you have an answer for that (I'm probably mis-understanding the
effect fo dropping whitespace nodes), r=rjesup@wgate.com

Randell Jesup [:jesup] (needinfo me)

Reporter

Updated

•

23 years ago

Attachment #65181 - Flags: review+

Joe Francis

Comment 21

•

23 years ago

r=jfrancis for initial landing, with preference set to disable new find code
pending further fixes.

Joe Francis

Comment 22

•

23 years ago

i meant to add to above comment that i did build testing and runtime testing on
mac for this.

Boris Zbarsky [:bzbarsky]

Comment 23

•

23 years ago

A thought.  You need to find whether the _node_ is a block or whether the node's
_frame_ is a block?  Would it not make more sense to get the primary frame for
the content and check whether it's a percentage base?

Akkana Peck

Assignee

Updated

•

23 years ago

Blocks: 80805

Akkana Peck

Assignee

Comment 24

•

23 years ago

The patch (with a few last-minute tweaks for problems Joe discovered) has gone
in, preffed out by default (because it has some problems that still need to be
worked out).  To turn it on, user_pref("browser.new_find", true).  On Unix
platforms it will print out timing information, so you can do timing tests by
flipping the pref and noting the times.

Bumping the bug to the next milestone for getting it turned on, getting all the
issues worked out and removing the old code and the old files.

Target Milestone: mozilla0.9.8 → mozilla0.9.9

Akkana Peck

Assignee

Updated

•

23 years ago

Blocks: 95461

kinmoz

Comment 25

•

23 years ago

Some problems I'm seeing with new find turned on:

1. It seems to only work with the page that exists when you *first*
   bring up the find dialog ... if you visit another page and leave
   the find dialog up, it stops working, probably due to the fact that
   nsWebBrowserFind::SearchInFrame() doesn't reset the doc used in mFind.

2. Searching for a string like "read Mozilla" on www.mozilla.org finds
   the sub string in:

      For more info about us, read <a href="foo">Mozilla at a Glance</a>

   but only the word "read" is hilited.

3. If I load the lxr page for nsCSSFrameConstructor.cpp, and search for
   "pseudo", it seems to find the first occurrence, but subsequent finds do
   nothing ... no not found dialog, no hiliting or scrolling of the next
   "pseudo" word. The first "pseudo" word remains hilited. This is probably
   because find is not starting it's search after the end of the current
   selection when going forward. (Note it should start before the current
   selection when searching backwards too.)

kinmoz

Comment 26

•

23 years ago

Just jotting down some of the notes I mentinoed to akk when looking at the first
pass impl of the nsIFind interface ...

- I really think we should remove dependencies on doc, presshell, and selection
  from this low-level interface, it would allow us to use the implementation
  on different types of DOM based trees like the ones used in text and other
  types of widgets. It would also allow things like spellcheckers to find all
  occurrences of a specific word, without disturbing the current selection.
  It would also open up a bunch of possibilities for using find via JS for
  things that we haven't thought of yet.

- We should remove wrapping from nsIFind and let the app/web-browser-find
  handle it, or provide an app-level convenience interface that implemented
  it. I say this because some apps want more control or notification before
  a search actually wraps within a particular doc. The example that comes to
  mind would be searching through a multi-frame set of docs ... some apps may
  prefer to search from a certain point within a one of docs in a frame, let
  the find proceed through all the frames after it, then wrap to the top frame,
  continuing till you hit the doc we started in. 

- The find method should probably take 2 ranges, one that says only search
  between these 2 endpoints, and the other an indicator of where to start the
  search from.

- Lose the references to replace and let the app handle it.

Akkana Peck

Assignee

Comment 27

•

23 years ago

Randell: your paranoid worry is probably correct.  In fact, that attempt at
optimization is probably not safe and should be removed.  I'll do that in the
next iteration while tracking down the other issues that have come up.

Re the document not being changed: nsWebBrowserFind has code that supposedly
changes the document if the frame isn't the same as before ... but it doesn't
work (that != never actually fires).  I wonder if that's what Simon was trying
to debug when he put the big #ifdef DEBUG_smfr right near there?  The old code
works because it resets the document for each find anyway; I hope to avoid that
and only reset when the document has actually changed (which I can probably do
by storing a weak pointer to the document in nsWebBrowserFind if storing the
frame doesn't do the trick).

Agree with Kin that we should offer a low-level interface that doesn't require
knowing about the selection, pres shell, etc.  I'll be doing that in the next
iteration (there wasn't time last night).  I also agree about removing the
wrapping from the low-level interface.  The tentative plan is that
nsIWebBrowserFind will be the easy-to-use interface that takes docshells and
handles wrapping, and nsIFind will be the low-level interface that just returns
a range.

Akkana Peck

Assignee

Comment 28

•

23 years ago

Boris: I've been warned that GetPrimaryFrameFor is very slow and should be
avoided when possible.  But of course, one should always test assertions like
that rather than believing them blindly, so it may be worth trying.

Boris Zbarsky [:bzbarsky]

Comment 29

•

23 years ago

GetPrimaryFrameFor can be slow, yes.... Especially on table elements.

My concern is what happens if the page contains something like 

<span style="display:block">
  Some random
</span>
<span style="display:block">
  text in here
</span>

Will Find end up finding "random text" in such a page?  :)

Joe Francis

Comment 30

•

23 years ago

just documenting some old news so we don't forget: for a good time, search
www.mozilla.org/start for "a page".  The behavior here may shed some light on
difficulties with the find algorithm.

Chris

Comment 31

•

23 years ago

Could this in some way be releated to a bug i reported here:
http://bugzilla.mozilla.org/show_bug.cgi?id=98835

Some of the behaviour between this and my bug is identical. The high CPU usage,
eating up a _lot_ of ram. 

Also after visiting the page that was given as test case for this bug, i also
noticed the same behaviour as with my bug. Memory is not released, lots of CPU
cycles spend on seemingly nothing (nothing visual).

Also when going to the test page, back here, and to the test page again, keeps
increasing the memory usage of the browser.

As i wrtite this, mozilla is using 140Mb of memory, and i've only visited 2
pages plus twice the test page here, plus once the test page i included in my
bug report.

sairuh (rarely reading bugmail)

Comment 32

•

23 years ago

mass moving open bugs pertaining to find in page/frame to pmac@netscape.com as
qa contact.

to find all bugspam pertaining to this, set your search string to
"AppleSpongeCakeWithCaramelFrosting".

QA Contact: sairuh → pmac

Akkana Peck

Assignee

Updated

•

23 years ago

Blocks: 122061

Akkana Peck

Assignee

Comment 33

•

23 years ago

The problem with the mozilla start page was a problem with the way the algorithm
skipped multiple adjacent whitespace.  When we hit the first non-space after a
block of whitespace, we weren't advancing to the next non-whitespace character
in the pattern string.

While I was there, I noticed a possible buffer over/underrun in the "search to
the  last space" code, so I also fixed that.

Can I get a review for that?  I'll work separately on the problems moving
between pages -- I need to restructure the high level/low level code as Kin and
I discussed.

Akkana Peck

Assignee

Comment 34

•

23 years ago

Attached patch Patch: fix problems skipping over whitespace (deleted) — Details — Splinter Review

Akkana Peck

Assignee

Comment 35

•

23 years ago

For anyone interested: issues with the API splitting will be addressed in newly
opened bug 123087 (this bug was getting cumbersome and it was hard to keep track
of what needed to be done).

Syd Logan

Updated

•

23 years ago

Keywords: nsbeta1+

Akkana Peck

Assignee

Comment 36

•

23 years ago

The API split is in.  This bug is now tracking turning the feature on by default.

Akkana Peck

Assignee

Comment 37

•

23 years ago

It's on by default!

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Akkana Peck

Assignee

Comment 38

•

23 years ago

Clarification: it's on by default in the browser. Working on the editor
find/replace ... bug 80805.

Alec Flett

Comment 39

•

23 years ago

is there another bug on ripping out the old find code then?

Akkana Peck

Assignee

Comment 40

•

23 years ago

Just filed bug 126312 -- thanks for the reminder.

Patty Mac

Comment 41

•

23 years ago

Akkana, is this one really fixed? 
1. Click on the url that lists on comment#2,
http://www.mozilla.org/performance/mallocsites.html
it still hangs. (windows 2000, commercial build: 2002-02-22-10-trunk).

2. Click on the orginal url:
http://lxr.mozilla.org/seamonkey/source/layout/html/style/src/nsCSSFrameConstructor.cpp
 Then, quickly, hit ^g for several times in "find in this page" dialog with this
text: nsobject, it crashes. This probably is a different issue so I already 
filed it:  http://bugzilla.mozilla.org/show_bug.cgi?id=127295

Let me know. Thanks!

Akkana Peck

Assignee

Comment 42

•

23 years ago

Patty: can you check your prefs (prefs.js and user.js) and make sure you don't
have "browser.new_find" set to false somewhere?

With a debug build from today, nsCSSFrameCnstructor.cppI get:
new find, first nsobject: 0.1 sec
new find, second (not found): 1.3 sec
old find, first nsobject: 5.8 sec
old find, second (not found): 11.1 sec

I don't see the crash, but I'm not sure I'm trying the right way to reproduce it
-- accel-G doesn't seem to do anything for me, either in the find dialog or in
the browser window when the find dialog is up, so I just tried hitting return in
the find dialog several times.  Any chance you can get a stack trace for the
crash?  Does it happen repeatably when new_find is true (the default now) but
not when it's false, or does it happen either way?  Does it happen on earlier
builds?

Patty Mac

Comment 43

•

23 years ago

Thanks Akkana! Verified on windows (commercial netscape build: 2002-02-27-06-trunk)
new find, first nsobject: 0.01 sec
new find, second (not found): 0.09 sec

Status: RESOLVED → VERIFIED

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: Core → Mozilla Application Suite

timeless

Updated

•

16 years ago

Component: XP Apps: GUI Features → UI Design

Preliminary patch: Much faster find 23 years ago Akkana Peck (deleted), patch		Details \| Diff \| Splinter Review
Better patch: should build on windows, includes mac build xml 23 years ago Akkana Peck (deleted), patch		Details \| Diff \| Splinter Review
updated diffs for EmbedComponents.xml 23 years ago Joe Francis (deleted), patch		Details \| Diff \| Splinter Review
Latest patch 23 years ago Akkana Peck (deleted), patch	jesup : review+	Details \| Diff \| Splinter Review
Patch: fix problems skipping over whitespace 23 years ago Akkana Peck (deleted), patch		Details \| Diff \| Splinter Review