1303749 - Spell checker might be using sync IPC messages for every word.

Reporter

Description

•

8 years ago

jrmuizel just noticed this today: https://dxr.mozilla.org/mozilla-central/rev/f0f15b7c6aa77a0c5750918aa0a1cb3dc82185bc/extensions/spellcheck/hunspell/glue/PRemoteSpellcheckEngine.ipdl#15 which is called here: http://searchfox.org/mozilla-central/rev/f6c298b36db67a7109079c0dd7755f329c1d58e2/extensions/spellcheck/src/mozSpellChecker.cpp#139 I'm not too familiar with our spell checking infrastructure, but the implication seems to be that we're sending a sync IPC message per word within an editor. I don't actually know if this is the case, but I think it's probably something we want to investigate and understand. If we are actually sending sync IPC messages per word, we should probably try finding a way to not do that for performance reasons.

Mike Conley (:mconley) (:⚙️)

Reporter

Comment 1

•

8 years ago

ally, I remember you working on this stuff, and ehsan, I remember you reviewing it - is my read on this correct? Is it sync IPC message per word? Or something else?

Flags: needinfo?(ehsan)

Flags: needinfo?(a.m.naaktgeboren)

Ben Kelly [:bkelly, not reviewing]

Comment 2

•

8 years ago

This could be the cause of bug 1297725.

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1297725

Jeff Muizelaar [:jrmuizel]

Updated

•

8 years ago

Blocks: 1303751

Mike Conley (:mconley) (:⚙️)

Reporter

Updated

•

8 years ago

Blocks: e10s-perf

(no longer active)

Comment 3

•

8 years ago

Yes, this is one sync IPC message per word. This is dictated by this method being synchronous: <http://searchfox.org/mozilla-central/source/editor/txtsvc/nsISpellChecker.h#54> In general, our entire spell checking interface is composed of synchronous APIs, so doing anything else in the e10s case will require rewriting our spell checker to make it async friendly. Ally wasn't in a rush to do that (understandably so), and since this component is essentially unowned, I'm not aware of anybody else who would be up to do this task.

Flags: needinfo?(ehsan)

(no longer active)

Comment 4

•

8 years ago

Note that the other thing that hurts us here is our spell checker regularly over-checking larger areas of the DOM for spelling mistakes in response to smaller changes (such as changing one character.) It's not uncommon for us to double check paragraphs or more in that case. This is easily visible by watching the squiggly lines disappear and reappear in that etherpad test case, for example.

Jeff Muizelaar [:jrmuizel]

Comment 5

•

8 years ago

Why don't we spell check in the content process?

Mike Conley (:mconley) (:⚙️)

Reporter

Comment 6

•

8 years ago

Thanks for the context, ehsan. Just spitballing here (mostly re-saying stuff bkelly suggested), but is it possible to store our dictionaries (which I suspect do not change much) in shared memory such that content processes can synchronously read as often as they want? blassey - I remember you did some of the shared memory stuff for tab switching way back in the day (at least for OS X). Do you know if that stuff is set up for something like what bkelly suggests?

Mike Conley (:mconley) (:⚙️)

Reporter

Comment 7

•

8 years ago

ni for comment 6

Flags: needinfo?(a.m.naaktgeboren) → needinfo?(blassey.bugs)

Nathan Froyd [:froydnj]

Comment 8

•

8 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #5) > Why don't we spell check in the content process? Bug 693555 comment 6 suggests one reason.

Mike Conley (:mconley) (:⚙️)

Reporter

Comment 9

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #8) > (In reply to Jeff Muizelaar [:jrmuizel] from comment #5) > > Why don't we spell check in the content process? > > Bug 693555 comment 6 suggests one reason. Good context there. It seems to me that a shared memory region that's writable by the parent process (but read-only by all content processes) might be super useful for this - and maybe other things. Like, I can see such a thing helping out with bug 1303096, for example.

Mike Conley (:mconley) (:⚙️)

Reporter

Comment 10

•

8 years ago

(In reply to Mike Conley (:mconley) - (needinfo me!) from comment #6) > Just spitballing here (mostly re-saying stuff bkelly suggested) For those who are wondering where this stuff bkelly suggested is, he actually suggested it in IRC, and it's this: "mconley: that sounds bad from child process memory footprint... can we keep the dictionary data in a shared memory page or something?"? >

Mike Conley (:mconley) (:⚙️)

Reporter

Comment 11

•

8 years ago

Just capturing some chatter with billm in IRC: 4:03 PM <bkelly> seems like typing is one of those things where people are highly sensitive to responsiveness... so any small gains to remove sync IPC, etc would probably be worth it 4:03 PM <&billm> yeah, the easiest fix here is some sort of batch checking 4:04 PM but I don't know if that fits into the spell check code well or not 4:04 PM <mconley> ehsan_ probably knows for sure 4:04 PM I get the impression that it's kinda crufty stuff 4:04 PM not well loved. 4:05 PM <&billm> well, whatever we do is going to involve messing with it I think

(no longer active)

Comment 12

•

8 years ago

As I said on IRC, doing things such as "batch checking" while it sounds trivial requires massive changes to the spell checking code. The shared memory idea is also easier said than done, since the data structures in question are created and managed by hunspell, and I'm assuming that we don't want to change the guts of that library and maintain it. Before multi-e10s becomes a thing, one quick option would be to actually move all of the spell checking to happen in the *child* process. In a sane world, we should not have anything that requires spell checking in the parent. Grepping around seems to suggest that we don't have any <xul:textbox spellcheck="true"> except in tests. This will still require changes to hunspell since it normally just wants to fopen the dictionary files, but we've already done hacks around that (see hunspell_fopen_readahead, for example.) Something that opens the files in the parent and sends fds down to the child and makes hunspell_fopen_readahead return FILE*s on top of the said fds should alleviate most of the e10s specific regressions here. I can't think of any good and easy options that doesn't involve rewriting spellchecker code for e10s-multi though. :(

Ben Kelly [:bkelly, not reviewing]

Comment 13

•

8 years ago

(In reply to :Ehsan Akhgari (intermittently available until 9/21) from comment #12) > As I said on IRC, doing things such as "batch checking" while it sounds > trivial requires massive changes to the spell checking code. > > The shared memory idea is also easier said than done, since the data > structures in question are created and managed by hunspell, and I'm assuming > that we don't want to change the guts of that library and maintain it. Well, if we could arrange for hunspell to load this data in particular memory pages (instead of mixed in with the rest of the heap) then wouldn't the OS share the page automatically when we fork the child process? Of course, if words are added to the dictionary, then the pages would diverge. But that raises the question of how to handle such changes in multiple child processes regardless of memory sharing.

Allison Naaktgeboren :ally

Comment 14

•

8 years ago

The spellchecker code base is some of the oldest, and according to Ehsan, most gnarly and least maintained in firefox. I'm sorry to see that it's still unowned, though no one sane wants to touch or maintain the guts of hunspell. As it was, billm had to rework IPIDL for the original e10s fix happen. I strongly suspect this is similar. Some significant technical capability, such as mconley suggests in comment 9, will need to made available, and a significant chunk of crufty spellchecker code will still need to be rewritten. At one point Ehsan estimated that rewriting the spellchecker would take longer than (single)e10s itself, so that idea was deemed not worth the investment to perf improvement ratio. I did raise the issue of multiple child processes at the time, but was told by blassey not to worry about it; multi-e10s might not even be a thing, and if did it would be years in the future. No one wants to hear it, but it may be time to throw out the old spellchecker. For an added dose of heresy, it may be wise to consider ditching the suggestions in the context menu UX in favor of a display method that is within the child process.

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

8 years ago

Blocks: 1244482

Blake Kaplan (:mrbkap) (inactive)

Comment 15

•

8 years ago

I have some ideas that I would like to try out here.

Assignee: nobody → mrbkap

(no longer active)

Comment 16

•

8 years ago

(In reply to Ben Kelly [:bkelly] from comment #13) > (In reply to :Ehsan Akhgari (intermittently available until 9/21) from > comment #12) > > As I said on IRC, doing things such as "batch checking" while it sounds > > trivial requires massive changes to the spell checking code. > > > > The shared memory idea is also easier said than done, since the data > > structures in question are created and managed by hunspell, and I'm assuming > > that we don't want to change the guts of that library and maintain it. > > Well, if we could arrange for hunspell to load this data in particular > memory pages (instead of mixed in with the rest of the heap) then wouldn't > the OS share the page automatically when we fork the child process? Only on Linux, I think. On Windows you get a fresh process (there's no fork.) Not sure what happens on OS X. > Of course, if words are added to the dictionary, then the pages would > diverge. But that raises the question of how to handle such changes in > multiple child processes regardless of memory sharing. The words added to your personal dictionary are dealt with separately, they won't be added to the hunspell dictionaries (thankfully!)

Brad Lassey [:blassey] (use needinfo?)

Updated

•

8 years ago

Flags: needinfo?(blassey.bugs)

Jim Mathies [:jimm]

Updated

•

8 years ago

tracking-e10s: ? → +

Priority: -- → P2

Andrew Overholt [:overholt]

Comment 17

•

8 years ago

Have you had time to investigate here, Blake?

Flags: needinfo?(mrbkap)

Blake Kaplan (:mrbkap) (inactive)

Comment 18

•

8 years ago

Yes. I have I have a work in progress patch but I got pulled off it for a bit. I'll try to get back to this in the next couple of days.

Flags: needinfo?(mrbkap)

(no longer active)

Comment 19

•

8 years ago

Ping?

Blake Kaplan (:mrbkap) (inactive)

Comment 20

•

8 years ago

Sorry, I got distracted again. I'm back on this.

Blake Kaplan (:mrbkap) (inactive)

Comment 21

•

8 years ago

Attached patch wip (deleted) — Details — Splinter Review

This is an extremely rough WIP. There's one huge problem with it: if the text is changed between when we ask the parent for the spellchecking results and when we get the responses, we could get confused. One possible solution would be to keep track of changes in the spell-checked range and deal with them when we get the responses. Another would be to return a map of word -> correct/incorrect. There's also a bunch of cleanup to do: I just wanted to get this to work at all.

Attachment #8816345 - Flags: feedback?(ehsan)

(no longer active)

Comment 22

•

8 years ago

Comment on attachment 8816345 [details] [diff] [review] wip Review of attachment 8816345 [details] [diff] [review]: ----------------------------------------------------------------- I only skimmed this as a careful review will take a significant amount of time. I saw a few red flags below, but overall this is on the right track. One thing which I think we may want to do now is to better keep track of which async spell checking jobs we have scheduled. Right now, mozInlineSpellStatus is sort of tracking a "job" but it's a very loose notion. We should probably start to change the number of pending spell checker jobs with a queue of mozInlineSpellStatus to be processed. We can give each job a unique ID and send that across the IPC channel to be able to correlate the async results with what we had asked to be spell checked before. Once we keep track of this stuff we can also analyze the ranges to be spelled to detect overlapping ranges, and either cancel or ignore everything but the latest result for a range. What do you think? ::: extensions/spellcheck/src/mozInlineSpellChecker.cpp @@ -1439,5 @@ > Selection *aSpellCheckSelection, > mozInlineSpellStatus* aStatus, > bool* aDoneChecking) > { > - *aDoneChecking = true; Hmm, so now you're never setting this? @@ -1586,5 @@ > - return NS_OK; > - } > - *aDoneChecking = false; > - return NS_OK; > - } I'm not quite sure removing this is a good idea... ::: extensions/spellcheck/src/mozInlineSpellWordUtil.cpp @@ +327,5 @@ > +mozInlineSpellWordUtil::CheckSoftTextUsing(nsISpellChecker* aSpellChecker, > + SoftTextCallback* aCallback) > +{ > + if (XRE_IsParentProcess()) { > + return NS_ERROR_NOT_IMPLEMENTED; So given this change, how is the spell checker supposed to work in the parent process? ::: extensions/spellcheck/src/mozSpellChecker.cpp @@ +454,5 @@ > + const nsTArray<RealWord> &aRealWords) > +{ > + MOZ_ASSERT(!mEngine); > + nsTArray<bool> results(aRealWords.Length()); > + for (auto &realWord : aRealWords) { We should probably have some kind of an upper bound of the number of words we check here, since this loop can take an arbitrary long amount of time on the main thread of the parent process. I think we should do this incrementally.

Attachment #8816345 - Flags: feedback?(ehsan) → feedback+

(no longer active)

Updated

•

8 years ago

Blocks: SyncIPC

(no longer active)

Comment 23

•

8 years ago

Blake, are you still working on this? Have you made any more progress? This is so far the 7th worst sync IPC message that we have in Gecko per https://docs.google.com/spreadsheets/d/1x_BWVlnQPg0DHbsrvPFX7g89lnFGa3lAIHWD_pLa_dE/edit#gid=738048355.

Flags: needinfo?(mrbkap)

Blake Kaplan (:mrbkap) (inactive)

Comment 24

•

8 years ago

(In reply to :Ehsan Akhgari from comment #23) > Blake, are you still working on this? Have you made any more progress? Yes I'm still working on this. My current strategy is to leave the sync message but only send one per turn of the spellchecking machinery. It'll be more work to make the spell checking fully async. I got a bit distracted by the performance of the spell checker on large textareas. I'll speed myself up.

Flags: needinfo?(mrbkap)

(no longer active)

Comment 25

•

8 years ago

Thanks! I think we ultimately want to get rid of all sync messages here, but I guess we can reduce the number of them as a first step. I still think my suggestion in comment 22 should work so I hope that we can give it a try after this bug.

Erin Lancaster [:elan]

Updated

•

8 years ago

Status: NEW → ASSIGNED

Iteration: --- → 54.2 - Feb 20

Erin Lancaster [:elan]

Updated

•

8 years ago

Whiteboard: [e10s-multi:?]

(no longer active)

Updated

•

8 years ago

Depends on: 1340573

(no longer active)

Comment 26

•

8 years ago

Blake, this page seems to be a great test case for this issue. To reproduce, just load <https://public.etherpad-mozilla.org/p/wr-plan>. Here is a profile on the current Nightly: <https://perf-html.io/public/cb2ce87d9c4bf0af4fda2a8391c31652831080ad/calltree/?range=3.8410_28.9421>

Blake Kaplan (:mrbkap) (inactive)

Updated

•

8 years ago

Whiteboard: [e10s-multi:?] → [e10s-multi:-]

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Updated

•

8 years ago

Whiteboard: [e10s-multi:-] → [e10s-multi:-][qf:p1]

Kan-Ru Chen [:kanru] (UTC+9)

Comment 28

•

8 years ago

Thanks for the wip patch, Blake! I'll take care the rest of the work.

Assignee: mrbkap → kchen

Wayne Mery (:wsmwk)

Updated

•

8 years ago

Keywords: perf

Kan-Ru Chen [:kanru] (UTC+9)

Comment 29

•

8 years ago

https://perfht.ml/2mZ5Bet Moving the spellchecker parent to its own thread seems to work well. Spellchecking in the content process still takes a long time, uninterruptible, but at least it won't jank with main process main thread.

Kan-Ru Chen [:kanru] (UTC+9)

Comment 30

•

8 years ago

Ehsan, do you think the profile in comment 29 looks good to you? If so I'll go ahead and polish the patch.

Flags: needinfo?(ehsan)

Kan-Ru Chen [:kanru] (UTC+9)

Comment 31

•

8 years ago

I get the profiles before and after my patch before: https://perfht.ml/2oA5d62 after: https://perfht.ml/2nIhIZo There is some improvement but not as huge as I thought. We still need to split the work into smaller chunks. Note in the second profile there is still sync IPC calls. It's just harder too see in the default view. If you zoom in then you will find them on the timeline.

Kan-Ru Chen [:kanru] (UTC+9)

Updated

•

8 years ago

Blocks: 1354641

Kan-Ru Chen [:kanru] (UTC+9)

Comment 32

•

8 years ago

Better profile gathered from a MacBook Pro Before: https://perfht.ml/2oJwkw7 After: https://perfht.ml/2oJtmrj

Kan-Ru Chen [:kanru] (UTC+9)

Comment 33

•

8 years ago

Attached file spellcheck-test.html (deleted) — Details

Kan-Ru Chen [:kanru] (UTC+9)

Comment 34

•

8 years ago

Attached patch patch-spellchecker-thread.patch (deleted) — Details — Splinter Review

For the record, this is the patch to move spellchecker to a spellcheck thread on main process.

Evelyn Hung

Comment 35

•

8 years ago

(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #26) > Blake, this page seems to be a great test case for this issue. To > reproduce, just load <https://public.etherpad-mozilla.org/p/wr-plan>. > FWIW, the test case in Nightly 55.0al on my Mac Pro works fine on page loading. My test scenario is: with one content process preference set, having a Facebook live video in a background tab and open this etherpad in a foreground tab. On Firefox 52, you will hear the audio of the video is paused because content main thread is busy on spell checking, but on Nightly 55, the audio isn't affected. Not sure by which part of performance improvement we've made that fixed the problem, but sounds great!

(no longer active)

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1343551

Evelyn Hung

Comment 36

•

8 years ago

(In reply to Evelyn Hung [:evelyn] from comment #35) > Not sure by which part of performance improvement we've made that fixed the > problem, but sounds great! Credit goes to bug 1330912! \o/

Kan-Ru Chen [:kanru] (UTC+9)

Comment 37

•

8 years ago

I have a simpler WIP patch to turn the spell checker async. I'll post it shortly after I sort out some lifetime issue.

Kan-Ru Chen [:kanru] (UTC+9)

Comment 38

•

8 years ago

Attached patch kanru-wip.patch (deleted) — Details — Splinter Review

Naveed Ihsanullah [:naveed]

Updated

•

8 years ago

Whiteboard: [e10s-multi:-][qf:p1] → [e10s-multi:-][qf:p2]

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Updated

•

7 years ago

Blocks: 1370754

Andrew Overholt [:overholt]

Comment 39

•

7 years ago

(In reply to (Catching up emails) Kan-Ru Chen [:kanru] (UTC+8) from comment #30) > Ehsan, do you think the profile in comment 29 looks good to you? If so I'll > go ahead and polish the patch. It's probably worth asking others involved here (OTTOMH: Evelyn, mconley, Masayuki, mrbkap, etc.)

Flags: needinfo?(ehsan) → needinfo?(kchen)

Naveed Ihsanullah [:naveed]

Updated

•

7 years ago

Whiteboard: [e10s-multi:-][qf:p2] → [e10s-multi:-][qf:p1]

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [e10s-multi:-][qf:p1] → [e10s-multi:-][qf:i60][qf:p1]

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [e10s-multi:-][qf:i60][qf:p1] → [e10s-multi:-][qf:f60][qf:p1]

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

7 years ago

Assignee: kanru → bugs

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [e10s-multi:-][qf:f60][qf:p1] → [e10s-multi:-][qf:f61][qf:p1]

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

7 years ago

Whiteboard: [e10s-multi:-][qf:f61][qf:p1] → [e10s-multi:-][qf:f64][qf:p1]

Jean Gong :jgong

Updated

•

6 years ago

Whiteboard: [e10s-multi:-][qf:f64][qf:p1] → [e10s-multi:-][qf:p1:f64]

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

6 years ago

Flags: needinfo?(kanru)

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

6 years ago

Priority: P2 → P1

Jean Gong :jgong

Updated

•

6 years ago

Whiteboard: [e10s-multi:-][qf:p1:f64] → [e10s-multi:-][qf:p2:responsiveness]

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 41

•

6 years ago

Need to ensure also https://bugzilla.mozilla.org/show_bug.cgi?id=1340573#c0 is fixed here.

Makoto Kato [:m_kato]

Comment 42

•

6 years ago

Spellchecker won't use sync IPC by bug 1502661.

Makoto Kato [:m_kato]

Comment 43

•

6 years ago

(I will remove PRemoteSpellcheckEngine.Check by bug 1503491)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 44

•

6 years ago

oh, superb. Makoto, want to make this a dup of one of those bugs then.

Assignee: bugs → nobody

Status: ASSIGNED → NEW

Flags: needinfo?(m_kato)

Makoto Kato [:m_kato]

Comment 45

•

6 years ago

Let's dup of bug 1502661.

Status: NEW → RESOLVED

Closed: 6 years ago

Flags: needinfo?(m_kato)

Resolution: --- → DUPLICATE

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P2

Keywords: perf:responsiveness

Whiteboard: [e10s-multi:-][qf:p2:responsiveness] → [e10s-multi:-]

wip 8 years ago Blake Kaplan (:mrbkap) (inactive) (deleted), patch	ehsan.akhgari : feedback+	Details \| Diff \| Splinter Review
spellcheck-test.html 8 years ago Kan-Ru Chen [:kanru] (UTC+9) (deleted), text/html		Details
patch-spellchecker-thread.patch 8 years ago Kan-Ru Chen [:kanru] (UTC+9) (deleted), patch		Details \| Diff \| Splinter Review
kanru-wip.patch 8 years ago Kan-Ru Chen [:kanru] (UTC+9) (deleted), patch		Details \| Diff \| Splinter Review