Closed
Bug 525579
Opened 15 years ago
Closed 15 years ago
topcrash [@ nsAccessibilityService::GetAccessible(nsIDOMNode*, nsIPresShell*, nsIWeakReference*, nsIFrame**, int*, nsIAccessible**)]
Categories
(Core :: Disability Access APIs, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
status1.9.2 | --- | beta4-fixed |
People
(Reporter: dbaron, Assigned: davidb)
References
(Blocks 1 open bug)
Details
(Keywords: access, crash, topcrash, Whiteboard: [crashkill][#7 Firefox 3.6b1 topcrash][crashkill-fix])
Crash Data
Attachments
(2 files, 9 obsolete files)
(deleted),
patch
|
roc
:
review+
surkov
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
roc
:
review+
dbaron
:
approval1.9.2+
|
Details | Diff | Splinter Review |
The #14 topcrash for Firefox 3.6b1 is currently crashes at nsAccessibilityService::GetAccessible(nsIDOMNode*, nsIPresShell*, nsIWeakReference*, nsIFrame**, int*, nsIAccessible**)
See reports at:
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6b1&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=nsAccessibilityService%3A%3AGetAccessible%28nsIDOMNode%2A%2C%20nsIPresShell%2A%2C%20nsIWeakReference%2A%2C%20nsIFrame%2A%2A%2C%20int%2A%2C%20nsIAccessible%2A%2A%29
Updated•15 years ago
|
Severity: normal → critical
Whiteboard: [crashkill]
Comment 1•15 years ago
|
||
Alex, any idea?
Comment 2•15 years ago
|
||
88 crashers like this within the last 7 days. Requesting blocking. Note that I haven't seen this crash myself at all yet.
Flags: blocking1.9.2?
Assignee | ||
Comment 3•15 years ago
|
||
The stacks seem to indicate we walk up an accessible parent chain and then look at siblings. The source for frame 0 is:
if (!frame || content != frame->GetContent())
Could the frame be defunct? We should probably call: FlushPendingNotifications(Flush_Layout) near the top of GetAccessible.
Also, this might indicate that we are allowing the call run at an unsafe time -- not sure.
Assignee | ||
Comment 4•15 years ago
|
||
Hoping for steps to reproduce in order to debug and fix this properly. I guess wallpapering it for 3.6 might be our best option for now.
Assignee | ||
Comment 5•15 years ago
|
||
Hmm I wonder if frame poisoning bug 497495 uncovered this bug.
Assignee | ||
Comment 6•15 years ago
|
||
The address is 0xfffffffff0dea813 so the frame is poisoned. Zack, what is the right approach here? We used to just check for a null frame.
Comment 7•15 years ago
|
||
for 2009 11 03 about half the crashes are near startup.
37 total crashes for nsAccessibilityService::GetAccessible on
20091103-crashdata.csv
19 start up crashes inside 3 minutes
distribution of all versions where the nsAccessibilityService::GetAccessible
crash was found on 20091103-crashdata.csv
29 Firefox 3.6b1
4 Firefox 3.6b2pre
2 Firefox 3.7a1pre
2 Firefox 3.5.4
____________number of uniq sites found with this signature:
16
3 about:blank
1 about:blank
1 http://www.youtube.com/results?search_query=GAME&page=&utm_source=opense
arch
1 http://www.youtube.com/republicandebate
1 http://www.yandex.ru/?edit=1&ncrnd=2395080453#widgetAdded
1 http://www.photohunters.de/meist-gesehen/natur/landschaft/sonnenaufgang-
in-sassenberg-349.html
1 http://www.google.it/#hl=it&source=hp&q=testo+legge+328+00&meta=&aq=0&oq
=testo+legge++328&fp=8170e646eb90a16c
1 http://www.google.fr/ig
1 http://www.bild.de/
1 http://windows.microsoft.com/en-US/windows/downloads/personalize?T1=desk
topgadgets
1 http://news.google.de/news/search?um=1&cf=all&ned=de&hl=de&q=t-home+ente
rtain&cf=all&scoring=n
1 http://mail2.daum.net/hanmail/spam/SpamReport.daum
1 http://kirutoku-rublog.seesaa.net/article/124938451.html#comment
some other possible related signatures
signature list
35 nsAccessibilityService::GetAccessible(nsIDOMNode*, nsIPresShell*,
nsIWeakReference*, nsIFrame**, int*, nsIAccessible**)
1 nsAccessibilityService::GetAccessibleByType(nsIDOMNode*, nsIAccessible**)
1 @0x0 | nsAccessibilityService::GetAccessible(nsIDOMNode*, nsIPresShell*,
nsIWeakReference*, nsIFrame**, int*, nsIAccessible**)
Assignee | ||
Comment 8•15 years ago
|
||
Chris thanks! While some of those reports seem to pre-date frame poisoning, they are a separate kind of crash, happening on a different line.
Note a proper fix for bug 423737 might solve this bug.
Marco, can you try those sites to see if you can recreate?
Comment 9•15 years ago
|
||
Unfortunately, simply calling these links will crash me. NONE. BTW I always start Firefox with about:blank, and new tabs are also blank when I open them. I've never seen this crash happen ever myself.
Comment 10•15 years ago
|
||
(In reply to comment #6)
> The address is 0xfffffffff0dea813 so the frame is poisoned. Zack, what is the
> right approach here? We used to just check for a null frame.
You need to find the place that is freeing a frame that still has live pointers, and make it not do that, or else null out those pointers when it does.
Comment 11•15 years ago
|
||
David, this bug should be your #1 priority.
Updated•15 years ago
|
Assignee: nobody → bolterbugz
Reporter | ||
Updated•15 years ago
|
Whiteboard: [crashkill] → [crashkill][#7 Firefox 3.6b1 topcrash]
Comment 12•15 years ago
|
||
Still no luck reproducing using any of the above links with either Minefield or Namoroka, with either JAWS or NVDA.
Assignee | ||
Comment 13•15 years ago
|
||
(In reply to comment #11)
> David, this bug should be your #1 priority.
Yep. Let's fix this.
(In reply to comment #10)
> (In reply to comment #6)
> > The address is 0xfffffffff0dea813 so the frame is poisoned. Zack, what is the
> > right approach here? We used to just check for a null frame.
>
> You need to find the place that is freeing a frame that still has live
> pointers, and make it not do that, or else null out those pointers when it
> does.
Thanks, this makes sense; so far I haven't pin-pointed it.
Updated•15 years ago
|
Blocks: PoisonFrameCrash
I am very suspicious of the way that nsAccessibleTreeWalker caches a frame in mState.frame. This means if any layout flush happens while an nsAccessibleTreeWalker is alive, we could crash or do something random and unsafe (before frame poisoning) or crash or do something random but safe (after frame poisoning).
(Were you guys already aware that flushing is not permitted while an nsAccessibleTreeWalker is alive?)
So far I can't find a spot where such a flush could have happened, but this is an awful lot of code to secure against flushes. Can we make mState.frame go away?
Another option is to expose nsWeakFrame and use it here, although then an unexpected flush would cause the frame to go away, and recovering from that reliably would probably be harder than just not using a frame in the first place.
On another topic, I wonder why our #4 topcrash is in accessibility at all. Why are so many people running with accessibility on? Is it because of that antivirus software that turns it on?
Comment 15•15 years ago
|
||
3.6b1 doesn't have any users or much crash volume yet so the top crash list can be influenced by a few users that are crashing a lot.
Comment 16•15 years ago
|
||
It might be reasonable to use IsDefunct inside of CacheChildren() instead of mWeakShell check because IsDefunct() returns true if presshell died which could happen before we'll process delayed event for example. It's totally safe and easy, we could try and see if it helps.
IsDefunct won't protect you from all frame destruction/reconstruction situations, at best it's just a stop-gap measure.
Comment 18•15 years ago
|
||
(In reply to comment #17)
> IsDefunct won't protect you from all frame destruction/reconstruction
> situations, at best it's just a stop-gap measure.
Sure, but nsAccTreeWalker is always local, used to create accessibles and accessible tree and cache them. I can't see anything dangerous we could launch destruction/reconstruction frame process. What should we call from layout to start this process? How is this process usually started?
Anything that calls nsIPresShell::FlushPendingNotifications can create or destroy frames. For example, nsAccessible::GetState(). Are you sure that nothing in nsAccessibilityService::GetAccessible can ever call nsAccessible::GetState, directly or indirectly? That's an awful lot of code.
Comment 20•15 years ago
|
||
Yes, it could be, for example, nsAccessibleService::GetAccessible() -> nsLinkableAccessible::Init() -> nsLinkableAccessible::CacheActionContent().
Assignee | ||
Comment 21•15 years ago
|
||
(In reply to comment #14)
> I am very suspicious of the way that nsAccessibleTreeWalker caches a frame in
> mState.frame.
Yes me too. I've been looking at tree walker quite a bit today -- it is fairly
new to me since that code has been mostly untouched for a while.
> (Were you guys already aware that flushing is not permitted while an
> nsAccessibleTreeWalker is alive?)
I only had a hunch, but now I'm aware thanks. I will not add a flush during
tree walking.
> So far I can't find a spot where such a flush could have happened, but this is
> an awful lot of code to secure against flushes. Can we make mState.frame go
> away?
I don't know yet.
>
> Another option is to expose nsWeakFrame and use it here, although then an
> unexpected flush would cause the frame to go away, and recovering from that
> reliably would probably be harder than just not using a frame in the first
> place.
Good to know. This might be an option if it isn't a flush that is biting us.
But if it isn't a flush I wonder what it is... how are we walking a tree that
has a poisoned frame?
> On another topic, I wonder why our #4 topcrash is in accessibility at all. Why
> are so many people running with accessibility on? Is it because of that
> antivirus software that turns it on?
It surprises me. As Chris says in comment #15 it could be sample size. I didn't
notice the virtual keyboard (mzvkbd.dll) that Kaspersky installs listed in the
modules. Not sure if there are others. I've started a list on
https://wiki.mozilla.org/Accessibility/Crash.
(In reply to comment #20)
> Yes, it could be, for example, nsAccessibleService::GetAccessible() ->
> nsLinkableAccessible::Init() -> nsLinkableAccessible::CacheActionContent().
OK, so that's one way we could get a random crash.
I looked pretty hard for a possible cause and failed to notice that code path, because there's just a lot of code here. I think that's a strong reason to make nsAccessibleTreeWalker not cache the frame. Can't we just call GetPrimaryFrameFor whenever we need a frame? I.e., get rid of the frame hints?
(In reply to comment #21)
> Good to know. This might be an option if it isn't a flush that is biting us.
> But if it isn't a flush I wonder what it is... how are we walking a tree that
> has a poisoned frame?
I'm guessing not. Bad frame trees are very rare in real Web content and would cause crashes outside accessibility a lot more often than in it, I would think, since accessibility code is normally not run.
By the way, let me explain why this wouldn't have crashed (much) before frame poisoning. From the crash stack it looks like we're doing
mState.frame = mState.frame->GetNextSibling();
and then (where frame is mState.frame)
if (!frame || content != frame->GetContent()) {
frame = aPresShell->GetRealPrimaryFrameFor(content);
If the first mState.frame has been deleted, then mState.frame->GetNextSibling() is probably going to return null, or a pointer to another frame (dead or alive). If it's null, then !frame is false. If it's a pointer to another frame --- dead or alive --- "content != frame->GetContent()" will almost certainly return false. So, unless we're very unlucky, we'll go ahead and reset 'frame' to the real frame.
Assignee | ||
Comment 23•15 years ago
|
||
(In reply to comment #22)
> (In reply to comment #20)
> > Yes, it could be, for example, nsAccessibleService::GetAccessible() ->
> > nsLinkableAccessible::Init() -> nsLinkableAccessible::CacheActionContent().
>
> OK, so that's one way we could get a random crash.
>
> I looked pretty hard for a possible cause and failed to notice that code path,
> because there's just a lot of code here. I think that's a strong reason to make
> nsAccessibleTreeWalker not cache the frame. Can't we just call
> GetPrimaryFrameFor whenever we need a frame? I.e., get rid of the frame hints?
Seems reasonable to me. Alexander do you know if we were keeping mState.frame for performance only or was there more to it?
> (In reply to comment #21)
> > Good to know. This might be an option if it isn't a flush that is biting us.
> > But if it isn't a flush I wonder what it is... how are we walking a tree that
> > has a poisoned frame?
>
> I'm guessing not. Bad frame trees are very rare in real Web content and would
> cause crashes outside accessibility a lot more often than in it, I would think,
> since accessibility code is normally not run.
>
> By the way, let me explain why this wouldn't have crashed (much) before frame
> poisoning.
(...)
Yes thanks for the explanation, I suspected frame poisoning was unveiling some hidden badness.
I should mention that I'm not married to our two calls to FlushPendingNotifications and would consider (in a pinch) pulling them at least for 1.9.2.
First let's try your recommended approach of not storing mState.frame.
Comment 24•15 years ago
|
||
(In reply to comment #22)
> (In reply to comment #20)
> > Yes, it could be, for example, nsAccessibleService::GetAccessible() ->
> > nsLinkableAccessible::Init() -> nsLinkableAccessible::CacheActionContent().
>
> OK, so that's one way we could get a random crash.
>
> I looked pretty hard for a possible cause and failed to notice that code path,
> because there's just a lot of code here.
nsAccessibleTreeWalker calls nsAccessibilityService::GetAccessible when it needs to create an accessible, which calls nsAccessibilityService::InitAccessible for every new created accessible and then the patch I described above.
> I think that's a strong reason to make
> nsAccessibleTreeWalker not cache the frame. Can't we just call
> GetPrimaryFrameFor whenever we need a frame? I.e., get rid of the frame hints?
Technically this might work. But it will make impossible to use nsAccTreeWalker to deal with generated content (after and before styles) and list bullet frames. I should say now we don't create accessible for generated content (it's our bug) and we use hacks to create accessible for list bullet frame. Therefore I said this might work but this is not good in perspective. However I'm afraid to touch nsAccTreeWalker without good set of mochitests because I don't know this code well.
Comment 25•15 years ago
|
||
(In reply to comment #23)
> Seems reasonable to me. Alexander do you know if we were keeping mState.frame
> for performance only or was there more to it?
I don't think it's performance issue. I would guess (I didn't look at history) we traversed frame tree to create generated content and etc and therefore we cached frame. But I might be wrong. We stopped to traverse frame tree because of huge performance problem.
Assignee | ||
Comment 26•15 years ago
|
||
I dug up the old performance bug 242589 where the frame hint was added.
TreeWalker changes: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla%2Faccessible%2Fsrc%2Fbase%2FnsAccessibleTreeWalker.cpp&rev=&cvsroot=%2Fcvsroot
I can essentially undo that... feels dangerous.
Backing out the frame hint sounds like the right thing to do, to me.
If you come up with a testcase that shows a significant performance loss after the backout, we can have a new bug on getting that performance back, safely. It's probably not too hard.
Assignee | ||
Comment 28•15 years ago
|
||
This is a stopgap with low risk just in cased we are uncomfortable with the right fix which changes code that has been around longer. Note we don't have test coverage that will fail with this patch, so it just a code change. The only known possible regression is regarding accessibility in chatzilla + screen reader usage, though there may be others.
Do we want to push this ASAP into a beta to make sure it works?
Next stop, frame hint removal...
Assignee | ||
Comment 29•15 years ago
|
||
Comment on attachment 410488 [details] [diff] [review]
192-fix (don't flush layout)
Seeking r+ just in case we go with this stopgap.
Attachment #410488 -
Flags: review?(surkov.alexander)
Attachment #410488 -
Flags: review?(roc)
Assignee | ||
Comment 30•15 years ago
|
||
Comment on attachment 410488 [details] [diff] [review]
192-fix (don't flush layout)
Marco, your assessment on the stopgap would be great. I recall you found the flushing to help the chatzilla bug.
Attachment #410488 -
Flags: review?(marco.zehe)
Comment 31•15 years ago
|
||
Comment on attachment 410488 [details] [diff] [review]
192-fix (don't flush layout)
r=me. This reverts bug 444644. I don't really expect any negative side effects because there were more factors playing in with bug 444644 than this one. This will bring us back to a level of 3.5, which is fine.
Attachment #410488 -
Flags: review?(marco.zehe) → review+
Assignee | ||
Comment 32•15 years ago
|
||
Comments on the approach encouraged. Try build started. Marco will give it a spin.
Assignee | ||
Comment 33•15 years ago
|
||
Marco, try build is here: https://build.mozilla.org/tryserver-builds/dbolter@mozilla.com-rm_framehint/rm_framehint-win32.zip
Comment 34•15 years ago
|
||
Comment on attachment 410512 [details] [diff] [review]
WIP (framehint removal)
With this patch, both fileupload widget and the embedded HTML5 audio/video controls stop working completely. They simply disappear from the accessible tree alltogether.
Assignee | ||
Comment 35•15 years ago
|
||
OK sounds like we might need to walk into anonymous child frames.
Assignee | ||
Comment 36•15 years ago
|
||
I'm now fuzzy on what "back(In reply to comment #27)
> Backing out the frame hint sounds like the right thing to do, to me.
I think I might have gone a little too far on my "framehint removal" patch.
Just to be clear, did you mean do not walk frames? Or did you mean grab the frame as we need it in the walker? Or?
Assignee | ||
Comment 37•15 years ago
|
||
Or this?
Assignee | ||
Comment 38•15 years ago
|
||
What is the best way to walk the children of an <input type="file" />?
I think this is anonymous non XBL content?
Blocks: 526767
Comment on attachment 410488 [details] [diff] [review]
192-fix (don't flush layout)
Looks OK to me but I guess it might break stuff for accessibility.
And it's definitely just a stopgap, since it's difficult to verify there aren't other places that flush.
Attachment #410488 -
Flags: review?(roc) → review+
(In reply to comment #36)
> Just to be clear, did you mean do not walk frames? Or did you mean grab the
> frame as we need it in the walker? Or?
You can work with frames, but you can't hang onto an nsIFrame* between treewalker steps.
If you need to find anonymous content by walking the frame tree, you probably need to collect that anonymous content into an array, by getting the primary frame for the parent content, then walking its frame subtree *all at once* to collect all the anonymous content nodes into the array. Does that make sense?
Assignee | ||
Comment 41•15 years ago
|
||
(In reply to comment #40)
> (In reply to comment #36)
> > Just to be clear, did you mean do not walk frames? Or did you mean grab the
> > frame as we need it in the walker? Or?
>
> You can work with frames, but you can't hang onto an nsIFrame* between
> treewalker steps.
OK.
>
> If you need to find anonymous content by walking the frame tree, you probably
> need to collect that anonymous content into an array, by getting the primary
> frame for the parent content, then walking its frame subtree *all at once* to
> collect all the anonymous content nodes into the array. Does that make sense?
Yes I think so; instead of as a state machine.
Assignee | ||
Comment 42•15 years ago
|
||
(I mean during sibling traversal)
Assignee | ||
Comment 43•15 years ago
|
||
I'd sleep better if we can get the stopgap out there and tested. When is the next beta?
I like Roc's direction for the longer term more resilient fix but it feels like it is going to be higher risk to me, given the timing.
It's already too late for beta2. I don't think there is a next beta.
Comment 45•15 years ago
|
||
Comment on attachment 410488 [details] [diff] [review]
192-fix (don't flush layout)
r=me as for temporary solution
Attachment #410488 -
Flags: review?(surkov.alexander) → review+
Assignee | ||
Comment 46•15 years ago
|
||
I'm still building the big picture in my mind. Why doesn't content let me walk down into native anonymous content?
Elements do not have references to their anonymous children. For XBL anonymous children there are auxiliary data structures you can use to look up the XBL anonymous children given a parent, but for native anonymous children only the element's frame has references to the children. It would be possible to create a way to find the native anonymous children for an element, but we haven't needed it yet.
Assignee | ||
Comment 48•15 years ago
|
||
Comment on attachment 410488 [details] [diff] [review]
192-fix (don't flush layout)
Renaming 192 fix for book-keeping.
Attachment #410488 -
Attachment description: stopgap → 192-fix (don't flush layout)
Assignee | ||
Comment 49•15 years ago
|
||
Needed to tweak it for 1.9.2. (pushed to try).
Attachment #410488 -
Attachment is obsolete: true
Assignee | ||
Updated•15 years ago
|
Attachment #410805 -
Flags: approval1.9.2?
Updated•15 years ago
|
Attachment #410805 -
Flags: approval1.9.2? → approval1.9.2+
Comment 50•15 years ago
|
||
Comment on attachment 410805 [details] [diff] [review]
updated 192 fix
a192=beltzner
Comment 52•15 years ago
|
||
Do we need this fix on the older branches? The 1.9.1 branch code doesn't seem to have the problem in the two places that were patched, but maybe code was just moved around a bit.
blocking1.9.1: --- → ?
status1.9.1:
--- → ?
Keywords: testcase-wanted
Whiteboard: [crashkill][#7 Firefox 3.6b1 topcrash] → [needs landing][crashkill][#7 Firefox 3.6b1 topcrash]
Assignee | ||
Comment 53•15 years ago
|
||
No need to backport a fix pre 1.9.2.; thanks for checking.
blocking1.9.1: ? → ---
status1.9.1:
? → ---
Assignee | ||
Comment 54•15 years ago
|
||
(Note to self: I could flush layout in the treewalker ctor for further protection)
Assignee | ||
Comment 55•15 years ago
|
||
Pushed 192 fix (stopgap, speculative)
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/da89184d1f05
Assignee | ||
Comment 56•15 years ago
|
||
Different crash signature, but probably the same root cause:
http://crash-stats.mozilla.com/report/index/cdf9f4bc-e10c-4c10-b158-9f1bf2091105
Possible STR: "From Facebook's private message view, I click on the Updates tab on the left, and Firefox crashes every time."
Comment 57•15 years ago
|
||
(In reply to comment #56)
> Possible STR: "From Facebook's private message view, I click on the Updates tab
> on the left, and Firefox crashes every time."
I cannot reproduce this with either JAWS or NVDA and Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.3a1pre) Gecko/20091029 Minefield/3.7a1pre (.NET CLR 3.5.30729)
David, did you have a screen reader running while doing this, or what did you do to turn on accessibility while reproducing this?
Comment 58•15 years ago
|
||
Also no luck reproducing with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2b2pre) Gecko/20091029 Namoroka/3.6b2pre (.NET CLR 3.5.30729) and either JAWS or NVDA.
Assignee | ||
Comment 59•15 years ago
|
||
(In reply to comment #57)
> David, did you have a screen reader running while doing this, or what did you
> do to turn on accessibility while reproducing this?
Those aren't my steps; I just pasted them from a user comment in a crash report.
Assignee | ||
Comment 60•15 years ago
|
||
Closing as fixed. I landed the 192 fix on the weekend before beta2 was tagged. I spun off bug 527466 to make sure we don't hold onto frames between calls as per Roc's suggestion.
Please reopen if the 192 fix doesn't resolve (or at least significantly reduce) these crashes in FF3.6beta2.
Status: NEW → RESOLVED
Closed: 15 years ago
status1.9.2:
--- → beta2-fixed
Flags: blocking1.9.2?
Resolution: --- → FIXED
Assignee | ||
Comment 61•15 years ago
|
||
Heavy sigh. Still seeing the crash frequently in 3.6b2.
E.g. http://crash-stats.mozilla.com/report/index/57fa3fb4-615b-4cef-9e46-e05e52091111
Unfortunately, this is going to be really tricky to get right without regressing some a11y.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 62•15 years ago
|
||
Roc is there a way we can freeze the frame tree while we walk it and unfreeze when we're done?
Adding a script blocker (e.g. with nsAutoScriptBlocker) will stop FlushPendingNotifications from flushing notifications. But it won't block all changes to the frame tree. For example, removing elements from the DOM will still destroy frames.
You can't call nsContentUtils from accessibility I guess, so you'd have to add some API to make that work.
Another option would just be to add a flag to your accessibility code to disable the calls to FlushPendingNotifications from accessibility.
Hmm, but I guess you've already removed those completely, and that wasn't enough.
Perhaps the best quick fix is to use nsWeakFrame. Although you can't do that directly, since again it's not linkable from accessibility.
Maybe you should make nsWeakFrame usable externally by adding ClearInternal/InitInternal non-virtual methods that do what Init and Clear currently do, add ClearExternal/InitExternal that are virtual, and make Init and Clear inline methods that use _IMPL_NS_LAYOUT #ifdefs to choose either the internal or external versions.
Assignee | ||
Comment 65•15 years ago
|
||
Roc thanks so much for the leads!
(In reply to comment #63)
> Adding a script blocker (e.g. with nsAutoScriptBlocker) will stop
> FlushPendingNotifications from flushing notifications. But it won't block all
> changes to the frame tree. For example, removing elements from the DOM will
> still destroy frames.
This might cover us actually, since I can't imagine our calls into layout during frame walking causing element removal but I'm not sure. (?)
(In reply to comment #64)
> Perhaps the best quick fix is to use nsWeakFrame. Although you can't do that
> directly, since again it's not linkable from accessibility.
>
> Maybe you should make nsWeakFrame usable externally by adding
> ClearInternal/InitInternal non-virtual methods that do what Init and Clear
> currently do, add ClearExternal/InitExternal that are virtual, and make Init
> and Clear inline methods that use _IMPL_NS_LAYOUT #ifdefs to choose either the
> internal or external versions.
This sounds like a decent plan too. It sounds like you are recommending this approach over the nsAutoScriptBlocker? Your advice is appreciated.
Yes, it's more robust, lower risk, and about as much work as making script blocking usable from accessibility.
Assignee | ||
Comment 67•15 years ago
|
||
OK great thanks, I have made the layout changes, now looking at accessibility.
Assignee | ||
Comment 68•15 years ago
|
||
Roc, anyone, seen this error before?
(attached is what I currently have changed in layout)
In our a11y code I use the nsWeakFrame in this way:
nsWeakFrame weakFrame = nsWeakFrame(mState.frame);
nsIFrame *frame = weakFrame.GetFrame();
..
But problems with linking (when building a11y):
Undefined symbols:
"vtable for nsWeakFrame", referenced from:
__ZTV11nsWeakFrame$non_lazy_ptr in nsAccessibleTreeWalker.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
make[2]: *** [libaccessibility.dylib] Error 1
make[1]: *** [libs] Error 2
make: *** [default] Error 2
Is it not seeing the vtable at all? After googling I think it means I haven't defined all my virtual (nsWeakFrame) methods but I'm just not seeing that to be true.
Assignee | ||
Comment 69•15 years ago
|
||
Word on #developers is that this means we are out of luck for this approach.
Assignee | ||
Comment 70•15 years ago
|
||
Hmm does this have something to do with NS_NO_VTABLE?
Reporter | ||
Comment 71•15 years ago
|
||
What didn't compile when you left nsWeakFrame as-is? Was it just the call to nsIPresShell::RemoveWeakFrame? Would it work to, instead of modifying nsWeakFrame, split nsIPresShell::RemoveWeakFrame into Internal and External variants much like this. (But the External one should be implemented in the .cpp file.)
Reporter | ||
Comment 72•15 years ago
|
||
Oh... you'd also need to add an external version of Init; probably easiest if it also just calls a virtual method on the pres shell. (It would have to be inline, call Clear first, and then, if aFrame, call a method on the presshell that just calls nsWeakFrame::Init.)
Assignee | ||
Comment 73•15 years ago
|
||
(In reply to comment #71)
> What didn't compile when you left nsWeakFrame as-is?
Undefined symbols:
"nsIPresShell::RemoveWeakFrame(nsWeakFrame*)", referenced from:
nsWeakFrame::Clear(nsIPresShell*) in nsAccessibleTreeWalker.o
"nsWeakFrame::Init(nsIFrame*)", referenced from:
nsWeakFrame::nsWeakFrame(nsIFrame*)in nsAccessibleTreeWalker.o
Hmmm yeah, I'll try your suggestions thanks.
Comment 74•15 years ago
|
||
David, do we have to remove the beta2-fixed flag too? It looks like.
Comment 75•15 years ago
|
||
David, over Twitter I received several reports from people using Windows Vista and ZoomText by AISquared and reporting big instability problems with Firefox 3.6. Since ZT is a screen magnifier, I obviously can't use it like a low-vision user would. But this might be something to go on as well.
Assignee | ||
Comment 76•15 years ago
|
||
I've removed the beta2-fixed flag (thanks Henrik).
Marco, yes, I'm going to be heads down on making our frame usage safer, but if anyone can help diagnose the the main usage cause (with STR) in the meantime that would be great.
status1.9.2:
beta2-fixed → ---
Assignee | ||
Comment 77•15 years ago
|
||
(In reply to comment #72)
> Oh... you'd also need to add an external version of Init; probably easiest if
> it also just calls a virtual method on the pres shell. (It would have to be
> inline, call Clear first, and then, if aFrame, call a method on the presshell
> that just calls nsWeakFrame::Init.)
To followup here, unfortunately I got the same linkage error from comment #68.
I'm going to cut my losses and create API (have patch locally) that will throw an nsWeakFrame on the heap and return a pointer to it (caller cleans up). A bit sad but it should work.
Assignee | ||
Comment 78•15 years ago
|
||
It was pointed out to me that you need to have a viable frame in order to create a weakframe (otherwise creating a weakframe would crash right away in the current implementation). It isn't exactly what I want but I can work with that.
Can someone reasure me that if I have a weakframe and I do weakframe->IsAlive() it will return false if the actual frame is poisoned?
Assignee | ||
Comment 79•15 years ago
|
||
(I lied, trying one more thing on dbaron's approach)
Assignee | ||
Comment 80•15 years ago
|
||
(In reply to comment #72)
> Oh... you'd also need to add an external version of Init; probably easiest if
> it also just calls a virtual method on the pres shell. (It would have to be
> inline, call Clear first, and then, if aFrame, call a method on the presshell
> that just calls nsWeakFrame::Init.)
The gotcha here is that the init chain is called during nsWeakFrame construction, so I can't pass 'this' to the presShell method for a subsequent call to nsWeakFrame::Init (which is not static).
Assignee | ||
Comment 81•15 years ago
|
||
An idea (since my changes are not small so far)...
Could we turn off frame poisoning if accessibility is invoked? (For 3.6 only)
Reporter | ||
Comment 82•15 years ago
|
||
No.
Also, it seems like an awful lot of the users who hit this crash (over a third) are using the "Tab Mix Plus" extension at https://addons.mozilla.org/addon/1122
Reporter | ||
Comment 83•15 years ago
|
||
(In reply to comment #80)
> (In reply to comment #72)
> > Oh... you'd also need to add an external version of Init; probably easiest if
> > it also just calls a virtual method on the pres shell. (It would have to be
> > inline, call Clear first, and then, if aFrame, call a method on the presshell
> > that just calls nsWeakFrame::Init.)
>
> The gotcha here is that the init chain is called during nsWeakFrame
> construction, so I can't pass 'this' to the presShell method for a subsequent
> call to nsWeakFrame::Init (which is not static).
I don't follow. Could you attach the patch you had so I can take a look?
Assignee | ||
Comment 84•15 years ago
|
||
(In reply to comment #83)
> (In reply to comment #80)
> > (In reply to comment #72)
> > > Oh... you'd also need to add an external version of Init; probably easiest if
> > > it also just calls a virtual method on the pres shell. (It would have to be
> > > inline, call Clear first, and then, if aFrame, call a method on the presshell
> > > that just calls nsWeakFrame::Init.)
> >
> > The gotcha here is that the init chain is called during nsWeakFrame
> > construction, so I can't pass 'this' to the presShell method for a subsequent
> > call to nsWeakFrame::Init (which is not static).
>
> I don't follow. Could you attach the patch you had so I can take a look?
Sure. Here's where I left off (not sure if there's kruft).
I'm moved along quite far along the heap alloc approach (which I can switch back to using this approach if you can help me with the linking).
Assignee | ||
Comment 85•15 years ago
|
||
I probably misunderstood you and I really appreciate you taking a look.
Assignee | ||
Comment 86•15 years ago
|
||
Hmm that patch unfortunately went further with some trial and error stuff (duplicating the body of InitInternal for example). Sorry.
Assignee | ||
Comment 87•15 years ago
|
||
To be clear here's where I am:
I gave up on trying to expose nsWeakFrame such that I could stack allocate an nsWeakFrame via nsWeakFrame wf = nsWeakFrame(aFrame); due to the linkage pain.
I proceeded with adding nsIFrame::GetWeakFrame API that heap allocates an nsWeakFrame (with cleanup responsibility current on the caller). I have proceeded to use this API in accessibility code and am just debugging and fixing some errors I have.
If you can solve the linkage problem with the first way (above) I'll happily take that and combine/rework what I've done in the second approach.
Do you want me to clean up that patch I posted for you to look at? (IRC me?)
(In reply to comment #78)
> It was pointed out to me that you need to have a viable frame in order to
> create a weakframe (otherwise creating a weakframe would crash right away in
> the current implementation). It isn't exactly what I want but I can work with
> that.
The idea is that right after getting a valid nsIFrame pointer, before there's any possibility of the frame being deleted, you stash it in an nsWeakFrame. Then you can go ahead and flush and do whatever else might cause frames to be deleted. Later you come back to your nsWeakFrame and you can test if the frame is still alive.
Assignee | ||
Comment 89•15 years ago
|
||
Yep, got it Roc but...
I'm finding that deleting my (new local api) heap allocated nsWeakFrame (in stack frame 3 below) doesn't clean up well inside nsIFrame.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xc785b9b3
0x15122abd in nsIFrame::GetStyleContext (this=0xc785b99b) at nsIFrame.h:642
642 nsStyleContext* GetStyleContext() const { return mStyleContext; }
(gdb) bt
#0 0x15122abd in nsIFrame::GetStyleContext (this=0xc785b99b) at nsIFrame.h:642
#1 0x15122ad3 in nsIFrame::PresContext (this=0xc785b99b) at nsIFrame.h:499
#2 0x15137644 in nsWeakFrame::~nsWeakFrame (this=0x1e1717e0) at nsIFrame.h:2537
#3 0x151493fa in nsAccessibleTreeWalker::PopState (this=0xbfffc85c) at /Users/dtb/mozworkspace/src/accessible/src/base/nsAccessibleTreeWalker.cpp:149
#4 0x151498ba in nsAccessibleTreeWalker::~nsAccessibleTreeWalker (this=0xbfffc85c) at /Users/dtb/mozworkspace/src/accessible/src/base/nsAccessibleTreeWalker.cpp:70
#5 0x15143fbc in nsAccessible::CacheChildren (this=0x1e38ae10) at /Users/dtb/mozworkspace/src/accessible/src/base/nsAccessible.cpp:770
#6 0x151801b9 in nsHyperTextAccessible::CacheChildren (this=0x1e38ae10) at /Users/dtb/mozworkspace/src/accessible/src/html/nsHyperTextAccessible.cpp:217
Could the mStyleContext go bad before nsWeakFrame dies? There is probably some life-cycle here detail I don't know.
Assignee | ||
Comment 90•15 years ago
|
||
No where near polished, but posting because of weird crash (see comment 89)
Reporter | ||
Comment 91•15 years ago
|
||
Basically, what you did to layout was exactly right, except you need to do the same thing for AddWeakFrame that you did for RemoveWeakFrame. :-)
I also wrote something more like what I think the accessibility changes would need to look like -- using nsWeakFrame to store the frame pointer. (The point of nsWeakFrame is that it's a frame pointer that gets notified and set to null if the frame gets deleted, so you have to store the problematic long-lived pointer in an nsWeakFrame.) But I did them rather quickly, and I'm not very familiar with the accessibility code?
Could you continue work on this based on this?
Reporter | ||
Comment 92•15 years ago
|
||
And, to clarify, attachment 412286 [details] [diff] [review] is a modified version of attachment 412263 [details] [diff] [review]; I didn't see attachment 412284 [details] [diff] [review] until after I attached it.
Assignee | ||
Comment 93•15 years ago
|
||
(In reply to comment #91)
> Created an attachment (id=412286) [details]
> WIP (linkage problems fixed)
>
> Basically, what you did to layout was exactly right, except you need to do the
> same thing for AddWeakFrame that you did for RemoveWeakFrame. :-)
Thanks!
> I also wrote something more like what I think the accessibility changes would
> need to look like -- using nsWeakFrame to store the frame pointer. (The point
> of nsWeakFrame is that it's a frame pointer that gets notified and set to null
> if the frame gets deleted, so you have to store the problematic long-lived
> pointer in an nsWeakFrame.)
Ah yes, this patch just had dummy usage of the call to make sure I could link. The more elaborate usage was in the patch you noticed later ;)
> But I did them rather quickly, and I'm not very
> familiar with the accessibility code?
>
> Could you continue work on this based on this?
Yes, thanks a bunch!
Assignee | ||
Comment 94•15 years ago
|
||
(In reply to comment #91)
> Created an attachment (id=412286) [details]
> WIP (linkage problems fixed)
>
David Baron, FF crashes on startup (not in accessibility) with your patch -- did this happen for you? (I'm updating my tree and will rebuild).
Reporter | ||
Comment 95•15 years ago
|
||
(In reply to comment #94)
> David Baron, FF crashes on startup (not in accessibility) with your patch --
> did this happen for you? (I'm updating my tree and will rebuild).
It didn't happen for me. Maybe you didn't rebuild enough and some things are expecting the old layout of nsIPresShell's vtable?
Assignee | ||
Comment 96•15 years ago
|
||
Yep, full clobber build launches -- sorry for the noise and thanks again :)
Assignee | ||
Comment 97•15 years ago
|
||
roc, dbaron, can you look this one over? I probably should sr one of you?
dbaron, it should be easy to r+ the changes to layout ;), and I like what you had started with the accessibility tree walker (it closely matches an earlier local patch I have and we can't both be wrong), I just added one small addition to pass the frame(hint) to GetAccessible since it is an in/out param.
I added some frame paranoia to nsAccessibilityService where we were getting bitten on this bug and a few other spots. (Overkill?)
Unsurprisingly, we pass all accessibility mochitests, including file uploader Marco :)
Attachment #410512 -
Attachment is obsolete: true
Attachment #410578 -
Attachment is obsolete: true
Attachment #410805 -
Attachment is obsolete: true
Attachment #411920 -
Attachment is obsolete: true
Attachment #412263 -
Attachment is obsolete: true
Attachment #412284 -
Attachment is obsolete: true
Attachment #412286 -
Attachment is obsolete: true
Attachment #412368 -
Flags: review?(surkov.alexander)
Attachment #412368 -
Flags: review?(roc)
Attachment #412368 -
Flags: review?(dbaron)
Assignee | ||
Updated•15 years ago
|
Attachment #412368 -
Flags: superreview?(roc)
Attachment #412368 -
Flags: review?(roc)
Attachment #412368 -
Flags: review-
+ weakFrame = *aFrameHint;
#ifdef DEBUG_A11Y
static int frameHintFailed, frameHintTried, frameHintNonexistant, frameHintFailedForText;
++frameHintTried;
#endif
- if (!frame || content != frame->GetContent()) {
+ if (!weakFrame.GetFrame() || content != weakFrame.GetFrame()->GetContent()) {
What is the point of weakFrame here? the frame can't be destroyed between setting weakFrame and testing weakFrame.GetFrame(). weakFrame.GetFrame() is guaranteed to be exactly *aFrameHint, whether that's bogus or not.
You need to rev the IID on nsIPresShell. That's OK on trunk, but unfortunately we can't do that on the 1.9.2 branch. For the branch you'll need to create a new interface, say nsIPresShell_1_9_2_BRANCH, that contains the methods you need and that PresShell can be QIed to.
Assignee | ||
Updated•15 years ago
|
Attachment #412368 -
Flags: review-
Assignee | ||
Comment 99•15 years ago
|
||
(In reply to comment #98)
> + weakFrame = *aFrameHint;
> #ifdef DEBUG_A11Y
> static int frameHintFailed, frameHintTried, frameHintNonexistant,
> frameHintFailedForText;
> ++frameHintTried;
> #endif
> - if (!frame || content != frame->GetContent()) {
> + if (!weakFrame.GetFrame() || content != weakFrame.GetFrame()->GetContent())
> {
>
> What is the point of weakFrame here? the frame can't be destroyed between
> setting weakFrame and testing weakFrame.GetFrame(). weakFrame.GetFrame() is
> guaranteed to be exactly *aFrameHint, whether that's bogus or not.
Right, the assignment there is actually unnecessary. Also, we could just null check *aFrameHint here but it is low cost to leave this guard in. (?)
> You need to rev the IID on nsIPresShell. That's OK on trunk, but unfortunately
> we can't do that on the 1.9.2 branch. For the branch you'll need to create a
> new interface, say nsIPresShell_1_9_2_BRANCH, that contains the methods you
> need and that PresShell can be QIed to.
OK I'll simply rev for this (trunk) patch. Hmmm I might not get to the 192 interface until Sunday night, Monday morning.
Assignee | ||
Comment 100•15 years ago
|
||
Note there are some other places in accessible code (e.g. nsHyperTextAccessible) where which we'll need to investigate in follow up bugs, but I haven't addressed them here. We still aren't certain when the frames where getting deallocated right? (Since we removed our calls to flush layout).
Assignee | ||
Comment 101•15 years ago
|
||
Rev'ed the PresShell uuid, removed two bogus msWeakFrame assignments and added some comments.
Attachment #412368 -
Attachment is obsolete: true
Attachment #412421 -
Flags: review?(surkov.alexander)
Attachment #412421 -
Flags: review?(roc)
Attachment #412368 -
Flags: superreview?(roc)
Attachment #412368 -
Flags: review?(surkov.alexander)
Attachment #412368 -
Flags: review?(dbaron)
Comment 102•15 years ago
|
||
i just experienced this crash twice during firefox startup when the addon-update notifier was showing up. some other users report the same via the crash reporter...
Attachment #412421 -
Flags: review?(roc) → review+
Comment 103•15 years ago
|
||
(In reply to comment #97)
> I added some frame paranoia to nsAccessibilityService where we were getting
> bitten on this bug and a few other spots. (Overkill?)
Yes, until you point the code what can destroy the frame. We don't flush layout (we can flush layout in nsAccessibilityService::InitAccessible but we don't use frame at this point already), we don't change the DOM and we don't have crash reports pointing to nsAccessibilityService::GetAccessible(). I think this protection is unnecessary.
Comment 104•15 years ago
|
||
I would prefer to see method GetFrame or GetStateFrame instead of mState.frame.GetFrame().
> http://lxr.mozilla.org/seamonkey/source/accessible/accessible-docs.html
that's a bit out of date. Either remove this comment or change it on something more valid.
Comment 105•15 years ago
|
||
(In reply to comment #102)
> i just experienced this crash twice during firefox startup when the
> addon-update notifier was showing up. some other users report the same via the
> crash reporter...
I also was able to finally reproduce this top crasher. But ONLY on my Win 7 64-bit installation with the same profile that I previously used on Windows XP 32 bit, where I never saw this crash even with add-on updates coming up.
Right now, Firefox 3.6b2 never starts for me with that profile, it imeediately crashes. See these reports:
bp-bb3549f2-c8bb-432d-8767-fe56a2091114
bp-ae9a8022-04e6-4685-b623-a188a2091114
bp-beff5628-ea92-41d5-aa3f-e2d6f2091114
bp-43c9a35f-69d4-4749-9d8e-251da2091114
Comment 106•15 years ago
|
||
Comment on attachment 412421 [details] [diff] [review]
patch 2 (trunk version)
I think I can give conditional r+
Attachment #412421 -
Flags: review?(surkov.alexander) → review+
Assignee | ||
Comment 107•15 years ago
|
||
(In reply to comment #103)
> (In reply to comment #97)
>
> > I added some frame paranoia to nsAccessibilityService where we were getting
> > bitten on this bug and a few other spots. (Overkill?)
>
> Yes, until you point the code what can destroy the frame. We don't flush layout
> (we can flush layout in nsAccessibilityService::InitAccessible but we don't use
> frame at this point already), we don't change the DOM and we don't have crash
> reports pointing to nsAccessibilityService::GetAccessible(). I think this
> protection is unnecessary.
I've had similar thoughts, but since the crashes were actually happening inside nsAccessibilityService::GetAccessible. I think we should err on the side of caution for now.
Now that we can recreate the bug, we can fine tune it later.
Assignee | ||
Comment 108•15 years ago
|
||
(In reply to comment #104)
> I would prefer to see method GetFrame or GetStateFrame instead of
> mState.frame.GetFrame().
>
> > http://lxr.mozilla.org/seamonkey/source/accessible/accessible-docs.html
>
> that's a bit out of date. Either remove this comment or change it on something
> more valid.
Yes thanks, updated locally.
Assignee | ||
Comment 109•15 years ago
|
||
Pushed http://hg.mozilla.org/mozilla-central/rev/f360de4b8b08
I will roll the 192 version... might as well add to blassey's nsIPresShell_MOZILLA_1_9_2.
Comment 110•15 years ago
|
||
A quick note that I've been able toverify using a local build that this patch fixes the bug. The STR I used were:
1. Install Firebug 1.5xb2, which is one version older than the current b3 release.
2. Restart Minefield, and throubh about:config, set the option to enable notification of add-on updates to true.
3. Closed Minefield and restarted it.
In a regular nightly build from yesterday, on Windows 7 only, I crash when the add-on updates UI should appear. This is consistent with reports we receivedd. See this report which I submitted: bp-3d3dec95-2a49-4190-ac4c-846392091116
In the build I made after David landed the fix above, the crash no longer happens, and the add-ons UI appears.
Assignee | ||
Comment 111•15 years ago
|
||
Roc, is this what you had in mind? (BTW can you approve this 1.9.2?)
Attachment #412644 -
Flags: review?(roc)
Attachment #412644 -
Flags: approval1.9.2?
Assignee | ||
Updated•15 years ago
|
Attachment #412644 -
Attachment description: (192 version) → patch 2 (192 version)
Assignee | ||
Comment 112•15 years ago
|
||
Comment on attachment 412644 [details] [diff] [review]
patch 2 (192 version)
Unrequesting 1.9.2 until I have r+ so that people don't get confused.
Attachment #412644 -
Flags: approval1.9.2?
Attachment #412644 -
Flags: review?(roc) → review+
Assignee | ||
Updated•15 years ago
|
Attachment #412644 -
Flags: approval1.9.2?
Assignee | ||
Comment 113•15 years ago
|
||
Note to 1.9.2 drivers, this landed on trunk and is a confirmed fix. The 1.9.2 patch moves some virtual methods (in layout) into a 192 specific interface to make sure we are not breaking anybody.
Reporter | ||
Comment 114•15 years ago
|
||
Comment on attachment 412644 [details] [diff] [review]
patch 2 (192 version)
a1.9.2=dbaron
Attachment #412644 -
Flags: approval1.9.2? → approval1.9.2+
Assignee | ||
Comment 115•15 years ago
|
||
Landed on 1.9.2. http://hg.mozilla.org/releases/mozilla-1.9.2/rev/4eda96e67e9a
Thanks everyone!
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
status1.9.2:
--- → final-fixed
Resolution: --- → FIXED
Whiteboard: [needs landing][crashkill][#7 Firefox 3.6b1 topcrash] → [crashkill][#7 Firefox 3.6b1 topcrash]
Reporter | ||
Updated•15 years ago
|
Whiteboard: [crashkill][#7 Firefox 3.6b1 topcrash] → [crashkill][#7 Firefox 3.6b1 topcrash][crashkill-fix]
Updated•14 years ago
|
Crash Signature: [@ nsAccessibilityService::GetAccessible(nsIDOMNode*, nsIPresShell*, nsIWeakReference*, nsIFrame**, int*, nsIAccessible**)]
Updated•9 years ago
|
Keywords: testcase-wanted
You need to log in
before you can comment on or make changes to this bug.
Description
•