Closed Bug 139294 Opened 23 years ago Closed 22 years ago

Switching from a tabbed mozembed to another causes the first to be hidden and never shown again

Categories

(Core Graveyard :: Embedding: GTK Widget, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozillabugs.philipl, Assigned: blizzard)

References

()

Details

(Keywords: topembed)

Attachments

(1 file, 4 obsolete files)

This has occured for me in both galeon and TestGtkEmbedNotebook. Note: It is not related to 137685 in so far as the patch attached to that bug does not fix this problem. However, I first observed this bug at the same time as 137685. Steps: 1) Start TestGtkEmbedNotebook with a url argument, eg: http://www.mozilla.org. 2) Switch from the message box tab to the browser tab. 3) Observe the web page load up. 4) Switch back to the message box tab. 5) Switch back to the mozembed tab. 6) Observe the lack of web page. Either black if 137685 patch unapplied or theme background colour if applied (usually grey). Expected behaviour: Still see the web page that was there before. Note: If a tab is created 'behind' the current one and a url loaded into it, it will load fine so that if you switch to the tab you will see the web page. However, as soon as you switch away from the tab, it will die as described. In testing in galeon, the mozembed appears to be totally dead. You cannot reload it's contents or attempt to load a new page. It does not respond at all. This renders tabbed browsing impossible.
Blocks: 98995
After doing some timewarping, I've identifed the commit which caused this bug. It's a big commit, so I'm still trying to pick through it and see what needs to be done. Here is the bonsai query for the commit: http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2002-04-16+21%3A16&maxdate=2002-04-16+21%3A30&cvsroot=%2Fcvsroot which was related to bug 52334 which is a rather involved one. But it is related to redraw and visibility so I can see how it would cause this bug.
Ok, after due poking around, I can avoid this bug by diabling |EmbedPrivate::Hide| but that simply calls |nsIBaseWindow::SetVisibility| with PR_FALSE. nsIBaseWindow is implemented by embedding/browser/webBrowser/nsWebBrowser which I haven't fully worked out yet. Right now it seems that the visibility setting mechanism is asymetric and once set to hidden, can't be set back to visible. So, this is not a bug with the gtk embedding widget, but I'm not sure who's in change of embedding/browswer/webBrowser/nsWebBrowser
Summary: Switching from a tabbed mozembed to another causes the first to die. → Switching from a tabbed mozembed to another causes the first to be hidden and never shown again
Well, after more poking around, I've isolated the bug as being caused by the changes to |DocumentViewerImpl::Show| and |DocumentViewerImpl::Hide| in content/base/src/nsDocumentViewer.cpp in the commit. The old versions of |Show| and |Hide| are pretty simple; the new ones do all sorts of initialisation and cleanup. From the cursory testing using TestGtkEmbedNotebook modified to dump a few messages, it appears that neither implementation of |Show| and |Hide| is working correctly. The old code seems to ignore attempts to get visibility to FALSE and the new code ignores attempts to set to TRUE, although the initial visibility is TRUE, the first attempt to change visibility to any value sets it to FALSE, and there is remains. Mozilla-the-browser is unaffected by this bug because it never calls |Hide| when a tab is sent to the back, rather it calls |Show| when a tab is brought to the front. Perhaps this bug is caused by a miscommunication as to the true meaning of |nsIBaseWindow::Show| and |nsIBaseWindow::Hide| The new nsDocumentViewerImpl code for |Show| and |Hide| looks more like initialisation and shutdown code than visibility control to me, and the fact that mozilla-the-browser doesn't call |Hide| when visibility is lost is consistent with this. That would suggest that the correct fix to stop calling |nsIBaseWindow::Hide| in EmbedPrivate. As such, we can summarise thus: |nsDocumentViewerImpl::Show| fails if called after |nsDocumentViewerImpl::Hide|. The bug is either a) |Show| should not fail. or b) |Hide| should not be called unless the window is being destroyed. If this bug is b), I can whipe a patch for EmbedPrivate up very quickly. a) is a different matter. :-) I hope you guys are reading this; This bug is topembed as far as galeon is concerned!
Sorry, mixed up some interfaces, this has been a long day. :-) References to |nsIBaseWindow::Show| and |nsIBaseWindow::Hide| are of course wrong. These methods don't exist. Rather I mean |nsIBaseWindow::SetVisibility| which in turn, eventually, lead to calls to |nsIContentViewer::Show| and |nsIContentViewer::Hide|.
Seems like there probably is a bug in DocumentViewerImpl::Hide() and ::Show(), it's unclear what those methods really mean now, and that caused bugs that showd up in mozilla as well (similar to this one). The problem in the other case was IIRC that the parent widget on the nsIBaseWindow (docshell) was set to null when the document viewer was hidden, and in order to show a document viewer, it needs the base window to have a parent widget (note that parent widget here doesn't really mean *parent* widget, it really means just widget). The reason for all the teardown/initialization code in ::Hide() and ::Show() is that that code is called when iframes on webpages are hidden/shown, in the case where an iframe is hidden we want to throw away the whole presentation and the widget for that iframe is then also destroyed since the layout frame and view (which holds the widget) is torn down as well. When an iframe is shown again, we create a new widget and a completely new presentation for the content of that iframe. There is a fix in place for this already though, document viewers can now be marked as "sticky", that way you can safely hide them w/o them completely tearing down the presentation, and it sounds like that's exactly what you want here. To do this, call nsIContentViewer::SetSticky(PR_TRUE) on the content viewer (document viewer) when it's created in a case where it might be used in a scenario like the one described here.
Johnny, Thanks! Depending on what Chris is inclined towards, we can either stop |Hide|ing windows, or use |Sticky|. I'll put together a patch using |Sticky| today.
Heh, Well, I don't think I can currently make a call to |nsIContentViewer::SetSticky| from gtkmozembed. nsWebBrowser owns the docshell but there is no way for gtkmozembed to get a reference to the docshell or an interface that can be |do_QueryInterface|ed to docshell. Either nsWebBrowser needs to provide a |GetDocShell| or a |SetSticky| that propagates down, both of which would require changes to interfaces. Alternately, nsWebBrowser can always set sticky to be true. I don't know whether this would be appropriate or not.
Blocks: 143200
No longer blocks: 143200
Ping for updates: At the very least, someone could confirm this bug according to the original test case. Also, we need to decide what needs fixing: Do we stop gtkmozembed calling |nsIBaseWindow::SetVisibility(PR_FALSE)| when a tab is de-focussed, which is what mozilla-the-browser seems to do, or do we change interfaces so that |nsIConentViewer::SetSticky| is successively exposed up to nsIBaseWindow so that it can call it, hopefully allowing Hide to work. I'm more in favour of the first fix. It's a minimally invasive one line job and there's an existing precedent for doing so. This bug will not go away if ignored long enough. 1.1 will branch in due course and then galeon is screwed.
*sigh* ping. 1.1alpha. Total b0rkage, etc etc. I really don't want to deal with all the complaints that will appear if 1.1 goes out in this state. 1.1alpha is going to be bad enough as it is.
Attached patch The one line fix. (obsolete) (deleted) — Splinter Review
More, it would seem, for galeon users who come here in search of help than anything else...
Attached patch The one line fix. (obsolete) (deleted) — Splinter Review
More, it would seem, for galeon users who come here in search of help than anything else...
Hmm. damn bugzilla.
Umm, this, in fact, didn't fix the problem at all. :-( Have we misdiagnosed the cause of the bug?
bz, here's the galeon related bug, please have a look today if you have time, if not we'll look at it tomorrow...
*ping* Time is an abstract concept.
Keywords: patch, topembed
Doh. This fell through the cracks... I just spent a while trying to set up an env in which I can usefully debug this.... I have a RedHat 7.3 system, and I'd rather not try installing all of RawHide on top of it. So given that, what's the simplest setup in which I could reproduce this with a debug Galeon build and debug Mozilla libs? Would building against a debug 1.0 build work (and using current release Galeon)? Or do I need to build against trunk? Also, does this require Galeon or is there some other way I can test this using only things normally present in a Mozilla tree?
You need a debug trunk build. I don't think this is present on the branch. If you pull the galeon-1-2 branch out of gnome's cvs I think it will work with the trunk. Well. It might, depending on how much of printing is different.
I pulled the "galeon" module from ":pserver:anonymous@anoncvs.gnome.org:/cvs/gnome". I then tried to build it. It needs a few neat things like a newer autoconf (installed that), then a new gnome-common (installed that), then a libgnomeui-2.0 (at which point I made that comment, since like I said, I'd rather avoid installing all of RawHide). Should I be pulling a branch of that module instead of the tip? If so, what's the tag?
You can test without using galeon at all using the repro case I included in the my original description. if you want to build galeon anyway, you need to check the galeon-1-2 branch out of cvs to work with mozilla trunk which is where this bug is present. mozilla 1.0.x branch does not have this bug and the 1.1a+ snapshot release is aflicted by that helper bug you helped sort out; though you can use it for testing this bug if really want. You will need a complete gnome1 library installation and up-to-date auto* tools to do this.
Doh. I should read more carefully. :( TestGtkEmbedNotebook reproduces the problem just fine. Thank you very much; I'll start poking at this...
Attached patch Patch that _does_ fix this (obsolete) (deleted) — Splinter Review
OK... so this is just like jst's patch except it also changes the _other_ constructor of DocumentViewerImpl -- the one used by NS_NewDocumentViewer (whose bright idea was it to not have the constructors next to each other in the C++?). My first instinct is that the _real_ fix here is to eliminate this constructor and make NS_NewDocumentViewer pass a null prescontext to the sole remaining constructor... as it is, the "no argument" constructor does not do any of the initialization the other one does... I've verified that this fixes the problem in TestGtkEmbedNotebook
Attachment #87361 - Attachment is obsolete: true
Attachment #87362 - Attachment is obsolete: true
Attachment #87377 - Attachment is obsolete: true
Comment on attachment 92075 [details] [diff] [review] Patch that _does_ fix this r=philipl Verified to work in galeon. Thanks Boris!
Attachment #92075 - Flags: review+
And I have to agree on the silliness of having two constructors; This is what defaults parameters are for, and as you say, even that's not actually needed.
Comment on attachment 92075 [details] [diff] [review] Patch that _does_ fix this DocumentViewerImpl::DocumentViewerImpl() + : mIsSticky(PR_TRUE) <mefeelsstupid>Duh!</mefeelsstupid> - In nsFrameFrame.cpp ... >+ // Mark the content viewer as non-sticky so that the presentation >+ // can safely go away when this frame is destroyed. >+ >+ // content_viewer->SetSticky(PR_FALSE); You didn't mean to leave this commented out did you? sr=jst
Attachment #92075 - Flags: superreview+
Let's get an aye-firmative on that "yes, I didn't mean to leave the then-part of that if statement commented out" and get this in for 1.1beta. /be
Comment on attachment 92075 [details] [diff] [review] Patch that _does_ fix this a=asa (on behalf of drivers) for checkin to 1.1
Attachment #92075 - Flags: approval+
"yes, I didn't mean to leave the then-part of that if statement commented out". ;) Patch checked in on the trunk for 1.1.
Did this make the cut for 1.1 beta? Can I mark fixed?
It didn't make beta...:( It did make 1.1 final, however (when that comes out). As for marking it fixed, that's your call.. I left it open in case we want to address the "too many constructors" issue, but that may be best spun off into a separate bug.
too many constructors spun off to bug 158804. marking FIXED.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Blocks: 152429
This fix may have caused bug 159155 / bug 152429. Recent nightlies are crashing with: Gdk-ERROR **: BadDrawable (invalid Pixmap or Window parameter) serial 5504 error_code 9 request_code 144 minor_code 3
Re-opening. I don't know if it's a corner case or related to the fix for bug 152429 but I'm seeing this bug again on occasion. I'm finding it very difficult to get a re-pro but I think it's related to application level visibility. Sometimes I'll switch back to galeon from another app and the current tab or a hidden one will die. The latest occurence saw the visible tab and another one both killed at once.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Ok, the corner case is plugins. If a page uses a plugin such as java or flash and it loses visibility then it will be dead when it regains visibility. Interestingly if the page with the plugin is framed, this does not happen, it's only if the plugin is part of the main page. for a quick re-pro, go to http://www.macromedia.com (with flash obviously :-)
Do you mean the page is disabled? Or just the plugin?
The page is disabled exactly as according to the original bug report. The fix for bug 152429 is the most likely cause as it disables sticky for plugins; It would seem that it disables it for the frame that contains the plugin, unfortunately this frame can be the whole page, effectively leaving us back at square-one with respect to this bug.
This is going to miss 1.1 isn't it....
Blocks: 1.1
That patch was by serge, reviewed by jst and bzbarsky. Guys, this regression is on the 1.1 list. Please take the time to fix this regression. Thanks.
Well... what exactly is the deal with plugins? Do they need to have the viewer always non-sticky? Or what? Here's the situation as I understand it: 1) Now that bug 159268 is fixed, the patch from bug 152429 can be safely backed out without reintroducing the crash described in that bug (I just tested that on the testcase there and all the dups). 2) Backing that patch out fixes TestGtkEmbedNotebook on pages with flash in them for me. So far so good. 3) One possible problem is that showing and hiding a tab with a plugin in it in Galeon will crash in exactly the same way Mozilla was crashing. I tested with TestGtkEmbedNotebook and it seems to work dandy -- no crashes. 4) I tested plugins in top-level windows in moz after backing out the patch in question. Reloading, loading new documents, closing window. No crashes. So I think backing out bug 152429 is safe, but A) Confirmation from Philip that backing out the fix for bug 152429 fixes the problem he's seeing and does not make Galeon crash on hiding and reshowing a tab with a plugin in it would be nice. B) Confirmation from serge that this is a non-issue for plugins in top-level documents would be nice. would be nice....
I can confirm that backing out bug 152429 has fixed the regression for pages with plugins while not reintroducing the crash for plugin pages.
OK. Then I say we back it out if Serge is OK with it.
well, I did not dig down the cause of crash in bug 152429, but I can confirm backing out check in for bug 152429, while patch for bug 159268 is in, does not regress to crash, so I think backing out is safe.
Attached patch Here we go again (deleted) — Splinter Review
Could we have the usual sprinkling of letters on this, please?
Attachment #92075 - Attachment is obsolete: true
Comment on attachment 94057 [details] [diff] [review] Here we go again r = philipl What fun eh?
Attachment #94057 - Flags: review+
Comment on attachment 94057 [details] [diff] [review] Here we go again sr=blizzard
Attachment #94057 - Flags: superreview+
Comment on attachment 94057 [details] [diff] [review] Here we go again a=asa (on behalf of drivers) for checkin to the 1.1 branch
Attachment #94057 - Flags: approval+
Checked in, 1.1 branch and trunk.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: