Closed
Bug 184204
Opened 22 years ago
Closed 22 years ago
initially generated XUL.mfasl is corrupt and breaks chrome/GUI
Categories
(Core :: XPCOM, defect, P2)
Tracking
()
RESOLVED
DUPLICATE
of bug 201574
mozilla1.4alpha
People
(Reporter: mach1, Assigned: alecf)
References
Details
Attachments
(9 files, 1 obsolete file)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
application/octet-stream
|
Details | |
(deleted),
application/x-gzip
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
alecf
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/plain
|
Details |
User-Agent: Mozilla/5.0 (X11; U; Linux alpha; en-US; rv:1.3a) Gecko/20021207 Build Identifier: Mozilla/5.0 (X11; U; Linux alpha; en-US; rv:1.3a) Gecko/20021207 When started with no profile (i.e., no ~/.mozilla) the browser starts fine and runs properly. But when the browser is terminated and restarted, several javascript and other errors are reported and the main window displays non-responsive, bare-bones versions of all controls. Reproducible: Always Steps to Reproduce: 1. rm -R ~/.mozilla 2. ./mozilla (and exit once browser starts up correctly) 3. ./mozilla (this produces the errors and incorrect display) Actual Results: The incorrect display appears to contain all of the controls, but in a minimal form. For example, all of the menu items display only the first two letters of the labels and the main document area is very small and positioned in the upper left corner rather than filling the remaining space in the window. None of the controls is operable. Expected Results: The browser should have restarted and behaved exactly as it did when started with no user profile.
Comment 3•22 years ago
|
||
I don't see this problem with PC-Linux on the trunk, and didn't see it with 1.2.1 for Alpha-Linux. what build options did you use?
I built the browser yesterday (07-DEC-2002) in a clean directory using the typical procedure: cvs co -f mozilla/client.mk cd mozilla gmake -f client.mk My .mozconfig file contains the single line: ac_add_options --enable-crypto This is the same procedure I have used for many months with no problems. The reported behavior first appeared in a build I made on 23-NOV-2002.
Comment 5•22 years ago
|
||
Hrm... perhaps the automagically generated profile files are not good. None of the files is actually critical (they should all be regenerated if they're missing). So perhaps, once it's busted, go in your profile dir and delete individual files and see if it works again (even just once)
You are correct. When I delete ~/.mozilla/default/....slt/XUL.mfasl and restart the browser the main window displays properly. When I exit, the file is recreated and the browser fails to display the window properly when it is restarted. So now what? Would you like me to send a copy of XUL.mfasl? If so, what type of attachment should I use?
Comment 7•22 years ago
|
||
please just gzip/bzip2 it and create an attachment here. I don't have the expertise you need. ==> XP/Widget/XUL
Assignee: asa → hyatt
Component: Browser-General → XP Toolkit/Widgets: XUL
QA Contact: asa → shrir
Summary: Functions OK when first started, but incorrectly displays main window when restarted → initially generated XUL.mfasl is corrupt and breaks chrome/GUI
![]() |
||
Comment 8•22 years ago
|
||
ccing people who should really be in on this...
Comment 9•22 years ago
|
||
Yes, please attach the file. If it is too large to be attached, you can either 'split(1)' the file into two parts, or just mail it to me <jrgm@netscape.com> > ... When I exit, the file is recreated ... Just to be clear, is the file recreated when you _start_ the browser, or when you _exit_ the browser. (i.e., start with no XUL.mfasl file in place; when you have started, check the timestamp on that file; then wait one minute and then exit. Does the timestamp or filesize change on exit. [It should not change at exit]). > The reported behavior first appeared in a build I made on 23-NOV-2002. What was the date of the last build that you made that did not have this problem?
Assignee: hyatt → ben
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Comment 10•22 years ago
|
||
Reporter | ||
Comment 11•22 years ago
|
||
I have attached a gzip version of XUL.mfasl. The file is created when the browser starts (i.e., its size and date do not change upon exit or even restart of the browser). I am not sure which build was the most recent one that did not exhibit the problem. I believe it was probably either 09-NOV-2002 or 16-NOV-2002 (I typically do a build every week or two, but only keep the most recent one or two).
Comment 12•22 years ago
|
||
Thanks for attaching the XUL.mfasl. Unfortunately, I cannot reproduce this bug with either trunk builds on win32 or Linux (builds which I've hacked to bypass some mtime/checksum/path sanity checks that would normally prevent one from running a fastload file from another build environment).
Reporter | ||
Comment 13•22 years ago
|
||
Is there anything else I can do to assist? I am not familiar with the browser code, but in my experience with my own code, problems related to reading and writing binary files on my Alpha often arise because of unconscious assumptions I make based on my experience with 32-bit machines that do not apply to the Alpha's 64-bit architecture with its preference for aligned data. Is the format of the XUL.mfasl file common across all architectures or is it peculiar to each platform? If common, has there been a recent change to the code that makes size, alignment, or byte order assumptions that are not valid for the Alpha or other 64-bit machines?
Updated•22 years ago
|
Blocks: FastLoadHang
Comment 14•22 years ago
|
||
Reply to comment 13: Can you try again with a recent build ? (see bug 169777 comment 137)
Reporter | ||
Comment 15•22 years ago
|
||
The build I attempted on 18-JAN-2003 failed as did the build I attempted this morning. The last successful build was on 11-JAN-2003. As a result, I cannot test the recent fix.
Comment 16•22 years ago
|
||
I just built 1.3b for Alpha to see if I could reproduce this, and I don't see exactly the same behavior as described in comment 0, but instead get a very badly constructed composer window on the second launch. Deleting the XUL.mfasl fixes it and the next startup is successful. Is there any useful debugging info I can retrieve by littering the code with printfs?
Comment 17•22 years ago
|
||
this is the XUL.mfasl I get after deleting the XUL.mfasl, and then doing: mozilla -P clean about:blank
Comment 18•22 years ago
|
||
Are you sure you are running the 1.3b build that you built?
The fastload file you attached gives the patch to the chrome directory as
'/home/browser/07dec2002/mozilla/dist/bin/chrome' and the timestamps of the
XUL.mfasl inside the zip file and of the jar files that were used to generate
the contents of the fastload file are all from Dec 7th, 2002.
To answer an earlier question...
> Is the format of the XUL.mfasl file common across all architectures or is it
> peculiar to each platform? If common, has there been a recent change to the
> code that makes size, alignment, or byte order assumptions that are not valid
> for the Alpha or other 64-bit machines?
The format is common across architectures (with the one OS dependent item being
the format of file paths, e.g., "C:\foobar" vs. "/home/foobar").
The binary data is stored in network order, so there is (or should not be) any
big- vs. little-endian issues. I'm not sure whether there might be implicit
alignment issues that affect Alpha.
But I guess I'd like to know first if a current trunk build does demonstrate
any problems (particularly after the most recent checkin by brendan this
morning). [Sorry, I have to be a little skeptical of Andrew's attached
XUL.mfasl since the timestamps indicate that this was not a 1.3b build that
was used to generate that file].
Comment 19•22 years ago
|
||
ya, what I just attached was actually mach1's that I had laying around. this one is for real.
Attachment #114613 -
Attachment is obsolete: true
Comment 20•22 years ago
|
||
> Is there any useful debugging info I can retrieve by littering the code with
> printfs?
Is this a debug build? Are there any assertions, or odd warnings in stdout?
Comment 21•22 years ago
|
||
> Is this a debug build? Are there any assertions, or odd warnings in stdout?
it was an optimized build. :( wishful thinking that I wouldn't see this bug
and I could actually use the build.
I'm building CVS-by-date to narrow the regression. 12-01-2002 has the bug.
I can build one with debug.
Comment 22•22 years ago
|
||
I'd suggest building debug to track this down, but first checking that it does not occur in a trunk build (there have been possibly relevant changes since 1.3b was finished). These lines from mach1's earlier logs are suspicious, particularly the file channel failure to load. Why are those strings truncated? > ... > WEBSHELL+ = 2 * > CSS Error (chrome://navigator/content/navigator.xul :1.1): Expected ':' but found 'm'. Declaration dropped. > Note: verifyreflow is disabled > ... > > charset = ISO, file nsFontMetricsGTK.cpp, line 2234 * > Error reading file jar:resource:///chrome/comm.jar!/content/navigator/rd * > Error reading file jar:resource:///chrome/comm.jar!/content/navigator/r * > Error reading file jar:resource:///chrome/comm.jar!/content/navigator/c > WEBSHELL+ = 3 > ... Andrew, Question: is the pattern that you are observing something like: 1) delete XUL.mfasl file; 2) start browser; all OK every time; quit 3) start again; open composer; not OK every time; quit 4) go back to 1, and repeat this pattern
Comment 23•22 years ago
|
||
1) delete XUL.mfasl file; 2) start browser; all OK every time; quit 3) start again; open composer; not OK every time; quit 4) go back to 1, and repeat this pattern yes. that's it absolutely. I'm pulling CVS head now...
Comment 24•22 years ago
|
||
Okay, so that pattern suggests (to me) that this is a different issue than the fastload bug of 169777 and variants. In fact, this may not be a problem with fastload code per se, but due to the fact that the existence of a valid fastload file can alter the code paths that are followed elsewhere. But it will be good to see what a tip debug build says first, and then backtrack. [Sorry that this got deferred. I had assumed that this bug would be taken care of with the other fastload fixes (although that may still be the case)].
Comment 25•22 years ago
|
||
I disabled debug on my CVS tip build in an attempt to workaround bug 190794, but got the build working eventually. It still has this bug. I'll try a respin with debug enabled tonite.
Comment 26•22 years ago
|
||
after playing CVS-by-date for a while, I've determined that this became busted with the second checkin for bug 177401 (Nov 14). (checkin comment: "the previous checkin had a typo which disabled fastload entirely!" duh.) applying shanmu's 64bit alignment patch (bug 177401 comment 16, adjusted to be not OSF-specific) actually fixes this bug. Is there a way I can verify that it isn't just ignoring the fastload file like it was before the second checkin? does this mean that NS_SWAP16 is totally busted on Alpha Linux? How can I test that? I still need to verify that the patch fixes the trunk...
Comment 27•22 years ago
|
||
attachment 109590 [details] [diff] [review] also fixes trunk
![]() |
||
Comment 28•22 years ago
|
||
Does that patch need to handle both places where we NS_SWAP16 in that method? Or it it only doing the swap inside the big buffer that breaks? (The other place is where we deal with a possible carryover byte.)
Comment 29•22 years ago
|
||
The other place, where we deal with a possible carryover byte, is safe. In fact, this patch is trying to be consistent in the way outString->Append is used.
Comment 30•22 years ago
|
||
this patch tries both ways (NS_SWAP16 and manually swapping), compares the results and prints them out. If the two results differ, it prints both and also the unswapped unicodeSegment[i/2]. sometimes it works fine: 0 61 0 1 70 0 2 70 0 3 6c 0 4 69 0 5 63 0 6 61 0 7 74 0 8 69 0 9 6f 0 10 6e 0 11 2f 0 but sometimes it doesn't. it seems to screw up every 4th element. the unswapped unicodeSegment[i/2] appears to be fine. 0 6e 0 1 61 0 0 0 61 0 2 76 0 3 69 0 4 67 0 5 61 0 0 0 61 0 6 74 0 7 6f 0 8 72 0 9 2d 0 0 0 2d 0 10 74 0 11 6f 0 12 6f 0 13 6c 0 0 0 6c 0 14 62 0 15 6f 0 16 78 0 0 74 0 0 0 74 0 1 6f 0 2 6f 0 3 6c 0 4 62 0 0 0 62 0 5 6f 0 6 78 0 7 2d 0 8 74 0 0 0 74 0 9 6f 0 10 70 0
Comment 31•22 years ago
|
||
it looks like Mozilla 1.3 is going to miss the Alpha/Linux boat, but it sounds like this bug would easy to fix and the fix would help out other 64bit platforms (by making the code 64bit-clean). nominating for 1.4a XUL no longer seems to be the proper component ==> xpcom
Assignee: ben → dougt
Component: XP Toolkit/Widgets: XUL → XPCOM
Flags: blocking1.4a?
QA Contact: shrir → scc
Comment 32•22 years ago
|
||
I'll take this. /be
Assignee: dougt → brendan
Priority: -- → P2
Target Milestone: --- → mozilla1.4alpha
Updated•22 years ago
|
Status: NEW → ASSIGNED
Updated•22 years ago
|
Flags: blocking1.4a? → blocking1.4a-
Comment 33•22 years ago
|
||
I'm not getting anywhere reading code. If there's an aliasing bug, and the Linux Alpha compiler is assuming pointers to different types can't alias the same piece of memory, I don't see it (although the carryover byte case does cast the bytes array to (PRUnichar*) -- could that be the problem? Stores scheduled for the bytes array initializer, even if not committed to memory, should be coherently found in a write buffer on the same CPU by the loads from the cast). Cc'ing bbaetz, bouncing to alecf. /be
Assignee: brendan → alecf
Status: ASSIGNED → NEW
Comment 34•22 years ago
|
||
I don't think that gcc does aliasing in c++ before 3.3, so thats not it. Is someone casting void*s to an int, or something? I don't suppose you can get a small test program of some sort?
Comment 35•22 years ago
|
||
I finally figured out what the difference was between the cases where it did work and didn't work (comment 30). It fails when &(unicodeSegment[i/2]) ends in 7 or f (0x1202d2e8f). this explains why (even when it does fail) it only fails every fourth time. The long stretches when it does work are a result of &(unicodeSegment[i/2]) being even. also, when it does fail, ((unicodeSegment[i/2]) >> 8) is 0. it seems to lose the byte when it shifts it across an 8-byte boundary. I'll attach a small C program that shows this.
Comment 36•22 years ago
|
||
output included at top
Comment 37•22 years ago
|
||
Sounds like a compiler bug, although I would've expected a bus error on all the misaligned PRUnichar loads, at least on some architectures. It would be best to reduce the testcase to one that didn't rely on misaligned loads, and on casting addresses to different pointer type. Can that be done, or does the bug go away? Should nsBinaryStream.cpp stop using a bytes array and just shift and bitwise-or to construct the two-byte sequence containing the carryover byte? If that works around the bug, I say alecf should check it in for 1.4a. Can someone search the gcc bug database? /be
Comment 38•22 years ago
|
||
Avoid misaligned loads and cast-to-pointer-to-wider-type aliasing. This is the patch to apply. I haven't tested it yet. Reviewers may prefer the cvs diff -wpu8 that's coming up next. /be
Comment 39•22 years ago
|
||
Comment 40•22 years ago
|
||
Andrew, can you show the assembly code, or disassemble the function from the object file using objdump, for your test program? /be
Comment 41•22 years ago
|
||
I can stepi through execution with gdb if that would be easier to read.
Comment 42•22 years ago
|
||
Andrew, can you try the patch (attachment 118472 [details] [diff] [review])? /be
Updated•22 years ago
|
Flags: blocking1.4a- → blocking1.4a+
Comment 43•22 years ago
|
||
the patch works great. does the misaligned if branch need to be #ifdef IS_LITTLE_ENDIAN ?
Comment 44•22 years ago
|
||
No, big-endian CPUs (most RISCs, last I checked -- a while ago) don't like misaligned loads and stores, either. Someone more up-to-date on CPU architectures and the memory models that compilers erect on them should comment. I'd like to get this patch in for 1.4a. Alecf, could you review? /be
Flags: blocking1.4a+ → blocking1.4a-
Comment 45•22 years ago
|
||
Comment on attachment 118472 [details] [diff] [review] proposed fix, plus useless trailing whitespace elimination But do read the diff -w patch for less eyestrain. /be
Attachment #118472 -
Flags: review?(alecf)
Comment 46•22 years ago
|
||
did you really mean to set this to 1.4a- or was that a miscue. (Comments suggest it was).
Flags: blocking1.4a- → blocking1.4a?
Assignee | ||
Comment 47•22 years ago
|
||
Comment on attachment 118472 [details] [diff] [review] proposed fix, plus useless trailing whitespace elimination seems reasonable.. a configure test would be nice so that we didn't have to do a count>1 test on all architectures
Attachment #118472 -
Flags: review?(alecf) → review+
Assignee | ||
Updated•22 years ago
|
Status: NEW → ASSIGNED
Comment 49•22 years ago
|
||
Darin's got the money patch in bug 201574. Please cc: yourselves there, I'm gonna dup this against that bug. /be *** This bug has been marked as a duplicate of 201574 ***
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•