Closed Bug 184204 Opened 22 years ago Closed 22 years ago

initially generated XUL.mfasl is corrupt and breaks chrome/GUI

Tracking

()

Status:

RESOLVED DUPLICATE of bug 201574

Milestone:

mozilla1.4alpha

People

(Reporter: mach1, Assigned: alecf)

References

Details

Attachments

(9 files, 1 obsolete file)

Logged info. from first execution (i.e., no ~/.mozilla) 22 years ago mach1 (deleted), text/plain		Details
Logged info. from second execution (i.e., with ~/.mozilla) 22 years ago mach1 (deleted), text/plain		Details
gzip version of XUL.mfasl 22 years ago mach1 (deleted), application/octet-stream		Details
gzipped XUL.mfasl 22 years ago Andrew Schultz (deleted), application/x-gzip		Details
the real XUL.mfasl 22 years ago Andrew Schultz (deleted), application/x-gzip		Details
debugging patch 22 years ago Andrew Schultz (deleted), patch		Details \| Diff \| Splinter Review
simple C program 22 years ago Andrew Schultz (deleted), text/plain		Details
proposed fix, plus useless trailing whitespace elimination 22 years ago Brendan Eich [:brendan] (deleted), patch	alecf : review+	Details \| Diff \| Splinter Review
diff -w version of last patch 22 years ago Brendan Eich [:brendan] (deleted), patch		Details \| Diff \| Splinter Review
objdump -d ns_swap16 22 years ago Andrew Schultz (deleted), text/plain		Details

mach1

Reporter

Description

•

22 years ago

User-Agent:       Mozilla/5.0 (X11; U; Linux alpha; en-US; rv:1.3a) Gecko/20021207
Build Identifier: Mozilla/5.0 (X11; U; Linux alpha; en-US; rv:1.3a) Gecko/20021207

When started with no profile (i.e., no ~/.mozilla) the browser starts fine and
runs properly. But when the browser is terminated and restarted, several
javascript and other errors are reported and the main window displays
non-responsive, bare-bones versions of all controls.

Reproducible: Always

Steps to Reproduce:
1. rm -R ~/.mozilla
2. ./mozilla (and exit once browser starts up correctly)
3. ./mozilla (this produces the errors and incorrect display)

Actual Results:  
The incorrect display appears to contain all of the controls, but in a minimal
form. For example, all of the menu items display only the first two letters of
the labels and the main document area is very small and positioned in the upper
left corner rather than filling the remaining space in the window. None of the
controls is operable.

Expected Results:  
The browser should have restarted and behaved exactly as it did when started
with no user profile.

mach1

Reporter

Comment 1

•

22 years ago

Attached file Logged info. from first execution (i.e., no ~/.mozilla) (deleted) — Details

mach1

Reporter

Comment 2

•

22 years ago

Attached file Logged info. from second execution (i.e., with ~/.mozilla) (deleted) — Details

Andrew Schultz

Comment 3

•

22 years ago

I don't see this problem with PC-Linux on the trunk, and didn't see it with
1.2.1 for Alpha-Linux.

what build options did you use?

mach1

Reporter

Comment 4

•

22 years ago

I built the browser yesterday (07-DEC-2002) in a clean directory using the
typical procedure:

cvs co -f mozilla/client.mk
cd mozilla
gmake -f client.mk

My .mozconfig file contains the single line:

ac_add_options --enable-crypto

This is the same procedure I have used for many months with no problems. The
reported behavior first appeared in a build I made on 23-NOV-2002.

Andrew Schultz

Comment 5

•

22 years ago

Hrm... perhaps the automagically generated profile files are not good.  None of
the files is actually critical (they should all be regenerated if they're
missing).  So perhaps, once it's busted, go in your profile dir and delete
individual files and see if it works again (even just once)

mach1

Reporter

Comment 6

•

22 years ago

You are correct. When I delete ~/.mozilla/default/....slt/XUL.mfasl and restart
the browser the main window displays properly. When I exit, the file is
recreated and the browser fails to display the window properly when it is restarted.

So now what? Would you like me to send a copy of XUL.mfasl? If so, what type of
attachment should I use?

Andrew Schultz

Comment 7

•

22 years ago

please just gzip/bzip2 it and create an attachment here.  I don't have the
expertise you need.

==> XP/Widget/XUL

Assignee: asa → hyatt

Component: Browser-General → XP Toolkit/Widgets: XUL

QA Contact: asa → shrir

Summary: Functions OK when first started, but incorrectly displays main window when restarted → initially generated XUL.mfasl is corrupt and breaks chrome/GUI

Boris Zbarsky [:bzbarsky]

Comment 8

•

22 years ago

ccing people who should really be in on this...

John Morrison

Comment 9

•

22 years ago

Yes, please attach the file. If it is too large to be attached, you can either
'split(1)' the file into two parts, or just mail it to me <jrgm@netscape.com>

> ... When I exit, the file is recreated ...

Just to be clear, is the file recreated when you _start_ the browser, or when
you _exit_ the browser. (i.e., start with no XUL.mfasl file in place; when 
you have started, check the timestamp on that file; then wait one minute and
then exit. Does the timestamp or filesize change on exit. [It should
not change at exit]).

> The reported behavior first appeared in a build I made on 23-NOV-2002.

What was the date of the last build that you made that did not have this 
problem?

Assignee: hyatt → ben

Status: UNCONFIRMED → NEW

Ever confirmed: true

mach1

Reporter

Comment 10

•

22 years ago

Attached file gzip version of XUL.mfasl (deleted) — Details

mach1

Reporter

Comment 11

•

22 years ago

I have attached a gzip version of XUL.mfasl. The file is created when the
browser starts (i.e., its size and date do not change upon exit or even restart
of the browser).

I am not sure which build was the most recent one that did not exhibit the
problem. I believe it was probably either 09-NOV-2002 or 16-NOV-2002 (I
typically do a build every week or two, but only keep the most recent one or two).

John Morrison

Comment 12

•

22 years ago

Thanks for attaching the XUL.mfasl. Unfortunately, I cannot reproduce this
bug with either trunk builds on win32 or Linux (builds which I've hacked
to bypass some mtime/checksum/path sanity checks that would normally 
prevent one from running a fastload file from another build environment).

mach1

Reporter

Comment 13

•

22 years ago

Is there anything else I can do to assist? I am not familiar with the browser
code, but in my experience with my own code, problems related to reading and
writing binary files on my Alpha often arise because of unconscious assumptions
I make based on my experience with 32-bit machines that do not apply to the
Alpha's 64-bit architecture with its preference for aligned data.

Is the format of the XUL.mfasl file common across all architectures or is it
peculiar to each platform? If common, has there been a recent change to the code
that makes size, alignment, or byte order assumptions that are not valid for the
Alpha or other 64-bit machines?

Olivier Cahagne

Updated

•

22 years ago

Blocks: FastLoadHang

Serge Gautherie (:sgautherie)

Comment 14

•

22 years ago

Reply to comment 13:

Can you try again with a recent build ? (see bug 169777 comment 137)

mach1

Reporter

Comment 15

•

22 years ago

The build I attempted on 18-JAN-2003 failed as did the build I attempted this
morning. The last successful build was on 11-JAN-2003. As a result, I cannot
test the recent fix.

Andrew Schultz

Comment 16

•

22 years ago

I just built 1.3b for Alpha to see if I could reproduce this, and I don't see
exactly the same behavior as described in comment 0, but instead get a very
badly constructed composer window on the second launch.  Deleting the XUL.mfasl
fixes it and the next startup is successful.

Is there any useful debugging info I can retrieve by littering the code with
printfs?

Andrew Schultz

Comment 17

•

22 years ago

Attached file gzipped XUL.mfasl (obsolete) (deleted) — Details

this is the XUL.mfasl I get after deleting the XUL.mfasl, and then doing:
mozilla -P clean about:blank

John Morrison

Comment 18

•

22 years ago

Are you sure you are running the 1.3b build that you built? 

The fastload file you attached gives the patch to the chrome directory as 
'/home/browser/07dec2002/mozilla/dist/bin/chrome' and the timestamps of the
XUL.mfasl inside the zip file and of the jar files that were used to generate 
the contents of the fastload file are all from Dec 7th, 2002.

To answer an earlier question...

> Is the format of the XUL.mfasl file common across all architectures or is it
> peculiar to each platform? If common, has there been a recent change to the 
> code that makes size, alignment, or byte order assumptions that are not valid 
> for the Alpha or other 64-bit machines?

The format is common across architectures (with the one OS dependent item being
the format of file paths, e.g., "C:\foobar" vs. "/home/foobar").

The binary data is stored in network order, so there is (or should not be) any
big- vs. little-endian issues. I'm not sure whether there might be implicit 
alignment issues that affect Alpha.

But I guess I'd like to know first if a current trunk build does demonstrate
any problems (particularly after the most recent checkin by brendan this 
morning). [Sorry, I have to be a little skeptical of Andrew's attached 
XUL.mfasl since the timestamps indicate that this was not a 1.3b build that
was used to generate that file].

Andrew Schultz

Comment 19

•

22 years ago

Attached file the real XUL.mfasl (deleted) — Details

ya, what I just attached was actually mach1's that I had laying around.  this
one is for real.

Attachment #114613 - Attachment is obsolete: true

John Morrison

Comment 20

•

22 years ago

> Is there any useful debugging info I can retrieve by littering the code with
> printfs?

Is this a debug build? Are there any assertions, or odd warnings in stdout?

Andrew Schultz

Comment 21

•

22 years ago

> Is this a debug build? Are there any assertions, or odd warnings in stdout?

it was an optimized build.  :(  wishful thinking that I wouldn't see this bug
and I could actually use the build.

I'm building CVS-by-date to narrow the regression.  12-01-2002 has the bug.
I can build one with debug.

John Morrison

Comment 22

•

22 years ago

I'd suggest building debug to track this down, but first checking that it 
does not occur in a trunk build (there have been possibly relevant changes
since 1.3b was finished).

These lines from mach1's earlier logs are suspicious, particularly the file
channel failure to load. Why are those strings truncated?

  > ...
  > WEBSHELL+ = 2
* > CSS Error (chrome://navigator/content/navigator.xul :1.1): Expected ':' but 
found 'm'.  Declaration dropped.
  > Note: verifyreflow is disabled
  > ...
  > 
  > charset = ISO, file nsFontMetricsGTK.cpp, line 2234
* > Error reading file jar:resource:///chrome/comm.jar!/content/navigator/rd
* > Error reading file jar:resource:///chrome/comm.jar!/content/navigator/r
* > Error reading file jar:resource:///chrome/comm.jar!/content/navigator/c
  > WEBSHELL+ = 3
  > ...

Andrew, Question: is the pattern that you are observing something like:
  1) delete XUL.mfasl file; 
  2) start browser; all OK every time; quit
  3) start again; open composer; not OK every time; quit
  4) go back to 1, and repeat this pattern

Andrew Schultz

Comment 23

•

22 years ago

1) delete XUL.mfasl file; 
2) start browser; all OK every time; quit
3) start again; open composer; not OK every time; quit
4) go back to 1, and repeat this pattern

yes.  that's it absolutely.
I'm pulling CVS head now...

John Morrison

Comment 24

•

22 years ago

Okay, so that pattern suggests (to me) that this is a different issue than
the fastload bug of 169777 and variants. In fact, this may not be a problem
with fastload code per se, but due to the fact that the existence of a valid 
fastload file can alter the code paths that are followed elsewhere. But it will 
be good to see what a tip debug build says first, and then backtrack. 

[Sorry that this got deferred. I had assumed that this bug would be taken care 
of with the other fastload fixes (although that may still be the case)].

Andrew Schultz

Comment 25

•

22 years ago

I disabled debug on my CVS tip build in an attempt to workaround bug 190794, but
got the build working eventually.  It still has this bug.  I'll try a respin
with debug enabled tonite.

Andrew Schultz

Comment 26

•

22 years ago

after playing CVS-by-date for a while, I've determined that this became busted
with the second checkin for bug 177401 (Nov 14).  (checkin comment: "the
previous checkin had a typo which disabled fastload entirely!"  duh.)

applying shanmu's 64bit alignment patch (bug 177401 comment 16, adjusted to be
not OSF-specific) actually fixes this bug.  Is there a way I can verify that it
isn't just ignoring the fastload file like it was before the second checkin?

does this mean that NS_SWAP16 is totally busted on Alpha Linux?  How can I test
that?

I still need to verify that the patch fixes the trunk...

Andrew Schultz

Comment 27

•

22 years ago

attachment 109590 [details] [diff] [review] also fixes trunk

Boris Zbarsky [:bzbarsky]

Comment 28

•

22 years ago

Does that patch need to handle both places where we NS_SWAP16 in that method? 
Or it it only doing the swap inside the big buffer that breaks?  (The other
place is where we deal with a possible carryover byte.)

Shanmugavelu Shanmuganathan (gone)

Comment 29

•

22 years ago

The other place, where we deal with a possible carryover byte, is safe.
In fact, this patch is trying to be consistent in the way outString->Append 
is used.

Andrew Schultz

Comment 30

•

22 years ago

Attached patch debugging patch (deleted) — Details — Splinter Review

this patch tries both ways (NS_SWAP16 and manually swapping), compares the
results and prints them out.  If the two results differ, it prints both and
also the unswapped unicodeSegment[i/2].

sometimes it works fine:
0 61 0
1 70 0
2 70 0
3 6c 0
4 69 0
5 63 0
6 61 0
7 74 0
8 69 0
9 6f 0
10 6e 0
11 2f 0

but sometimes it doesn't.  it seems to screw up every 4th element.  the
unswapped unicodeSegment[i/2] appears to be fine.
0 6e 0
1 61 0 0 0 61 0
2 76 0
3 69 0
4 67 0
5 61 0 0 0 61 0
6 74 0
7 6f 0
8 72 0
9 2d 0 0 0 2d 0
10 74 0
11 6f 0
12 6f 0
13 6c 0 0 0 6c 0
14 62 0
15 6f 0
16 78 0
0 74 0 0 0 74 0
1 6f 0
2 6f 0
3 6c 0
4 62 0 0 0 62 0
5 6f 0
6 78 0
7 2d 0
8 74 0 0 0 74 0
9 6f 0
10 70 0

Andrew Schultz

Comment 31

•

22 years ago

it looks like Mozilla 1.3 is going to miss the Alpha/Linux boat, but it sounds
like this bug would easy to fix and the fix would help out other 64bit platforms
(by making the code 64bit-clean).

nominating for 1.4a

XUL no longer seems to be the proper component
==> xpcom

Assignee: ben → dougt

Component: XP Toolkit/Widgets: XUL → XPCOM

Flags: blocking1.4a?

QA Contact: shrir → scc

Brendan Eich [:brendan]

Comment 32

•

22 years ago

I'll take this.

/be

Assignee: dougt → brendan

Priority: -- → P2

Target Milestone: --- → mozilla1.4alpha

Brendan Eich [:brendan]

Updated

•

22 years ago

Status: NEW → ASSIGNED

Asa Dotzler [:asa]

Updated

•

22 years ago

Flags: blocking1.4a? → blocking1.4a-

Brendan Eich [:brendan]

Comment 33

•

22 years ago

I'm not getting anywhere reading code.  If there's an aliasing bug, and the
Linux Alpha compiler is assuming pointers to different types can't alias the
same piece of memory, I don't see it (although the carryover byte case does cast
the bytes array to (PRUnichar*) -- could that be the problem?  Stores scheduled
for the bytes array initializer, even if not committed to memory, should be
coherently found in a write buffer on the same CPU by the loads from the cast).

Cc'ing bbaetz, bouncing to alecf.

/be

Assignee: brendan → alecf

Status: ASSIGNED → NEW

Bradley Baetz (:bbaetz)

Comment 34

•

22 years ago

I don't think that gcc does aliasing in c++ before 3.3, so thats not it. Is
someone casting void*s to an int, or something?

I don't suppose you can get a small test program of some sort?

Andrew Schultz

Comment 35

•

22 years ago

I finally figured out what the difference was between the cases where it did
work and didn't work (comment 30).

It fails when &(unicodeSegment[i/2]) ends in 7 or f (0x1202d2e8f).  this
explains why (even when it does fail) it only fails every fourth time.  The long
stretches when it does work are a result of &(unicodeSegment[i/2]) being even.

also, when it does fail, ((unicodeSegment[i/2]) >> 8) is 0.  it seems to lose
the byte when it shifts it across an 8-byte boundary.

I'll attach a small C program that shows this.

Andrew Schultz

Comment 36

•

22 years ago

Attached file simple C program (deleted) — Details

output included at top

Brendan Eich [:brendan]

Comment 37

•

22 years ago

Sounds like a compiler bug, although I would've expected a bus error on all the
misaligned PRUnichar loads, at least on some architectures.  It would be best to
reduce the testcase to one that didn't rely on misaligned loads, and on casting
addresses to different pointer type.  Can that be done, or does the bug go away?


Should nsBinaryStream.cpp stop using a bytes array and just shift and bitwise-or
to construct the two-byte sequence containing the carryover byte?  If that works
around the bug, I say alecf should check it in for 1.4a.

Can someone search the gcc bug database?

/be

Brendan Eich [:brendan]

Comment 38

•

22 years ago

Attached patch proposed fix, plus useless trailing whitespace elimination (deleted) — Details — Splinter Review

Avoid misaligned loads and cast-to-pointer-to-wider-type aliasing.


This is the patch to apply.  I haven't tested it yet.  Reviewers may prefer the
cvs diff -wpu8 that's coming up next.

/be

Brendan Eich [:brendan]

Comment 39

•

22 years ago

Attached patch diff -w version of last patch (deleted) — Details — Splinter Review

Brendan Eich [:brendan]

Comment 40

•

22 years ago

Andrew, can you show the assembly code, or disassemble the function from the
object file using objdump, for your test program?

/be

Andrew Schultz

Comment 41

•

22 years ago

Attached file objdump -d ns_swap16 (deleted) — Details

I can stepi through execution with gdb if that would be easier to read.

Brendan Eich [:brendan]

Comment 42

•

22 years ago

Andrew, can you try the patch (attachment 118472 [details] [diff] [review])?

/be

Asa Dotzler [:asa]

Updated

•

22 years ago

Flags: blocking1.4a- → blocking1.4a+

Andrew Schultz

Comment 43

•

22 years ago

the patch works great.
does the misaligned if branch need to be #ifdef IS_LITTLE_ENDIAN ?

Brendan Eich [:brendan]

Comment 44

•

22 years ago

No, big-endian CPUs (most RISCs, last I checked -- a while ago) don't like
misaligned loads and stores, either.  Someone more up-to-date on CPU
architectures and the memory models that compilers erect on them should comment.

I'd like to get this patch in for 1.4a.  Alecf, could you review?

/be

Flags: blocking1.4a+ → blocking1.4a-

Brendan Eich [:brendan]

Comment 45

•

22 years ago

Comment on attachment 118472 [details] [diff] [review]
proposed fix, plus useless trailing whitespace elimination

But do read the diff -w patch for less eyestrain.

/be

Attachment #118472 - Flags: review?(alecf)

John Morrison

Comment 46

•

22 years ago

did you really mean to set this to 1.4a- or was that a miscue. (Comments 
suggest it was).

Flags: blocking1.4a- → blocking1.4a?

Alec Flett

Assignee

Comment 47

•

22 years ago

Comment on attachment 118472 [details] [diff] [review]
proposed fix, plus useless trailing whitespace elimination

seems reasonable.. a configure test would be nice so that we didn't have to do
a count>1 test on all architectures

Attachment #118472 - Flags: review?(alecf) → review+

Asa Dotzler [:asa]

Comment 48

•

22 years ago

1.4a is out.

Flags: blocking1.4a? → blocking1.4a-

Alec Flett

Assignee

Updated

•

22 years ago

Status: NEW → ASSIGNED

Brendan Eich [:brendan]

Comment 49

•

22 years ago

Darin's got the money patch in bug 201574.  Please cc: yourselves there, I'm
gonna dup this against that bug.

/be

*** This bug has been marked as a duplicate of 201574 ***

Status: ASSIGNED → RESOLVED

Closed: 22 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.