Closed Bug 137478 Opened 23 years ago Closed 21 years ago

win32 crash in jpeg\jdapimin.c: jpeg_consume_input() [jpg]

Categories

(Core :: Graphics: ImageLib, defect)

x86
Windows 2000
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: jrgmorrison, Assigned: pavlov)

References

Details

(Keywords: crash)

Attachments

(2 files, 2 obsolete files)

I've run into a crash in '-O2' optimized builds on msvc/win32. The strange thing is that it happens on one win2k machine, but not another. Here's the state of affairs: 1) build on machine A with '-O2' -- crashes as soon as it sees a JPEG 2) build again on machine A wiht '-O1' -- no crash 3) run build from step (1) on machine B -- no crash 4) build on machine B with '-O2' -- no crash when run on machine B 5) run build from (4) on machine A -- crash on any JPEG, when run on A I also have another win98 system building with '-O2' and it doesn't crash either. [Obviously, a debug, unoptimized build doesn't crash either]. All machines are running MSVC 6.0 with Service Pack 5 and the MSVC++ Processor Pack for that service pack. One thing I should note is that it is the newest (and fastest) machine which crashes, which I suppose points to some asm or processor problem. Let me know what details you need on the exact processor being used on the crashing system. The crash is inn jpeg\jdapimin.c: jpeg_consume_input() switch (cinfo->global_state) { case DSTATE_START: /* Start-of-datastream actions: reset appropriate modules */ (*cinfo->inputctl->reset_input_controller) (cinfo); /* Initialize application's data source module */ (*cinfo->src->init_source) (cinfo); //XXXX<-- crash here cinfo->global_state = DSTATE_INHEADER; /*FALLTHROUGH*/ and |*cinfo->src| says that it has value "0x02" in the debugger (but some inspection of the actually assembly and registers would confirm whether that value was correct, or whether that was the kind of bogus value that the msvc debugger gives sometimes with optimized builds). Stack trace: jpeg_consume_input(jpeg_decompress_struct * 0x01c0d240) line 314 + 4 bytes jpeg_read_header(jpeg_decompress_struct * 0x01c0d240, unsigned char 1) line 268 nsJPEGDecoder::WriteFrom(nsJPEGDecoder * const 0x01c0d220, nsIInputStream * 0x029bd3a8, unsigned int 4008, unsigned int * 0x0012fc6c) line 229 + 11 bytes imgRequest::OnDataAvailable(imgRequest * const 0x10068c62, nsIRequest * 0x029bd370, nsISupports * 0x1002a542, nsIInputStream * 0x029bd3a8, unsigned int 26428578, unsigned int 43775256) line 751 nsCOMPtr_base::assign_with_AddRef(nsCOMPtr_base * const 0x01c0d240, nsISupports * 0x029c7f20) line 73 nsStreamListenerTee::OnDataAvailable(nsStreamListenerTee * const 0x029bd3a8, nsIRequest * 0x029c7be0, nsISupports * 0x00000000, nsIInputStream * 0x00000000, unsigned int 0, unsigned int 4008) line 56 + 34 bytes nsHttpChannel::OnDataAvailable(nsHttpChannel * const 0x029c7be4, nsIRequest * 0x029bf3ac, nsISupports * 0x00000000, nsIInputStream * 0x029c47cc, unsigned int 0, unsigned int 4008) line 2876 nsOnDataAvailableEvent::HandleEvent(nsOnDataAvailableEvent * const 0x01c0d240) line 192 + 23 bytes PL_HandleEvent(PLEvent * 0x029caa3c) line 596 + 4 bytes PL_ProcessPendingEvents(PLEventQueue * 0x10042fe0) line 526 + 6 bytes _md_EventReceiverProc(HWND__ * 0x000201a6, unsigned int 49315, unsigned int 0, long 15811040) line 1077 + 13 bytes USER32! 77e13eb0() nsAppShellService::Run(nsAppShellService * const 0x00fe2dd0) line 309 main1(int 4202558, char * * 0x00000004, nsISupports * 0x00253c90) line 1415 + 10 bytes main(int 0, char * * 0x00252cb8) line 1763 + 23 bytes WinMain(HINSTANCE__ * 0x00400000, HINSTANCE__ * 0x00400000, char * 0x00133c42, HINSTANCE__ * 0x00400000) line 1781 + 23 bytes MOZILLA! WinMainCRTStartup + 308 bytes
blocks bug 136999 "Windows builds should build with -O2" pavlov: I have a system all set up with a build that has this crash. Can we take some time to have a look at this and figure out what is going wrong.
Blocks: 136999
I've seen this a while ago (same stack) I've tried to #undef MMX support (no more inline assembly that can break opt code)and it worked for me Later on I pulled another tree and the problem was simply not there anymore So now I run with -O2 and no crash, go figure ... So did you try to #undef MMX support ? (in jmorecfg.h: /* Defines for MMX support. */ #if defined(XP_WIN32) && defined(_M_IX86) #define HAVE_MMX_INTEL_MNEMONICS <-------- comment out that line #endif )
Bernard: thanks. Yes, removing that define and compiling with -O2 cures the crash that I was experiencing.
I think I got it: http://lxr.mozilla.org/mozilla1.0/source/modules/libimg/png/pngvcrd.c#33 see the comment: CPUID instruction will trash ebx ecx edx ! those registers might be used by optimized code (especially ecx) ! therefore the solution would be to sync jpeg mmxsupport() function with png_mmx_support()
Attached patch CPUID will trash ebx, ecx, and edx registers (obsolete) (deleted) — Splinter Review
for test and review
Bernard. Good catch, although unfortunately, that doesn't seem to cure this crash. It does seem like something that should be done though. One thing I noted, and I'm really over my head at this point, but I was stepping through this in the debugger, and it seemed like those push/pops had been optimized away (???). It doesn't appear that ebx,ecx,edx had been restored to previous state at the end of mmxsupported() in jdapimin.c. (I don't know; maybe I botched the patch somehow). cc: a couple of pixel heads :-)
assembler code: GLOBAL(void) jpeg_CreateDecompress (j_decompress_ptr cinfo, int version, size_t structsize) { 60F11000 push ebp 60F11001 mov ebp,esp 60F11003 push ecx int i; #ifdef HAVE_MMX_INTEL_MNEMONICS static int cpuidDetected = 0; if(!cpuidDetected) 60F11004 mov eax,dword ptr ds:[60F27334h] 60F11009 test eax,eax 60F1100B push ebx 60F1100C push esi 60F1100D push edi 60F1100E jne jpeg_CreateDecompress+63h (60F11063h) { MMXAvailable = mmxsupport(); 60F11010 mov dword ptr [ebp-4],0 60F11017 push ebx 60F11018 push ecx 60F11019 push edx 60F1101A pushfd 60F1101B pop eax 60F1101C mov ecx,eax 60F1101E xor eax,200000h 60F11023 push eax 60F11024 popfd 60F11025 pushfd 60F11026 pop eax 60F11027 xor eax,ecx 60F11029 je jpeg_CreateDecompress+4Bh (60F1104Bh) 60F1102B xor eax,eax 60F1102D cpuid 60F1102F cmp eax,1 60F11032 jl jpeg_CreateDecompress+4Bh (60F1104Bh) 60F11034 xor eax,eax 60F11036 inc eax 60F11037 cpuid 60F11039 and edx,800000h 60F1103F cmp edx,0 60F11042 je jpeg_CreateDecompress+4Bh (60F1104Bh) 60F11044 mov dword ptr [ebp-4],1 60F1104B mov eax,dword ptr [ebp-4] 60F1104E pop edx 60F1104F pop ecx 60F11050 pop ebx 60F11051 mov eax,dword ptr [ebp-4] 60F11054 mov dword ptr [_MMXAvailable (60F27348h)],eax cpuidDetected = 1; 60F11059 mov dword ptr ds:[60F27334h],1 } #endif /* For debugging purposes, zero the whole master structure. * But error manager pointer is already there, so save and restore it. */ /* Guard against version mismatches between library and caller. */ cinfo->mem = NULL; /* so jpeg_destroy knows mem mgr not called */ if (version != JPEG_LIB_VERSION) 60F11063 mov eax,dword ptr [version] 60F11066 cmp eax,3Eh 60F11069 mov esi,dword ptr [cinfo] .... I use VC70. I don't have VC60. as you can see, mmxsupport() is inlined but push and pops are there (to step through the assembly code, I did right click | Go to disassembly after I hit the breakpoint) I verified that CPUID changes the registers and that they are now preserved All that I think it's true: 1) From Comment #3 we know that this is somewhat mmx related. 2) From the stack trace the crash occurs before any data is actually decoded so the only mmx specific code involved would be mmxsupport() I suggest that you look at the assembly code to check if push and pops are there then try to replace the line: MMXAvailable = mmxsupport(); by: MMXAvailable = 1; (or 0, it must be the result of the true mmxsupport() call) maybe that way we would know if the problem is really with mmxsupport()/CPUID or not
Target Milestone: --- → mozilla1.1beta
The patch wfm. Can other windows people verify?
Attachment #79515 - Attachment is obsolete: true
Tested swalker's updated patch. Seems to do the trick, I can render JPEG images using an -O2 build without crashing.
Attachment #134032 - Flags: superreview?(tor)
Attachment #134032 - Flags: review?(tor)
Comment on attachment 134032 [details] [diff] [review] Updated version of Bernard's patch; sync the additions added when libpng 1.2.5 landed According to the MS documentation, asm blocks generate their own clobber list and you don't need to manually preserve e[abcd]x or e[sd]i. Just preserving the flags should be enough.
Attachment #134032 - Flags: superreview?(tor)
Attachment #134032 - Flags: superreview-
Attachment #134032 - Flags: review?(tor)
Attachment #134032 - Flags: review-
I assume __fastcall option /Gr isn't being used is it? It was my understanding that if you make a call to a function and need register values preserved it's the caller's responsibility to preserve them not the callee's, other than things like esp and ebp. The only other thing I thought might be happening is the local being stored in a register, in which case it might have gotten clobbered, but a simple test I ran indicates that shouldn't happen. For simple assignment it was referenced on the stack, for more complex operations a register was used and then it was stored back to a location on the stack and the assembler referenced that from then on. I'm not sure if I have the Processor Pack, though. That could very well make a difference. I'll check and report back.
"I assume __fastcall option /Gr isn't being used is it?" yes, __fastcall isn't being used here And I looked at MS doc again and I couldn't see any statement about __asm block preserving any register About the patch I have to say that it's just a matter of synchronizing jpeg mmxsupport() with libpng png_mmx_support() and that there's no problem with png_mmx_support() with -O2 ... And I've looked at assembly code again and it's clear that no push/pop are generated to preserve registers, you have to do this manually, hence the push/pop for ebx,ecx,edx I've noticed that with -O2 when push/pop for ebx/cx/dx are added the call to mmxsupport() is not inlined. When push/pops are removed mmxsupport() is inlined with -O2 and not inlined with -O1. Maybe the crash has nothing to do with preserving the registers but is just about inlining or not inlining mmxsupport() ?
I wasn't implying that the __asm would cause the registers to be preserved, what I was saying that in my test the register was stored back at the start of the __asm block. The inline issue may be the heart of it. With it inlined it's not actually a function call, thus the optimizer may have tripped up didn't think about preserving registers used. Forcing this to be non-inline would be an interesting test.
+__declspec(noinline) int mmxsupport() Does indeed not crash.
Summary: win32 '-O2' crash in jpeg\jdapimin.c: jpeg_consume_input() → win32 '-O2' crash in jpeg\jdapimin.c: jpeg_consume_input() [jpg]
*** Bug 235087 has been marked as a duplicate of this bug. ***
Removing -O2 from the summary because this bug exhibits itself using other optimisation flags too. Also clearing target milestone of mozilla1.1beta and removing defunct QA contact. This bug exhibits itself regularly when compiling with MSVC++ .NET 2003 using the generic "ac_add_options --enable-optimize" mozconfig option.
Summary: win32 '-O2' crash in jpeg\jdapimin.c: jpeg_consume_input() [jpg] → win32 crash in jpeg\jdapimin.c: jpeg_consume_input() [jpg]
Target Milestone: mozilla1.1beta → ---
Untested possible fix.
Attachment #142875 - Flags: review?(bugmail)
Just a comment that both of these patches with SSE2 IDCT and GL optimization crash SeaMonkey. They work fine on FireFox though. I don't think that the code in jdapimin.c is the cause as I replaced both routines with "return 1" and the thing still crashed. It could be that the SSE2 and/or MMX code has a problem with GL optimization. I'm going to play around with this stuff a little more next week.
http://msdn.microsoft.com/library/en-us/vclang/html/_core_Using_and_Preserving_Registers_in_Inline_Assembly.asp Indicates that the compiler should be preserving registers used in asm-blocks. However they also say that we should make sure to not use __fastcall, which I think that we do by default.
Attached patch add static (deleted) — Splinter Review
Attachment #142875 - Attachment is obsolete: true
Comment on attachment 143013 [details] [diff] [review] add static r=me if you've tested that this fixes the problem. However it'd be great if someone that has MSVC *without* the processorpack could try to compile with this change and make sure it still compiles. It's not a big deal if it doesn't compile since we already require the processorpack. But if it's required then it might be good to put an announcement in the NGs or some such.
Attachment #143013 - Flags: review+
Attachment #143013 - Flags: superreview?(bryner)
Attachment #142875 - Flags: review?(bugmail)
Attachment #143013 - Flags: superreview?(bryner) → superreview+
Checked in.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
This bug (or bug 125762, I'm not sure) may have caused major regression bug 247437 (see attached screenshot)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: