244978 - MMX IDCT JPEG optimization

Reporter

Description

•

20 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a) Gecko/20040523 Firefox/0.8.0+ (mmoy-2004-05-05-Exp-Pentium4C) Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a) Gecko/20040523 Firefox/0.8.0+ (mmoy-2004-05-05-Exp-Pentium4C) There is code to detect the capability of MMX processing on Windows platforms compiled with MSVC++ and there is code to do MMX processing in jidctfst.c but this code isn't used as the plumbing isn't set up for it. The jidctfst.c code says that it uses a fast algorithm (IFAST) with lower quality than the slow algorithm (ISLOW) which is currently used in either the C form or the SSE2 form. A few lines of code in jddctmgr.c are needed to get the MMX code enabled for processors that support MMX (Some Pentium Is and all Pentium 2 and up). The question is, though, is it desirable to use the code that results in lower quality compared to the current code? I've had an unofficial build out since mid-April with this code in it. Several unofficial builders are also using this code so there are daily (or close to it) builds with the MMX code active. I haven't heard of any rendering complaints to date. I also have an SSE port of the ISLOW code and it would be a fairly small amount of work to port that to MMX. The SSE code has been out since early May in my unofficial builds and those of others too. This bug is to see if we can get some kind of MMX optimization into the code base. I will file a bug to get in my SSE optimization after I get my SSE2 code through. Reproducible: Always Steps to Reproduce: 1. 2. 3.

dwitte@gmail.com

Comment 1

•

20 years ago

this is something we can probably enable on linux as well, depending on the quality of mmx/sse/sse2 support in the older gcc's we need to build with.

Michael Moy

Reporter

Comment 2

•

20 years ago

(In reply to comment #1) > this is something we can probably enable on linux as well, depending on the > quality of mmx/sse/sse2 support in the older gcc's we need to build with. The code would have to be ported to run on Linux as it used MSVC++ inline assembler. I'm working on porting the SSE2 code to GCC Linux [I just need someone to build and help test the code]. I can port the existing MMX code as well but that would be down the road. I could also port my own MMX code should there be the need to write it.

dwitte@gmail.com

Comment 3

•

20 years ago

wouldn't it be better to use the more standardized compiler intrinsics for mmx etc? those are cross-compiler i believe, and should result in the same generated assembly. see mmintrin.h and xmmintrin.h of your favorite compiler. i can help with testing mmx and sse on linux, but i'm only running a p3 so i can't help with sse2. (except for compile tests).

Michael Moy

Reporter

Comment 4

•

20 years ago

(In reply to comment #3) > wouldn't it be better to use the more standardized compiler intrinsics for mmx > etc? those are cross-compiler i believe, and should result in the same generated > assembly. see mmintrin.h and xmmintrin.h of your favorite compiler. The code is written in assembler so converting the code to intrinsics would be more work than simply converting on inline assembly to another. I'm just getting up to speed on GCC inline assembler and was looking to get something working in a relatively short time frame. At some point in the future, I'll be learning how to do intrinsics as inline assembly support isn't available for the Windows-64 Development environment. One other thing about intrinsics is that what you see is not necessarily what you get. There are a lot of conveniences with intrinsics where you can address parts of a register (I think that's the right terminology) directly. You can't actually do that in assembler so your intrinsics actually maps to multiple assembler instructions. If you're considering instruction latency in your coding to weigh the benefits of using various instructions or data representation approaches, your measurements may be off because you can't see what's going on under the covers. An example is when using mmx instructions. I imagine that you don't take the 12 clock latency of emms into consideration if the compiler generates it for you without showing you. Of course you can always generate assembler listings but when you want to show someone your code are you going to show the generated code along with your intrinsics? One advantage with intrinsics is that it can do instruction reordering so that you don't have to worry as much about interleaving. You still have to worry about it to some degree as considering parallel operation executions can have an effect on your design that a compiler wouldn't necessarily see. > i can help with testing mmx and sse on linux, but i'm only running a p3 so i > can't help with sse2. (except for compile tests). Thanks for the offer. What I need is someone that can build SSE and/or SSE2 FireFox distributions. I have a machine that I can test stuff on. I imagine that a build would take quite a bit of time on a P3.

dwitte@gmail.com

Comment 5

•

20 years ago

well... writing in intrinsics is much better for portability, i'd imagine; and it's also easier for folk who aren't familiar with assembler, to understand. obviously you have to have an awareness of the assembly your code is generating (this applies to any language where you're trying to optimize for perf), which often involves looking at the compiled assembly and going back-and-forth until you get the desired result. the only reason i can see why we wouldn't want to use intrinsics, is if the compiler is bad at optimizing the assembly generated from it (e.g. parallel execution, or register usage). we'll have to see how msvc and gcc differ here. so, even if it's more work to rewrite the code in intrinsic form, i think it's more desirable in the long run... cc'ing some other folk, to see if they have an opinion.

Michael Moy

Reporter

Comment 6

•

20 years ago

(In reply to comment #5) > well... writing in intrinsics is much better for portability, i'd imagine; and > it's also easier for folk who aren't familiar with assembler, to understand. If you're not familiar with the assembly, then you shouldn't be writing SIMD intrinsics code for performance. There's another bug on improving the nsid::equals routine where someone wrote an MMX routine in intrinsics but apparently didn't understand the costs of the instructions that were being generated. Working in abstraction tends to do this. I imagine that there's an improvement in portability as the code will compile on more development platforms. But will you want the resulting code? There are a few unofficial builders that report that -arch:SSE optimizations don't help and can hurt builds that run on some AMD processors and that AMD users are better off without that optimization. Even if their procesor supports it. Someone porting with intrinsics may not know that but someone familiar with assembler and the latency times of instructions would more likely be able to create code to handle situations like this. > obviously you have to have an awareness of the assembly your code is generating > (this applies to any language where you're trying to optimize for perf), which > often involves looking at the compiled assembly and going back-and-forth until > you get the desired result. Adding a third step can be a hinderence too. Ever try to get a compiler to generate the code that you want? > the only reason i can see why we wouldn't want to use intrinsics, is if the > compiler is bad at optimizing the assembly generated from it (e.g. parallel > execution, or register usage). we'll have to see how msvc and gcc differ here. WIth AMD coming on strong with A64 and Opteron, compilers generating bad code for a processor becomes more and more of a problem. > so, even if it's more work to rewrite the code in intrinsic form, i think it's > more desirable in the long run... cc'ing some other folk, to see if they have an > opinion. One thing that I've found is that finding volunteers to do stuff like this where there's a fairly large amount of work involved isn't easy. There will be lots of cheerleaders saying what should be done and how you should go about doing it but very little in the way of offers of assistance or people volunteering to do a piece of the work for you. I see addresses of people at Sun, Intel and other tech companies with healthy balance sheets asking for ports where someone else does the work but not volunteering any time or hardware to what they want; even if someone else volunteers to do the work. It reminds me of the childhood story whose name eludes me at the moment.

Michael Moy

Reporter

Comment 7

•

20 years ago

Based on another bug where a broken piece of hardware was at fault, the SSE2 code was yanked. Given that, I don't think that it would be likely for the MMX code to be turned on. That said, you can get the MMX code, the SSE2 code or the SSE code in a wide variety of unofficial builds. See the unooficial builders forum at Mozillazine or the forums at www.pryan.org.

Wayne Mery (:wsmwk)

Comment 8

•

18 years ago

Michael in comment #7 > Based on another bug where a broken piece of hardware was at fault, the SSE2 > code was yanked. Michael, bug# ?

David Baron :dbaron:

Updated

•

18 years ago

Assignee: jdunn → nobody

QA Contact: imagelib

Ryan VanderMeulen [:RyanVM]

Comment 9

•

17 years ago

Bug 247437

Justin Lebar (not reading bugmail)

Comment 10

•

14 years ago

This is subsumed by bug 573948: Use libjpeg-turbo. I'm going to dupe, but feel free to undupe if you think this is wrong.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → DUPLICATE

Bugzilla

MMX IDCT JPEG optimization

Categories

(Core :: Graphics: ImageLib, defect)

Tracking

()

People

(Reporter: mmoy, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10