Closed
Bug 244978
Opened 20 years ago
Closed 14 years ago
MMX IDCT JPEG optimization
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 573948
People
(Reporter: mmoy, Unassigned)
Details
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a) Gecko/20040523 Firefox/0.8.0+ (mmoy-2004-05-05-Exp-Pentium4C)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a) Gecko/20040523 Firefox/0.8.0+ (mmoy-2004-05-05-Exp-Pentium4C)
There is code to detect the capability of MMX processing on Windows platforms
compiled with MSVC++ and there is code to do MMX processing in jidctfst.c but
this code isn't used as the plumbing isn't set up for it.
The jidctfst.c code says that it uses a fast algorithm (IFAST) with lower quality
than the slow algorithm (ISLOW) which is currently used in either the C form or
the SSE2 form.
A few lines of code in jddctmgr.c are needed to get the MMX code enabled for
processors that support MMX (Some Pentium Is and all Pentium 2 and up).
The question is, though, is it desirable to use the code that results in lower
quality compared to the current code?
I've had an unofficial build out since mid-April with this code in it. Several
unofficial builders are also using this code so there are daily (or close to it)
builds with the MMX code active. I haven't heard of any rendering complaints to
date.
I also have an SSE port of the ISLOW code and it would be a fairly small amount
of work to port that to MMX. The SSE code has been out since early May in my
unofficial builds and those of others too.
This bug is to see if we can get some kind of MMX optimization into the code
base. I will file a bug to get in my SSE optimization after I get my SSE2 code
through.
Reproducible: Always
Steps to Reproduce:
1.
2.
3.
Comment 1•20 years ago
|
||
this is something we can probably enable on linux as well, depending on the
quality of mmx/sse/sse2 support in the older gcc's we need to build with.
Reporter | ||
Comment 2•20 years ago
|
||
(In reply to comment #1)
> this is something we can probably enable on linux as well, depending on the
> quality of mmx/sse/sse2 support in the older gcc's we need to build with.
The code would have to be ported to run on Linux as it used MSVC++ inline
assembler. I'm working on porting the SSE2 code to GCC Linux [I just need
someone to build and help test the code].
I can port the existing MMX code as well but that would be down the road. I could
also port my own MMX code should there be the need to write it.
Comment 3•20 years ago
|
||
wouldn't it be better to use the more standardized compiler intrinsics for mmx
etc? those are cross-compiler i believe, and should result in the same generated
assembly. see mmintrin.h and xmmintrin.h of your favorite compiler.
i can help with testing mmx and sse on linux, but i'm only running a p3 so i
can't help with sse2. (except for compile tests).
Reporter | ||
Comment 4•20 years ago
|
||
(In reply to comment #3)
> wouldn't it be better to use the more standardized compiler intrinsics for mmx
> etc? those are cross-compiler i believe, and should result in the same generated
> assembly. see mmintrin.h and xmmintrin.h of your favorite compiler.
The code is written in assembler so converting the code to intrinsics would be
more work than simply converting on inline assembly to another. I'm just
getting up to speed on GCC inline assembler and was looking to get something
working in a relatively short time frame.
At some point in the future, I'll be learning how to do intrinsics as inline
assembly support isn't available for the Windows-64 Development environment.
One other thing about intrinsics is that what you see is not necessarily what
you get. There are a lot of conveniences with intrinsics where you can address
parts of a register (I think that's the right terminology) directly. You can't
actually do that in assembler so your intrinsics actually maps to multiple
assembler instructions.
If you're considering instruction latency in your coding to weigh the benefits
of using various instructions or data representation approaches, your
measurements may be off because you can't see what's going on under the covers.
An example is when using mmx instructions. I imagine that you don't take the
12 clock latency of emms into consideration if the compiler generates it for
you without showing you. Of course you can always generate assembler listings
but when you want to show someone your code are you going to show the generated
code along with your intrinsics?
One advantage with intrinsics is that it can do instruction reordering so that
you don't have to worry as much about interleaving. You still have to worry
about it to some degree as considering parallel operation executions can have
an effect on your design that a compiler wouldn't necessarily see.
> i can help with testing mmx and sse on linux, but i'm only running a p3 so i
> can't help with sse2. (except for compile tests).
Thanks for the offer. What I need is someone that can build SSE and/or SSE2
FireFox distributions. I have a machine that I can test stuff on. I imagine
that a build would take quite a bit of time on a P3.
Comment 5•20 years ago
|
||
well... writing in intrinsics is much better for portability, i'd imagine; and
it's also easier for folk who aren't familiar with assembler, to understand.
obviously you have to have an awareness of the assembly your code is generating
(this applies to any language where you're trying to optimize for perf), which
often involves looking at the compiled assembly and going back-and-forth until
you get the desired result.
the only reason i can see why we wouldn't want to use intrinsics, is if the
compiler is bad at optimizing the assembly generated from it (e.g. parallel
execution, or register usage). we'll have to see how msvc and gcc differ here.
so, even if it's more work to rewrite the code in intrinsic form, i think it's
more desirable in the long run... cc'ing some other folk, to see if they have an
opinion.
Reporter | ||
Comment 6•20 years ago
|
||
(In reply to comment #5)
> well... writing in intrinsics is much better for portability, i'd imagine; and
> it's also easier for folk who aren't familiar with assembler, to understand.
If you're not familiar with the assembly, then you shouldn't be writing
SIMD intrinsics code for performance. There's another bug on improving the
nsid::equals routine where someone wrote an MMX routine in intrinsics but
apparently didn't understand the costs of the instructions that were being
generated. Working in abstraction tends to do this.
I imagine that there's an improvement in portability as the code will compile
on more development platforms. But will you want the resulting code? There are
a few unofficial builders that report that -arch:SSE optimizations don't help
and can hurt builds that run on some AMD processors and that AMD users are
better off without that optimization. Even if their procesor supports it.
Someone porting with intrinsics may not know that but someone familiar with
assembler and the latency times of instructions would more likely be able to
create code to handle situations like this.
> obviously you have to have an awareness of the assembly your code is generating
> (this applies to any language where you're trying to optimize for perf), which
> often involves looking at the compiled assembly and going back-and-forth until
> you get the desired result.
Adding a third step can be a hinderence too. Ever try to get a compiler to generate the code that you want?
> the only reason i can see why we wouldn't want to use intrinsics, is if the
> compiler is bad at optimizing the assembly generated from it (e.g. parallel
> execution, or register usage). we'll have to see how msvc and gcc differ here.
WIth AMD coming on strong with A64 and Opteron, compilers generating bad code
for a processor becomes more and more of a problem.
> so, even if it's more work to rewrite the code in intrinsic form, i think it's
> more desirable in the long run... cc'ing some other folk, to see if they have an
> opinion.
One thing that I've found is that finding volunteers to do stuff like this where
there's a fairly large amount of work involved isn't easy. There will be lots of
cheerleaders saying what should be done and how you should go about doing it but
very little in the way of offers of assistance or people volunteering to do a
piece of the work for you.
I see addresses of people at Sun, Intel and other tech companies with healthy
balance sheets asking for ports where someone else does the work but not
volunteering any time or hardware to what they want; even if someone else
volunteers to do the work.
It reminds me of the childhood story whose name eludes me at the moment.
Reporter | ||
Comment 7•20 years ago
|
||
Based on another bug where a broken piece of hardware was at fault, the SSE2
code was yanked. Given that, I don't think that it would be likely for the MMX
code to be turned on. That said, you can get the MMX code, the SSE2 code or the
SSE code in a wide variety of unofficial builds. See the unooficial builders
forum at Mozillazine or the forums at www.pryan.org.
Comment 8•18 years ago
|
||
Michael in comment #7
> Based on another bug where a broken piece of hardware was at fault, the SSE2
> code was yanked.
Michael, bug# ?
Updated•18 years ago
|
Assignee: jdunn → nobody
QA Contact: imagelib
Comment 9•17 years ago
|
||
Comment 10•14 years ago
|
||
This is subsumed by bug 573948: Use libjpeg-turbo. I'm going to dupe, but feel free to undupe if you think this is wrong.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•