Closed Bug 578761 Opened 14 years ago Closed 14 years ago

JM: PIC for |o[e]| (JSOP_GETELEM)

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: dmandelin, Assigned: cdleary)

References

Details

(Whiteboard: fixed-in-jaegermonkey)

Attachments

(5 files, 10 obsolete files)

WIP patch. 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), patch		Details \| Diff \| Splinter Review
iterator getelem patch 14 years ago Brian Hackett [Laid off!] (deleted), patch		Details \| Diff \| Splinter Review
WIP: Passes trace tests. 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), patch		Details \| Diff \| Splinter Review
rebased iterators patch 14 years ago Brian Hackett [Laid off!] (deleted), patch		Details \| Diff \| Splinter Review
WIP: Adds int32 inline path. 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), patch		Details \| Diff \| Splinter Review
Perf comparison. 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), text/plain		Details
WIP: Fixes bug w/megamorphism on shape change. 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), patch		Details \| Diff \| Splinter Review
GETELEM PIC results on SunSpider. 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), text/plain		Details
Perf comparison. 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), text/plain		Details
WIP: Fixes bug w/megamorphism on shape change. 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), patch		Details \| Diff \| Splinter Review
PIC GETELEM 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), patch		Details \| Diff \| Splinter Review
PIC GETELEM, max count bumped 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), patch		Details \| Diff \| Splinter Review
regression test case 14 years ago David Anderson [:dvander] - inactive, e-mail if emergency (deleted), application/javascript		Details
PIC GETELEM, take 2. 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), patch		Details \| Diff \| Splinter Review
PIC GETELEM, take 2, for real. 14 years ago Chris Leary [:cdleary] (not checking bugmail) (deleted), patch	dvander : review+	Details \| Diff \| Splinter Review

David Mandelin [:dmandelin]

Reporter

Description

•

14 years ago

We need a pic for getelem expressions of the form o[e]. It is the main cause of slowdown on fasta. Previous measurements suggest that we don't have too worry too much about megamorphism: a simple solution, like going to a slow call after 16 stubs, will probably do. I suspect that most getelems we see will either use integer indexes, thus skipping this entirely, or will go over a relatively small number of property names. JM wanted SS win: 20 ms

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 1

•

14 years ago

Ooh, I'll take this one.

Assignee: general → cdleary

Status: NEW → ASSIGNED

Brendan Eich [:brendan]

Comment 2

•

14 years ago

For string-fasta.js's two for-in loops, we could do even better, but maybe it is overkill: the def-use chain for the loops has the def in the loop head dominating all (few) uses, so we could try a bit more static analysis and predict the next value (property name). /be

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 3

•

14 years ago

What about the idea mentioned a few times of iterating over JSScopeProperties? It'd be pretty cool if we could just read the last sprop out of the iterator and avoid the PIC.

Brendan Eich [:brendan]

Comment 4

•

14 years ago

Other than the sprop linkage being up the property tree, so in reverse for-in order, it's great. This isn't a big deal given the fastiterators bug's work, but you'd need to tweak that code to build an array of sprops and know how to get the next one in for-in enumeration order from it. Or something like that. /be

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 5

•

14 years ago

Attached patch WIP patch. (obsolete) (deleted) — Details — Splinter Review

Very WIP: needs cleanup and the integer fast path added back in.

Brian Hackett [Laid off!]

Comment 6

•

14 years ago

Attached patch iterator getelem patch (obsolete) (deleted) — Details — Splinter Review

FWIW here's the patch from last week that keeps the last-seen value around which would avoid a MIC/PIC in the common case. Hi, patch attached which saves 8-9ms on string-fasta for TM and JM. 'for in' loops keep an array of (id,slot) instead of id, which is inspected before doing an arbitrary name lookup (not fast-pathed yet for JM, just non-as-slow-pathed). Doesn't handle property deletes (don't know how those work). Should this go on 560989 or a new bug? Brian

David Mandelin [:dmandelin]

Reporter

Comment 7

•

14 years ago

(In reply to comment #6) > Should this go on 560989 or a new bug? I think this belongs in bug 578756. I tried to apply it to check out the perf gain, but it needs rebasing. We are hoping to get 23 ms total from improving that particular loop of fasta, so if we aren't getting all of it, we need to know why to decide if the current patches are going to take care of it or if we need to do more.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Updated

•

14 years ago

Summary: JM: PIC for |o[e]| → JM: PIC for |o[e]| (JSOP_GETELEM)

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 8

•

14 years ago

Attached patch WIP: Passes trace tests. (obsolete) (deleted) — Details — Splinter Review

Shoehorning the int path back in...

Attachment #457736 - Attachment is obsolete: true

Brian Hackett [Laid off!]

Comment 9

•

14 years ago

Attached patch rebased iterators patch (deleted) — Details — Splinter Review

Iterator patch rebased to current JM. I still get a 9ms speedup on string-fasta with this (35ms -> 26ms).

Attachment #457885 - Attachment is obsolete: true

Brian Hackett [Laid off!]

Comment 10

•

14 years ago

Chris, what speedup are you seeing on string-fasta with your patch (can you test it yet)? Looking at this some more, unless I'm missing something I don't see why using a PIC makes sense for the workloads seen on SS. There are 330k slow paths taken for getelem on SS. 175k are from one site in string-fasta, 80k are from four sites in string-unpack-code, leaving 75k (iirc 30k are gets on string elements, 8k are getting holes, remainder are misc). If you PIC on (shape x id) (you need to do this to determine the slot, right?), then for the one site in string-fasta there are 19 cases that get hit (the total number of properties in the two objects that are repeatedly iterated over), and for the sites in string-unpack-code there are between 600-2000 cases that get hit (all 1-2 character strings). The iterator patch works on the 175k in string-fasta but not the ones in string-unpack-code; I think these require old fashioned hashtable lookups.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 11

•

14 years ago

I've only tested on string-fasta (because that has no dense array hot paths): 13.5% == 4ms speedup.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 12

•

14 years ago

Attached patch WIP: Adds int32 inline path. (obsolete) (deleted) — Details — Splinter Review

About to test on a machine whose benchmark results I trust...

Attachment #458032 - Attachment is obsolete: true

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 13

•

14 years ago

Attached file Perf comparison. (obsolete) (deleted) — Details

4ms on SunSpider, wash on V8. I'll see if I can do anything to cut down on extraneous spillage to make these numbers better.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 14

•

14 years ago

How many times does the PIC miss on SS, and how many times does it hit?

Brian Hackett [Laid off!]

Comment 15

•

14 years ago

General concern: how reliable are these benchmark comparisons? Comparison from comment 13 shows 'significant' differences in tests completely unrelated to these PICs (binary-trees, bitwise-and, plenty that only have integer array accesses). string-fasta is .9ms faster, and string-unpack-code is .5ms faster; if you repeat the comparison, how much do things change?

Brendan Eich [:brendan]

Comment 16

•

14 years ago

SunSpider overstates significance all the time; a while ago dmandelin thought he saw a flaw in its pstats but then I think thought it was ok, but it "feels" all wrongly overstated, way too often. Of course SS has ND in it too (Date, Math.random). Benchmarketing at its best! /be

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 17

•

14 years ago

(In reply to comment #15) > General concern: how reliable are these benchmark comparisons? It's reproducible. There are a lot of little effects from adding these PICs that can have emergent behavior, though most of the ones I can think of are bad: self-modifying code causes I$ flushes, PIC guards alias other branches in the BHTs, there's more I$ pressure because of the inline PIC code and out-of-line slow call code that gets tacked on to the inline buffer. I look a quick look at cachegrind diff results yesterday, but I'll look into it more today. @dvander: I'll investigate hit counts.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 18

•

14 years ago

I wrote a little script to compare stub call counts using our stub call profiling mode and I got the following: sunspider-0.9.1 aggregate JSOP_NOP: +999 JSOP_PROPINC: +1 JSOP_EXCEPTION: -1 JSOP_CONCATN: -1 JSOP_CALL: -3 JSOP_REGEXP: -999 JSOP_GETELEM: -24425 Total: -24429 I think that the stub profiling data (or my script) may be a little buggy because running the untouched shell against itself I got: crypto-aes.js JSOP_CONCATN: -12 JSOP_CALL: -36 Total: -48 sunspider-0.9.1 aggregate JSOP_CONCATN: -12 JSOP_CALL: -36 Total: -48 Investigating...

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 19

•

14 years ago

Yeah, there's definitely nondeterminism in the stub profiling, although it's always for the same opcodes. Still looking, but CCing dmandelin -- maybe this is a known issue.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 20

•

14 years ago

In any case, this is the bit that matters to us: string-tagcloud.js JSOP_GETELEM: -4998 date-format-xparb.js JSOP_GETELEM: -3999 string-fasta.js JSOP_GETELEM: -15428 sunspider-0.9.1 aggregate JSOP_GETELEM: -24425 Those numbers should represent the PIC hits.

Brian Hackett [Laid off!]

Comment 21

•

14 years ago

Is this nondeterminism in the stub profiling or nondeterminism in SS itself (see the nonce variable in AESEncryptCtr in crypto-aes.js)? As Brendan pointed out, SS doesn't do the same thing every time, including differing numbers of calls (bitten me before).

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 22

•

14 years ago

(In reply to comment #21) > (see the nonce variable in AESEncryptCtr in crypto-aes.js) That makes a lot of sense for crypto-aes, didn't see the call to Date until you mentioned it. :-) Still not sure how exception is becoming propinc or regexp is becoming jsop_nop -- we don't even have a slow call for anything noppish, so that'll be what I check out next.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 23

•

14 years ago

(In reply to comment #20) What about PIC misses? Trying to see what percentage of PIC'able callsites we're missing. And how many times do we hit max stubs?

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 24

•

14 years ago

Attached patch WIP: Fixes bug w/megamorphism on shape change. (obsolete) (deleted) — Details — Splinter Review

Excellent suggestion by dvander -- rooted out a bug. Re-analyzing with that fix.

Attachment #458158 - Attachment is obsolete: true

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 25

•

14 years ago

Attached file GETELEM PIC results on SunSpider. (deleted) — Details

1 disabled due to failed lookup at date-format-xparb.js:19. 4000 iterations where the first iteration is used in a "try to look it up, and if it turned out it wasn't in there, put it in there" fashion. 3 disabled due to abnormal object ops (2 in regexp-dna.js:1707, 1 in string-tagcloud.js:185) TODO: investigate what these objects actually are. 1 disabled in string-fasta.js:68 disabled due to megamorphism, could up the maximum a little bit to catch all of these. 4 disabled in string-unpack-code.js (:16, :30, :51, :66) due to megamorphism, but this one is definitely correct -- the strings that are being used for indexing are dynamically generated, so the atom comparison fails even when the characters are the same, i.e. "L" shows up three times with different string pointers. In summary, I'll try to treating failed lookups as nops for the benefit of xparb and bump the stub cap for the (small) benefit of fasta. Will rerun the benchmarks on my macbook tonight and give V8 a look-over tomorrow.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 26

•

14 years ago

Attached file Perf comparison. (deleted) — Details

16.8ms win on SunSpider without any of the modifications I mentioned at the end of comment 25, but looks like it regresses V8 by 80ms (1%) in its current form. Will investigate tomorrow.

Attachment #458162 - Attachment is obsolete: true

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 27

•

14 years ago

Attached patch WIP: Fixes bug w/megamorphism on shape change. (obsolete) (deleted) — Details — Splinter Review

#if out some debug stuff I accidentally left in there. (I removed it before profiling in the above comment.)

Attachment #458282 - Attachment is obsolete: true

Brendan Eich [:brendan]

Comment 28

•

14 years ago

Re: comment 25: non-native object ops in SS must be dense arrays. ITM we want a different approach here from a PIC. We want prediction based on likelihood or (better) a bit of static analysis, not prediction based on what just missed the cache and forced us to fill it with a key that won't be matched again all too often. /be

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 29

•

14 years ago

On my machine, with the patch from comment #27, I don't see any regressions. string-fasta speeds up by 34% (6.5ms), and that's the only clearly-affected test.

Brian Hackett [Laid off!]

Comment 30

•

14 years ago

I get a 9ms speedup (35ms -> 26ms; how are you computing the 34%?) with the patch from comment 27, the same as for the iterators patch. That other patch could get a few ms more by fast-pathing the cache lookup (the calls to GetElem and associated spills/tests are still performed) and maybe by using an array of JSScopeProperty*. Both patches are attacking string-fasta; how to proceed?

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 31

•

14 years ago

Attached patch PIC GETELEM (obsolete) (deleted) — Details — Splinter Review

Discussed offline with dvander, dmandelin, and bhackett. We're going to go with the GETELEM PIC over the iterator patch. The GETELEM patch shows a little more speedup on microbenchmarks because the iterator patch is lacking a fast path in its current implementation.

Attachment #458289 - Attachment is obsolete: true

Attachment #458497 - Flags: review?(dvander)

Andreas Gal :gal

Comment 32

•

14 years ago

Could you post a summary of the discussion?

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 33

•

14 years ago

(In reply to comment #32) > Could you post a summary of the discussion? Surely! The patches do the same thing (make getelem faster) -- the GETELEM PICs are a bit more generic and have a larger speedup ATM, so it doesn't seem worthwhile to spend more engineering effort duplicating something we already have the win for. Since GETELEM PICs work and have a slightly larger speedup, into the tree they go.

Brendan Eich [:brendan]

Comment 34

•

14 years ago

Sorry if I missed this: What PIC hit rates do you get for the string-fasta loops? /be

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 35

•

14 years ago

(In reply to comment #33) > the GETELEM PICs > are a bit more generic To clarify, when I say "a bit more generic" I mean that they work on a larger set of cases than indexing into the innermost iterator.

Andreas Gal :gal

Comment 36

•

14 years ago

fasta does a property walk over an object. why does a PIC outperform an optimized iterator data structure?

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 37

•

14 years ago

(In reply to comment #36) > fasta does a property walk over an object. why does a PIC outperform an > optimized iterator data structure? As I mentioned in comment 33, because we have a fast path implemented for it. The current iterator patch performs a slow call on each iteration.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 38

•

14 years ago

Attached patch PIC GETELEM, max count bumped (obsolete) (deleted) — Details — Splinter Review

Bumps the megamorphic limit to 20, so we don't disable at all on fasta.

Attachment #458497 - Attachment is obsolete: true

Attachment #458505 - Flags: review?(dvander)

Attachment #458497 - Flags: review?(dvander)

Andreas Gal :gal

Comment 39

•

14 years ago

Your summary above is completely misleading. So we are choosing a less optimal algorithm that we implemented a fast path for over a better algorithm that we didn't finish implementing? This doesn't seem like a good approach to me.

Brendan Eich [:brendan]

Comment 40

•

14 years ago

Tuning for SunSpider is one thing (20 may be overkill -- isn't the magic number for IUB 15?). Giving the win to the patch that has had some fast-path love, even though it uses caching which is in general not effective on real for-in loops, instead of giving the same advantage of fast-path love to the iterator path, and then judging, is a path-dependent mistake. Apples to apples, if the fast-path'ed iterator wins, then it should go in. /be

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 41

•

14 years ago

We should do both, they're not mutually exclusive, and this is ready now. We fast-path FORLOCAL and MOREITER, we can have a separate bug for noticing where FORLOCAL results flow into.

Brendan Eich [:brendan]

Comment 42

•

14 years ago

Ok, so let's do both -- not make this an either-or on bogus grounds :-/. Also, be clear this is tuning for SunSpider. We can take it, but it should be described as "more general". Is the other bug on file? /be

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 43

•

14 years ago

(In reply to comment #34) > Sorry if I missed this: What PIC hit rates do you get for the string-fasta > loops? Not sure that I understand the metric you're looking for, but I'll take a shot. The patch elides 147331 slow calls to GETELEM versus TIP, so with 52 slow calls for PIC updates, that's a hit rate of 99.9647%.

Brendan Eich [:brendan]

Comment 44

•

14 years ago

er, "should *not* be described as 'more general'." /be

Brendan Eich [:brendan]

Comment 45

•

14 years ago

(In reply to comment #43) > (In reply to comment #34) > > Sorry if I missed this: What PIC hit rates do you get for the string-fasta > > loops? > > Not sure that I understand the metric you're looking for, A PIC is a cache. Caches are tested, which either hits or misses (if miss, then fill according to a replacement policy). The hit rate is the ratio of hits to tests. What's the hit rate? > but I'll take a shot. > The patch elides 147331 slow calls to GETELEM versus TIP, so with 52 slow calls > for PIC updates, that's a hit rate of 99.9647%. That's pretty awesome. Thanks! /be

Brendan Eich [:brendan]

Comment 46

•

14 years ago

Chris, what happens if you shrink the PIC size to 16? Does the hit rate drop a whole lot? /be

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 47

•

14 years ago

(In reply to comment #42) > Is the other bug on file? dmandelin mentioned in comment 7 that the iterator work is living in bug 578756. This bug happened to become a de-facto string-fasta optimization bug, which was the cause for the either-or confusion.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 48

•

14 years ago

(In reply to comment #45) > > Not sure that I understand the metric you're looking for, > > A PIC is a cache. I may have misrepresented my level of cluelessness... You were right, 16 is okay. PIC Size Hit Rate -------- -------- 2: 49322 / 147384 = 33.471% 4: 64493 / 147384 = 43.758% 8: 79014 / 147384 = 53.611% 15: 122866 / 147384 = 83.365% 16: 147331 / 147384 = 99.964% 32: 147331 / 147384 = 99.964% Just so happens only 17 of the 19 shape/atom combos flow through that GETELEM. In light of this pathetic analysis, I would like to apologize for the impression I gave that PICing is more general in any serious way.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 49

•

14 years ago

Attached file regression test case (deleted) — Details

On the mac-mini: fasta: 1.40x as fast 41.9ms +/- 0.6% 29.8ms +/- 0.9% significant The only consistent regression I see across multiple computers is this: fannkuch: *1.052x as slow* 20.8ms +/- 1.0% 21.8ms +/- 1.0% significant When there's only a single GETELEM site, things are fine. As more get added the performance degrades. To demonstrate see the attached script. It runs in 2500ms before this patch, and 3000ms after. It's only 1ms on the benchmark, but my gut feeling is we should investigate this.

Brendan Eich [:brendan]

Comment 50

•

14 years ago

Is it still the case that JM does not handle megamorphism explicitly? I.e. too much PIC thrashing -> recompile a generic stub call? /be

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 51

•

14 years ago

(In reply to comment #50) Yeah, though this problem case is the reverse - lots of call sites, each monomorphic.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 52

•

14 years ago

Comment on attachment 458505 [details] [diff] [review] PIC GETELEM, max count bumped >+ /* FIXME: we should factor this out. Easy to miss new members. */ Forgetting to set a member will light up pretty quickly. Omit this or explain it better (maybe even turn it into documentation). >+ script->pics[i].idReg = pics[i].idReg; This should be in the |u.get| union struct of ic::PICInfo. >+ pic.hotPathBegin = masm.label(); >+ /* Guard on shape. */ Newline before comment. >+ pic.shapeGuard = masm.label(); >+ Jump shapeGuard = masm.branch32(Assembler::NotEqual, shapeReg, >+ Imm32(int32(JSObjectMap::INVALID_SHAPE))); >+ /* Guard on id identity. */ Ditto. >+ static const int32 BOGUS_ATOM = 0xdeadbeef; >+ Jump idGuard = masm.branch32(Assembler::NotEqual, idReg, Imm32(BOGUS_ATOM)); branchPtr, ImmPtr - or better yet, put a // :FIXME: X64 in there somewhere. >+ stubcc.linkExit(idGuard, Uses(2)); >+ stubcc.linkExitDirect(shapeGuard, pic.slowPathStart); This must be linkExit(), not linkExitDirect(), because if the branch is taken registers won't be synced. >+ void jsop_getelem_dense(FrameEntry *obj, FrameEntry *id, RegisterID objReg, >+ MaybeRegisterID &idReg, RegisterID shapeReg); > inline JSC::MacroAssembler::RegisterID >+FrameState::tryRegForType(FrameEntry *fe, RegisterID fallback) More on this later. >+ if (obj->isDenseArray()) { >+ start = masm.label(); >+ atomGuard = masm.branchPtr(Assembler::NotEqual, pic.idReg, ImmPtr(id)); >+ shapeGuard = masm.branchPtr(Assembler::NotEqual, >+ Address(pic.objReg, offsetof(JSObject, clasp)), >+ ImmPtr(obj->getClass())); Dense arrays are non-native, so an earlier check made this all dead. You can just remove it. >+ masm.loadData32(proto, pic.objReg); loadPtr() - looks like an existing bug that got copy pasta'd. proto used to be a jsval. >+ if (obj->isDenseArray()) >+ disable("dense array"); Can remove this too. > // This struct comes out to 61 bits. 65 now (70 when idReg moves in). >+ } >+ /* Postcondition: type should be in tmpReg, data should be in objReg. */ >+ /* Note: linkExits will be hooked up to a leave() after this method completes. */ Newline before comment, merge comments or use // >+ if (id->isTypeKnown() && id->getKnownType() == JSVAL_TYPE_INT32 && id->isConstant() && >+ id->getValue().toInt32() < 0) { >+ jsop_getelem_slow(); >+ return; >+ } Can this check be moved into the jsop_getelem_dense path? >+ >+ if (id->isTypeKnown() && id->getKnownType() == JSVAL_TYPE_STRING && id->isConstant()) { >+ /* Never happens, or I'd optimize it. */ >+ jsop_getelem_slow(); >+ return; >+ } Similarly, into the jsop_getelem_pic path. >+ RegisterID tmpReg = frame.allocReg(); >+ if (!obj->isTypeKnown()) { >+ RegisterID typeReg = frame.tryRegForType(obj, tmpReg); >+ Jump objGuard = masm.testObject(Assembler::NotEqual, typeReg); >+ stubcc.linkExit(objGuard, Uses(2)); >+ } I'd prefer playing along with the register allocator. Use copyTypeIntoReg, and allocReg on the other side of the |if|. Right now, you might be spilling and then re-loading the type. >+ if (!id->isConstant()) >+ idReg.setReg(frame.copyDataIntoReg(id)); >+ /* Meat. */ Delicious, but more savory with a preceding newline. >+ jsop_getelem_dense(obj, id, objReg, idReg, tmpReg); >+ stubcc.leave(); >+ stubcc.call(stubs::GetElem); >+ /* Epilogue. */ Ditto. >+ RegisterID objReg = frame.copyDataIntoReg(obj); >+ RegisterID idReg = frame.copyDataIntoReg(id); >+ /* Meat. */ >+ jsop_getelem_pic(obj, id, objReg, idReg, tmpReg); >+ /* Epilogue. */ Ditto * 2. >+#ifdef JS_POLYIC >+ JS_ASSERT(!id->isTypeKnown()); Taking another look, this function is *really* long. Would be nice to separate into smaller pieces (getelem_known_id_type, getelem_with_pic, getelem_without_pic), but up to you. >+ RegisterID typeReg = frame.tryRegForType(id, tmpReg); >+ Jump intGuard = masm.testInt32(Assembler::NotEqual, typeReg); Nice, this is a good use case for tryRegForType, because anything else will be less optimal. Let's overload it to "tempRegForType" and give it a "hint" parameter. Assert that the hint reg is owned. >+ Jump performedDense = masm.jump(); Hrm... I think we could do better by keeping one case OOL, or PIC'ing everything. For example, if the type isn't known, we'd always go to a PIC. If it saw a string id, it'd guard on that and build string PICs. If it saw an int32 id, it'd guard on that, and build a dense elem path. We most likely don't need or want both in one path, for locality, branch prediction, or code size. This should be a follow-up bug though. Nice work so far, looking forward to new patch.

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 53

•

14 years ago

(In reply to comment #52) > >+ stubcc.linkExit(idGuard, Uses(2)); > >+ stubcc.linkExitDirect(shapeGuard, pic.slowPathStart); > > This must be linkExit(), not linkExitDirect(), because if the branch is taken > registers won't be synced. I think the same set of registers is dirty on the idGuard taken edge and the shapeGuard taken edge. > >+ if (obj->isDenseArray()) { > >+ start = masm.label(); > >+ atomGuard = masm.branchPtr(Assembler::NotEqual, pic.idReg, ImmPtr(id)); > >+ shapeGuard = masm.branchPtr(Assembler::NotEqual, > >+ Address(pic.objReg, offsetof(JSObject, clasp)), > >+ ImmPtr(obj->getClass())); > > Dense arrays are non-native, so an earlier check made this all dead. You can > just remove it. We checked in the caller that the *holder* had a native JSObjectOps, not obj. Is it possible for a dense array to have a prototype with native object ops? I'm somewhat confused by the terminology here, may have to discuss offline. > Would be nice to separate into smaller pieces I'd like that as well, will break it up some. > >+ if (id->isTypeKnown() && id->getKnownType() == JSVAL_TYPE_INT32 && id->isConstant() && > >+ id->getValue().toInt32() < 0) { > >+ jsop_getelem_slow(); > >+ return; > >+ } > > Can this check be moved into the jsop_getelem_dense path? It's easier to emit all the shortcut-to-slow stuff before the shared isObject guard. I can duplicate that guard a few times if you hate the way it is now. > Let's overload it to "tempRegForType" and give it a "hint" > parameter. Assert that the hint reg is owned. So this just means rename tryRegForType to tempRegForType? tryRegForType already asserted that the fallback register was owned.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 54

•

14 years ago

(In reply to comment #53) > I think the same set of registers is dirty on the idGuard taken edge and the shapeGuard taken edge. Whoops, yup, you're right. Can you leave a comment in there to avoid accidental breakage? > We checked in the caller that the *holder* had a native JSObjectOps, not obj. Okay, yeah, we need to check |obj|, as well as everything in that protochain walk. Another existing bug, we can fix the other cases separately. > I can duplicate that guard a few times if you hate the way it is now. If it's a big deal then nevermind. > So this just means rename tryRegForType to tempRegForType? Yes.

Brendan Eich [:brendan]

Comment 55

•

14 years ago

(In reply to comment #53) > > Dense arrays are non-native, so an earlier check made this all dead. You can > > just remove it. > > We checked in the caller that the *holder* had a native JSObjectOps, not obj. > Is it possible for a dense array to have a prototype with native object ops? > I'm somewhat confused by the terminology here, may have to discuss offline. David pointed out js::PropertyCache::fill over IRC, filling a PIC needs to have the same logic about non-native followed by native on proto chain. Fortunately this is rare apart from dense arrays, and for dense arrays we special-case all proto chain identifier-based (atom literal) tests/lookups to skip the direct (dense array) object, since it cannot have named properties. GETELEM could be accessing any computed property name, though, so if the id at runtime is not an index, but the direct object is a dense array, then you should be able to skip up one level on the proto chain. /be

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 56

•

14 years ago

Attached patch PIC GETELEM, take 2. (obsolete) (deleted) — Details — Splinter Review

Attachment #458505 - Attachment is obsolete: true

Attachment #458799 - Flags: review?(dvander)

Attachment #458505 - Flags: review?(dvander)

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 57

•

14 years ago

Shouldn't have marked for re-review, fixing native logic now...

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 58

•

14 years ago

Attached patch PIC GETELEM, take 2, for real. (deleted) — Details — Splinter Review

Attachment #458799 - Attachment is obsolete: true

Attachment #458846 - Flags: review?(dvander)

Attachment #458799 - Flags: review?(dvander)

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 59

•

14 years ago

Comment on attachment 458846 [details] [diff] [review] PIC GETELEM, take 2, for real. >+ /* >+ * Check all nodes along proto chain are JS-native; otherwise, >+ * obj->shape() is undefined. >+ */ >+ for (JSObject *protoNode = obj; protoNode; protoNode = protoNode->getProto()) { As discussed IRL the isNative check should be in the proto chain walk in generateStub(), don't need need two loops. r=me with that. The regression to fannkuch is still there, but the attached test case is somewhat better (2800ms from 2500ms). I'll file a follow-up bug to investigate this.

Attachment #458846 - Flags: review?(dvander) → review+

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

14 years ago

Depends on: 580492

Brendan Eich [:brendan]

Comment 60

•

14 years ago

Canonical variable names are important in house style, so s/protoNode/pobj/ (it's not a "node" :_P). Drive-by nit, hope you can get it in before it's forgotten. /be

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Comment 61

•

14 years ago

http://hg.mozilla.org/users/danderson_mozilla.com/moo/rev/5e8c4d7985ca

Whiteboard: fixed-in-jaegermonkey

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

14 years ago

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

David Mandelin [:dmandelin]

Reporter

Updated

•

14 years ago

Depends on: 583873

David Mandelin [:dmandelin]

Reporter

Updated

•

14 years ago

Depends on: 583942

Chris Leary [:cdleary] (not checking bugmail)

Assignee

Updated

•

14 years ago

Depends on: 584466

You need to log in before you can comment on or make changes to this bug.