Open Bug 681542 Opened 13 years ago Updated 2 years ago

JavaScript streamed parsing

Categories

(Core :: JavaScript Engine, defect)

defect

Tracking

()

People

(Reporter: azakai, Unassigned)

References

Details

Large JS files are being used these days, whose parse/initial run times can be significant. For example, ammo.js is 1.1MB and takes 2-3 seconds to be prepared before it is actually used. This is very noticeable in demos like http://syntensity.com/static/ammo.html Perhaps we can start to work on the JavaScript as it is being downloaded, in a streaming manner? Parsing seems more feasible, since running would be in danger of things like f(x); [..much later..] function f() { .. } but perhaps something can be done even there?
Parsing is the only issue. We cannot execute at all for the reason you show, and more: var as well as function is hoisted in JS. But lexing and parsing is enough. /be
Summary: JavaScript Streaming (process JavaScript as it is downloaded) → JavaScript streamed parsing
The JS parser is a pretty well tuned (thanks in modern days to cdleary, njn, and others) recursive descent parser. The problem is if we receive TCP segments full of JS, the top-level functions and statements will span segment boundaries. So we will need either an explicit-state parser, which is ugly and (this may have been tried and measured in SpiderMonkey) often slower than implicit(-stack)-state. Or else we would need a thread in which to parse, so the OS can keep the implicit state on the thread stack in between parsing turns. Putting JS parsing in a thread should remind us of speculative <script src=...> prefetching, which uses a thread to parse ahead and try to fire early requests to load the ... URL. We're already using threads. So a JS parsing thread seems both easiest and even most efficient (probably we won't measure, since we'd have to do the work of mangling the current parser to use explicit state). I say full speed ahead with a thread! Who will take this. /be
So from the browser end of this.... Right now we very explicitly wait until we have all the data before calling into JS, obviously. We could sort of change that, but I thought that removing the streaming parse mode from the JS parser was a significant perf win. And the TCP segment issue is somewhat worse than Brendan says because individual characters can span segment boundaries. In any case, what sort of API would jseng expose here? A series of calls each of which takes a jschar* + length as the data becomes available? Trying to think about how to implement this on the browser side.
Compiling functions lazily (as proposed in bug 678037) should allow the initial parsing/compilation to complete much faster.
Boris: we never had a streaming parse mode in JS. Any larger-than-byte lexical or grammatical unit can span a segment boundary, of course. That's not the issue so much as how to save state when suspending waiting for more data. The thread idea would want our script loader code to pass segment payloads of <script src=> content to the parsing thread, and be prepared to execute the script when a message from the thread comes back indicated successful compilation. So a message-passing API. Luke: good point. Oliver Hunt has implemented lazy function compilation in JSC (I think in the straightforward way, which has exponential complexity in the worst case of nesting). But whatever our initial lex/parse costs loading JS (you have to parse to lex JS, note well -- consider /), the idea of moving that parser to a thread to which one can hand off buffers as they come in from the net, and get back a script to execute later, seems like it could win. We could prefetch and interleave lexing and parsing with other work. Of course we could not run any later scripts or lay out the content beyond the script (in case of document.write, assuming the script lacks the async attribute). /be
(In reply to Brendan Eich [:brendan] from comment #5) > > Any larger-than-byte lexical or grammatical unit can span a segment > boundary, of course. And that's a *huge* pain to handle. Pretty much the first step in my epic scanner reworking was to remove the need for the parsing mode where the input is broken into segments, by ensuring the full input was in a single block in memory before starting. (We already always had the input in a single block in the browser, the segmented case was only used in the JS shell.) This permitted just about every follow-on simplification and speed-up. I *really* don't want segments to be re-introduced :(
Assignee: general → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.