Closed Bug 1600053 Opened 5 years ago Closed 5 years ago

Implement IO.read

Categories

(Remote Protocol :: CDP, task, P1)

task

Tracking

(firefox73 fixed)

RESOLVED FIXED
Tracking Status
firefox73 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Blocks 3 open bugs, )

Details

(Whiteboard: [puppeteer-beta-mvp])

Attachments

(2 files)

IO.read allows to read data from a stream as specified by a handle (uuid). It is required by Puppeteer's Page.printToPDF method.

I would suggest that we maintain a singleton for the current session, which keeps track of in-memory (blob) data, and data as stored on disk (we don't want to keep everything in memory - like PDFs). Only data as referenced by the handle can be accessed to prevent unauthorized access to random files.

Quick update here. I got it all implemented and it works fine for file streams. My first solution was to make use of nsIFileInputstream and NetUtils.readInputStream() but that is actually blocking the main process of Firefox and as such had to change it yesterday to make use OS.File which executes any IO via a worker on a different thread.

Also to prevent base64 encoding issues for binary content that may contain characters outside of the range of 0x0 to 0xFF, I want to directly encode a Uint8Array into a base64 string. As reading the base64 MDN article it looks like that I have to use an external library like https://github.com/beatgammit/base64-js to make use of that feature. But maybe I'm missing something there and there is an already existent XPCOM interface that could be utilized for that required purpose? Olli, are you aware of one or do you know whom to task about that? Thanks.

Flags: needinfo?(bugs)

Olli, thank you for the reply. Both proposed methods work on nsIInputStream only, and I had to move our code to OS.File as mentioned above. As such I have a Uint8Array what OS.File.read() returns. Based on your reply I assume there is nothing available in Gecko right now to do that direct encoding?

Flags: needinfo?(bugs)

NS_NewByteInputStream gives you an nsIInputStream if you have some data to pass in as a param.
And there are APIs to get the raw data out from Uint8Array

Flags: needinfo?(bugs)

This patch implements the IO.read() method to allow
reading streams for files and blobs.

Therefor all the methods in the IO domain need a registry
for streams. Those have to be stored globally because
they need to be kept existent across different client
connections.

(In reply to Olli Pettay [:smaug] from comment #4)

NS_NewByteInputStream gives you an nsIInputStream if you have some data to pass in as a param.
And there are APIs to get the raw data out from Uint8Array

It doesn't have a scriptable interface definition, so there is no way for me to make use of it from the Remote Agent which is implemented in JS. Or would the DOM team be in favor to expose it to JS?

Flags: needinfo?(bugs)

oh, right, this all needs to be JS.
I can't recall if we have anything for Uint8Array -> base64 exposed to JS.
Perhaps baku recalls.

Flags: needinfo?(bugs) → needinfo?(amarchesini)

Can you please give more context of this IO.read()? Do you have any spec?
Asking this because Blob/File have a stream() method. Is it enough?

Flags: needinfo?(amarchesini) → needinfo?(hskupin)

Sadly there is no spec. So this documentation is everything what we have. The rest is reverse engineering. Maybe have a look at the attached patch to get a better sense. Btw if you get a 404 when opening the documentation try to reload or go to a different domain and back. The viewer is somewhat broken.

Which stream() method are you referring to? I'm using OS.File.read() to read the bytes as Uint8Array. Does that one directly return a base64 encoded string?

Flags: needinfo?(hskupin) → needinfo?(amarchesini)

Thanks for the link. I'll write a few comments in your phab push.

Flags: needinfo?(amarchesini)

(In reply to Andrea Marchesini [:baku] from comment #10)

Thanks for the link. I'll write a few comments in your phab push.

Thank you for the reply. I'm not sure if you have seen my own comments yet, so I will just set needinfo for you again.

Flags: needinfo?(amarchesini)

I tested the IO.read method again with Chrome and the chrome-remote-interface library. And as noticed the streams are not shared between connections:

➜ bin/client.js inspect
>>> var s = Page.printToPDF({ transferMode: "ReturnAsStream" });
>>> IO.read({ handle: "1", offset: -400 });
Promise { <pending>, inspect: [AsyncFunction] }
(To exit, press ^C again or ^D or type .exit)
➜ bin/client.js inspect
>>> var d = IO.read({ handle: "1", offset: -400 });
>>> (node:56220) UnhandledPromiseRejectionWarning: Error: Invalid stream handle

As such I will also limit the lifetime of streams to the current connection.

Blocks: 1602731
Blocks: 1602747

As it turned out this is too complicated for now. As such we will do it as a follow-up if necessary. For now by using uuids for the handle it will make it way harder to retrieve a stream handle of a different connection.

For historical record:

As the attached patches are using UUIDs for stream handles, the question of making the lifetime of the stream registry cache equivalent to that of a connection or to make it global becomes secondary, since a client would necessarily need to know the handle of the other connection’s stream in order to access it.

Whiteboard: [puppeteer-alpha-reserve] → [puppeteer-beta-mvp]
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/10a1344257fd [remote] Add registry for managing references to streams. r=remote-protocol-reviewers,maja_zf https://hg.mozilla.org/integration/autoland/rev/8140a12324b1 [remote] Implement IO.read. r=remote-protocol-reviewers,baku,ato,maja_zf
Flags: needinfo?(amarchesini)
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Component: CDP: IO → CDP
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: