Closed
Bug 771094
Opened 12 years ago
Closed 12 years ago
[OS.File] read/write strings with encodings
Categories
(Toolkit Graveyard :: OS.File, enhancement)
Toolkit Graveyard
OS.File
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Yoric, Assigned: Yoric)
References
Details
Attachments
(4 files, 1 obsolete file)
(deleted),
patch
|
froydnj
:
feedback+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
froydnj
:
feedback+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Updated•12 years ago
|
Assignee | ||
Comment 1•12 years ago
|
||
Ok, here is a possible implementation of readString/writeString. For the moment, Unix only, but the code could actually be shared with Windows.
Taras, could you tell me what you think of it?
Assignee: nobody → dteller
Attachment #639352 -
Flags: feedback?(taras.mozilla)
Comment 2•12 years ago
|
||
Comment on attachment 639352 [details] [diff] [review]
First prototype
use string concatenation instead of .join.
instead of passing buffer size, pass a buffer to use directly in options. This abstraction feels like it should live outside of the file api. I'm not a fan of adding readString, readInt.
Conversion functions should live independent of IO.
Attachment #639352 -
Flags: feedback?(taras.mozilla) → feedback-
Assignee | ||
Comment 3•12 years ago
|
||
(In reply to Taras Glek (:taras) from comment #2)
> Comment on attachment 639352 [details] [diff] [review]
> First prototype
>
> use string concatenation instead of .join.
Really? Algorithmically, this sounds scary: concatenation is generally quadratic, while .join can be implemented linearly (I hope it is, I haven't checked), or at least O(n.log(n)).
>
> instead of passing buffer size, pass a buffer to use directly in options.
> This abstraction feels like it should live outside of the file api. I'm not
> a fan of adding readString, readInt.
> Conversion functions should live independent of IO.
Ok. Perhaps we should open a bug for |OS.Stream| or something such.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Updated•12 years ago
|
Component: Networking: File → OS.File
Product: Core → Toolkit
Assignee | ||
Comment 4•12 years ago
|
||
So, as expected, I now need Unicode conversion for bug 794091 et al. In all these bugs, we pretty much have the same scenario: take a string, convert it to Unicode, write it asynchronously to the disk. Since, with the current API, the process is clumsy and segfault-prone, I prefer defining a streamlined version.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: [OS.File] readString, writeString → [OS.File] read/write strings with encodings
Assignee | ||
Comment 5•12 years ago
|
||
Attachment #639352 -
Attachment is obsolete: true
Attachment #665443 -
Flags: review?(nfroyd)
Assignee | ||
Comment 6•12 years ago
|
||
Attachment #665444 -
Flags: review?(nfroyd)
Assignee | ||
Comment 7•12 years ago
|
||
Attachment #665445 -
Flags: review?(nfroyd)
Comment 8•12 years ago
|
||
Comment on attachment 665443 [details] [diff] [review]
1. Read/write strings (sync version)
Review of attachment 665443 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/osfile/osfile_shared_front.jsm
@@ +157,5 @@
> + let root;
> + if (typeof buffer == "string") {
> + let encoding = options.encoding || "utf-8";
> + let bytesC = new Type.uint32_t.implementation(0);
> + root = OS.Shared.Utils.Strings.encodeAll(encoding, buffer, bytesC.address());
Yuck. :)
Let's make |bytesC| more explicit, like |nBytes| or |byteCount|.
@@ +170,4 @@
>
> let pos = 0;
> while (pos < bytes) {
> + LOG("write", "writing at position " + pos);
This is controlled by a flag somewhere, correct?
@@ +374,5 @@
> + * - {number} bytes If unspecified, read all the remaining bytes from
> + * this file. If specified, read |bytes| bytes, or less if the file does not
> + * contain that many bytes.
> + * - {string} encoding The name of an encoding to use to decode the string.
> + * If unspecified, use "utf-8".
I'm not sure this is the right API. Saying that we return a string and count in bytes doesn't fit together very well.
I think you should drop the |bytes| option and just declare that we read the whole file for now. That's the biggest use case you need to solve here, yes? We can revisit specifying an amount later.
@@ +399,5 @@
> * Important note: In the current implementation, option |tmpPath|
> * is required. This requirement should disappear as part of bug 793660.
> *
> * @param {string} path The path of the file to modify.
> + * @param {ArrayBuffer|String} buffer A buffer containing the bytes to write.
Nit: you use |string| for the type everywhere else. Please do so here.
Nit2: please be consistent in adding spaces around | (you have spaces in the docs above).
Attachment #665443 -
Flags: review?(nfroyd) → feedback+
Comment 9•12 years ago
|
||
Comment on attachment 665443 [details] [diff] [review]
1. Read/write strings (sync version)
Review of attachment 665443 [details] [diff] [review]:
-----------------------------------------------------------------
Also, having read through previous comments, what do you think about the idea of moving this out of .File, since it has nothing to do with OS-level I/O?
::: toolkit/components/osfile/osfile_shared_front.jsm
@@ +15,5 @@
> (function(exports) {
>
> let LOG = exports.OS.Shared.LOG.bind(OS.Shared, "Shared front-end");
> +let Type = OS.Shared.Type;
> +let Strings = OS.Shared.Utils.Strings;
If you're going to define this, you should use it in the code that you add. :)
Assignee | ||
Comment 10•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #9)
> Comment on attachment 665443 [details] [diff] [review]
> 1. Read/write strings (sync version)
>
> Review of attachment 665443 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Also, having read through previous comments, what do you think about the
> idea of moving this out of .File, since it has nothing to do with OS-level
> I/O?
Well, my current problem is that |encodeAll| leaks abstractions, insofar as it requires a C pointer (that can be wrapped in a nicer API) but also returns a C pointer (that's more annoying), which needs to be deallocated, which in turns requires the user to understand the finer points of CDataFinalizer, etc.
Now, we could try and do something smarter with |encodeAll| to ensure that it returns an ArrayBuffer. I believe that I now see how we could do that without additional copies.
>
> ::: toolkit/components/osfile/osfile_shared_front.jsm
> @@ +15,5 @@
> > (function(exports) {
> >
> > let LOG = exports.OS.Shared.LOG.bind(OS.Shared, "Shared front-end");
> > +let Type = OS.Shared.Type;
> > +let Strings = OS.Shared.Utils.Strings;
>
> If you're going to define this, you should use it in the code that you add.
> :)
:)
Assignee | ||
Comment 11•12 years ago
|
||
If we wish to to keep it out of OS.File, let's try another style, then.
Attachment #665913 -
Flags: feedback?(nfroyd)
Comment 12•12 years ago
|
||
Comment on attachment 665444 [details] [diff] [review]
2. Read/write strings (async version)
Review of attachment 665444 [details] [diff] [review]:
-----------------------------------------------------------------
Un-r?-ing until we get interfaces straightened out.
Attachment #665444 -
Flags: review?(nfroyd)
Comment 13•12 years ago
|
||
Comment on attachment 665445 [details] [diff] [review]
3. Read/write strings (tests for async version)
Review of attachment 665445 [details] [diff] [review]:
-----------------------------------------------------------------
Un-r?-ing until we get interfaces straightened out.
Attachment #665445 -
Flags: review?(nfroyd)
Comment 14•12 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] from comment #10)
> (In reply to Nathan Froyd (:froydnj) from comment #9)
> > Comment on attachment 665443 [details] [diff] [review]
> > 1. Read/write strings (sync version)
> >
> > Review of attachment 665443 [details] [diff] [review]:
> > -----------------------------------------------------------------
> >
> > Also, having read through previous comments, what do you think about the
> > idea of moving this out of .File, since it has nothing to do with OS-level
> > I/O?
>
> Well, my current problem is that |encodeAll| leaks abstractions, insofar as
> it requires a C pointer (that can be wrapped in a nicer API) but also
> returns a C pointer (that's more annoying), which needs to be deallocated,
> which in turns requires the user to understand the finer points of
> CDataFinalizer, etc.
>
> Now, we could try and do something smarter with |encodeAll| to ensure that
> it returns an ArrayBuffer. I believe that I now see how we could do that
> without additional copies.
The usual solution to avoiding copying in something like this is to have an EncodedLength function or similar that tells you how long a given run of characters would be in bytes (resp. DecodedLength, bytes/characters). I don't know if we have those facilities available. If you have something more clever than that, that'd be good too.
Comment 15•12 years ago
|
||
Comment on attachment 665913 [details] [diff] [review]
Possible API for Unicode conversion
>/**
> * @param {string} encoding The name of the encoding (e.g, âutf-8â)
> *
> * @constructor
> */
>OS.Shared.Encoding = function Encoding(encoding) {
>};
>
>OS.Shared.Encoding.prototype = {
> /**
> * Encode a string to an ArrayBuffer in the current encoding
> *
> * @param {String | ArrayBuffer | C pointer} text
> * @param {object=} options
> * - {number} chars
> * - {number} offset //in chars
> *
> * @return {{buffer: ArrayBuffer, bytes: number}}
> */
> encode: function(text) {
> },
> /**
> * Decode a buffer to a string using the current encoding
> *
> * @param {ArrayBuffer | C pointer} buffer
> * @param {object=} options
> * - {number} bytes
> * - {number} offset //in bytes
> *
> * @return {string}
> */
> decode: function(buffer) {
> },
> reset: function() {
> }
>};
Attachment #665913 -
Flags: feedback?(nfroyd) → feedback+
Comment 16•12 years ago
|
||
Gee, thanks, Bugzilla, for throwing away my comments there.
What is the reset() function for?
I think you are asking for trouble to pass ArrayBuffers and C pointers to encode(). Even though the documentation says the options are in chars, people are going to pass bytes, which is right, except for all those time that it isn't.
Assignee | ||
Comment 17•12 years ago
|
||
Ok, let's step back for a second.
Bug 764234 introduces the string encoding API. The only missing feature is that it returns a TypedArray rather than an ArrayBuffer, so we can't feed it to js-ctypes yet.
Now:
- returning a TypedArray rather than an ArrayBuffer makes lots of sense, and this is probably what we should do instead of returning {{buffer: ArrayBuffer, bytes: number}};
- we have discussed feeding TypedArray to js-ctypes, and I have just opened bug 795505 along these lines.
So let me suggest the following:
- I will work on bug 795505;
- once I have landed this, replace {{buffer: ArrayBuffer, bytes: number}} with |Uint8Array| and accept TypedArray wherever we currently accept ArrayBuffer;
- once both are done, if the patch for bug 764234 has stuck, our work in this bug is done.
Assignee | ||
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Product: Toolkit → Toolkit Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•