Closed Bug 1027528 Opened 10 years ago Closed 10 years ago

Latin1 strings: Make even more code work with Latin1 strings

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(20 files, 1 obsolete file)

(deleted), patch
n.nethercote
: review+
Details | Diff | Splinter Review
(deleted), patch
luke
: review+
Details | Diff | Splinter Review
(deleted), patch
Waldo
: review+
Details | Diff | Splinter Review
(deleted), patch
Waldo
: review+
Details | Diff | Splinter Review
(deleted), patch
luke
: review+
Details | Diff | Splinter Review
(deleted), patch
luke
: review+
Details | Diff | Splinter Review
(deleted), patch
Waldo
: review+
Details | Diff | Splinter Review
(deleted), patch
n.nethercote
: review+
Details | Diff | Splinter Review
(deleted), patch
terrence
: review+
Details | Diff | Splinter Review
(deleted), patch
n.nethercote
: review+
Details | Diff | Splinter Review
(deleted), patch
terrence
: review+
Details | Diff | Splinter Review
(deleted), patch
terrence
: review+
Details | Diff | Splinter Review
(deleted), patch
n.nethercote
: review+
Details | Diff | Splinter Review
(deleted), patch
till
: review+
Details | Diff | Splinter Review
(deleted), patch
luke
: review+
Details | Diff | Splinter Review
(deleted), patch
bhackett1024
: review+
Details | Diff | Splinter Review
(deleted), patch
terrence
: review+
Details | Diff | Splinter Review
(deleted), patch
jorendorff
: review+
bent.mozilla
: review+
Details | Diff | Splinter Review
(deleted), patch
luke
: review+
Details | Diff | Splinter Review
(deleted), patch
luke
: review+
Details | Diff | Splinter Review
With my local patch queue we're down to < 20 jit-test failures with latin1 strings enabled. The remaining failures are in some pretty obscure corners of the engine exercised by 1 or 2 tests (the long tail).
Attached patch Part 1 - Function constructor (deleted) — Splinter Review
Attachment #8442758 - Flags: review?(n.nethercote)
Attachment #8442760 - Flags: review?(luke)
Attachment #8442778 - Flags: review?(jwalden+bmo)
Attached patch Part 4 - Quote (deleted) — Splinter Review
The Quote function used for JSON.stringify.
Attachment #8442786 - Flags: review?(jwalden+bmo)
Attached patch Part 5 - FunctionToString (deleted) — Splinter Review
Attachment #8442802 - Flags: review?(luke)
Attachment #8442845 - Flags: review?(luke)
Comment on attachment 8442778 [details] [diff] [review]
Part 3 - JS_EncodeString and friends

Review of attachment 8442778 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi-tests/testUTF8.cpp
@@ +39,5 @@
>  
>  BEGIN_TEST(testUTF8_badSurrogate)
>  {
>      static const jschar badSurrogate[] = { 'A', 'B', 'C', 0xDEEE, 'D', 'E', 0 };
> +    const mozilla::Range<const jschar> tbchars(badSurrogate, js_strlen(badSurrogate));

It occurs to me you can't actually modify the pointer/length (pointer/end pointer), so tagging every Range as const isn't necessary, we could just make mStart/mEnd both const instead.  (And Range::operator[] should be const but return T&, so constness of elements is determined by element type.)  Don't care if you decide to not remove all these extra consts, but they wouldn't be necessary with this tweak.

::: js/src/jsapi.cpp
@@ +5513,5 @@
> +    if (linear->hasTwoByteChars())
> +        return JS::LossyTwoByteCharsToNewLatin1CharsZ(cx, linear->twoByteRange(nogc)).c_str();
> +
> +    size_t len = str->length();
> +    char *buf = cx->pod_malloc<char>(len + 1);

Latin1Char rather than char, seems.

@@ +5517,5 @@
> +    char *buf = cx->pod_malloc<char>(len + 1);
> +    if (!buf)
> +        return nullptr;
> +
> +    mozilla::PodCopy(reinterpret_cast<Latin1Char*>(buf), linear->latin1Chars(nogc), len);

Which kills off this cast.

@@ +5518,5 @@
> +    if (!buf)
> +        return nullptr;
> +
> +    mozilla::PodCopy(reinterpret_cast<Latin1Char*>(buf), linear->latin1Chars(nogc), len);
> +    buf[len] = 0;

Mild preference for '\0' for null terminators in strings.
Attachment #8442778 - Flags: review?(jwalden+bmo) → review+
Attached patch Part 7 - minor Range changes (deleted) — Splinter Review
As discussed on IRC, this makes mStart and mEnd const and changes operator[] to be const and return non-const T&.

I can build the shell with this patch; will send to Try before landing to make sure it doesn't break something else.
Attachment #8442863 - Flags: review?(jwalden+bmo)
Comment on attachment 8442786 [details] [diff] [review]
Part 4 - Quote

Review of attachment 8442786 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/json.cpp
@@ +121,5 @@
>  
> +static bool
> +Quote(JSContext *cx, StringBuffer &sb, JSString *str)
> +{
> +    JS::Anchor<JSString *> anchor(str);

This anchoring thing rather than handles is nonsense, but pre-existing.  Maybe I'll fix after you land your patches.
Attachment #8442786 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8442863 [details] [diff] [review]
Part 7 - minor Range changes

Review of attachment 8442863 [details] [diff] [review]:
-----------------------------------------------------------------

Tryserver this before trusting it to land -- I would be entirely unsurprised to find someone with

Range x;
if (...)
  x = ...;
else
  x = ...;

in the whole browser.
Attachment #8442863 - Flags: review?(jwalden+bmo) → review+
Attached patch Part 8 - Reflect.parse (deleted) — Splinter Review
Attachment #8442869 - Flags: review?(n.nethercote)
Attachment #8442760 - Flags: review?(luke) → review+
Comment on attachment 8442802 [details] [diff] [review]
Part 5 - FunctionToString

Review of attachment 8442802 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsfun.cpp
@@ +808,5 @@
>          JS_ASSERT(!braced);
>          for (; unicode::IsSpaceOrBOM2(end[-1]); end--)
>              ;
>      }
> +    *bodyEnd = end.get() - srcChars.start().get();

I think you can subtract them directly w/o .get() (via operator- overload in RangedPtr).
Attachment #8442802 - Flags: review?(luke) → review+
Comment on attachment 8442845 [details] [diff] [review]
Part 6 - asm.js PropertyName serialization

Review of attachment 8442845 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for doing this!

::: js/src/jit-test/tests/latin1/asmjs.js
@@ +1,1 @@
> +load(libdir + "asm.js");

Can you name this file asm.js?  The reason is that I often run 'jit_tests.py $(SHELL) asm.js' to run all the asm.js test.

::: js/src/jit/AsmJSModule.cpp
@@ +761,4 @@
>  {
> +    js::Vector<CharT> tmp(cx);
> +    CharT *src;
> +    if (sizeof(CharT) > 1 && (size_t(cursor) & (sizeof(CharT) - 1)) != 0) {

I think you don't need the "sizeof(CharT) > 1" conjunct: if sizeof(CharT) == 1, then the mask will be 0 and so the branch will not be taken.
Attachment #8442845 - Flags: review?(luke) → review+
Attached patch Part 9 - Fix more places (deleted) — Splinter Review
SavedFrame::HashPolicy::hash, SPSProfiler::allocProfileString, PrintHelpString and ShellSourceHook are all pretty straight-forward.
Attachment #8442917 - Flags: review?(terrence)
Attachment #8442917 - Flags: review?(terrence) → review+
Attached patch Part 10 - ObjectToSource (deleted) — Splinter Review
Attachment #8442991 - Flags: review?(n.nethercote)
Attached patch Part 11 - More shell functions (deleted) — Splinter Review
This patch also changes AutoStableStringChars so that the string is passed to init() instead of to the constructor. This is nicer because init() can now call the (fallible) ensureLinear(), so that the callers don't have to flatten.
Attachment #8443022 - Flags: review?(terrence)
Attachment #8443022 - Flags: review?(terrence) → review+
Attachment #8442758 - Flags: review?(n.nethercote) → review+
Attachment #8442869 - Flags: review?(n.nethercote) → review+
Comment on attachment 8442991 [details] [diff] [review]
Part 10 - ObjectToSource

Review of attachment 8442991 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/latin1/assorted.js
@@ +15,5 @@
> +
> +// obj.toSource TwoByte
> +Object.defineProperty(o, "prop", {get: function() { return "\u1200"; },
> +				 set: function() { return "\u1200"; },
> +				 enumerable: true});

Nit: indent this like the one above.
Attachment #8442991 - Flags: review?(n.nethercote) → review+
Attached patch Part 12 - More fixes (deleted) — Splinter Review
Note that I'm using AutoStableStringChars more often in shell-only stuff and on error paths, as any work to avoid inflating isn't really worth it there.
Attachment #8443419 - Flags: review?(terrence)
Attached patch Part 13 - Pure chars (deleted) — Splinter Review
Some strings have "pure chars", which means the chars can be accessed without copying. We use this in JSCompartment::wrap, ScopedThreadSafeStringInspector and MemoryMetrics.

The pureCharsZ functions were not used at all, so I removed them.

After that, it turned out to be simpler/cleaner to use isLinear directly instead of hasPureChars, with a method on JSRope to copy the chars.
Attachment #8443468 - Flags: review?(n.nethercote)
Attachment #8443475 - Flags: review?(terrence)
Comment on attachment 8443475 [details] [diff] [review]
Part 14 - Self-hosting CloneString

Review of attachment 8443475 [details] [diff] [review]:
-----------------------------------------------------------------

I wonder if we even need support for twoByteChars here. I don't see why any self-hosting code should ever use non-latin1 chars. Maybe assert that and only handle latin1?
Attached patch Part 15 - StructuredClone (obsolete) (deleted) — Splinter Review
Attachment #8443506 - Flags: review?(jorendorff)
(In reply to Till Schneidereit [:till] from comment #24)
> I wonder if we even need support for twoByteChars here. I don't see why any
> self-hosting code should ever use non-latin1 chars. Maybe assert that and
> only handle latin1?

Good point, but we need it for now because Latin1 strings are not yet enabled by default (I want to get these patches in before that happens). I'd be happy to file a follow-up bug to change it later on?
Attachment #8443475 - Flags: review?(terrence) → review?(till)
And part 7:

https://hg.mozilla.org/integration/mozilla-inbound/rev/17cf9036f67f

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #10)
> Tryserver this before trusting it to land -- 

https://tbpl.mozilla.org/?tree=Try&rev=e98ed5c761e6
Comment on attachment 8443475 [details] [diff] [review]
Part 14 - Self-hosting CloneString

Review of attachment 8443475 [details] [diff] [review]:
-----------------------------------------------------------------

Right, I didn't consider the piecemeal-landing. Makes total sense to do it this way for now and remove the twoByteChar stuff in a followup. If it's even useful to do it then, that is: it's not like this introduces terrible amounts of overhead or complexity.
Attachment #8443475 - Flags: review?(till) → review+
Attachment #8443419 - Flags: review?(terrence) → review+
Attachment #8443506 - Flags: review?(jorendorff) → review+
Attachment #8443719 - Flags: review?(luke)
The shell's OffThreadCompileScript is a bit annoying because we have to make sure the chars are kept live until the off-thread compilation is done, so the AutoStableStringChars destructor should not free them.
Attachment #8443725 - Flags: review?(bhackett1024)
Not perf-sensitive so AutoStableStringChars will do.
Attachment #8443728 - Flags: review?(terrence)
Attachment #8443719 - Flags: review?(luke) → review+
Attachment #8443725 - Flags: review?(bhackett1024) → review+
Attached patch Part 15 - StructuredClone v2 (deleted) — Splinter Review
The previous patch stored the string length shifted by 1 and used the low bit for the isLatin1 bit. This is not backwards-compatible and that's complicated for IndexedDB, because we'd have to write a converter from v2 to v3 (or support both formats in the JS engine).

This patch just stores the isLatin1 bit in the high bit, without shifting the length, so that it's backwards compatible (v2 structured clone data is never Latin1 and will be read just fine), so that we don't have to worry about this.

bent, can you review the IndexedDB changes?
Attachment #8443506 - Attachment is obsolete: true
Attachment #8443897 - Flags: review?(jorendorff)
Attachment #8443897 - Flags: review?(bent.mozilla)
Attached patch Part 19 - XDR (deleted) — Splinter Review
Attachment #8443979 - Flags: review?(luke)
Attachment #8443728 - Flags: review?(terrence) → review+
Attachment #8443468 - Flags: review?(n.nethercote) → review+
Comment on attachment 8443897 [details] [diff] [review]
Part 15 - StructuredClone v2

Review of attachment 8443897 [details] [diff] [review]:
-----------------------------------------------------------------

r=me on the IDB changes
Attachment #8443897 - Flags: review?(bent.mozilla) → review+
With the patch in bug 1026438 and the patches in this bug jit-tests and jstests pass with Latin1 strings enabled.
Attachment #8444317 - Flags: review?(luke)
Attachment #8443979 - Flags: review?(luke) → review+
Comment on attachment 8444317 [details] [diff] [review]
Part 20 - Almost everything else

Review of attachment 8444317 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +958,5 @@
>      }
>      if (U_FAILURE(status))
>          return false;
> +
> +    JSString *ns = js_NewStringCopyN<CanGC>(cx, chars.begin(), size);

IIUC, won't this introduce an extra malloc+copy in the case of a string bigger than INLINE_CAPACITY?
Attachment #8444317 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #40)
> IIUC, won't this introduce an extra malloc+copy in the case of a string
> bigger than INLINE_CAPACITY?

Yeah, the alternative is to allow deflating StringBuffer (TwoByte => Latin1). We could add a StringBuffer::maybeDeflate method that we could call if the input is Latin1; I can post an updated patch to do this if you think that's fine.
Comment on attachment 8443897 [details] [diff] [review]
Part 15 - StructuredClone v2

Review of attachment 8443897 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/OpenDatabaseHelper.cpp
@@ +2116,5 @@
>            rv = UpgradeSchemaFrom13_0To14_0(connection);
>          }
> +        else if (schemaVersion == MakeSchemaVersion(14, 0)) {
> +          rv = UpgradeSchemaFrom14_0To15_0(connection);
> +        }

Huh. I didn't know how this stuff worked.

::: js/src/vm/StructuredClone.cpp
@@ +1205,5 @@
>      return str;
>  }
>  
> +JSString *
> +JSStructuredCloneReader::readString(uint32_t data)

I would've named this method something else, to distinguish it from the template, but I defer to your judgment.
Attachment #8443897 - Flags: review?(jorendorff) → review+
Part 15:

https://hg.mozilla.org/integration/mozilla-inbound/rev/b0cbd074dcda

(In reply to Jason Orendorff [:jorendorff] from comment #44)
> I would've named this method something else, to distinguish it from the
> template, but I defer to your judgment.

I renamed the template one to readStringImpl.
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/b0cbd074dcda
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: