Closed Bug 987069 Opened 11 years ago Closed 7 years ago

[jsdbg2] Unicode characters in script URLs are not correctly encoded in Debugger.Script.url

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: sebo, Assigned: tromey)

References

(Blocks 2 open bugs, )

Details

Attachments

(34 files, 20 obsolete files)

(deleted), application/pdf
Details
(deleted), text/x-c++src
Details
(deleted), patch
Waldo
: review+
Details | Diff | Splinter Review
(deleted), patch
Waldo
: review+
Details | Diff | Splinter Review
(deleted), patch
Waldo
: review+
Details | Diff | Splinter Review
(deleted), patch
Waldo
: review-
Details | Diff | Splinter Review
(deleted), patch
Waldo
: review+
Details | Diff | Splinter Review
(deleted), patch
Waldo
: review+
Details | Diff | Splinter Review
(deleted), patch
Waldo
: review+
Details | Diff | Splinter Review
(deleted), patch
fitzgen
: review+
Details | Diff | Splinter Review
(deleted), patch
fitzgen
: review+
Details | Diff | Splinter Review
(deleted), patch
Waldo
: review+
Details | Diff | Splinter Review
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review-
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
(deleted), text/x-review-board-request
Waldo
: review+
Details
If URLs contain non-ASCII characters they are not correctly Unicode encoded within Debugger.Script.url. In case of accessing http://öl.de this is returned within the Debugger.Script.url property: "http://öl.de/" Sebastian
DebuggerScript_getUrl is just buggy. It does: 2962 str = js_NewStringCopyZ<CanGC>(cx, script->filename()); where script->filename() is a char*. But this assumes ASCII (or at best ISO-8859-1) encoding for the char*, whereas it's actually more likely to be UTF-8 in the case of Firefox (and in general is in no particular encoding!).
Status: UNCONFIRMED → NEW
Ever confirmed: true
terrence points out that js/public/CharacterEncoding.h has utilities that could be used here.
I will take the lead on this and figure out who can help fix this. I definitely don't know the C++ functions to use here but I'll get help.
Assignee: nobody → jlong
Jim, you filed the bug that was marked as a duplicate of this one. Any idea of which string encoding function we should be using instead? I noticed that `DebuggerSource_getDisplayURL` uses `JS_NewUCStringCopyZ` instead, but that function internally just does `NewStringCopyZ<CanGC>(cx, s);` as well. Looks like there is some good stuff in CharacterEncoding.h.
Flags: needinfo?(jimb)
There are two overloads of NewStringCopyZ: one takes char* and the other takes char16_t*. JS_NewUCStringCopyZ takes char16_t* and calls the second overload. But that already starts of with UTF-16 data, since it has char16_t*.
So the point is you want to be using one of the JS_NewUCString* functions, but that's the easy part; the "hard" part is getting your data in UTF-16.
This is the core problem: http://dxr.mozilla.org/mozilla-central/source/js/src/jsscript.h#473 The url/filename of the script isn't stored as a wide char. That's wrong.
The point is, it's stored as UTF-8.
I believe that at the time that code was written, JSScript filenames were indeed Latin-1. When the encoding was changed to UTF-8, Debugger wasn't updated. In light of this code: https://hg.mozilla.org/mozilla-central/file/57e4e9c33bef/js/src/jsapi.cpp#l3993 I would guess that UTF8CharsToNewTwoByteCharsZ is the conversion function to use, much as it's used there.
Flags: needinfo?(jimb)
JS script filenames in Gecko have been UTF-8 basically since time immemorial, as far as I can tell. It just passed in necko URI stringifications, which are UTF-8.... James, are you still planning to work on this?
Flags: needinfo?(jlong)
(In reply to Boris Zbarsky [:bz] from comment #11) > JS script filenames in Gecko have been UTF-8 basically since time > immemorial, as far as I can tell. It just passed in necko URI > stringifications, which are UTF-8.... Huh. I wonder what I'm thinking of. Anyway, this is all very fine.
(In reply to Boris Zbarsky [:bz] from comment #11) > JS script filenames in Gecko have been UTF-8 basically since time > immemorial, as far as I can tell. It just passed in necko URI > stringifications, which are UTF-8.... > > James, are you still planning to work on this? I was hoping to push this through with a mentor's help, but at this point I'm not sure I'll be able to do that. I'm only vaguely familiar with our C++ string APIs and I still don't understand where the problem is, even after to talking to a few people about it. Jim, do you think you could take a whack at it? I know you're busy, so I can try to find someone else if not.
Flags: needinfo?(jlong)
Assignee: jlong → ttromey
It turns out that the ScriptSource filename's encoding is handled inconsistently. UTF-8 is intended, I believe, and used in most places. However an audit shows some uses of Latin1. E.g.: http://dxr.mozilla.org/mozilla-central/source/ipc/testshell/XPCShellEnvironment.cpp#167 I think this uses Latin-1 due to http://dxr.mozilla.org/mozilla-central/source/ipc/testshell/XPCShellEnvironment.cpp#156 This case and some others like it are even odder because I think the string in question comes from the command line, so it is presumably in the locale's encoding. Another spot using Latin-1: http://dxr.mozilla.org/mozilla-central/source/ipc/testshell/XPCShellEnvironment.cpp#156 See: http://dxr.mozilla.org/mozilla-central/source/js/src/jsreflect.cpp#3467 I think there are also a few more errors on the consumer side. http://dxr.mozilla.org/mozilla-central/source/js/src/vm/SavedStacks.cpp#683 (I believe Atomize uses Latin-1) http://dxr.mozilla.org/mozilla-central/source/dom/promise/PromiseCallback.cpp#258 It's hard to be sure I found all the problems (FYI - I didn't list them all here). If we want to fix this permanently then I think we'd want to use a type other than "char *" for the filename, to make the encoding explicit. Perhaps JS::UTF8CharsZ could be used.
> http://dxr.mozilla.org/mozilla-central/source/ipc/testshell/XPCShellEnvironment.cpp#167 This is just buggy. More precisely, the JSAutoByteString is buggy, since it's actually lossy in the non-ASCII filename case, unless the filename happens to be latin-1. > because I think the string in question comes from the command line, so it is presumably > in the locale's encoding By the time we're in Load we have a JSString. So it's explicitly UTF-16 at that point, unless whoever created the JSString screwed up by taking a byte sequence that wasn't ISO-8859-1 and just byte-inflated it to produce a bogus JSString... > Another spot using Latin-1: That's still the first spot, right? > http://dxr.mozilla.org/mozilla-central/source/js/src/jsreflect.cpp#3467 Yep, also looks buggy. > http://dxr.mozilla.org/mozilla-central/source/js/src/vm/SavedStacks.cpp#683 Yep. Though this may just be using the value as a hashtable key, so it might not matter that much if we byte-inflate instead of doing the UTF-8 conversion; it's not going to cause key collisions, and if we're consistent about it should not cause any lookup failures either. If that's the situation, this code needs some comments. > http://dxr.mozilla.org/mozilla-central/source/dom/promise/PromiseCallback.cpp#258 Yes, agreed. You can easily see the bug here by doing this: 1) Load http://www.אביזרים.net/ (thanks to smontagu for that link) 2) Open the web console. 3) Evaluate this code: var s = document.createElement("script"); s.textContent = "var p = Promise.resolve(5); var r = p.then(function() { return r; }); r.catch(function (e) { console.log(e.fileName) });"; document.body.appendChild(s); and observe the broken string that gets logged... which is broken in the same exact way as the web console's "url" display on this page. :( > I think we'd want to use a type other than "char *" for the filename Would be nice, yeah...
(In reply to Boris Zbarsky [:bz] from comment #15) > By the time we're in Load we have a JSString. So it's explicitly UTF-16 at > that point, unless whoever created the JSString screwed up by taking a byte > sequence that wasn't ISO-8859-1 and just byte-inflated it to produce a bogus > JSString... Yeah, whoops. Here's a case where I think that does happen though: http://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCShellImpl.cpp#840 > > Another spot using Latin-1: > > That's still the first spot, right? Bad paste, I meant to paste a pointer to another line in jsreflect.cpp.
We discussed it on irc. One idea was to use JSString here but after some investigation this seemed to be difficult, as ScriptSource isn't really known to the GC yet and that this route would be difficult. In the end we agreed to use UTF8Chars as the type.
It seems that neither UTF8Chars nor UTF8CharsZ are really suitable for this use. UTF8Chars is not assignable, so it can't be easily used, as a ScriptSource's filename is set after construction. The UTF8CharsZ constructor takes non-const arguments, so using it would involve casting away const in a number of places. So maybe it is better to keep it as-is and document the members in question as being utf-8 encoded.
(In reply to Boris Zbarsky [:bz] from comment #15) > > http://dxr.mozilla.org/mozilla-central/source/js/src/vm/SavedStacks.cpp#683 > > Yep. Though this may just be using the value as a hashtable key, so it > might not matter that much if we byte-inflate instead of doing the UTF-8 > conversion; it's not going to cause key collisions, and if we're consistent > about it should not cause any lookup failures either. If that's the > situation, this code needs some comments. I looked and I believe this atom is exported to js via SavedFrame::sourceProperty. https://dxr.mozilla.org/mozilla-central/source/js/src/vm/SavedStacks.cpp#309
Attached patch make ScriptSource filename encoding consistent (obsolete) (deleted) — Splinter Review
Here is an initial patch. It works for at least one test on my Linux box but is otherwise incomplete: * No new test cases; * I know of at least one spot that won't build on Windows (see the FIXME in XPCShellImpl.cpp), probably others too. I will fix this up after sending it through "try" In this patch I took a somewhat maximal approach of introducing a new utf8-string-ish type and then updating various members, argument types, and return types. This revealed a number of bugs. Commentary on the approach would be welcome.
Boy am I glad you took this bug.
Good work :)
This patch was even more incomplete than I thought, since I forgot to mark the ConstUTF8CharsZ constructor as "explicit". Doing that revealed several more latent bugs.
Attached patch make ScriptSource filename encoding consistent (obsolete) (deleted) — Splinter Review
This is an updated patch, but still provisional. I made the ConstUTF8CharsZ constructor explicit and fixed up the fallout from this. This found some more bugs. There are a few odd things. AsmJSModule::setProfilingEnabled uses JS_smprintf. With this change the encoding of the result isn't consistent. Note though that JS_smprintf already truncates char16_t values -- so perhaps the encoding isn't crucial. Otherwise I think the fix is to convert all strings to UTF-8 at this point. This occurs in SPSProfiler::allocProfileString as well, via its call to JS_snprintf. JSRuntime::initSelfHosting uses the result of getenv as a file name: char *filename = getenv("MOZ_SELFHOSTEDJS"); if (filename) { RootedScript script(cx); if (Compile(cx, shg, options, JS::ConstUTF8CharsZ(filename), &script)) ok = Execute(cx, script, *shg.get(), rv.address()); This still just uses whatever bytes were returned; but now we are assuming they are a UTF-8 encoded string. Some other spots now pass a UTF-8-encoded string to fopen, where previously they passed in something else: either Latin-1; or when using __FILE__, the compiler-determined runtime encoding; or in one case the string ultimately came from command-line arguments. It seems to me that using UTF-8 here isn't really more wrong than the status quo ante.
Attachment #8545459 - Attachment is obsolete: true
Attached patch make ScriptSource filename encoding consistent (obsolete) (deleted) — Splinter Review
Yet another update. This one tries to fix the XP_WIN thing and adds some tests. I re-read the patch and came up with this list of things to write tests for: EventSource PromiseCallback XPCShellEnvironment jsexn.cpp:Error js_ErrorToException ErrorReport::init js::GetPCCountScriptSummary jsreflect.cpp js.cpp:LoadScript js.cpp:ParseCompileOptions js.cpp:Run js.cpp:DisassFile js.cpp:ThisFilename Debugger.cpp prepareQuery DebuggerScript_getUrl DebuggerSource_getUrl DebuggerGenericEval js::ErrorObject::getOrCreateErrorReport SavedStacks::getLocation XPCConvert::JSErrorToXPCException XPCShellImpl on XP_WIN xpc::ErrorReport::Init XrayWrapper.cpp:ReportWrapperDenial I'm not totally sure how to test all of them.
Attachment #8546208 - Attachment is obsolete: true
Today I found out that TestingFunctions.cpp:GetBacktrace assumes that FormatStackDump uses Latin-1, but that is incorrect. Unfortunately this patch didn't directly expose this, I found out by reading and trying to figure out how to write a test case :(
Note that the js.cpp hits, being testing-only code, probably don't need any tests.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #27) > Note that the js.cpp hits, being testing-only code, probably don't need any > tests. Yeah, thanks. I could not think of a reasonable way to test many of these anyway. I do have a test for thisFilename, though, as it was easy and useful elsewhere. I couldn't figure out good tests for a subset of these functions. js::ErrorObject::getOrCreateErrorReport xpc::ErrorReport::Init ErrorReport::init These aren't used much. I thought perhaps a rejected Promise without a handler on a worker thread (the only way, I think, to see the console notice in a timely way), but then I couldn't figure out how to get the non-ASCII character into the worker thread's script source. At least based on the assumption that putting some utf-8-encoded file name into the source tree was not workable. XrayWrapper.cpp:ReportWrapperDenial This one sends a console message just once, so it would be dependent on the test environment. That is, if any earlier test in the same process happened to encounter an xray wrapper denial, then our test would fail. XPCConvert::JSErrorToXPCException Still thinking about this one but I did find some mildly odd code in there. The message argument is constructed as latin-1: https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCConvert.cpp?from=xpcconvert.cpp#1079 At first I thought this was odd since it is later converted to char16_t using CopyASCIIToUTF16, but tracking through all the functions I see this is just a misnomer and the converter accepts Latin-1. However, it still seems to me that there is no need to be lossy here. Also from the original list, the EventSource test has been obviated by an intervening change to nsJSUtils::GetCallingLocation.
Attached patch make ScriptSource filename encoding consistent (obsolete) (deleted) — Splinter Review
Newest version. I'm going to send this one through try.
Attachment #8546863 - Attachment is obsolete: true
> xpc::ErrorReport::Init This should be fairly simple to trigger: data:text/html,<script>throw 5</script> Or any other uncaught exception in script in a web page for that matter. In particular, you should be able to do a mochitest where you override window.onerror and then see whether it's getting the right data, right? > At least based on the assumption that putting some utf-8-encoded file name into > the source tree was not workable It's not, but for mochitest you can use a url like http://whatever?non-ascii-here, loaded in a subframe, right? > That is, if any earlier test in the same process happened to encounter an xray wrapper > denial, then our test would fail. Test in the same _global_, not same process, right? So this should be testable if we care.
(In reply to Boris Zbarsky [:bz] from comment #30) > > xpc::ErrorReport::Init > In particular, you should be able to do a mochitest where you override > window.onerror and then see whether it's getting the right data, right? [...] > It's not, but for mochitest you can use a url like > http://whatever?non-ascii-here, loaded in a subframe, right? Thanks for the hint. This took me an absurdly long time :) but in the end I realized that the key is that this code: https://dxr.mozilla.org/mozilla-central/source/js/src/jsexn.cpp?from=jsexn.cpp#802 is gated on a check that the exception is exception-like; so "throw 23" does not work but "throw new Error()" does. Whew. Also while there I noticed this: https://dxr.mozilla.org/mozilla-central/source/js/src/jsexn.cpp?from=jsexn.cpp#845 which seems like another unnecessary lossy conversion.
(In reply to Tom Tromey :tromey from comment #31) > > It's not, but for mochitest you can use a url like > > http://whatever?non-ascii-here, loaded in a subframe, right? > > Thanks for the hint. Actually this turns out not to work, because the query string is url-encoded somewhere along the way. However, I found server-locations.txt, so I will try using a non-ascii domain name.
Attached patch make ScriptSource filename encoding consistent (obsolete) (deleted) — Splinter Review
More tests.
Attachment #8548958 - Attachment is obsolete: true
Attached patch make ScriptSource filename encoding consistent (obsolete) (deleted) — Splinter Review
Fix various minor problems pointed out by the try build: * Missed some updates for Windows, Mac; * Rooting hazard (this one was a big surprise) in SavedStacks; fixed with a mild hack of using a "lossy" utf-8 conversion; * Thinko in test_xrayLogEncoding.xul; * Require xulRuntime in error-encoding.js test. The rooting hazard bit gives me some pause. Most spots that create a new file name are safe; but there are a number, in tests and the various test shells; that simply pass in data unexamined. A test case could certainly provoke the lossy conversion behavior here. One fix would be to introduce a check at the spot where a JS::ConstUTF8CharsZ is created. Then we're getting out of "mostly transparent utility class" and into something heavier. Another fix would be to simply convert everything to char16_t* as is done for some other members of ScriptSource. (However when we discussed this on irc, IIRC, there was an objection to char16_t* based on size; hence the present patch.) While I'm on the subject, the patch could also maybe be cleaned up by adding knowledge of JS::ConstUTF8CharsZ to more spots. Many places (but not all) using .c_str() could instead be changed to directly use the ConstUTF8CharsZ via an appropriate overload.
Attachment #8549879 - Attachment is obsolete: true
Attached patch make ScriptSource filename encoding consistent (obsolete) (deleted) — Splinter Review
The previous attempt at rooting hazard fix did not work. Apparently, and I suppose rightly so, it does not like indirect function calls; so change the one case to a template function instead.
Attachment #8550401 - Attachment is obsolete: true
Comment on attachment 8550523 [details] [diff] [review] make ScriptSource filename encoding consistent I think this is ready for review. It is rather ugly; but if you want it less ugly, then I could do with some guidance as to the direction you'd like. Some comments point out various ugly bits; see comment #24 and comment #35. Also I was unsure whether I needed a review from someone else for the bits outside of js.
Attachment #8550523 - Flags: review?(jwalden+bmo)
Attached patch make ScriptSource filename encoding consistent (obsolete) (deleted) — Splinter Review
I've rebased the patch. This new version is pretty much the same as the original. A couple of hunks were removed due to intervening changes.
Attached patch fixes after rebasing (obsolete) (deleted) — Splinter Review
This patch fixes a couple build issues caused by rebasing. These are new spots using the changed API.
Waldo pointed out that the patch neglected spots passing ConstUTF8CharsZ objects through varargs. Generally this occurs in printf-like contexts. This situation is somewhat disturbing to me. The patch built everywhere -- but was still wrong. So it is hard to be confident that future changes will also be correct. clang warns about this situation, at least for Waldo -- I built clang ToT but it did not give any warnings. GCC 5 supposedly can warn about this with -Wconditionally-supported, but that didn't work for me (bug reported) and anyway that enables other warnings, otherwise breaking the build. Plus we can't require GCC 5. In order to find the failing spots I wrote a GCC plugin. It finds other bugs not to be fixed in this patch. I could make this plugin available or check it in somewhere, I suppose. It just rejects passing aggregate objects through varargs, which seems like what one generally wants anyway. There's also bug 553032, but unfortunately due to "%hs" we cannot mark all the printf-like functions in js with the printf attribute.
Comment on attachment 8550523 [details] [diff] [review] make ScriptSource filename encoding consistent Review of attachment 8550523 [details] [diff] [review]: ----------------------------------------------------------------- Largely fine, but large enough that the remainder is also kinda large. PLEASE, for the love of all that is holy, post subsequent patches as separate patches that apply atop the base patch. It was close to a week to review this; I'd rather repeat as little of that experience as possible. :-) I see a couple relatively pervasive assumptions here: * all __FILE__ paths are pure ASCII, or at least UTF-8-encoded (not specified by C++) On Linux I think this is true these days. Research suggests it's not entirely the case on OS X, tho. And it's just not at all on Windows. <http://stackoverflow.com/questions/23285759/fopen-file-name-with-utf8-string-in-windows> However, it seems like most of these uses are going to refer to paths to source files, in a source tree -- and we already require various things about where one puts a source tree when building. And probably the only way to deal with this is using a full-fledged file abstraction that understands native path encodings, along the lines of nsILocalFile. So this is probably good enough for now, I guess. But I won't be surprised to find out that requiring UTF-8 here breaks something. * fprintf(std{err,out}) will interpret output as UTF-8 Most Linux distros will handle UTF-8, and probably OS X does now, and these are probably mostly debugging uses, so what the heck. We probably assume this a whole bunch of other places, what's one more. ::: dom/base/nsJSUtils.cpp @@ +37,2 @@ > nsJSUtils::GetCallingLocation(JSContext* aContext, nsACString& aFilename, > uint32_t* aLineno) Please add a comment by the declaration of this function indicating that |aFilename| will be filled with a UTF-8 encoding of the string. ::: dom/promise/PromiseCallback.cpp @@ +248,5 @@ > Promise* returnedPromise; > nsresult r = UNWRAP_OBJECT(Promise, valueObj, returnedPromise); > > if (NS_SUCCEEDED(r) && returnedPromise == mNextPromise) { > + JS::ConstUTF8CharsZ fileName; = JS::ConstUTF8CharsZ("") so as to not possibly pass nullptr to JS_NewStringCopyUTF8Z. ::: dom/promise/tests/test_encoding.html @@ +19,5 @@ > +var Cu = Components.utils; > + > +function runTest() { > + // Anything outside of ASCII will do here. > + var URL = "file:///whee\u2708.js"; Please don't use the same URL for the sandbox principal, and for the eval code. They should be different so we know we're testing the right datum -- namely, the argument to evalInSandbox. I suggest var principal = "file:///whee\u2708/"; var URL = principal + "\uD83D\uDEE7.js"; or something that puts distinct Unicode stuffs in both the principal *and the URL. Well, actually: if it were me, I wouldn't use file: URLs for this. The problem with file: URLs is that the way they get mapped to security origins is unspecified across browsers, and our policy, while a relatively sensible one, isn't the only one on the block. (Specifically, our policy makes file: a multitude of origins, one per directory. Hence the necessity for URL here to be within the folder delineated by principal.) It would be better to use an HTTP URL with IDN in it -- say, http://天/ ("http://\u5929/") or http://
Attachment #8550523 - Flags: review?(jwalden+bmo) → review-
DANGER WILL ROBINSON Somehow Bugzilla has "eaten" screenfuls and screenfuls and screenfuls of other comments I had on that patch. YOU DO NOT ACTUALLY HAVE TWO SCREENS OF REVIEW COMMENTS ON THIS PATCH. :-) I have no idea why this happened. I was able to successfully hit Back a couple times to get back a full accounting of all the comments I had, so at least I can recover that several days' worth of work and get it posted here for your use soon. Hopefully. STAND BY FOR FURTHER INSTRUCTIONS.
So this at least is not a very *nice* or readable version of my comments (printed out on 8.5"x11" paper it'd probably be almost too small to read). But it at least *is* a copy of them, and it doesn't appear like it's dropped anything due to printing bugs. I'll try to get a PDF with larger fonts in it, but this at least appears to serve to capture all my comments, in the event I can't extract them in a better format.
Comment on attachment 8550523 [details] [diff] [review] make ScriptSource filename encoding consistent Review of attachment 8550523 [details] [diff] [review]: ----------------------------------------------------------------- Trying to resubmit r- here to pick up additional comments. Hopefully it'll work, but I'm not hugely optimistic. If it doesn't give any more than before, ignore. ::: dom/base/nsJSUtils.cpp @@ +37,2 @@ > nsJSUtils::GetCallingLocation(JSContext* aContext, nsACString& aFilename, > uint32_t* aLineno) Please add a comment by the declaration of this function indicating that |aFilename| will be filled with a UTF-8 encoding of the string. ::: dom/promise/PromiseCallback.cpp @@ +248,5 @@ > Promise* returnedPromise; > nsresult r = UNWRAP_OBJECT(Promise, valueObj, returnedPromise); > > if (NS_SUCCEEDED(r) && returnedPromise == mNextPromise) { > + JS::ConstUTF8CharsZ fileName; = JS::ConstUTF8CharsZ("") so as to not possibly pass nullptr to JS_NewStringCopyUTF8Z. ::: dom/promise/tests/test_encoding.html @@ +19,5 @@ > +var Cu = Components.utils; > + > +function runTest() { > + // Anything outside of ASCII will do here. > + var URL = "file:///whee\u2708.js"; Please don't use the same URL for the sandbox principal, and for the eval code. They should be different so we know we're testing the right datum -- namely, the argument to evalInSandbox. I suggest var principal = "file:///whee\u2708/"; var URL = principal + "\uD83D\uDEE7.js"; or something that puts distinct Unicode stuffs in both the principal *and the URL. Well, actually: if it were me, I wouldn't use file: URLs for this. The problem with file: URLs is that the way they get mapped to security origins is unspecified across browsers, and our policy, while a relatively sensible one, isn't the only one on the block. (Specifically, our policy makes file: a multitude of origins, one per directory. Hence the necessity for URL here to be within the folder delineated by principal.) It would be better to use an HTTP URL with IDN in it -- say, http://天/ ("http://\u5929/") or http://
Attachment #8550523 - Flags: review-
Comment on attachment 8550523 [details] [diff] [review] make ScriptSource filename encoding consistent Review of attachment 8550523 [details] [diff] [review]: ----------------------------------------------------------------- One last try -- with every comment edited minimally (but never substantively), to see if changing all of them will force them all to be submitted. :-( ::: dom/base/nsJSUtils.cpp @@ +37,2 @@ > nsJSUtils::GetCallingLocation(JSContext* aContext, nsACString& aFilename, > uint32_t* aLineno) Please add a comment by the declaration of this function indicating |aFilename| will be filled with a UTF-8 encoding of the string. ::: dom/promise/PromiseCallback.cpp @@ +248,5 @@ > Promise* returnedPromise; > nsresult r = UNWRAP_OBJECT(Promise, valueObj, returnedPromise); > > if (NS_SUCCEEDED(r) && returnedPromise == mNextPromise) { > + JS::ConstUTF8CharsZ fileName; = JS::ConstUTF8CharsZ("") to not possibly pass nullptr to JS_NewStringCopyUTF8Z. ::: dom/promise/tests/test_encoding.html @@ +19,5 @@ > +var Cu = Components.utils; > + > +function runTest() { > + // Anything outside of ASCII will do here. > + var URL = "file:///whee\u2708.js"; Don't use the same URL for the sandbox principal, and for the eval code. They should be different so we know we're testing the right datum -- namely, the argument to evalInSandbox. I suggest var principal = "file:///whee\u2708/"; var URL = principal + "\uD83D\uDEE7.js"; or something that puts distinct Unicode stuffs in both the principal *and the URL. Well, actually: if it were me, I wouldn't use file: URLs for this. The problem with file: URLs is that the way they get mapped to security origins is unspecified across browsers, and our policy, while a relatively sensible one, isn't the only one on the block. (Specifically, our policy makes file: a multitude of origins, one per directory. Hence the necessity for URL here to be within the folder delineated by principal.) It would be better to use an HTTP URL with IDN in it -- say, http://天/ ("http://\u5929/") or http://
Okay, that didn't work either. I guess you get to read review comments from an obscenely-small-text PDF file. Sorry. :-(
Oops, forgot. In the PDF comments at one point I mentioned that we should do some sort of minimal validation of ostensibly UTF-8 const char* to see they *actually* contain that. This attachment provides C++ that should do this -- untested, but throwing it in a browser build and using it in JS::ConstUTF8CharsZ() should quickly point out any places where we pass in non-UTF-8 as if it were so (at least, assuming we pass in *actual* non-UTF-8 content -- which may be somewhat trickier make happen, in practice).
A few preliminary notes on the review -- > SpiderMonkey doesn't m/a/s-prefix names, so this should just be data or something. I'll change it, but in this case I was copying the style of other code in that same file. > While you're touching this, mind switching the linenos to PRIuSIZE in mozilla/SizePrintfMacros.h and > removing casts? Amazingly, I am not sure this is possible. It's definitely fine for all the spots that eventually call one of the stdio *printf functions. However, spots that call the nspr printf-like, or the js equivalent (apparently cut-and-pasted from nspr and then modified) can't perhaps use these macros. The reason is that nspr interprets printf formats differently from the standard forms; e.g., for nspr %l always means a 64-bit type, whereas for the system printf it means "long". In this particular instance, I think the main problem is that neither nspr nor JS handle the 'z' or 'I' modifiers, which are used by PRIuSIZE, depending on host. Bug 1060419 posits that it is probably not possible to change nspr to be more standard-like here. This seems actively bad to me... > Oh blah, this is why you changed that one use of this in TestingFunctions.cpp. Bleargh. Congratulations on > getting to audit all the code used by FormatStackDump to make it all consistently UTF-8! :-\ We can't have > it be half-Latin1, half-UTF-8, because then it's neither. I didn't re-examine this spot particularly, but note that the code base already has encoding issues. For example, if you want to be grossed out, take a look at how JS_smprintf and friends handle char16_t*, aka "%hs": https://dxr.mozilla.org/mozilla-central/source/js/src/jsprf.cpp?from=jsprf.cpp#100 > I don't particularly understand this. Why is it okay to be lossy here, versus sucking it up to always be > correct, even if extra work is required to keep |key| in acceptable shape about GC hazards? I don't > understand this well enough to propose the solution, myself -- but we should have one. Pretty sure it's not > okay to randomly lose data here. Yes, it's an oddity. If one assumes that the string in question cannot be incorrectly constructed, then the conversion is never lossy in practice. This is just a hack to deal with the rooting hazard issue. I'll try harder to make this code GC-safe.
(In reply to Tom Tromey :tromey from comment #50) > In this particular instance, I think the main problem is that neither nspr > nor JS handle the > 'z' or 'I' modifiers, which are used by PRIuSIZE, depending on host. TIL bug 1088790 is fixing this for nspr. So, I can fix this for JS (though why doesn't JS use the nspr code...?) and then fix the problems.
Depends on: 1088790
Depends on: 1130114
Blocks: 1130117
(In reply to Tom Tromey :tromey from comment #50) > > While you're touching this, mind switching the linenos to PRIuSIZE in > > mozilla/SizePrintfMacros.h and removing casts? > > Amazingly, I am not sure this is possible. > > It's definitely fine for all the spots that eventually call one of the stdio > *printf functions. So I might have been wrong, but I thought I was careful with these to only make the suggestion for the places that *did* flow into stdio functions. The places that don't flow into such functions, I did not intend to comment upon. I have no need/desire for you to do anything to fix such cases, in any way preceding or following up on this bug. We should deal eventually, but put it out of sight, out of mind for now.
Depends on: 1130166
> So I might have been wrong, but I thought I was careful with these to only > make the suggestion for the places that *did* flow into stdio functions. You probably did and I didn't notice. But no big deal. > The places that don't flow into such functions, I did not intend to comment > upon. I have no need/desire for you to do anything to fix such cases, in > any way preceding or following up on this bug. We should deal eventually, > but put it out of sight, out of mind for now. It was easy to just fix it throughout js/, see bug 1130166.
Depends on: 1135731
I've been thinking that it would be cleaner and safer to simply use char16_t* here. Aside from printf, which is already a pain, it would enable the compiler to point out thinkos in the translation. My experience just tracking through the XBL bug here was what pulled me over to this way of thinking. And since the code already is using char16_t* strings in the same object, it seems like the space issue is maybe not so bad. NIing you before I embark on a total rewrite to see what you think.
Flags: needinfo?(jwalden+bmo)
Guh. If we were starting afresh, maybe char16_t* everywhere wouldn't be bad. At this point, frankly, I'd rather pocket the gain we have mostly done already. And if we're still worried, and it's fair for there to be reason for such, I'd say that's when we should do the work to be able to use JSString*, then actually make a switch to that. Okay okay, GC changes aren't fun. But I don't think that's excuse enough to punt on that, not when this isn't the only place that would use the functionality if it could.
Flags: needinfo?(jwalden+bmo)
Blocks: 1192763
tromey said this was the reason that if http://шомбург.рф/ is loaded in devtools the network tab shows garbage everywhere the domain is visible (domain column, request url). Copy as url, copy as cURL and Edit and Resend all seem to work fine though. As well as the Headers which are ofc punycode.
(In reply to nemo from comment #56) > tromey said this was the reason that if http://шомбург.рф/ is loaded in > devtools the network tab shows garbage everywhere the domain is visible Then I wonder why it is shown correctly in the Network panel for the example page http://öl.de/ I posted (but in punycode within the Debugger panel). @Tom: Do those two issues really have the same root cause or should a new bug report be created to tackle the issue in the Network panel? Sebastian
Flags: needinfo?(ttromey)
(In reply to Sebastian Zartner [:sebo] from comment #57) > Then I wonder why it is shown correctly in the Network panel for the example > page http://öl.de/ I posted (but in punycode within the Debugger panel). > > @Tom: Do those two issues really have the same root cause or should a new > bug report be created to tackle the issue in the Network panel? No, sorry, I think I was confused on irc. This bug affects ScriptSource, so it should be just js-related things, I think. I don't think it ought to affect the network monitor generally. So, I'd suggest a new bug.
Flags: needinfo?(ttromey)
(In reply to Tom Tromey :tromey from comment #58) > I don't think it ought to affect the network monitor generally. > So, I'd suggest a new bug. Ok, I've created bug 1250397. Sebastian
From the review: > js/src/vm/SPSProfiler.cpp > [...] > As for whether all that code is copacetic with this. This is unclear to me Looking at SPSProfiler::allocProfileString, I see there is already code there using UTF-8. So I think the patch is an improvement here. See: https://dxr.mozilla.org/mozilla-central/source/js/src/vm/SPSProfiler.cpp#352
Attached patch make ScriptSource filename encoding consistent (obsolete) (deleted) — Splinter Review
Attachment #8550523 - Attachment is obsolete: true
Attachment #8557352 - Attachment is obsolete: true
Attachment #8557353 - Attachment is obsolete: true
Attachment #8557356 - Attachment is obsolete: true
Attached patch trivial fixes from the review (obsolete) (deleted) — Splinter Review
MozReview-Commit-ID: 85dtQ7z2DK7
Attached patch fixes needed due to rebasing (obsolete) (deleted) — Splinter Review
MozReview-Commit-ID: DhDrw7os8KQ
Attached patch add ConstUTF8CharsZ validation (obsolete) (deleted) — Splinter Review
MozReview-Commit-ID: aNfv4XMRTx
Attached patch fix up FormatStackDump (obsolete) (deleted) — Splinter Review
MozReview-Commit-ID: K9tDXgssLnD
Attached patch fix up NotableScriptSourceInfo (obsolete) (deleted) — Splinter Review
not quite what was asked for but close MozReview-Commit-ID: GauY2trhSgA
Attached patch fix up ubinodes (obsolete) (deleted) — Splinter Review
needs a test MozReview-Commit-ID: ANWJRbNSYg
Attached patch fix up debugger (obsolete) (deleted) — Splinter Review
MozReview-Commit-ID: JGYXn4Ksrt4
Attached patch update tests for more kinds of characters (obsolete) (deleted) — Splinter Review
MozReview-Commit-ID: 3vEcm264Vch
Attachment #8732194 - Attachment is obsolete: true
Attachment #8732195 - Attachment is obsolete: true
Attachment #8732197 - Attachment is obsolete: true
Attachment #8732198 - Attachment is obsolete: true
Attachment #8732199 - Attachment is obsolete: true
Attachment #8732200 - Attachment is obsolete: true
Attachment #8732201 - Attachment is obsolete: true
Attachment #8732202 - Attachment is obsolete: true
Attachment #8732203 - Attachment is obsolete: true
Attachment #8732204 - Attachment is obsolete: true
Attached patch trivial fixes from the review (deleted) — Splinter Review
MozReview-Commit-ID: 85dtQ7z2DK7
Attached patch fixes needed due to rebasing (deleted) — Splinter Review
MozReview-Commit-ID: DhDrw7os8KQ
Attached patch add ConstUTF8CharsZ validation (deleted) — Splinter Review
MozReview-Commit-ID: aNfv4XMRTx
Attached patch fix up FormatStackDump (deleted) — Splinter Review
MozReview-Commit-ID: K9tDXgssLnD
Attached patch fix up NotableScriptSourceInfo (deleted) — Splinter Review
not quite what was asked for but close MozReview-Commit-ID: GauY2trhSgA
Attached patch fix up ubinodes (deleted) — Splinter Review
needs a test MozReview-Commit-ID: ANWJRbNSYg
Attached patch fix up debugger (deleted) — Splinter Review
MozReview-Commit-ID: JGYXn4Ksrt4
MozReview-Commit-ID: 3vEcm264Vch
Comment on attachment 8732211 [details] [diff] [review] make ScriptSource filename encoding consistent This is the first patch again. I tried to touch it as little as possible during the various rebases. However, I'm sure some minor changes slipped in. There was at least one case where a file (the reflect-parse.js test) was split into many files, and so rather than resurrect the file, I made a new file, included in this patch. The patches in this series aren't independent. If you want to try anything you will have to apply them all, in the order attached. I plan to squash all the patches before landing. They are separate only for review purposes. This whole series is a giant mess. If you want I can try to split this patch into more bite-size pieces. I didn't do this because of comment #43.
Attachment #8732211 - Flags: review?(jwalden+bmo)
Comment on attachment 8732212 [details] [diff] [review] fix spots passing ConstUTF8CharsZ through varargs The review pointed out spots where a ConstUT8CharsZ was being passed to a varargs function. This patch fixes all the spots I found. It occurs to me that I didn't re-run my varargs checker. I will do so before landing. If it finds anything, I'm sure it will just be trivia like the other things appearing here.
Attachment #8732212 - Flags: review?(jwalden+bmo)
Comment on attachment 8732213 [details] [diff] [review] trivial fixes from the review The review had a large number of "trivial" requests - renamings, minor refactorings, comment additions, etc. This patch collects all the small ones.
Attachment #8732213 - Flags: review?(jwalden+bmo)
Comment on attachment 8732214 [details] [diff] [review] fixes needed due to rebasing After rebasing, of course, the result failed to compile. There were a number of new spots needing the same treatment, primarily adding JS::ConstUTF8CharsZ uses. This patch collects most of the rebasing fixups, though I can't promise that none leaked into other patches. I think this patch is largely "more of the same".
Attachment #8732214 - Flags: review?(jwalden+bmo)
Comment on attachment 8732215 [details] [diff] [review] add ConstUTF8CharsZ validation The review requested some UTF-8 validation in debug mode. You had mentioned just checking some characters, but it seemed simpler to just reuse our existing UTF-8 decoder, and furthermore checking all the characters did not seem so bad, especially because the checking is DEBUG-only.
Attachment #8732215 - Flags: review?(jwalden+bmo)
Comment on attachment 8732216 [details] [diff] [review] fix up FormatStackDump The review called out FormatStackDump as needing more work. I believe this patch fixes up the fallout.
Attachment #8732216 - Flags: review?(jwalden+bmo)
Comment on attachment 8732217 [details] [diff] [review] fix up NotableScriptSourceInfo The review asked for: > Can you propagate ConstUTF8CharsZ down into the type of the allScriptSources > map here, and NotableStringInfo implicated by it? Everything looks fine in > the users as far as encoding goes (nsIMemoryReporterCallback::callback > accepts an AUTF8String path), but better to be more explicit about it. I looked at this, but changing the map itself meant making ConstUTF8CharsZ copyable and moveable, which I didn't want to do. Also I think the "NotableStringInfo" there is a typo for "NotableScriptSourceInfo". I did propagate the type into the latter, since this makes the result a bit more obviously correct.
Attachment #8732217 - Flags: review?(jwalden+bmo)
Comment on attachment 8732218 [details] [diff] [review] fix up ubinodes Oops, I see the commit comment here says "needs a test"; but I did add one and just forgot to remove that note. Just FYI - it doesn't matter in the end since this series needs to be squashed before landing. I think the ubi node code assumes that filenames are Latin-1 encoded, in particular via its use of Atomize. This patch fixes these to use UTF-8 instead, following the basic approach taken by the rest of this series.
Attachment #8732218 - Flags: review?(nfitzgerald)
Comment on attachment 8732219 [details] [diff] [review] fix up debugger There are more spots in Debugger.cpp that assume Latin-1 where UTF-8 is needed. In particular the DefineProperty overload in Debugger.cpp uses JS_NewStringCopyN. I checked all callers to assure myself that this was safe, but I think it would be good for someone else to take a close look at that.
Attachment #8732219 - Flags: review?(nfitzgerald)
Comment on attachment 8732220 [details] [diff] [review] update tests for more kinds of characters In the review you asked for combining characters and non-base-plane characters to be included in the tests. This patch updates most such tests to do so -- a few changes along these lines snuck into earlier patches during the rebase frenzy. I didn't change the file: URLs in test_encoding. It didn't seem crucial to me; however if you still want this, just let me know.
Attachment #8732220 - Flags: review?(jwalden+bmo)
Comment on attachment 8732218 [details] [diff] [review] fix up ubinodes Review of attachment 8732218 [details] [diff] [review]: ----------------------------------------------------------------- Thanks!
Attachment #8732218 - Flags: review?(nfitzgerald) → review+
Comment on attachment 8732219 [details] [diff] [review] fix up debugger Review of attachment 8732219 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsapi.h @@ +4375,5 @@ > extern JS_PUBLIC_API(JSString*) > JS_NewStringCopyUTF8Z(JSContext* cx, const JS::ConstUTF8CharsZ& s); > > +extern JS_PUBLIC_API(JSString *) > +JS_NewStringCopyUTF8N(JSContext *cx, const JS::UTF8Chars& s); Ugh "JS_"-prefix, but I suppose consistency is good...
Attachment #8732219 - Flags: review?(nfitzgerald) → review+
That's a terrible try run, but so far the fixes have just been trivia.
Attachment #8732212 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8732215 [details] [diff] [review] add ConstUTF8CharsZ validation Review of attachment 8732215 [details] [diff] [review]: ----------------------------------------------------------------- This is totes going to be fun to have in the tree, amirite? :-) But such a good thing...
Attachment #8732215 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8732217 [details] [diff] [review] fix up NotableScriptSourceInfo Review of attachment 8732217 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/MemoryMetrics.cpp @@ +689,5 @@ > if (!runtime.notableScriptSources.growBy(1)) > return false; > > + runtime.notableScriptSources.back() = > + NotableScriptSourceInfo(JS::ConstUTF8CharsZ(filename), info); In passing I wonder why |filename| can't be ConstUTF8CharsZ itself, but I lack time immediately to figure this out. We should try to inject the type harder into this, of course -- if not before landing, then after in somewhat short order.
Attachment #8732217 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8732213 [details] [diff] [review] trivial fixes from the review Review of attachment 8732213 [details] [diff] [review]: ----------------------------------------------------------------- ::: ipc/testshell/XPCShellEnvironment.cpp @@ +163,4 @@ > return false; > FILE *file = fopen(filename.ptr(), "r"); > if (!file) { > JS_ReportError(cx, "cannot open file '%s' for reading", filename.ptr()); Erm, did I mention this in reviewing before or anything? This is going to shove UTF-8 data into JS_ReportError's generated string -- which delegates to ReportErrorVA, which delegates to InflateString, which won't correctly handle this UTF-8. Boiling the error-reporting ocean to handle UTF-8/UTF-16 fully, everywhere, definitely has to be a separate bug. (Probably on par in size and pain with this one. :-( ) How about you (ugh) triplicate all the JS_Report* APIs? Have the original name, then have separate JS_Report*Latin1 and JS_Report*UTF8 versions that call the original function. Inside each of these functions, put an XXX comment or something indicating this is lies. Then file a bug on properly implementing the *Latin1/*UTF8 functions. Then, conditioned on that, we can file bug(s) on eliminating all uses of the JS_Report* APIs, to be replaced with uses of the properly-charsetted versions. Sound like a plan that lets us quasi-incrementally progress? ::: js/public/CharacterEncoding.h @@ +118,3 @@ > class ConstUTF8CharsZ > { > + const char *data; * by type, and in the other places below here. ::: js/public/ProfilingStack.h @@ +196,2 @@ > JS_FRIEND_API(void) > RegisterRuntimeProfilingEventMarker(JSRuntime* rt, void (*fn)(const char*)); I can't remember, is there a reason the callback couldn't take |const JS::ConstUTF8CharsZ&|? Unless there's a good reason, let's do this right from the start and make that so. ::: js/src/jsscript.h @@ +653,1 @@ > UniqueChars filename_; Adding UTF8 to the end (before _) of this name would be good, same reason as for the other thing I mentioned in here as benefiting from a rename. And as there, I probably should review it. :-\ @@ +676,1 @@ > UniqueChars introducerFilename_; Another UTF8 naming case. Bah. ::: js/src/vm/Debugger.cpp @@ +4005,1 @@ > JSAutoByteString urlCString; Can we put UTF8 into this field name at the end or so? (The rename probably should be reviewed, sadly, to be sure all consumers are properly handling things. Since this is in "requested from review" bits, I probably should review the patch-that-only-renames-this, even if the individual bits of it seem/are trivial. :-\ ) ::: js/src/vm/SPSProfiler.h @@ +194,1 @@ > void setEventMarker(void (*fn)(const char*)); And I guess if the friendapi changed to the UTF-8-typed thing, this type would change too, right? ::: js/src/vm/SavedStacks.cpp @@ +1327,5 @@ > if (const char16_t* displayURL = iter.displayURL()) { > locationp.setSource(AtomizeChars(cx, displayURL, js_strlen(displayURL))); > + } else if (JS::ConstUTF8CharsZ filename = iter.filename()) { > + locationp.setSource(AtomizeUTF8Chars(cx, filename.c_str(), > + strlen(filename.c_str()))); I'm assuming for now that this takes a byte-count and not a codepoint-length (seems likely). @@ +1354,5 @@ > RootedAtom source(cx); > if (const char16_t* displayURL = iter.displayURL()) { > source = AtomizeChars(cx, displayURL, js_strlen(displayURL)); > + } else if (JS::ConstUTF8CharsZ filename = script->filename()) { > + source = AtomizeUTF8Chars(cx, filename.c_str(), strlen(filename.c_str())); And again. ::: js/src/vm/String.h @@ +1202,5 @@ > } > > template <js::AllowGC allowGC> > inline JSFlatString * > +NewStringCopyUTF8Z(JSContext *cx, const JS::ConstUTF8CharsZ &s) Dunno the state of things, but all the */& have moved to the left these days. Maybe this is an old tree, or these were places we missed. ::: js/xpconnect/src/XPCShellImpl.cpp @@ +343,5 @@ > return false; > FILE* file = fopen(filename.ptr(), "r"); > if (!file) { > JS_ReportError(cx, "cannot open file '%s' for reading", > filename.ptr()); Another place wanting JS_ReportErrorUTF8. ::: toolkit/components/telemetry/Telemetry.cpp @@ +2975,5 @@ > if (!ret) { > return nullptr; > } > for (size_t i = 0; i < stack.length(); i++) { > + JS::RootedString string(cx, JS_NewStringCopyUTF8Z(cx, JS::ConstUTF8CharsZ(stack[i]))); Move the Rooted out of the loop. I...vaguely remember it being right to assume HangStack entries were UTF-8 after this patch. I hope that was right. We'll need to double-check this at some point, but I'm sadly out of time right now to do it just now. :-(
Attachment #8732213 - Flags: review?(jwalden+bmo) → review+
Attachment #8732216 - Flags: review?(jwalden+bmo) → review+
Attachment #8732220 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8732211 [details] [diff] [review] make ScriptSource filename encoding consistent Review of attachment 8732211 [details] [diff] [review]: ----------------------------------------------------------------- In hindsight, while separating extra changes into extra patches was great, I probably should have asked for a rollup patch as well, to consult when determining whether latent bugs in this patch had been fixed in subsequent patching in this bug. Oh well. (I was going to read this line-by-line again anyway to make sure I hadn't missed anything, and to verify things that appeared to be bugs were subsequently fixed, even if I hadn't mentioned them in the original review.) That seems to be all I have from another pass on this patch. BUT, I still need to go through my original review comments and verify things were addressed as needed. It doesn't immediately appear everything was. For example, nsJSUtils::GetCallingLocation doesn't seem to have a comment documenting UTF-8-ness of the filled-in file name, or at least I don't remember it in any of these patches. I pointed out issues in nsXBLProtoImplMethod.cpp and nsXBLProtoImplProperty.cpp that I didn't see this time, which either means someone fixed the code since the initial patch, or I was blind somehow this time. (Probably the latter.) I hope to finish reading through my initial review comments tomo^H^Hday to give a final (probably-)r+. In the meantime, I'll leave you to start responding to this however you feel like -- just don't post an updated patch against this before I post final comments here. Sorry for the delay, and thanks for being moderately patient in waiting on me to review this! ::: dom/base/nsGlobalWindow.cpp @@ +10680,1 @@ > lineno); I believe that readonly attribute ACString scriptFileName; in nsIHangReport.idl should become AUTF8String to properly reflect that this can be UTF-8 now. (But it would be good for someone to double-check me, as I don't 100% remember XPIDL string types these days, and how ACString differs from AUTF8String.) ::: dom/base/nsJSUtils.cpp @@ +41,5 @@ > if (!JS::DescribeScriptedCaller(aContext, &filename, aLineno, aColumn)) { > return false; > } > > + aFilename.Assign(filename.get().c_str()); Sooooooooooooooooooooooooo. If I trace through far enough, it looks like this string is a result of |timeout->mScriptHandler->GetLocation(&filename, &lineNum, &column);| in nsGlobalWindow.cpp. That string, flowing through super-gnarly code, eventually flows into MOZ_LOG stuffs. That either prints to stderr, which we're supposing is okay. *Or*, it flows off somewhere customizable. And I *think* one target of that customization, in our tree, is profiler code. Please find someone with profiler-knowingness to verify that the profiler is going to be okay taking in UTF-8 strings, and that it won't assume Latin-1 or anything. ::: dom/base/nsScriptLoader.cpp @@ +1047,5 @@ > // aRequest ended up getting script data from, as the script filename. > nsContentUtils::GetWrapperSafeScriptFilename(mDocument, aRequest->mURI, aRequest->mURL); > > aOptions->setIntroductionType("scriptElement"); > + aOptions->setFileAndLine(JS::ConstUTF8CharsZ(aRequest->mURL.get()), aRequest->mLineNo); nsScriptLoadRequest::mURL is used in exactly these two lines, and setFileAndLine copies the provided filename. Eliminate the field and use a local variable to hold this, so it's clearer that the UTF-8-ness matters *only* for this line and doesn't have potential ramifications later if passed to some API that doesn't handle UTF-8 data. ::: dom/plugins/base/nsNPAPIPlugin.cpp @@ +1417,5 @@ > ("NPN_Evaluate(npp %p, npobj %p, script <<<%s>>>) called\n", > npp, npobj, script->UTF8Characters)); > > JS::CompileOptions options(cx); > + options.setFileAndLine(JS::ConstUTF8CharsZ(spec), 0) I can't remember, was it okay to pass in a null filename here? Because there's one case above where that can happen. Does this need to pass in something like "[unknown filename]" in this case? ::: ipc/testshell/XPCShellEnvironment.cpp @@ +164,5 @@ > return false; > FILE *file = fopen(filename.ptr(), "r"); > if (!file) { > JS_ReportError(cx, "cannot open file '%s' for reading", filename.ptr()); > return false; Hoo boy. This supposes that JS_ReportError accepts and understands UTF-8 input. Which it does not: JS_ReportError calls js::ReportErrorVA which expands the format string plus arguments into a string which it passes to js::InflateString interprets the string as Latin-1. (I *think* all the other JS_ReportError* functions also interpret everything as Latin-1, but we'll need to verify that.) So we have a problem. For the purposes of *this* patch, I think you should do the following. First, rename JS_ReportError to JS_ReportErrorLatin1. Second, add JS_ReportError that calls JS_ReportErrorLatin1 (or to its similar guts, because you can't just trivially delegate behavior of one varargs function to another). Third, add JS_ReportErrorUTF8. For the purposes of this bug, make it delegate *exactly* as JS_ReportErrorLatin1 does, with an XXX comment indicating that its requirement of UTF-8 in its arguments is a lie. That's enough to land this bug. This obviously leaves error messages in a bad way, and JS_ReportErrorUTF8 will be an actual lie. So as a very fast followup, separate bug, that function will need to be *actually* implemented. I *think* (but am prepared to be proven wrong) your template trick as used with (?) InflateString or whatever, applied to js::ReportErrorVA, is probably the most sensible approach, to have the reporting code written once while dealing with multiple possible encodings. That patch can also include tests for proper behavior in places that use JS_ReportErrorUTF8, like here. (It should be a trivial xpcshell test to load a non-existent file with a bunch of Unicode goop in its name, then verify the error message contains the proper file name.) @@ +327,5 @@ > ungetc(ch, file); > > JS::CompileOptions options(cx); > options.setUTF8(true) > + .setFileAndLine(JS::ConstUTF8CharsZ(filename), 1); filename is a function argument, and it looks like there's only one caller, and it passes "xpcshell.js". So I think this function parameter should change to const JS::ConstUTF8CharsZ& itself, and then this won't need any changes. ::: js/public/CharacterEncoding.h @@ +216,5 @@ > /* > + * Like UTF8CharsToNewTwoByteCharsZ, but for ConstUTF8CharsZ. > + */ > +extern TwoByteCharsZ > +UTF8CharsToNewTwoByteCharsZ(JSContext *cx, const ConstUTF8CharsZ &utf8, size_t *outlen); * and & by the type name, not the argument name. @@ +227,5 @@ > extern TwoByteCharsZ > LossyUTF8CharsToNewTwoByteCharsZ(JSContext* cx, const UTF8Chars utf8, size_t* outlen); > > +extern TwoByteCharsZ > +LossyUTF8CharsToNewTwoByteCharsZ(JSContext *cx, const ConstUTF8CharsZ &utf8, size_t *outlen); */& positioning. ::: js/src/builtin/ReflectParse.cpp @@ +1829,2 @@ > : cx(c) > , builder(c, l, src) Could you rename |l| to |saveLocation| or something? This is unreadable and un-understandable right now. @@ +3613,5 @@ > RootedString src(cx, ToString<CanGC>(cx, args[0])); > if (!src) > return false; > > + ScopedJSFreePtr<char> filenameChars; JSAutoByteString filenameChars; @@ +3615,5 @@ > return false; > > + ScopedJSFreePtr<char> filenameChars; > + RootedValue filename(cx); > + filename.setNull(); RootedValue filename(cx, NullValue()); @@ +3657,5 @@ > return false; > > + filename.setString(str); > + filenameChars = JS_EncodeStringToUTF8(cx, str); > + if (!filenameChars) if (!filenameChars.encodeUtf8(cx, str)) @@ +3731,5 @@ > if (!linearChars.initTwoByte(cx, linear)) > return false; > > CompileOptions options(cx); > + options.setFileAndLine(JS::ConstUTF8CharsZ(filenameChars.get()), lineno); filenameChars.ptr() (We seem to use JSABS pretty widely, so let's use that rather than a one-off ScopedDelete thing that's almost uniquely different. Plus Scoped* is sort of deprecated these days, just that nobody's taken the time to kill all uses off yet.) ::: js/src/builtin/TestingFunctions.cpp @@ +2380,5 @@ > if (!buf) > return false; > > RootedString str(cx); > + if (!(str = JS_NewStringCopyZUTF8(cx, JS::ConstUTF8CharsZ(buf)))) RootedString str(cx, JS_NewStringCopyZUTF8(cx, JS::ConstUTF8CharsZ(buf))); if (!str) return false; ::: js/src/jit/BaselineBailouts.cpp @@ +1145,5 @@ > } > > if (cx->runtime()->spsProfiler.enabled()) { > // Register bailout with profiler. > + const char* filename = script->filename().c_str(); I'm pretty sure I misread the consumers of this last time. Further down this string gets interpolated into a buffer (which then becomes UTF-8 itself). That buffer is passed to |cx->runtime()->spsProfiler.markEvent(buf);|. There are two different possibilities for what will handle/interpret that buffer: the two callers of js::RegisterRuntimeProfilingEventMarker. The first, PrintProfilerEvents_Callback, just fprintfs to stderr -- which we presume correct, as concluded last time. The second is ProfilerJSEventMarker, which calls to mozilla_sampler_add_marker passing the buffer as its first argument, which calls ProfilerMarker::addMarker, which creates a ProfilerMarker containing the string data. That ProfilerMarker's GetMarkerName function is called by ProfilerMarker::StreamJSON, which passes our (copied) buffer to UniqueJSONStrings::WriteElement, which passes it to UniqueJSONStrings::GetOrAddIndex, which passes it to |mStringTableWriter.StringElement|, which is JSONWriter::StringElement, which calls JSONWriter::StringProperty, which then interprets the string as Latin-1. Last time around, I diagnosed this as JSStreamWriter-handled, and I claimed that handled UTF-8. I don't see JSStreamWriter in the tree any more, and I have no idea what replaced it, or whether this was safe before but someone "broke" since then. But I *am* pretty confident it's broken now. Passing UTF-8 to Latin-1 will produce gobbledygook, but it's at least safe gobbledygook. So I'm okay landing this as-is, but I want a fast followup to make this marking goo accept and properly handle UTF-8. ::: js/src/jit/C1Spewer.cpp @@ +28,5 @@ > > out_.printf("begin_compilation\n"); > if (script) { > + out_.printf(" name \"%s:%" PRIuSIZE "\"\n", script->filename().c_str(), script->lineno()); > + out_.printf(" method \"%s:%" PRIuSIZE "\"\n", script->filename().c_str(), script->lineno()); It's not immediately, absolutely, 100% evident that all printers are fine with UTF-8. But I think it's the case. Basically we have GenericPrinter, Fprinter, and LSprinter to worry about here. LSprinter and GenericPrinter just spew to stdout, which we've supposed handles UTF-8. Fprinter spews to FILE* which is set by an init() method, or by the constructor. The files passed to it are primarily stderr -- the cases where other files can be passed are Shape::dump (DEBUG-only code for debugging) and JS_FileEscapedString, which has no callers in our tree and so we approximately don't care about, or at least don't have to care about *now*. There's also TraceLogger code, but that spews to files named .json, so I have to imagine UTF-8 is okay. (And I rather doubt TraceLogger can invoke this code, but I wouldn't bet the farm on it.) ::: js/src/jit/Ion.cpp @@ +3149,5 @@ > // Format of event payload string: > // "<filename>:<lineno>" > > // Get the script filename, if any, and its length. > + const char* filename = script->filename().c_str(); This too forwards off to profiler goo that treats it as Latin-1 -- see the comment elsewhere that mentions JSONWriter. I can't be 100% certain without a rollup, but it appears to me that additional uses of filename() and maybeForwardedFilename() have appeared in varargs calls, within this file, since this patch and the varargs-handling patch were posted. Again, please make sure you've gotten all of them before you declare victory here. ::: js/src/jit/JitcodeMap.cpp @@ +321,5 @@ > hasName = true; > } > > // Calculate filename length > + const char* filenameStr = script->filename() ? script->filename().c_str() : "(null)"; If I read the propagation correctly, this gets spewed into a string put into JitcodeGlobalEntry::BaselineEntry. The strings in such entries are observed by JitcodeGlobalEntry::BaselineEntry::callStackAtAddr, then another overload of same, which is called by JS::ProfilingFrameIterator::extractStack, which is called by ReadSPSProfilingStack which does |NewStringCopyZ<CanGC>(cx, frames[inlineFrameNo].label)| and therefore misconstrues the string created here as Latin-1. So, another place where UTF-8 will get mangled into Latin-1 -- but at least safely. This needs to be dealt with in another fast followup after landing. ::: js/src/jsapi.cpp @@ +4539,5 @@ > { > FileContents buffer(cx); > { > AutoFile file; > + if (!file.open(cx, filename.c_str()) || !file.readAll(cx, buffer)) We might consider making AutoFile::open take const JS::ConstUTF8CharsZ& in a separate patch. ::: js/src/jsapi.h @@ +4386,5 @@ > extern JS_PUBLIC_API(JSString*) > JS_AtomizeString(JSContext* cx, const char* s); > > extern JS_PUBLIC_API(JSString*) > +JS_InternJSString(JSContext* cx, JS::HandleString str); Erm, did you really mean to add this? ::: js/src/jsopcode.cpp @@ +1687,5 @@ > > buf.append('{'); > > AppendJSONProperty(buf, "file", NO_COMMA); > + JSString* str = NewStringCopyZUTF8<CanGC>(cx, script->filename()); Can this be a Rooted? ::: js/src/shell/js.cpp @@ +505,5 @@ > { > CompileOptions options(cx); > options.setIntroductionType("js shell file") > .setUTF8(true) > + .setFileAndLine(JS::ConstUTF8CharsZ(filename), 1) This isn't really legal -- file paths on Windows (at least) are in the system's native coding, and the only way to fopen paths containing various Unicode, is to 8.1 it. (Tho Microsoft would prefer you use wfopen or a MS-proprietary thing.) But I suppose let's adopt the pretense to simplify -- just carve out space to fix this in the future: #include "mozilla/UniquePtrExtensions.h" static mozilla::UniqueFreePtr<const char*> NativePathToUTF8(const char* path) { // POSIX APIs on OS X accept UTF-8; any vaguely modern Linux system will as // well. On Windows, however, this is technically lies if the system code // page isn't UTF-8 -- only 8.1-style names can be used with the non-wide // POSIX APIs. Given this is a developer thing, just ignore the concern // until someone complains, but use this method to leave space for someone to // fix this if ever needed. return mozilla::UniqueFreePtr<const char>(strdup(path)); } and use JS::ConstUTF8CharsZ(NativePathToUTF8(filename).get()) instead of what you've done. @@ +951,1 @@ > if (!filename) if (!filename.encodeUtf8(cx, str)) to idiomatically combine the two lines. @@ +1003,5 @@ > > if (!JS_GetProperty(cx, opts, "fileName", &v)) > return false; > if (v.isNull()) { > + options.setFile(JS::ConstUTF8CharsZ()); Hm, I guess nulls *were* okay? (I think I mentioned this in some review here previously.) @@ +1013,3 @@ > if (!fileName) > return false; > + options.setFile(JS::ConstUTF8CharsZ(fileName)); Preference for if (!fileNameBytes.encodeUtf8(cx, s)) return false; options.setFile(JS::ConstUTF8CharsZ(fileNameBytes.ptr())); Tho, the current setup of having this be a CompileOptions, so you can't have the filename be copied when setFile occurs, is really sadmaking. :-( Making a copy (or better, moving) for this stuff is so much preferable to having this lifetime-dependency tangling. Sigh. (Not on you to fix this now, to be clear -- or even in a followup, although it'd be nice if *someone* fixed it sooner rather than later.) @@ +1534,1 @@ > if (!filename) if (!filename.encodeUtf8(...)) @@ +2378,1 @@ > if (!filename) if (!filename.encodeUtf8(...)) @@ +2437,3 @@ > if (!file) { > JS_ReportErrorNumber(cx, my_GetErrorMessage, nullptr, > JSSMSG_CANT_OPEN, script->filename(), Uh... This is passing a JS::ConstUTF8CharsZ to a varargs function, right? Are you *sure* your checker caught everything? 'cause it's not looking like it. And I don't see this changed in the varargs-handling separate patch (maybe it's in one of the other patches -- if so I guess that un-requested rollup would have helped here :-( ). This is another case where JS_ReportError isn't the right thing to use -- we need JS_ReportErrorUTF8 as mentioned elsewhere. Please use that. I see one other further down in this function, and I see a third case that uses JS_ReportErrorNumber which will need similar UTF-8-ifying treatment. ::: js/src/vm/Debugger.cpp @@ +4095,5 @@ > /* Compute urlCString and displayURLChars, if a url or displayURL was > * given respectively. */ > if (url.isString()) { > + RootedString str(cx, url.toString()); > + if (!urlCString.encodeUtf8(cx, str)) Add a comment by urlCString's declaration indicating it contains UTF-8 data. @@ +5123,3 @@ > else > + filename = script->filename(); > + str = NewStringCopyZUTF8<CanGC>(cx, filename); Declare JSString* str here, not earlier, please. @@ +6403,1 @@ > return Some(str); Uh. Is it really kosher to ignore a failure here? No need to investigate/fix now, but once this lands we should do so. @@ +7391,5 @@ > if (!v.isUndefined()) { > RootedString url_str(cx, ToString<CanGC>(cx, v)); > if (!url_str) > return false; > + url = url_bytes.encodeUtf8(cx, url_str); If we just have JSAutoByteString url_bytes; and we convert this location to if (!url_bytes.encodeUtf8(cx, url_str)) return false; then in the sole use of |url| below, a ternary |url ? url : "..."|, we can instead use |url.ptr() ? url.ptr() : "..."| (or better yet, add an explicit operator bool to JSAutoByteString) and eliminate the |url| variable. Please do this in a mini-patch followup -- no reason to confusingly have two separate variables here. ::: js/src/vm/MemoryMetrics.cpp @@ +457,5 @@ > > rtStats->runtime.scriptSourceInfo.add(info); > > if (granularity == FineGrained) { > + const char* filename = ss->filename().c_str(); So actually, looking more at this, I think my "in passing" note on the NotableScriptSources review can be answered by saying -- yes! This could be JS::ConstUTF8CharsZ. And allScriptSources *could* use that same type as its key. Mind doing a fast followup to make that so? Lower priority than dealing with all the passing-UTF-8-to-Latin-1 APIs problem, for sure, tho. ::: js/src/vm/SPSProfiler.cpp @@ +324,5 @@ > // Get the function name, if any. > JSAtom* atom = maybeFun ? maybeFun->displayAtom() : nullptr; > > // Get the script filename, if any, and its length. > + const char* filename = script->filename().c_str(); If I read correctly, this string is one of the things exposed by SPSProfiler::push. The string is used as a label, exposed via ProfileEntry::label(). That function is called by (among other things) mozilla_sampler_get_backtrace_noalloc and profiler_get_backtrace_noalloc. This latter is called by ProfilerImpl::GetStacktrace, called by GCHeapProfilerImpl::SampleInternal. *That* inserts the produced stack trace file names into a mNames hash. And the contents of that, are consumed by MemoryProfiler::GetResults, which does this with them: for (size_t i = 0; i < names.Length(); i++) { JS::RootedString name(cx, JS_NewStringCopyZ(cx, names[i].get())); JS_SetElement(cx, jsnames, i, name); } which misinterprets UTF-8 as Latin-1. Safely, as mentioned numerous times in this review. But wrongly. Another fast followup, please. ::: js/src/vm/SavedStacks.cpp @@ +1327,5 @@ > if (const char16_t* displayURL = iter.displayURL()) { > locationp.setSource(AtomizeChars(cx, displayURL, js_strlen(displayURL))); > } else { > + JS::ConstUTF8CharsZ filename = iter.filename(); > + if (filename) { if (JS::ConstUTF8CharsZ filename = iter.filename()) { @@ +1335,5 @@ > + return false; > + locationp->source = AtomizeChars(cx, chars.get(), len); > + js_free(chars.get()); > + } else { > + locationp->source = Atomize(cx, "", 0); locationp->source = cx->emptyString(); or cx->names().empty @@ +1362,5 @@ > if (const char16_t* displayURL = iter.displayURL()) { > source = AtomizeChars(cx, displayURL, js_strlen(displayURL)); > } else { > + JS::ConstUTF8CharsZ filename = script->filename(); > + if (filename) { if (JS::... @@ +1368,5 @@ > + // We use the lossy variant here because it cannot GC. > + // This avoids a rooting hazard for 'key'. > + JS::TwoByteCharsZ chars = LossyUTF8CharsToNewTwoByteCharsZ(cx, filename, &len); > + if (!chars) > + return false; At least let's do a followup patch for GC-safety and losslessness here? It can land after the rest of this has landed. @@ +1372,5 @@ > + return false; > + source = AtomizeChars(cx, chars.get(), len); > + js_free(chars.get()); > + } else { > + source = Atomize(cx, "", 0); cx->emptyString() ::: js/src/vm/String.h @@ +1202,5 @@ > } > > +template <js::AllowGC allowGC> > +inline JSFlatString * > +NewStringCopyZUTF8(JSContext *cx, const JS::ConstUTF8CharsZ &s) Move */& next to type names here. ::: js/src/vm/TraceLogging.cpp @@ +458,1 @@ > script); TraceLogging stuffs like this, are observed by Debugger::drainTraceLog. That method does this: if (!DefineProperty(cx, item, dataId, eventText, strlen(eventText))) where eventText is, I believe, influenced by the filename handled here. So I believe we're going to spew gobbledygook into the tracelog, when a Debugger is used wrt a script that has a non-ASCII filename. It'll be interpreted as Latin-1, so it'll be safe -- and so we can defer fixing this to a fast followup. But it does very much need fixing. Better yet, I suspect it's probably even easily testable, because Debugger.drainTraceLoggerScriptCalls is going to be almost trivially easy to use. So make sure the followup patch for this includes a test. ::: js/xpconnect/src/Sandbox.cpp @@ +1708,5 @@ > JS::ContextOptionsRef(sandcx).setDontReportUncaught(true); > JSAutoCompartment ac(sandcx, sandbox); > > JS::CompileOptions options(sandcx); > + options.setFileAndLine(JS::ConstUTF8CharsZ(filenameBuf.get()), lineNo) This filename *appears* as if it can be non-UTF-8, because there's an evalInSandbox method in IDL that takes |in string filename| that flows into this. But that method is [noscript], and the one unconstrained caller passes it what's ultimately the value of the "general.config.filename" preference. And if you look at prefread.cpp, it appears that what that parsing process generates, is UTF-8. So I *think* this is okay, but it's very obscure. ::: js/xpconnect/src/XPCShellImpl.cpp @@ +870,5 @@ > JS::RootedScript script(cx); > JS::RootedValue unused(cx); > JS::CompileOptions options(cx); > options.setUTF8(true) > + .setFileAndLine(JS::ConstUTF8CharsZ(filename), 1) This could be a commandline argument, and I don't believe those are restricted to UTF-8 only. But I guess serves people right if they do wrong here. ::: netwerk/base/ProxyAutoConfig.cpp @@ +723,5 @@ > > SetRunning(this); > JS::Rooted<JSObject*> global(cx, mJSRuntime->Global()); > JS::CompileOptions options(cx); > + options.setFileAndLine(JS::ConstUTF8CharsZ(mPACURI.get()), 1); I believe this string ultimately was produced by nsPACMan::OnStreamComplete, derived from an nsIURI::GetAsciiSpec call. As a followup, I think you could/should change this to use GetSpec instead, now that every consumer properly understands UTF-8 here, which now seems to be the case if my searching didn't miss anything. ::: xpcom/threads/ThreadStackHelper.cpp @@ +430,5 @@ > MOZ_ASSERT(aEntry->script()); > > const char* label; > if (IsChromeJSScript(aEntry->script())) { > + const char* filename = JS_GetScriptFilename(aEntry->script()).c_str(); Okay, this flow of information is truly horrific, but *if* I'm reading correctly, this flows into something in mStackToFill, which gets appended to a HangStack::mBuffer, which is indexed into by pointers in HangStack::mImpl (a Vector of const char*), and those indexes are consumed by Telemetry.cpp's CreateJSHangStack, which does JS::RootedString string(cx, JS_NewStringCopyZ(cx, stack[i])); and therefore presumes that what's being appended here, is Latin-1. So, more gobbledygook badness, but not security badness because UTF-8 as Latin-1 just makes for a garbage-containing string for anything non-ASCII. Another thing to fix in a fast followup. (All these fast followups should be dealt with in separate patches, just to be clear.)
Comment on attachment 8732211 [details] [diff] [review] make ScriptSource filename encoding consistent Review of attachment 8732211 [details] [diff] [review]: ----------------------------------------------------------------- Remaining comments: * nsJSUtils::GetCallingLocation needs a comment indicating that the returned filename/path is UTF-8. * ah, XBL got fixed by you in bug 1135731, woo \o/ *hides from mrbkap for siccing that review on him* And then skimming the rest, I either duplicated them in this most recent pass, found them anew, or they were dealt with, or maybe we can punt on them some. So I guess (gulp) r+ with all these comments addressed -- but have me review an interdiff of the additional changes, and post a full rollup patch for easy consultation when looking it over.
Attachment #8732211 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8732214 [details] [diff] [review] fixes needed due to rebasing Review of attachment 8732214 [details] [diff] [review]: ----------------------------------------------------------------- Whatever of these changes I didn't describe as fast followup, feel free to fold into the third-round patch I mentioned probably needing to look at for the big-chunk patch review a couple comments ago. There's so much going on in all of this, and so many changes, that we're rather straining the limits of what's really reviewable without having something like a feature branch to look at, and individual commits on it to tag. Or something. :-( ::: js/src/asmjs/AsmJS.cpp @@ +1765,5 @@ > } > > CacheableChars filename; > if (parser_.ss->filename()) { > + filename = DuplicateString(parser_.ss->filename().c_str()); This string passes into ModuleData::filename, and it's consumed by Module::setProfilingEnabled (WasmModule.cpp), which JS_smprintf's it into a label that it inserts into Module::funcLabels_. The various contents of funcLabels_ are exposed by Module::profilingLabel. ProfilingFrameIterator::label() returns that function's result. JS::ProfilingFrameIterator::extractStack further calls that last function. Finally, extractStack is called to fill in a set of frames in TestingFunctions.cpp:ReadSPSProfilingStack, which then copies a label using NewStringCopyZ which will interpret the string as Latin-1. That user needs to be NewStringCopyUTF8Z, and probably all the various propagations of labels should be typed as UTF-8 as well. Also, a test should be added for this, since there's a testing function for it. Additionally, Module::filename_ is exposed by Module::filename(). One caller of that is Debugger.cpp's DebuggerSourceGetURLMatcher::match, which JS_smprintf-formats the filename into a buffer, and that buffer is passed to NewStringCopyZ, again misinterpreting as Latin-1. This too should be changed, then tested -- you'll have to ask some wasm people exactly how to do this, as wasm's new enough I don't know how to invoke it. On the latter issue, that would have been pointed out by Module::filename() returning ConstUTF8CharsZ, and by Module::filename_ being the same. Those changes should be made, and make sure to run the varargs checker again, because the JS_smprintf formatting just mentioned would be another case of that problem. ::: js/src/asmjs/Wasm.cpp @@ +1605,2 @@ > if (!file) > return false; This too flows into ModuleGenerator::init, just like the AsmJS.cpp-modified string does. Addressing those comments, should address this particular place as well. ::: js/src/jit/OptimizationTracking.cpp @@ +872,3 @@ > lineno = constructor->nonLazyScript()->lineno(); > } else { > + filename = constructor->lazyScript()->filename().c_str(); I'd slightly prefer if |filename| were JS::ConstUTF8CharsZ -- this would stay with the better type as long as possible, and it'd leave the .c_str() to happen only once at the place that requires it. @@ +1154,5 @@ > trackOptimizationOutcome(TrackedOutcome::Inlined); > } > > static void > InterpretedFunctionFilenameAndLineNumber(JSFunction* fun, const char** filename, Please make |filename| a |const JS::ConstUTF8CharsZ* filename| so that the UTF-8 goes with it as far as possible. That'll end up propagating into the readType function definitions the filename could flow into. And at that point, we start running into more issues. SprintOptimizationTypeInfoOp::readType passes the location to PutEscapedString, which interprets the string as Latin-1: safe, at least, but it'll produce gobbledygook. (Once that output's de-gobbledygook'd, it looks like SOTIP::sp is only ever a Sprinter, and Sprinter::printf just prints to stdout, which we've assumed safe, so we're safe except for bad output.) Alternatively, the other readType implementation is StreamOptimizationTypeInfoOp::readMatch. That function passes the location to UniqueJSONStrings::WriteProperty, which (as discussed in the previous review) interprets the string as Latin-1, so requires whatever the fix is that's used for that prior issue. @@ +1250,5 @@ > > if (tracked.hasAllocationSite()) { > JSScript* script = tracked.script; > op_.readType("alloc site", buf, > + script->maybeForwardedScriptSource()->filename().c_str(), Once readType's signature changes, this change will no longer be necessary. ::: js/src/jsapi-tests/testStructuredClone.cpp @@ +170,5 @@ > " return saveStack(); \n" // 4 > " }()); \n" // 5 > " }()); \n" // 6 > "}()); \n", // 7 > + JS::ConstUTF8CharsZ(FILENAME), Minor preference for removing FILENAME entirely and just using JS::ConstUTF8CharsZ("filename.js") directly. The extra variable only obscures, here. ::: js/src/jsexn.cpp @@ +757,5 @@ > if (!filename) { > filename = "FILE_NOT_FOUND"; > } > char histogramKey[64]; > JS_snprintf(histogramKey, sizeof(histogramKey), It looks like histogram keys are considered UTF-8 -- or at least KeyedHistogram::ReflectKeyedHistogram seems to consider them so -- so this hunk of change looks okay. ::: js/src/jsscript.cpp @@ +4598,5 @@ > > const char* > JS::ubi::Concrete<JSScript>::scriptFilename() const > { > + return get().filename().c_str(); Make this function return JS::ConstUTF8CharsZ, since we know it is. @@ +4620,5 @@ > auto source = sourceObject->source(); > if (!source) > return nullptr; > > + return source->filename().c_str(); This function too should return JS::ConstUTF8CharsZ. With these two function return types changed, propagating onward, it appears to me that UbiNodeCensus.cpp's countMapToObject will observe this string as a key, and it'll pass it to Atomize, which will fail to understand this string as UTF-8. So we have a UTF-8-as-Latin-1 gobbledygook problem again, for another very fast followup. Please ensure this is tested, too. ::: js/src/jsstr.cpp @@ +2194,1 @@ > cx->compartment()->addTelemetry(filename, JSCompartment::DeprecatedFlagsArgument); I think arai got rid of this code in the last month-ish, so happily we don't need to worry about this. If it just migrated elsewhere and I can't find where, please say so in whatever remaining reviews are needed in this bug. ::: js/src/shell/OSObject.cpp @@ +129,5 @@ > #ifdef XP_WIN > // The docs say it can return EINVAL, but the compiler says it's void > _splitpath(scriptFilename.get(), nullptr, buffer, nullptr, nullptr); > #else > + strncpy(buffer, scriptFilename.get().c_str(), PATH_MAX+1); The contents of buffer are at the end of this function passed to JS_NewStringCopyZ, which will interpret this as Latin-1 -- needs to be the UTF-8 variant instead. ::: js/src/shell/js.cpp @@ +3465,5 @@ > return false; > } > > RootedString str(cx, args[1].toString()); > + filename.reset(JS_EncodeStringToUTF8(cx, str)); Mild preference for filename being JSAutoByteString, and this being |if (!filename.encodeUtf8(cx, str))|, following the most common idiom. ::: js/src/vm/Debugger.cpp @@ +354,5 @@ > JS_snprintf(linenoStr, sizeof(linenoStr), "%" PRIuSIZE, script->lineno()); > unsigned flags = warning ? JSREPORT_WARNING : JSREPORT_ERROR; > return JS_ReportErrorFlagsAndNumber(cx, flags, GetErrorMessage, nullptr, > JSMSG_DEBUGGEE_WOULD_RUN, > filename, linenoStr); This requires a JS_ReportErrorFlagsAndNumberUTF8 variant. ::: js/src/vm/Stack.cpp @@ +904,5 @@ > case INTERP: > case JIT: > return script()->filename(); > case WASM: > + return JS::ConstUTF8CharsZ(data_.activations_->asWasm()->module().filename()); Once the filename() function is made to return JS::ConstUTF8CharsZ, this change will be unnecessary.
Attachment #8732214 - Flags: review?(jwalden+bmo) → review-
> I can't remember, was it okay to pass in a null filename here? Because there's one case above where that can happen. I checked and it is ok.
> Could you rename |l| to |saveLocation| or something? This is unreadable and un-understandable right now. If it's ok with you, I would rather not ... I think this patch is unwieldy enough without also trying to fix up unrelated pre-existing problems.
> I can't be 100% certain without a rollup, but it appears to me that additional uses of filename() and maybeForwardedFilename() have appeared in varargs calls, within this file, since this patch and the varargs-handling patch were posted. Again, please make sure you've gotten all of them before you declare victory here. I've been building with clang, which warns when passing one of these objects through varargs.
> Alternatively, the other readType implementation is StreamOptimizationTypeInfoOp::readMatch. That function passes the location to UniqueJSONStrings::WriteProperty, which (as discussed in the previous review) interprets the string as Latin-1, so requires whatever the fix is that's used for that prior issue. By my reading, WriteProperty uses UTF-8. It hashes the string as bytes, which is fine; and sends it off to "mStringTableWriter.StringElement(aStr)" - which is a mozilla::JSONWriter, which explicitly assumes UTF-8.
I've rebased this and made a number of the requested changes -- though not all of them, I still have a to-do list. Tomorrow (probably) I'll upload a squashed patch holding all the previous patches, plus a series of smaller patches addressing the various comments. Then chances are that I'll drop this again for a while, as it's taking too much time.
> Sooooooooooooooooooooooooo. If I trace through far enough, it looks like this string is a result of |timeout->mScriptHandler->GetLocation(&filename, &lineNum, &column);| in nsGlobalWindow.cpp. That string, flowing through super-gnarly code, eventually flows into MOZ_LOG stuffs. That either prints to stderr, which we're supposing is okay. *Or*, it flows off somewhere customizable. And I *think* one target of that customization, in our tree, is profiler code. Please find someone with profiler-knowingness to verify that the profiler is going to be okay taking in UTF-8 strings, and that it won't assume Latin-1 or anything. I looked at this and I don't see the customizable code path from Logging.cpp, specifically from LogModuleManager::Print. So I think this is probably ok.
Ok, I changed my mind and, I think, have now addressed all the review comments. You should probably double-check this. I'm going to attach a roll-up patch that is basically everything that came before plus the various minor changes needed when rebasing. This may bear some re-review. I've tried to keep the individual follow-on patches small and independent. This should make review a bit simpler. When the time comes to land (crosses fingers) I will have to squash everything. I'm going to switch everything over to ReviewBoard, primarily because the "issue" system there provides a nice way to manage the to-do checklist. I think a lot of the "fast followup" patches are in this series; or were rendered obsolete by other changes that have happened since the most recent round of review.
(In reply to Tom Tromey :tromey from comment #107) > Ok, I changed my mind I am sooooooooo glad you changed your mind, 'cause the only way this is ever going to get done is if we push through all the short-term pain and get'er'done, as fast as possible. Will review everything as fast as I can.
Attachment #8798636 - Flags: review?(jwalden+bmo) → review+
Attachment #8798632 - Flags: review?(jwalden+bmo) → review+
Attachment #8798631 - Flags: review?(jwalden+bmo) → review+
Attachment #8798625 - Flags: review?(jwalden+bmo) → review+
Attachment #8798640 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8798642 [details] Bug 987069 - introduce NativePathToUTF8; https://reviewboard.mozilla.org/r/84090/#review83630 ::: js/src/shell/js.cpp:640 (Diff revision 1) > > int64_t t1 = PRMJ_Now(); > RootedScript script(cx); > > { > + auto utf8Filename(NativePathToUTF8(filename)); if (!utf8Filename) { JS_ReportOutOfMemory(cx); return false; }
Attachment #8798642 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8798641 [details] Bug 987069 - change nsPACMan to use GetSpec, not GetAsciiSpec; https://reviewboard.mozilla.org/r/84088/#review83634
Attachment #8798641 - Flags: review?(jwalden+bmo) → review+
Attachment #8798634 - Flags: review?(jwalden+bmo) → review+
Attachment #8798623 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8798628 [details] Bug 987069 - nsScriptLoader fix; https://reviewboard.mozilla.org/r/84062/#review83642 ::: dom/base/nsScriptLoader.cpp:2091 (Diff revision 1) > > bool isScriptElement = !aRequest->IsModuleRequest() || > aRequest->AsModuleRequest()->IsTopLevel(); > aOptions->setIntroductionType(isScriptElement ? "scriptElement" > : "importedModule"); > - aOptions->setFileAndLine(JS::ConstUTF8CharsZ(aRequest->mURL.get()), aRequest->mLineNo); > + aOptions->setFileAndLine(JS::ConstUTF8CharsZ(mURL.get()), aRequest->mLineNo); Oh, wait. I suggested this, but it isn't right, because this function *fills in* options but doesn't also consume them. We're grabbing a pointer to data in |nsAutoCString mURL;|, whose lifetime is scoped to this function, into |aOptions| whose lifetime is strictly longer than that of |mURL|. Dangling pointer in options, then, with this tactic. I think we should bump nsAutoCString filename; up into the callers, immediately before the |options| variable into which the filename is placed. Then pass |filename| as an argument to this function. Without rust-style borrow-checking, we can't actually *enforce* that the string's lifetime is as long as the options. But, given all the hits are in this file -- I guess this function could be made file-static, even, if it accepted |nsScriptLoader::mDocument| as an argument, although I'm not going to require you do it -- I think that's fine enough. Better than the much-overlong lifetime currently used.
Attachment #8798628 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 8798633 [details] Bug 987069 - use encodeUtf8; https://reviewboard.mozilla.org/r/84072/#review83644 ::: js/src/shell/js.cpp:5556 (Diff revision 1) > - JSAutoByteString filename(cx, str); > - if (!filename) > + JSAutoByteString filename; > + if (!filename.encodeUtf8(cx, str)) > return false; > > while (__AFL_LOOP(1000)) { > Rooted<JSObject*> ret(cx, FileAsTypedArray(cx, filename.ptr())); I think this was since rewritten so that FileAsTypedArray just takes Handle<JSString*>, so I believe |filename| isn't even necessary anymore here. Woo!
Attachment #8798633 - Flags: review?(jwalden+bmo) → review+
Attachment #8798637 - Flags: review?(jwalden+bmo) → review+
Attachment #8798627 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8798624 [details] Bug 987069 - profiler event marker request; https://reviewboard.mozilla.org/r/84054/#review84268 ::: js/src/vm/SPSProfiler.cpp:165 (Diff revision 1) > if (ProfileStringMap::Ptr entry = locked->lookup(script)) > locked->remove(entry); > } > > void > SPSProfiler::markEvent(const char* event) There are only two callers of SPSProfiler::markEvent: BaselineBailouts.cpp's InitFromBailout, and Ion.cpp's Invalidate. Both pass in buffers allocated and filled in by a format-specifier API super-nearby. If we make SPSProfiler::markEvent take |const JS::ConstUTF8CharsZ&|, then we don't need the cast inside this function, and it'll be crystal-clear in the callers (as long as the reader's aware filenames are UTF-8) that everything's kosher. Please make that change.
Attachment #8798624 - Flags: review?(jwalden+bmo) → review+
Attachment #8798626 - Flags: review?(jwalden+bmo) → review+
Attachment #8798629 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8798638 [details] Bug 987069 - MemoryProfiler.cpp fix; https://reviewboard.mozilla.org/r/84082/#review84360 ::: tools/memory-profiler/MemoryProfiler.cpp:282 (Diff revision 1) > JS::RootedObject jsnames(cx, JS_NewArrayObject(cx, names.Length())); > JS::RootedObject jstraces(cx, JS_NewArrayObject(cx, traces.Length())); > JS::RootedObject jsevents(cx, JS_NewArrayObject(cx, events.Length())); > > for (size_t i = 0; i < names.Length(); i++) { > - JS::RootedString name(cx, JS_NewStringCopyZ(cx, names[i].get())); > + JS::RootedString name(cx, JS_NewStringCopyUTF8Z(cx, JS::ConstUTF8CharsZ(names[i].get()))); Mm. Reverifying this was all okay was fun fun fun. One extra little tweak. It *appeared* that void PseudoStack::push(const char *aName, js::ProfileEntry::Category aCategory, uint32_t line) which lives at http://searchfox.org/mozilla-central/rev/2142de26c16c05f23e543be4fa1a651c4d29604e/tools/profiler/public/PseudoStack.h#255 potentially fed stuff into the constituent arrays that went into creating this string. But when I tracked stuff down far enough, at least in Searchfox, it appears that that overload is entirely unused. Please remove it so future readers don't have to deal with it when reading code.
Attachment #8798638 - Flags: review?(jwalden+bmo) → review+
Attachment #8798630 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8798635 [details] Bug 987069 - OptimizationTracking fix; https://reviewboard.mozilla.org/r/84076/#review84374 ::: js/src/jit/OptimizationTracking.cpp:1244 (Diff revision 1) > > - const char* filename; > + JS::ConstUTF8CharsZ filename; > Maybe<unsigned> lineno; > InterpretedFunctionFilenameAndLineNumber(fun, &filename, &lineno); > op_.readType(tracked.constructor ? "constructor" : "function", > - name, filename, lineno); > + name, filename.c_str(), lineno); I don't actually know whether the function can be called by this. But. SprintOptimizationTypeInfoOp::readType has code that is *potentially* callable in this spot. At the least, it seems to be an override of the basic virtual function declared in some base class. And inside that function, the provided |location| is handled as follows: if (location) { char buf[512]; PutEscapedString(buf, mozilla::ArrayLength(buf), location, strlen(location), '"'); if (!sp->jsprintf(",\"location\":%s", buf)) break; } PutEscapedString interprets the provided string as Latin-1. This needs to land, and it needs to not be delayed for anything that can be made to not delay it. This doesn't block landing the patch. But please file a followup bug on this, and let's fix it quickly afterward. At a quick search, I can't actually *find* the function we should be using for this -- and if that's true, we *can't* fix this before landing, not without another review of some new function to do that. Much better to review that function's code, if it's needed, in a smaller, separate patch, for sure.
Attachment #8798635 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8798639 [details] Bug 987069 - ReadSPSProfilingStack fix; https://reviewboard.mozilla.org/r/84084/#review84380 ::: js/src/builtin/TestingFunctions.cpp:1669 (Diff revision 1) > > if (!JS_DefineProperty(cx, inlineFrameInfo, "kind", frameKind, propAttrs)) > return false; > > auto chars = frames[inlineFrameNo].label.release(); > - frameLabel = NewString<CanGC>(cx, reinterpret_cast<Latin1Char*>(chars), strlen(chars)); > + frameLabel = NewStringCopyUTF8Z<CanGC>(cx, JS::ConstUTF8CharsZ(chars)); Bah, this isn't right. Labels \*can\* have UTF-8 filenames in them. But, they can also WASM funcDefNames in them -- see js::wasm::Code::ensureProfilingState. WASM funcDefNames, if you look at what getFuncDefName does, are originally UTF-8 (woo\!). But the API we use there, inflates to UTF-16. That UTF-16 is then passed to JS_smprintf with the "%hs" specifier. And if we trawl through the layers far enough, we eventually hit jsprf.cpp's generic_write of |const char16_t\*|. And as the comment in generic_write says, "// FIXME: truncates characters to 8 bits". This isn't trivially fixable by just shunting in a new conversion function or anything. In theory we could add wasm code that would expose the funcDefName as UTF-8, and then we'd be home free. But that's getting somewhat afield of this bug, and we can/should punt this to a fast followup. Switch this to treat the chars as a Latin-1 string, so we're safe even if buggy. Additionally, file a bug, land the patch with an XXX comment or something by this that mentions the bug, and let's circle back to it after these patches land.
Attachment #8798639 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8798639 [details] Bug 987069 - ReadSPSProfilingStack fix; https://reviewboard.mozilla.org/r/84084/#review84400 ::: js/src/builtin/TestingFunctions.cpp:1668 (Diff revision 1) > return false; > > if (!JS_DefineProperty(cx, inlineFrameInfo, "kind", frameKind, propAttrs)) > return false; > > auto chars = frames[inlineFrameNo].label.release(); As an additional note: the release() here returns a |char*| (maybe constified). If we successfully create |frameLabel|, all is well -- the chars will be associated with that string and will die when it gets GC'd. But what if |frameLabel| isn't allocated? Then, nothing actually deallocates |chars|. So we have a leak here in case of OOM. I suggest not using |auto| here -- just use |const char*| directly, to make clear there's a raw string here to worry about. Then add, inside the |if (!frameLabel)| conditional, a |js_free(chars)| with whatever casting is needed.
Comment on attachment 8798622 [details] Bug 987069 - restore an optimization requested in an earlier review; https://reviewboard.mozilla.org/r/84050/#review84402 And, I guess next time to look at the original patch that's now +661,-442. Will try to skim that as quickly as possible, 'cause as I've said a zillion times every bit of delay of this is horrendously expensive. :-\
Attachment #8798622 - Flags: review?(jwalden+bmo) → review+
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #150) > Bah, this isn't right. Labels \*can\* have UTF-8 filenames in them. But, > they can also WASM funcDefNames in them -- see > js::wasm::Code::ensureProfilingState. WASM funcDefNames, if you look at > what getFuncDefName does, are originally UTF-8 (woo\!). But the API we use > there, inflates to UTF-16. That UTF-16 is then passed to JS_smprintf with > the "%hs" specifier. And if we trawl through the layers far enough, we > eventually hit jsprf.cpp's generic_write of |const char16_t\*|. And as the > comment in generic_write says, "// FIXME: truncates characters to 8 bits". > > This isn't trivially fixable by just shunting in a new conversion function > or anything. This is addressed in bug 553032.
Comment on attachment 8798621 [details] Bug 987069 - make ScriptSource filename encoding consistent https://reviewboard.mozilla.org/r/84048/#review84408 Some of the shells seem to pass argv[i] directly into things, and then some of this patch treats that as UTF-8 when it happens to be a filename. This isn't really right, but I'm going to pretend that at least inside shells, we can just assume this stuff is UTF-8, the same way we assume stdout/stderr are fine to have UTF-8 thrown at them, and the same way we assume __FILE__ can be considered UTF-8. ::: js/src/frontend/BytecodeEmitter.cpp:2853 (Diff revision 1) > */ > if (parser->options().sourceMapURL()) { > // Warn about the replacement, but use the new one. > if (parser->ss->hasSourceMapURL()) { > if(!parser->report(ParseWarning, false, nullptr, JSMSG_ALREADY_HAS_PRAGMA, > - parser->ss->filename(), "//# sourceMappingURL")) > + parser->ss->filename().c_str(), "//# sourceMappingURL")) This is safe, but it isn't correct. If I dig through the levels, the varargs here go through the ExpandErrorArgumentsVA call in TokenStream::reportCompileErrorNumberVA, passing ArgumentsAreLatin1 to indicate a non-fact. Because this is safe, it's landable as-is -- but file the followup to fix this. Off the top of my head, I don't believe there *is* a UTF-8-aware way to warn, right now, so fixing this would require hacking together some more stuff -- stuff definitely not worth delaying this for. ::: js/src/jsapi-tests/testUbiNode.cpp:631 (Diff revision 1) > } > END_TEST(test_JS_ubi_DominatorTree) > > BEGIN_TEST(test_JS_ubi_Node_scriptFilename) > { > + // whee\u308\u2708\u1009d If it's necessary to spell this out manually (see other comment), use "whee\u0308\u2708\u{1009d}". What you have now isn't valid JS, and it's not copy-pastable into a shell or anything like that. ::: js/src/jsapi-tests/testUbiNode.cpp:633 (Diff revision 1) > > BEGIN_TEST(test_JS_ubi_Node_scriptFilename) > { > + // whee\u308\u2708\u1009d > + const unsigned char uname[] = {0x77, 0x68, 0x65, 0x65, 0xcc, 0x88, 0xe2, > + 0x9c, 0x88, 0xf0, 0x90, 0x82, 0x9d, 0x0}; You may be able to use u8"..." for this now. Give it a try, and if it works, we definitely should use that. ::: js/src/jsobj.cpp:3659 (Diff revision 1) > } > } > fprintf(fp, "%s", sprinter.string()); > #ifdef XP_WIN32 > if (IsDebuggerPresent()) > + // JS_GetScriptFilename returns UTF-8. But, even This |if| needs braces, now that its body (including comment) spans multiple lines. ::: js/src/jsopcode.cpp:681 (Diff revision 1) > > if (showAll) { > if (!parser.parse()) > return false; > > - if (!sp->jsprintf("%s:%u\n", script->filename(), unsigned(script->lineno()))) > + if (!sp->jsprintf("%s:%u\n", script->filename().c_str(), unsigned(script->lineno()))) DisassembleAtPC is called by js::Disassemble, which is called by shell/js.cpp's DisassembleScript, which is called by shell/js.cpp's DisassembleToSprinter, which is called by shell/js.cpp's DisassembleToString, which passes the sprinter's output to JS_NewStringCopyZ. Please fix that to expect UTF-8. ::: js/src/jsscript.cpp:4267 (Diff revision 1) > > MOZ_ASSERT(size > 0); > return size; > } > > -const char* > +const JS::ConstUTF8CharsZ The |const| on the return type isn't really necessary, but it doesn't hurt, either. ::: js/src/shell/OSObject.cpp:125 (Diff revision 1) > if (!scriptFilename.get()) > return nullptr; > > - if (strcmp(scriptFilename.get(), "-e") == 0 || strcmp(scriptFilename.get(), "typein") == 0) > + if (strcmp(scriptFilename.get().c_str(), "-e") == 0 || > + strcmp(scriptFilename.get().c_str(), "typein") == 0) > resolveMode = RootRelative; Should put braces around this, now that the condition spans multiple lines: if (... || ...) { resolveMode = RootRelative; } ::: js/src/shell/js.cpp:2862 (Diff revision 1) > return false; > } > > - FILE* file = fopen(script->filename(), "r"); > + FILE* file = fopen(script->filename().c_str(), "r"); > if (!file) { > /* FIXME: script->filename() should become UTF-8 (bug 987069). */ This comment should be removed. (The function called just beneath here doesn't treat the argument as UTF-8, yet, but we already have a warning comment about that, inside that function. And it's odd having a comment here, but not having one 18 lines further down where we *do* assume this is UTF-8.) ::: js/src/shell/js.cpp:5238 (Diff revision 1) > return false; > > if (sp.hadOutOfMemory()) > return false; > > RootedString str(cx, JS_NewStringCopyZ(cx, sp.string())); The string here is UTF-8, both because of the UTF-8 filename splatted in within this patch at line 5188, but also because the InterpretedFunctionFilenameAndLineNumber function stores UTF-8 in its outparams, and if I track that far enough back, IonTrackedOptimizationsTypeInfo::ForEachOpAdapter::readType suggests that the UTF-8 it puts out can make its way back to this location. ::: js/src/vm/CodeCoverage.cpp:111 (Diff revision 1) > } > > bool > LCovSource::writeSourceFilename(ScriptSourceObject* sso) > { > - outSF_.printf("SF:%s\n", sso->source()->filename()); > + outSF_.printf("SF:%s\n", sso->source()->filename().c_str()); |outSF_| is an LSprinter whose contents are exposed through LCovSource::exportInto. That function is called by LCovCompartment::exportInto, which exports into an argument provided by js::GetCodeCoverageSummary. That function is called by TestingFunctions.cpp's GetLcovInfo, and the string it returns is processed by JS_NewStringCopyZ. So I think the final string conversion in GetLcovInfo needs to be the UTF-8 variant. ::: js/src/vm/Debugger.cpp:356 (Diff revision 1) > - const char* filename = script->filename() ? script->filename() : "(none)"; > + const char* filename = script->filename() ? script->filename().c_str() : "(none)"; > char linenoStr[15]; > SprintfLiteral(linenoStr, "%" PRIuSIZE, script->lineno()); > unsigned flags = warning ? JSREPORT_WARNING : JSREPORT_ERROR; > // FIXME: filename should be UTF-8 (bug 987069). > return JS_ReportErrorFlagsAndNumberLatin1(cx, flags, GetErrorMessage, nullptr, We have a UTF-8 error-reporting function now. Now that we're definitely passing UTF-8, we just need to call it. So, remove that comment, and s/Latin1/UTF8/. ::: js/src/vm/SPSProfiler.h:198 (Diff revision 1) > > jsbytecode* ipToPC(JSScript* script, size_t ip) { return nullptr; } > > void setProfilingStack(ProfileEntry* stack, uint32_t* size, uint32_t max); > + // Note that an event marker is UTF-8 encoded. > void setEventMarker(void (*fn)(const char*)); Maybe "...that the string passed to an event marker is UTF-8."? ::: js/src/vm/Stack.cpp:860 (Diff revision 1) > break; > case INTERP: > case JIT: > return script()->filename(); > case WASM: > - return data_.wasmFrames_.filename(); > + return JS::ConstUTF8CharsZ(data_.wasmFrames_.filename()); File a followup bug to use a type that encompasses *both* single ownership *and* UTF-8 encoding, please? This is obviously not fixable in this bug, even more than the other followup-y nits I've mentioned. But we should get it on record somewhere, and perhaps to start the ball rolling on reasons to add such a type. ::: js/src/vm/TraceLogging.cpp:467 (Diff revision 1) > > TraceLoggerEventPayload* > TraceLoggerThread::getOrCreateEventPayload(TraceLoggerTextId type, JSScript* script) > { > - return getOrCreateEventPayload(type, script->filename(), script->lineno(), script->column(), > - nullptr); > + return getOrCreateEventPayload(type, script->filename().c_str(), script->lineno(), > + script->column(), nullptr); I feel like I looked at this once and concluded it was okay, but looking now I'm not sure it won't mangle UTF-8 into mojibake. Specifically, the filename passed here will get interpolated into a string, that'll be passed to TraceLoggerGraph::addTextId. The string there will be passed to js::FileEscapedString, and that function looks like it implicitly interprets incoming text that's byte-sized (if it gets char16_t it'll do other stuff) as Latin-1. This is safe, so it's good enough for now. But please file a followup to make the tracelogger pipeline properly UTF-8-safe. ::: js/xpconnect/src/Sandbox.cpp:1868 (Diff revision 1) > mozilla::dom::AutoEntryScript aes(priv, "XPConnect sandbox evaluation"); > JSContext* sandcx = aes.cx(); > JSAutoCompartment ac(sandcx, sandbox); > > JS::CompileOptions options(sandcx); > - options.setFileAndLine(filenameBuf.get(), lineNo) > + options.setFileAndLine(JS::ConstUTF8CharsZ(filenameBuf.get()), lineNo) On second read, I'm no longer quite sure my analysis the first time of this was correct. On the other hand, for lockfiles and weird goopy config stuff, I think we can tolerate a mistake if there were one -- certainly in exchange for the rest of this landing.
Attachment #8798621 - Flags: review+
If you need anything -- anything at all -- to land any of this, from me, tell me ASAP/immediately/P0/etc. Fast reviews of changes within the last two weeks, tree-watching, anything at all. As far as I'm concerned, there's basically nothing that's more important than getting this stuff landed and locked in the tree and no longer residing in a monster patch stack that, so long as it hasn't landed, represented multiple weeks or months of engineering time gone to waste. Any inconvenient sacrifices now are still vastly cheaper than delaying longer and having to do another massive round of reviewing all over again.
Blocks: 1316679
I haven't worked on this in a while. I wonder if this approach is still needed, now that bug 1380617 has landed. When I get back to this, I'll check that.
Bug 1380617 obviously made it so that punycode is displayed everywhere. I.e. you now see "http://xn--l-0ga.de/" instead of "http://öl.de/", which is a bit better, I guess, but still suboptimal, because web developers would expect to rather see http://öl.de/. But, the great thing now is that it's now fairly easy to display the http://öl.de/. It just requires to convert the punycode into Unicode. Sebastian
(In reply to Sebastian Zartner [:sebo] from comment #157) > Bug 1380617 obviously made it so that punycode is displayed everywhere. I.e. > you now see "http://xn--l-0ga.de/" instead of "http://öl.de/", which is a > bit better, I guess, but still suboptimal, because web developers would > expect to rather see http://öl.de/. > > But, the great thing now is that it's now fairly easy to display the > http://öl.de/. It just requires to convert the punycode into Unicode. > > Sebastian I filed two bugs(bug 1448553 , Bug 1450292) for decoding Punycode and URI-encoded pathname in devtool UI. I think this bug is actually defunct and we can safely close it now. The remaining work of decoding Punycode to Unicode is mainly in the front-end side.
Flags: needinfo?(sebastianzartner)
(In reply to Zhang Junzhi from comment #158) > (In reply to Sebastian Zartner [:sebo] from comment #157) > > Bug 1380617 obviously made it so that punycode is displayed everywhere. I.e. > > you now see "http://xn--l-0ga.de/" instead of "http://öl.de/", which is a > > bit better, I guess, but still suboptimal, because web developers would > > expect to rather see http://öl.de/. > > > > But, the great thing now is that it's now fairly easy to display the > > http://öl.de/. It just requires to convert the punycode into Unicode. > > > > Sebastian > > I filed two bugs(bug 1448553 , Bug 1450292) for decoding Punycode and > URI-encoded pathname in devtool UI. > > I think this bug is actually defunct and we can safely close it now. The > remaining work of decoding Punycode to Unicode is mainly in the front-end > side. Thank you for filing the other two bugs. It's ok for me to close this one and do the decoding in the DevTools code. Sebastian
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(sebastianzartner)
Resolution: --- → WONTFIX
(In reply to Sebastian Zartner [:sebo] from comment #159) > Thank you for filing the other two bugs. It's ok for me to close this one > and do the decoding in the DevTools code. Having said that, I wonder what to do with the blocked bugs. Tom, can you have a look at them and check what to do with them? Sebastian
Flags: needinfo?(ttromey)

(In reply to Sebastian Zartner [:sebo] from comment #160)

(In reply to Sebastian Zartner [:sebo] from comment #159)

Thank you for filing the other two bugs. It's ok for me to close this one
and do the decoding in the DevTools code.

Having said that, I wonder what to do with the blocked bugs. Tom, can you
have a look at them and check what to do with them?

Long delay on this one but I finally went through them.

Flags: needinfo?(ttromey)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: