Closed Bug 97173 Opened 23 years ago Closed 23 years ago

startup perf- remove the need of loading of unixcharset.properties files at startup time to speed up

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla0.9.8

People

(Reporter: ftang, Assigned: shanjian)

References

Details

(Keywords: perf)

Attachments

(1 file, 3 obsolete files)

we currently load unixcharset.properties at startup time, we should find a way 
to 
remove the need of loading it at startup time to speed up startup performance
we could 
1. build in common used pair in the cpp code, and/or
2. cache the resolved result in registry.
Blocks: 7251
Keywords: perf
Blocks: 71781
No longer blocks: 7251
mark m0.9.6 and give to jbetak
Assignee: ftang → jbetak
Target Milestone: --- → mozilla0.9.6
Blocks: 95823
Blocks: 103179
Juraj: would you kindly tell us how much time will be save by doing this?
accepting.

bstell: my measurements using the built-in NS_TIMELINE timer dp recommended to 
us show that this would save ~16 ms, which ~0.3% of my local startup time.
Status: NEW → ASSIGNED
Attached patch proposed patch v1 (obsolete) (deleted) β€” β€” Splinter Review
bstell:

could you kindly look over the proposed changes? It's a rather lame 
optimization, but I couldn't think of anything better. It would save us an 
unnecessary disk access on a platform Netscape cares most about. I've noticed 
that you worked in this area and obsoleted the unixcharset.properties string 
bundle, which covers most of the work needed for this bug. Since you have 
domain knowledge in this particular area, I'm basically looking for a statement 
indicating whether you find the proposed change acceptable? 
Frank suggested we have a hard coded mapping list for standare charset names 
(eg: iso8859-1, SJIS, ...) and only load the properties file we have a charset 
that is not in that list.
bstell - thanks for looking at this. I agree and understand. However, I believe 
you (or erik?) have obsoleted the .properties file on most platforms - 
certainly on Linux - and it is in fact not being loaded. 

There is some new logic in place where the default charset value is derived 
from the results received from couple of system API calls. My assumption was 
that this is performant enough (it's below the timer resolution) and does not 
need to be circumvented by a hard-coded list. The only measurable performance 
impact is caused by an attempt to load an non-existent .properties file. For 
what I've understood the string bundle is not required and not packaged on most 
*nix platforms. I could extend the existing code by a hard-coded list, but I'd 
not like to. It's less than transparent as it is and I'm not sure whether that 
would be the most intelligent way to address this.
it is being used
I did not see it being used in my debug Linux build. Instead Mozilla tried to 
load unixcharset.Linux.properties. Also when 
is "ConvertLocaleToCharsetUsingDeprecatedConfig" being used? Do we need to 
worry about it at all?
>+    if (OSARCH!="Linux")
no, we should not have this kind of code.
I'll profile this on my home machine again - on my slow Suse 7.1 machine at 
work the only file access was to a non-existent unixcharset.Linux.properties 
which consumed 15 ms to return a failure and no further file access was 
necessary. 

Bstell pointed out that this might not be the case for Linux versions, 
certainly not some older ones. However, if requesting a non-existed file takes 
less than 0.1% of startup time on Netscape supported Redhat versions, we should 
probably push this out and come up with a more comprehensive solution as 
discussed offline with bstell.
moving to M097 - new profiling data on another machine has shown that an 
attempt to load an non-existent string bundle carries an I/O penalty of ~15 ms 
which is ~0.3% if our startup time. Since it's below the 0.5% threshold it's 
not as critical, but we should still fix it.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
jbetak's contract is up. Bulk move bugs to ftang
Assignee: jbetak → ftang
Status: ASSIGNED → NEW
Linux code. Give to shanjian. move to m0.9.8
Assignee: ftang → shanjian
Target Milestone: mozilla0.9.7 → mozilla0.9.8
brian,
On Linux (and hopefully most other unix system), could we try to use 
the result returned by nl_langinfo(CODESET) before loading platform 
specific property file? I believe file "unixcharset.OSARCH.properties"
is suppose to provide charset for unusual vendor specific charset 
names. We don't really need to overload some names. So it should be 
safe to change the trying order. 
Status: NEW → ASSIGNED
The file is there to provide a point to correct wrong values *from*
nl_langinfo(CODESET).

Reversing the order of loading would defeat this.

However, I do not know of any case where this is needed 
so the result of the change you propose is presently unknown.

No one can say either "safe" or "unsafe".

If I wanted to be safe I would add a pref to unix.js that disables the
loading of the file and wait to see if anyone complains.

If I was feeling brave I would just remove the code and wait to see if '
anyone complains.
Attached patch proposed patch (obsolete) (deleted) β€” β€” Splinter Review
Attachment #54738 - Attachment is obsolete: true
I rearrange the calling sequence. Now we try to use charset directly before 
anything else. I kept platform specific property file there to help resolve 
unknow charset name. 

Brian,
I was feeling very brave and made this patch. If there is any complain after 
this change, we can always reconsider it in future. Could you review?
>The file is there to provide a point to correct wrong values *from*
>nl_langinfo(CODESET).
>
>Reversing the order of loading would defeat this.
>
>However, I do not know of any case where this is needed 

ok. so... here is what I understand:
1. we add the loading of that file because we afraid that we may need to correct
the return value of nl_langinfo(CODESET)
2. we don't know any one do such wrong thing yet.

So... my suggestion is we remove such procedure for now untill we find out we
need it. We should not take performance hit for something we are not sure we
need. We should only consider take such hit when we are sure that we NEED it. 
frank, 
I believe that there is 2 purpose for the existence of that property file:
1) if the return value of nl_langinfo(CODESET) is incorrect, we can override it.
2) if the return value of nl_langinfo(CODESET) is not found in charset alias 
   file, we can put it here.

To achieve purpose 1, we have to load and check this property file first. For 
above mentioned reason, we can call it after we failed to use it directly. That 
way we still can achieve 2nd goal. 
2) probably won't cause startup performance 
1) will and we should consider remove 1)
frank, could you put r= there if you don't have objection to my approach?
r=bstell@ix.netcom.com
Brendan, could you sr?
Comment on attachment 61784 [details] [diff] [review]
proposed patch

>@@ -60,6 +60,7 @@
> #include <langinfo.h>
> #endif
> #include "nsPlatformCharset.h"
>+#include "nsAutoLock.h"
> 
> NS_IMPL_ISUPPORTS1(nsPlatformCharset, nsIPlatformCharset);

If this XPCOM object is referenced by more than one thread, as the following
suggests, you need NS_IMPL_THREADSAFE_... here.

>@@ -67,22 +68,30 @@
> static nsURLProperties *gInfo_deprecated = nsnull;
> static PRInt32 gCnt=0;
> 
>+//this lock is for protecting above static variable operation
>+static PRLock  *gLock = nsnull;
>+
> nsPlatformCharset::nsPlatformCharset()
> {
>   NS_INIT_REFCNT();
>   PR_AtomicIncrement(&gCnt);
>+  if (!gLock)
>+    gLock = PR_NewLock();

This is not threadsafe, and could result in two locks being created (a leak,
and possibly a further thread-safety hazard, if one thread runs for a while
with the first lock cached in an nsAutoLock, while the second thread creates,
stores, and uses the final or "race-winning" lock).

Consider using PR_CallOnce to initialize this lock, if you can't initialize the
lock in a function or method that you are certain runs on the main thread
before any other thread could construct an nsPlatformCharset.

I'll review a new patch that fixes these bugs.

/be
Attachment #61784 - Flags: needs-work+
Will this object reference from different thread ?
>Will this object reference from different thread ?
if not, we can remove those locking code. How can we tell ?
Follow the uses of nsIPlatformCharset via lxr.  The interface is not scriptable,
so the only code that could call into nsPlatformCharset's implementation is C++
code.  A quick glance at the lxr made me worry that some Necko thread uses uconv
or something else that uses nsIPlatformCharset.  Cc'ing darin.

A little thread-safety violation can be a big problem, so unless you're willing
to assert (NS_ASSERTION) that the implementation runs only on the main thread,
and you can show good arguments for keeping it that way, and doc-comments that
warn potential users about the limitation, I think the part of this bug's
patches that tries to make nsPlatformCharset thread-safe is moving in the right
direction -- but we're not safe yet.

/be
Attached patch updated patch for thread safety issue (obsolete) (deleted) β€” β€” Splinter Review
Attachment #61784 - Attachment is obsolete: true
brendan, could you sr?
fyi, bug 99382 is for all thread safety issue in charset conversion. I just
intend to make this file thread safe while I was touching it. A complete
solution to the whole module will happen in 99382. 
Comment on attachment 63425 [details] [diff] [review]
updated patch for thread safety issue

sr=brendan@mozilla.org

Nit: the if(nsnull == gInfo_deprecated) silly style of comparing a pointer to
nsnull, with nsnull on the left, is not observed in much of the code visible in
the patch, so why preserve it in your changes?	See  'if (gNLInfo)' and
'if(gInfo_deprecated && locale.Length())' for alternatives.

Those remind me: how about consistent 'if(' or 'if (' style?  And don't use
.Length(), use .IsEmpty().

/be
Attachment #63425 - Flags: superreview+
loading file://some/directory can lead to charset conversion happening on one of
the background file transport threads.  cc'ing bbaetz as he would know best what
requirements the directory index stream has on charset conversion.
Attached patch the final patch used for checkin (deleted) β€” β€” Splinter Review
fix some nits pointed out by brendan.
Attachment #63425 - Attachment is obsolete: true
Attachment #64065 - Flags: superreview+
Attachment #64065 - Flags: review+
fix checked in. 
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
darin: It could, on some platforms (unix, but not windows - not sure about mac).
That code isn't enabled, but it may be shortly - I need to rework that stuff
soonish.
Changing QA contact to ftang for now.  Frank, can this be verified within
development or can you provide a test case for QA? 

CCing teruko on this bug.
QA Contact: andreasb → ftang
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: