Closed Bug 1348591 Opened 8 years ago Closed 7 years ago

BufferList overhead is too high

Categories

(Core :: IPC, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Performance Impact high
Tracking Status
firefox56 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: hchang)

References

Details

Attachments

(3 files)

I was looking at an Instruments profile for imgLoader::Load where there Pickle::WriteBytes is taking more than 30% of the time. It seems that in many cases, if you consider the memmove cost to be the "meat" of the WriteBytes, there is a lot of overhead in the rest of it. It seems like in bug 1262671 the trade-off was between using buffers of unequal size to save memory vs buffers of equal size with something like nsSegmentedBuffer. I'm not really sure I understand all of the details here. Bill, what do you think? One thing that *really* hurts us here is the malloc that is in the way here. We should do whatever we can to not need to malloc in most cases. On desktop, it seems like we should be able to afford doing that. Any other improvements we can make to the code you can think of? BTW, now we have telemetry on this, so if we for example eliminate the malloc, we can see the impact on telemetry.
Flags: needinfo?(wmccloskey)
Attached image What's hot in the assembly view (deleted) —
Attached image Instruments Profile (deleted) —
You might want to turn off sentinel checking when you measure this: http://searchfox.org/mozilla-central/rev/c9f5cc6b4593d461e87a6221dc0286b3859fd515/ipc/chromium/src/base/pickle.h#27 It increases the amount of work we do in this code. Looking at the code for BufferList::WriteBytes, I don't see a lot of fat to trim. We could update mSize for the BufferList all at once instead of inside the loop. That would save a couple additions, which isn't much. Maybe the updates to both |copy| and |remaining| are redundant, but I kinda figured the compiler could deal with those. Maybe it can't. I haven't looked closely at the assembly. We can't really eliminate the malloc. That's pretty much what this function does: allocate space and copy data into it. We could allocate in bigger chunks, but that means we'll waste more space. I never did much profiling to find a good chunk size. I think I used whatever we had used before. I doubt SegmentedVector would be more efficient here. It doesn't actually have a method to do what WriteBytes does. If we added one, it would probably do what WriteBytes does.
Flags: needinfo?(wmccloskey)
If we know the approximate size of specific messages, we could use a size that's more appropriate for them. In some cases that could be based on the types being sent (since some are fixed size). I'm not sure about the case in your profile though.
I don't know how much it will help, but maybe you could fix mStandardCapacity as a template argument (it looks like we use 4096 everywhere, except for a borrow method where it is 0 but presumably unused), then split out the last iteration of the loop that has to copy a partial segment, and maybe the compiler can inline the memcpy, which might help. It doesn't seem to have inlined the memcpy in the screenshot Ehsan posted. Maybe just fixing the segment size would be enough for it to decide to do something fancy with the memcpy.
But I guess having a larger chunk size for larger messages is probably a bigger win. The spread of message size is really large...
(In reply to Bill McCloskey (:billm) from comment #3) > You might want to turn off sentinel checking when you measure this: > http://searchfox.org/mozilla-central/rev/ > c9f5cc6b4593d461e87a6221dc0286b3859fd515/ipc/chromium/src/base/pickle.h#27 > It increases the amount of work we do in this code. Ugh, can we turn this off in non-debug builds? We are going to get telemetry on these serialization times, having this turned on will make our numbers misleading. > Looking at the code for BufferList::WriteBytes, I don't see a lot of fat to > trim. We could update mSize for the BufferList all at once instead of inside > the loop. That would save a couple additions, which isn't much. Maybe the > updates to both |copy| and |remaining| are redundant, but I kinda figured > the compiler could deal with those. Maybe it can't. I haven't looked closely > at the assembly. > > We can't really eliminate the malloc. That's pretty much what this function > does: allocate space and copy data into it. We could allocate in bigger > chunks, but that means we'll waste more space. I never did much profiling to > find a good chunk size. I think I used whatever we had used before. > > I doubt SegmentedVector would be more efficient here. It doesn't actually > have a method to do what WriteBytes does. If we added one, it would probably > do what WriteBytes does. Well, my suggestion was having a simple segmented vector with a larger buffer size. We have telemetry on our IPC message sizes that we can use to pick a smart size. Does that sound doable?
I think the IPC message size telemetry expired, or is about to expire, so you may need to bump the date on that.
(In reply to :Ehsan Akhgari (busy) from comment #7) > (In reply to Bill McCloskey (:billm) from comment #3) > > You might want to turn off sentinel checking when you measure this: > > http://searchfox.org/mozilla-central/rev/ > > c9f5cc6b4593d461e87a6221dc0286b3859fd515/ipc/chromium/src/base/pickle.h#27 > > It increases the amount of work we do in this code. > > Ugh, can we turn this off in non-debug builds? We are going to get > telemetry on these serialization times, having this turned on will make our > numbers misleading. I think it does actually catch bugs. We can turn it off for a little while, but it's a valuable assertion. Having it on DEBUG-only isn't very useful. Its purpose is to make weird IPC-related bugs easier to understand. > Well, my suggestion was having a simple segmented vector with a larger > buffer size. We have telemetry on our IPC message sizes that we can use to > pick a smart size. Does that sound doable? We can consider increasing the default segment size. The IPC_MESSAGE_SIZE histogram did expire in 55, but the data from 54 is still around and probably still valid. A bigger problem is that we only count a message in this histogram if the payload is >= 8K. It might be better to add a new histogram (probably one that wouldn't be keyed). I filed bug 1353159 for this.
Whiteboard: [qf]
Whiteboard: [qf] → [qf:investigate:p1]
Depends on: 1353159
Assignee: nobody → hchang
I cannot see any data from https://telemetry.mozilla.org/ so I did a very simple local short-term profiling and the result is at [1]. PContent::Msg_AccumulateChildKeyedHistograms PHttpChannel::Msg_OnTransportAndData PLayerTransaction::Msg_Update PBrowser::Msg_AsyncMessage PHttpChannel::Msg_OnStartRequest seem to be the hotspots. Will consult with people who are more familiar with telemetry stuff. [1] https://docs.google.com/spreadsheets/d/1JiJLPxSpChC9GbyivFDLqHbKNB4QDTzdAmXUflx0YdU/edit?usp=sharing
Henry, please recheck telemetry data after bug 1364556 has been fixed.
I further analyzed the local profiling data and put it in [1]. In short, 95% of the IPC messages have size less than 4096 and 98% less than 8192. By message: 4096 8192 12288 16384 20480 24576 28672 ----------------------------------------------------------------------------------------------------------- PContent::Msg_StoreAndBroadcastBlobURLRegistration 0 100 PContent::Msg_SetXPCOMProcessAttributes 0 0 0 0 0 0 0 PHttpChannel::Msg_OnStartRequest 36 100 PContent::Msg_AccumulateChildKeyedHistograms 56 67 74 80 89 93 95 PContent::Msg_AccumulateChildHistograms 65 67 68 68 69 70 70 PHttpChannel::Msg_OnTransportAndData 66 90 94 95 98 99 99 PHttpChannel::Msg_Redirect1Begin 68 100 PLayerTransaction::Msg_Update 76 94 97 99 99 99 99 PNecko::Msg_PHttpChannelConstructor 79 96 98 99 99 99 99 PWyciwygChannel::Msg_WriteToCacheEntry 81 92 96 100 PWyciwygChannel::Msg_SetSecurityInfo 82 100 PMessagePort::Msg_PostMessages 83 83 100 PMessagePort::Msg_ReceiveData 83 83 100 PContent::Msg_BroadcastLocalStorageChange 93 100 PBackgroundIDBCursor::Msg_Response 94 94 94 94 94 94 94 PContent::Msg_DispatchLocalStorageChange 94 100 PBrowser::Msg_AsyncMessage 96 97 99 99 99 99 99 If we set the pre-allocated buffer size to 8192: 33% of PContent::Msg_AccumulateChildKeyedHistograms 33% PContent::Msg_AccumulateChildHistograms 10% PHttpChannel::Msg_OnTransportAndData 6% PLayerTransaction::Msg_Update 5% PBackgroundIDBCursor::Msg_Response will require memory allocation in BufferList::Write(). So, we may either 1) Set 8192 to the pre-allocated buffer size 2) Set 12288 to the pre-allocated buffer size 3) Set 8192 as the pre-allocated size and specialize a size for Msg_OnTransportAndData and PLayerTransaction::Msg_Update since they seem to popular enough to have a custom size. [1] https://docs.google.com/spreadsheets/d/1qYpO4vBL2cPHWvHLZhjtAJR_HRGKeQNYSVmgzRUO9PE/edit#gid=1353915496
Hi Eshan, According to comment 12, would you like me to change the default preallocated size to 8192 first or we should still wait for the actual telemetry data? Thanks!
Flags: needinfo?(ehsan)
Hmm, I think I would probably get the telemetry data since I think getting a good breadth over all messages that we should care about may be difficult with some local testing, but I'm curious to know what Bill thinks here (and since he knows a lot more about IPC than I do, his opinion should probably override mine here!). If we decide to just use your local testing, option 3 in comment 12 looks really good to me based on the data at hand! Thanks a lot for digging through this so far. :-)
Flags: needinfo?(ehsan)
The telemetry I added in bug 1353159 in comparison with my local analysis lacks the global distribution and the per-message distribution. I am not so sure if they would become important in the end. I might firstly work on being able to have custom preallocated buffer size given that some messages do have outstanding message size.
There seem to be the following approaches to have custom preallocation size based on message type: 1) Templatize IPC::Message by "message id". (IPC::Message inherits Pickle, which owns the BufferList.) This requires the change of IPDL codegen to generate code like return new IPC::Message<PAPZCTreeManager:Msg_UpdateZoomConstraints__ID>(routingId, Msg_UpdateZoomConstraints__ID, ... If we don't want to increase the code size too much, we can have default template parameter for those message types which don't need to be customized. 2) Just dynamically determine the default preallocation size in like http://searchfox.org/mozilla-central/rev/f55349994fdac101d121b11dac769f3f17fbec4b/ipc/chromium/src/base/pickle.cc#140 One thing I wanted to point out is in which layer we should specify the custom message size. If we do not do it in the "*.ipdl", we have to hardcode the generated message id. I don't know if this would be an issue. Hi Bill, may I know your feedback about comment 12 and the approaches to have custom preallocation size? Thanks!
Flags: needinfo?(wmccloskey)
I'm now wishing that we had counted every message in bug 1353159, not just the ones over 4K. I should have realized that it makes the histogram useless for deciding whether 4K is a good size. The main thing I noticed from comment 12 is: > In short, 95% of the IPC messages have size less than 4096 and > 98% less than 8192. That suggests to me that we should not increase the default size at all. If we did, then the space would be wasted in 95% of cases and we would avoid an allocation in 3% of cases. That doesn't seem like a good trade-off at all! However, using custom message sizes for certain important messages seems fine to me. I think option (2) from comment 16 is the right way to do it. I would recommend having a .ini file to store the sizes in, sort of like how we have sync-messages.ini.
Flags: needinfo?(wmccloskey)
I think we have enough data so I'm changing this to qf:p1
Whiteboard: [qf:investigate:p1] → [qf:p1]
Telemetry data is out: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=1&end_date=2017-05-21&keys=__none__!__none__!__none__&max_channel_version=nightly%252F55&measure=IPC_MESSAGE_SIZE2&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-05-10&table=0&trim=1&use_submission_date=0 but it only has message sizes >= 4096 bytes, Assume the Prob(size < 4096) is 95% (according to comment 12%), then the normalized cumulative distribution should be P'(size < X) = 95% + P(size < X) * 5%, where P(x) is the probability from the telemetry and P'(x) is the normalized one. 95% < 4k 97.3% < 9.43k 97.7% < 12.99k 98.9% < 17.92k
Ping? Has there been any activity here?
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #20) > Ping? Has there been any activity here? Sorry I am working on bug 1355746 and bug 1349255 lately. This is the next one in my queue. Should I escalate its priority over others? BTW, I am going to take the *.ini* approach which mentioned in comment 17, which means I will have to modify the ipdl parser but it shouldn't be a big deal. Thanks :)
(In reply to Henry Chang [:hchang] from comment #21) > (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from > comment #20) > > Ping? Has there been any activity here? > > Sorry I am working on bug 1355746 and bug 1349255 lately. This is > the next one in my queue. Should I escalate its priority over others? Nope, just checking. Thanks a lot! :-)
I have almost done Bug 1355746 and hope to discuss bug 1349255 with Masayuki san in person. So, I will be working on this one lately and probably be able to get reviewed and landed in the work week.
Hi Bill, I have followed your suggestion and had a WIP patch designed as the following: 1) message-sizes.ini to descibe messages and their sizes which we'd like to have custom initial BufferList capacity for serialization. The .ini looks like [PContent::AccumulateChildKeyedHistograms] size = 16384 [PContent::AccumulateChildHistograms] size = 16384 [PLayerTransaction::Update] size = 8192 2) Modify ipdl.py, lower.py and etc to generate a .inc file like the follwing based on message-sizes.ini { {mozilla::dom::PContent::AccumulateChildKeyedHistograms, 16384} }, { {mozilla::dom::PContent::AccumulateChildKeyedHistograms, 16384} }, { {mozilla::layers::PLayerTransaction, 8192} }, 3) Modify ipc_message.h/cc and pickle.h/cc to use the custom default chunkc size based on the message id (according to the generated code) Do you think if my above approach on the right track? For example, would you prefer generating a header file instead of just a section of code? Would you like me to figure out how to generate code to list all header files which are required to include rather than including them manually? Thanks :)
Flags: needinfo?(wmccloskey)
Hi Henry, I would rather do this differently. Right now we generate a constructor function for each IPC message type. You can see some examples here: https://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/PContent.cpp#49 I think the right way to do this is to change the code in lower.py that generates these constructors so that it passes in the standard segment capacity to the Message constructor. The Python code to generate the constructors is here: http://searchfox.org/mozilla-central/rev/1b7b1ec949497e9fc2a9b9dfaccbe894ee2ad5ef/ipc/ipdl/ipdl/lower.py#1695-1736 This code gets called from here: http://searchfox.org/mozilla-central/rev/1b7b1ec949497e9fc2a9b9dfaccbe894ee2ad5ef/ipc/ipdl/ipdl/lower.py#1580-1596 I think this is the place you'll need to modify. You should be able to look up the message in a dictionary that's from from the .ini file to get the default size. Then you can pass it in as an argument.
Flags: needinfo?(wmccloskey)
Hi Bill, I believe I have followed your instruction in comment 27 to have a new patch pushed to reviewboard. What I am only uncertain is where/how to use the default segment size (which is 4096.) My current setup is to use '0' as an indicator to use the default segment size so we don't have to specify the actual default segment size in the codegen level. Instead, we define the default segment size in ipc_message.cc. Besides, regarding the ini file, I name it to message-metadata.ini so that we can put any kind of metadata in it in the future when needed. Do you think if it's a good idea? Thanks :)
Flags: needinfo?(wmccloskey)
I submitted version 4 for: 1) More custom segment capacities based on https://docs.google.com/spreadsheets/d/1qYpO4vBL2cPHWvHLZhjtAJR_HRGKeQNYSVmgzRUO9PE/edit#gid=1353915496 2) Added typo-proof check in ipdl.py. It works by enumerating all [protocol_name:message_name] from all ipdl files and compare if message-metadata.ini contains any message not in the enumerated ones. 3) Use default segment capacity for all "reply" messages. The ini file doesn't specify the capacity is for send or reply. In version 3, both way would use the same capacity. However, considering the reply message is usually very trivial and we are removing sync messages, using default capacity for serializing reply messages should be fine :)
Attachment #8879882 - Flags: review?(wmccloskey)
Comment on attachment 8879882 [details] Bug 1348591 - Support custom default segment buffer list size. https://reviewboard.mozilla.org/r/151288/#review157556 ::: ipc/chromium/src/base/pickle.cc:130 (Diff revision 2) > } > > // Payload is sizeof(Pickle::memberAlignmentType) aligned. > > -Pickle::Pickle(uint32_t header_size) > - : buffers_(AlignInt(header_size), kHeaderSegmentCapacity, kDefaultSegmentCapacity), > +Pickle::Pickle(uint32_t header_size, size_t default_segment_capacity) > + : buffers_(AlignInt(header_size), kHeaderSegmentCapacity, default_segment_capacity), I think it would be better to use the default_segment_capacity for the second argument here as well (initial capacity). I don't see any reason to make the first segment be small if we expect these messages to be large. We only want to do this if we have a default_segment_capacity, though. ::: ipc/chromium/src/chrome/common/ipc_message.cc:23 (Diff revision 2) > +#include "mozilla/dom/PContent.h" > +#include "mozilla/layers/PLayerTransaction.h" ???? ::: ipc/chromium/src/chrome/common/ipc_message.cc:49 (Diff revision 2) > + size_t mSize; > + } kMessageSizeTable[] = { > +#include "IPCMessageSize.inc" > + }; > + > + for (size_t i = 0; i < mozilla::ArrayLength(kMessageSizeTable); i++) { The .inc file should be sorted so that we can use binary search here. ::: ipc/chromium/src/base/pickle.cc:130 (Diff revision 4) > } > > // Payload is sizeof(Pickle::memberAlignmentType) aligned. > > -Pickle::Pickle(uint32_t header_size) > - : buffers_(AlignInt(header_size), kHeaderSegmentCapacity, kDefaultSegmentCapacity), > +Pickle::Pickle(uint32_t header_size, size_t segment_capacity) > + : buffers_(AlignInt(header_size), kHeaderSegmentCapacity, segment_capacity), If segment_capacity is not the default segment capacity, I think we should use it for the first segment as well (instead of kHeaderSegmentCapacity). It probably makes sense to pass 0 for segment_capacity if we want the default. We would continue to define kDefaultSegmentCapacity in Pickle. Then the constructor would look like: buffers_(AlignInt(header_size), segment_capacity ? segment_capacity : kHeaderSegmentCapacity, segment_capacity ? segment_capacity : kDefaultSegmentCapacity) Also, you could make the default value of segment_capacity be 0. ::: ipc/chromium/src/chrome/common/ipc_message.cc:32 (Diff revision 4) > #define MSG_HEADER_SZ sizeof(Header) > #endif > > namespace IPC { > > +static const uint32_t kDefaultSegmentCapacity = 4096; This could then be removed. ::: ipc/chromium/src/chrome/common/ipc_message.cc:41 (Diff revision 4) > Message::~Message() { > MOZ_COUNT_DTOR(IPC::Message); > } > > Message::Message() > - : Pickle(MSG_HEADER_SZ) { > + : Pickle(MSG_HEADER_SZ, kDefaultSegmentCapacity) { This wouldn't need to change. ::: ipc/chromium/src/chrome/common/ipc_message.cc:60 (Diff revision 4) > #endif > InitLoggingVariables(); > } > > -Message::Message(int32_t routing_id, msgid_t type, NestedLevel nestedLevel, PriorityValue priority, > +Message::Message(int32_t routing_id, msgid_t type, > + uint32_t segmentCapacity, You have an extra space at the end of the line. Also, please call this segment_capacity. ::: ipc/chromium/src/chrome/common/ipc_message.cc:63 (Diff revision 4) > > -Message::Message(int32_t routing_id, msgid_t type, NestedLevel nestedLevel, PriorityValue priority, > +Message::Message(int32_t routing_id, msgid_t type, > + uint32_t segmentCapacity, > + NestedLevel nestedLevel, PriorityValue priority, > MessageCompression compression, const char* const aName, bool recordWriteLatency) > - : Pickle(MSG_HEADER_SZ) { > + : Pickle(MSG_HEADER_SZ, segmentCapacity ? segmentCapacity : kDefaultSegmentCapacity) { You won't need the conditional here. Just pass segment_capacity. ::: ipc/ipdl/ipdl.py:184 (Diff revision 4) > - ipdl.gencxx(filename, ast, headersdir, cppdir) > + ipdl.gencxx(filename, ast, headersdir, cppdir, segmentCapacityDict) > > if ast.protocol: > allmessages[ast.protocol.name] = ipdl.genmsgenum(ast) > allprotocols.append('%sMsgStart' % ast.protocol.name) > + # e.g. PContent::RequestMemoryReport (not prefixed or sufficed.) suffixed
Attachment #8879882 - Flags: review?(wmccloskey)
Sorry, MozReview mixed up my comments from the old version of the patch and the new one. The new patch looks good. I've just requested a few changes to the argument handling.
Flags: needinfo?(wmccloskey)
Hi Bill, I have a new patch submitted (which addressed all your comments) so could you help review again? Thanks :)
Comment on attachment 8879882 [details] Bug 1348591 - Support custom default segment buffer list size. https://reviewboard.mozilla.org/r/151288/#review159664 ::: ipc/chromium/src/chrome/common/ipc_message.h:70 (Diff revision 6) > // > // NOTE: `recordWriteLatency` is only passed by IPDL generated message code, > // and is used to trigger the IPC_WRITE_LATENCY_MS telemetry. > Message(int32_t routing_id, > msgid_t type, > + uint32_t segmentCapacity = 0, // 0 for the default capacity. You'll need to fix this constructor here: http://searchfox.org/mozilla-central/rev/e8f4f51cd543f203e9cb861cecb7545ac43c836c/ipc/glue/ProtocolUtils.cpp#77 Or you can delete it since it's unused (as well as the Open and Bridge functions). ::: ipc/chromium/src/chrome/common/ipc_message.cc:57 (Diff revision 6) > -Message::Message(int32_t routing_id, msgid_t type, NestedLevel nestedLevel, PriorityValue priority, > +Message::Message(int32_t routing_id, msgid_t type, > + uint32_t segment_capacity, > + NestedLevel nestedLevel, PriorityValue priority, > MessageCompression compression, const char* const aName, bool recordWriteLatency) Please put each argument on its own line.
Attachment #8879882 - Flags: review?(wmccloskey) → review+
Thanks Bill! (In reply to Bill McCloskey (:billm) from comment #37) > Comment on attachment 8879882 [details] > Bug 1348591 - Support custom default segment buffer list size. > > https://reviewboard.mozilla.org/r/151288/#review159664 > > ::: ipc/chromium/src/chrome/common/ipc_message.h:70 > (Diff revision 6) > > // > > // NOTE: `recordWriteLatency` is only passed by IPDL generated message code, > > // and is used to trigger the IPC_WRITE_LATENCY_MS telemetry. > > Message(int32_t routing_id, > > msgid_t type, > > + uint32_t segmentCapacity = 0, // 0 for the default capacity. > > You'll need to fix this constructor here: > http://searchfox.org/mozilla-central/rev/ > e8f4f51cd543f203e9cb861cecb7545ac43c836c/ipc/glue/ProtocolUtils.cpp#77 > > Or you can delete it since it's unused (as well as the Open and Bridge > functions). > This is so scary! I fixed the similar issue in Shmem.cpp but still missed this one. I'd like to move the newly added argument (segment_capacity) all the way to be the last one to avoid this kind of mismatch. Do you have any concern on this change? Thank!
Flags: needinfo?(wmccloskey)
I don't really care about the order. However, you can find all the calls to that particular Message constructor with this search: http://searchfox.org/mozilla-central/search?q=symbol:_ZN3IPC7MessageC1EijNS0_11NestedLevelENS0_13PriorityValueENS0_18MessageCompressionEPKcb&redirect=false Aside from generated code, there aren't very many, and it's easy to see that you'll have covered them all once the ChannelOpened thing is fixed.
Flags: needinfo?(wmccloskey)
Pushed by hchang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e3a6a933eee5 Support custom default segment buffer list size. r=billm
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: