Closed Bug 1773733 Opened 2 years ago Closed 2 years ago

Crash in [@ core::option::expect_failed | futures_util::stream::stream::next::impl$3::poll<T>]

Categories

(Core :: Internationalization: Localization, defect)

Unspecified
All
defect

Tracking

()

RESOLVED FIXED
108 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox106 --- wontfix
firefox107 --- wontfix
firefox108 --- fixed

People

(Reporter: gsvelto, Assigned: gregtatum)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(13 files)

(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details

Crash report: https://crash-stats.mozilla.org/report/index/db3e1a71-7ae9-4238-b436-e8cf80220610

MOZ_CRASH Reason: Index out-of-range

Top 10 frames of crashing thread:

0 xul.dll RustMozCrash mozglue/static/rust/wrappers.cpp:17
1 xul.dll mozglue_static::panic_hook mozglue/static/rust/lib.rs:91
2 xul.dll core::ops::function::Fn::call<void  ../7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:227
3 xul.dll std::panicking::rust_panic_with_hook ../7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c//library/std/src/panicking.rs:702
4 xul.dll std::panicking::begin_panic_handler::closure$0 ../7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c//library/std/src/panicking.rs:588
5 xul.dll std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::closure_env$0, never$> ../7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c//library/std/src/sys_common/backtrace.rs:138
6 xul.dll std::panicking::begin_panic_handler ../7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c//library/std/src/panicking.rs:584
7 xul.dll core::panicking::panic_fmt ../7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c//library/core/src/panicking.rs:143
8 xul.dll core::panicking::panic_display<str> ../7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c//library/core/src/panicking.rs:73
9 xul.dll core::panicking::panic_str ../7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c//library/core/src/panicking.rs:56

This appears to be a new crash and it's spiking at least according to clouseau. It's an index-out-of-bounds access in fluent code of which I'm not very familiar with. It appears harmless as this is in Rust code and the crash is happening in a controlled way.

Happening in:

futures_util::stream::stream::next::impl$3::poll<
  fluent_fallback::cache::AsyncCacheStream<
    l10nregistry::registry::asynchronous::GenerateBundles<
      l10nregistry_ffi::env::GeckoEnvironment,
      l10nregistry_ffi::registry::GeckoBundleAdapter
    >,
    alloc::rc::Rc<fluent_bundle::resource::FluentResource>
  >
>(
  core::pin::Pin<
    ref_mut$<
      futures_util::stream::stream::next::Next<
        fluent_fallback::cache::AsyncCacheStream<
          l10nregistry::registry::asynchronous::GenerateBundles<
            l10nregistry_ffi::env::GeckoEnvironment,
            l10nregistry_ffi::registry::GeckoBundleAdapter
          >,
          alloc::rc::Rc<fluent_bundle::resource::FluentResource>
        >
      >
    >
  >,
  core::task::wake::Context*
)

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 content process crashes on release

For more information, please visit auto_nag documentation.

Keywords: topcrash

I think this is the same crash or a similar one, but with inline stacks: https://crash-stats.mozilla.org/report/index/95f42952-fed4-4564-bb0d-f1ac00220930#tab-details

https://github.com/mozilla/l10nregistry-rs/commit/1cc7df0203f7451b0215e4202c464aca7e0217f7

That one is on Linux rather than Windows, but I'm unable to find an example with inline stacks on Windows.

I suspect this PR had some logic errors in it, or assumption issues.

https://github.com/mozilla/l10nregistry-rs/pull/14/files

Here is a searchfox for these checks which are all looking up sources. https://searchfox.org/mozilla-central/search?q=Index%20out-of-range&path=&case=false&regexp=false

Seems like the crashes all have multiple langpacks installed.

Here are crash stats for all of the "Index out-of-range" errors:

https://crash-stats.mozilla.org/search/?moz_crash_reason=~Index%20out-of-range&date=%3E%3D2022-09-30T20%3A47%3A00.000Z&date=%3C2022-10-07T20%3A47%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

There appears to be logic errors around the metasources. I haven't reproduced any yet though. I'm not sure if it's something async that's happening that's invalidating the logic. The flow is a bit hard to follow.

Assignee: nobody → gtatum

Depends on D159437

Depends on D159438

Depends on D159440

This is a bit of chore work, as the upcoming MetaSources struct will not
need access to this member, and it's simpler without it.

Depends on D159441

This is probably the riskiest patch in the group, as it's touching the
core async iterator algorithm. The previous implementation relied on
pure match branches for handling things, but it made it awkward to share
some of the logic with a macro. I kept on running into issues with
refactoring tools not being able to work on the macro.

The changes here makes the code a bit more imperative, but I added
documentation of the current state that thing are in, so that it's
clearer where we were in the process of the algorithm.

This is a pure refactor and should not have any behavior changes.

Depends on D159442

I believe the crashes are due to the metasources list being mutated
while the async iterators are running. This changes should fix this
issue if it is in fact the root cause. The scope of this patch is a bit
larger since it changes the implementation to use a proper MetaSources
struct.

This struct centralizes access and mutation to the MetaSources,
providing an ergonomic API that should be fairly idiomatic and similar
to other Rust APIs.

The Mutex is not needed as this code is not multi-threaded. The RefCell
should provide proper access controls for handling the mutation of the
MetaSources.

The MetaSources struct also favors a Vec<Vec<Rc<T>>> rather than
Vec<Vec<T>> so that it can be copied very cheaply for the async
iterators.

Depends on D159445

Status: NEW → ASSIGNED

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

Depends on D159446

Depends on D159714

Pushed by gtatum@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c215a7e88153 Rename set_adapt_bundle to set_bundle_adapter; r=nordzilla https://hg.mozilla.org/integration/autoland/rev/b42b728d2235 Rename get_source to file_source_by_name; r=nordzilla https://hg.mozilla.org/integration/autoland/rev/3f69b450af5e Mark test-only methods as cfg(test); r=nordzilla https://hg.mozilla.org/integration/autoland/rev/2eab1da94ac4 Rename generate_sources_for_file to get_sources_for_resource; r=nordzilla https://hg.mozilla.org/integration/autoland/rev/0558121f56e9 Fill in reasons for unreachable panics; r=nordzilla https://hg.mozilla.org/integration/autoland/rev/b15b6e32afc8 Remove the bundle_adapter from the L10nRegistryLocked; r=nordzilla https://hg.mozilla.org/integration/autoland/rev/3a95bda2779f Re-work and document the L10nRegistry bundle algorithm; r=nordzilla https://hg.mozilla.org/integration/autoland/rev/8fbfbfeb5d9d Rename L10nRegistry methods to be consistent and idiomatic; r=nordzilla https://hg.mozilla.org/integration/autoland/rev/7403fe1dff86 Add a bit more documentation to various pieces of the L10nRegistry; r=nordzilla https://hg.mozilla.org/integration/autoland/rev/e73a52476a21 Remove locking and add a MetaSources struct; r=nordzilla https://hg.mozilla.org/integration/autoland/rev/fcd90ecf0100 Add a fuzzing test for the l10nregistry; r=nordzilla https://hg.mozilla.org/integration/autoland/rev/da8ee953ff70 Add a README.md for l10nregistry-rs; r=nordzilla https://hg.mozilla.org/integration/autoland/rev/eda125da6dae Remove some commented out code in l10n-registry; r=nordzilla

Comment on attachment 9299263 [details]
Bug 1773733 - Remove some commented out code in l10n-registry; r?nordzilla

Beta/Release Uplift Approval Request

  • User impact if declined: This resolves a top content crasher in Beta:

https://crash-stats.mozilla.org/search/?moz_crash_reason=%3DIndex%20out-of-range&product=Firefox&date=%3E%3D2022-10-26T14%3A42%3A00.000Z&date=%3C2022-11-02T14%3A42%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: * Go to about:preferences
  • Download and install another language pack
  • Switch between the Firefox language and the langpack language.
  • Ensure nothing crashes and the page is translated as expected.
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): This patch is kind of weird, because the issue only affects browsers with langpacks. Langpacks are not used in Nightly, so the broader Nightly population can't trigger the code paths very easily. I did add a fuzzing test which can pretty reliably trigger the old crashes, so we should have coverage there.

I have manually installed langpacks in Nightly and verified the fix myself. I marked the risk as medium, since it's not a trivial change, and affects the core part of string translations.

  • String changes made/needed:
  • Is Android affected?: Unknown
Attachment #9299263 - Flags: approval-mozilla-beta?
Attachment #9298702 - Flags: approval-mozilla-beta?
Attachment #9298703 - Flags: approval-mozilla-beta?
Attachment #9298704 - Flags: approval-mozilla-beta?
Attachment #9298705 - Flags: approval-mozilla-beta?
Attachment #9298706 - Flags: approval-mozilla-beta?
Attachment #9298707 - Flags: approval-mozilla-beta?
Attachment #9298708 - Flags: approval-mozilla-beta?
Attachment #9298709 - Flags: approval-mozilla-beta?
Attachment #9298710 - Flags: approval-mozilla-beta?
Attachment #9298711 - Flags: approval-mozilla-beta?
Attachment #9299261 - Flags: approval-mozilla-beta?
Attachment #9299262 - Flags: approval-mozilla-beta?
Attachment #9298702 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Comment on attachment 9298703 [details]
Bug 1773733 - Rename get_source to file_source_by_name; r?nordzilla

Rejecting beta uplift request for Bug 1773733.
107 is near the end of the beta cycle. Since this patch is not testable in Nightly, we should give it some time in Beta with 108, when 108 rides the train to Beta.
When 108 get's to Beta, we should monitor Bug 1798506. If there is a reduction in the crash rate.
We could later consider including Bug 1773733 in a 107 dot-release, depending on stability in beta.

Attachment #9298703 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9298704 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9298705 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9298706 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9298707 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9298708 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9298709 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9298710 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9298711 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9299261 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9299262 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9299263 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Seems reasonable to me, thanks!

Flags: qe-verify+

I'm not seeing any crashes in this area in 108b so far, nor any new bugs filed on breakage. I'm still monitoring.

:dmeehan, when should we consider making a decision on uplifting to release? I can continue to monitor.

Flags: needinfo?(dmeehan)

(In reply to Greg Tatum [:gregtatum] from comment #26)

I'm not seeing any crashes in this area in 108b so far, nor any new bugs filed on breakage. I'm still monitoring.

:dmeehan, when should we consider making a decision on uplifting to release? I can continue to monitor.

:gregtatum we have a planned 107 dot release scheduled for 2022-11-29.
We could consider including this in that dot release, pending continued stability in 108. (108.0b2 is currently only at 50% rollout)
When you're confident, you submit a release uplift request and it will be on the list for uplift consideration.

Flags: needinfo?(dmeehan)
QA Whiteboard: [qa-triaged]

Comment on attachment 9299263 [details]
Bug 1773733 - Remove some commented out code in l10n-registry; r?nordzilla

Beta/Release Uplift Approval Request

  • User impact if declined: This resolves a top content crasher in Release.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Go to about:preferences
    Download and install another language pack
    Switch between the Firefox language and the langpack language.
    Ensure nothing crashes and the page is translated as expected.
    (I have not been able to reproduce the crash manually, but the following automated test can trigger it: intl/l10n/test/test_l10nregistry_fuzzed.js)
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): I changed this from Medium to Low, since it has been in Beta for awhile now, and I can't find any crashes related to Fluent in 108. I haven't seen any new reports of breakage around langpacks or translations. The surrounding code has been fairly stable as far as code changes, so I don't think there is much risk at this time. Plus there is an automated test covering this change.

The riskiest part of this patch is that it's touching how langpack resources are used for translations, so any breakage will affect the user visible language of the browser. However, with the automated test, and a follow-up to have QA manually verifying that the browser is still translating, I think there is only a low risk in taking it.

  • String changes made/needed:
  • Is Android affected?: Yes
Attachment #9299263 - Flags: approval-mozilla-release?
Attachment #9298702 - Flags: approval-mozilla-release?
Attachment #9298703 - Flags: approval-mozilla-release?
Attachment #9298704 - Flags: approval-mozilla-release?
Attachment #9298705 - Flags: approval-mozilla-release?
Attachment #9298706 - Flags: approval-mozilla-release?
Attachment #9298707 - Flags: approval-mozilla-release?
Attachment #9298708 - Flags: approval-mozilla-release?
Attachment #9298709 - Flags: approval-mozilla-release?
Attachment #9298710 - Flags: approval-mozilla-release?
Attachment #9298711 - Flags: approval-mozilla-release?
Attachment #9299261 - Flags: approval-mozilla-release?
Attachment #9299262 - Flags: approval-mozilla-release?

Copying crash signatures from duplicate bugs.

Crash Signature: [@ core::option::expect_failed | futures_util::stream::stream::next::impl$3::poll<T>] [@ core::option::expect_failed | <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll] → [@ core::option::expect_failed | futures_util::stream::stream::next::impl$3::poll<T>] [@ core::option::expect_failed | <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll] [@ core::option::expect_failed | l10nregistry::re…

Comment on attachment 9298702 [details]
Bug 1773733 - Rename set_adapt_bundle to set_bundle_adapter; r?nordzilla

108 goes to RC on Monday 2022-11-05. There's no 107 release vehicle for this before 108.

Attachment #9298702 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9298703 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9298704 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9298705 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9298706 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9298707 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9298708 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9298709 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9298710 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9298711 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9299261 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9299262 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9299263 - Flags: approval-mozilla-release? → approval-mozilla-release-

Tested this issue on RC 108.0 against Windows 10, Ubuntu 18 and macOS 11.6. Played with the browser, with multiple language packs and no crash occurred.

Please note that I could not reproduce the crash on an older build, without the fix, tried with 102.5.0esr build and Beta 107 builds.

Crash Signature: [@ core::option::expect_failed | futures_util::stream::stream::next::impl$3::poll<T>] [@ core::option::expect_failed | <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll] [@ core::option::expect_failed | → [@ core::option::expect_failed | futures_util::stream::stream::next::impl$3::poll<T>] [@ core::option::expect_failed | <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll] [@ core::option::expect_failed |
QA Whiteboard: [qa-triaged]
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: