Closed
Bug 1246408
Opened 9 years ago
Closed 9 years ago
Update EMR release to 4.3.0
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rvitillo, Assigned: whd)
References
Details
No description provided.
Reporter | ||
Comment 1•9 years ago
|
||
Once 1248336 lands, Parquet datasets will be accessible both from Spark and Presto. Packing more profiles per row group seems to be triggering a Spark bug that causes the "take(N)" operation to require a full scan of the dataset. The bug can be avoided by converting the dataset to a RDD, but that impacts performance. Spark 1.6 doesn't suffer from this issue and we should upgrade asap.
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(whd)
Reporter | ||
Comment 2•9 years ago
|
||
Note that Hive has to be deployed as well for Spark 1.6 to be able to read Parquet datasets.
Updated•9 years ago
|
Points: --- → 1
Assignee | ||
Comment 3•9 years ago
|
||
https://github.com/mozilla/emr-bootstrap-spark/pull/16
https://github.com/mozilla/telemetry-server/pull/146
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(whd)
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•