Closed Bug 1344020 Opened 8 years ago Closed 8 years ago

Drop support for EMR 4 series in analysis tools

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whd, Assigned: rvitillo)

References

Details

Attachments

(3 files)

(deleted), text/x-github-pull-request
mreid
: review+
Details
(deleted), text/x-github-pull-request
jezdez
: review+
Details
(deleted), text/x-github-pull-request
mreid
: review+
Details
The agent of change here is the desire to use a centralized metastore, which is not supported on 4.X. We could set up multiple configurations to continue supporting the old version, but we should be moving to spark 2 for things anyway. Some things we need to do (not necessarily in order): 1. Announce that we're doing this to the appropriate lists. 2. Check and migrate airflow jobs using 4.X. Here's a list of dag nodes that are using the default release label, and thus need to be checked: android_addons.py:t0 = EMRSparkOperator(task_id="android_addons", job_name="Update android addons"...) android_clients.py:t0 = EMRSparkOperator(task_id="android_clients", job_name="Update android clients"...) android_events.py:t0 = EMRSparkOperator(task_id="android_events", job_name="Update android events"...) bugzilla_dataset.py:t0 = EMRSparkOperator( task_id="update_bugs", job_name="Bugzilla Dataset Update"...) example.py:t0 = EMRSparkOperator(task_id = "spark", job_name = "Spark Example Job"...) example.py:t1 = EMRSparkOperator(task_id = "bash", job_name = "Bash Example Job"...) longitudinal.py:t1 = EMRSparkOperator(task_id="update_orphaning", job_name="Update Orphaning View"...) longitudinal.py:t3 = EMRSparkOperator(task_id="game_hw_survey", job_name="Game Hardware Survey"...) main_summary.py:t2 = EMRSparkOperator(task_id="engagement_ratio", job_name="Update Engagement Ratio"...) main_summary.py:t5 = EMRSparkOperator(task_id="daily_search_rollup", job_name="Daily Search Rollup"...) mobile_clients.py:t0 = EMRSparkOperator(task_id="mobile_clients", job_name="Update mobile clients"...) telemetry_aggregates_fennec_backfill.py:t0 = EMRSparkOperator(task_id = "telemetry_aggregate_fennec_backfill", job_name = "Telemetry Aggregate Fennec Backfill"...) telemetry_aggregates.py:t0 = EMRSparkOperator(task_id = "telemetry_aggregate_view", job_name = "Telemetry Aggregate View"...) 3. Change the default release_label in telemetry-airflow (most override the default to use 5.X series anyway, none set it explicitly). 4. Remove 4.X series from selectable EMR releases on ATMO. 5. Migrate the Churn scheduled ATMO job. This might be a dupe of a different churn job in airflow so maybe we can just remove it. It's owned by :Dexter but references an :mreid s3 path in the code, and lives at s3://telemetry-analysis-code-2/jobs/telemetry-churn-atmov2/Churn.ipynb. It might also make sense to just make the release_label a required argument with no default value for EMRSparkOperator, so that we're always forced to piecemeal migrate jobs when we deprecate old EMR versions, as opposed to just bumping the default and accidentally breaking something. It might be better to have separate bugs for each of these things, making this a meta bug, but I filed it as-is and people can split it out if needed.
Attached file PR (deleted) —
Attachment #8844435 - Flags: review?(mreid)
Attachment #8844435 - Flags: review?(mreid) → review+
Attached file PR (deleted) —
Attachment #8844438 - Flags: review?(jezdez)
Attached file PR (deleted) —
Attachment #8844445 - Flags: review?(mreid)
Assignee: nobody → rvitillo
Points: --- → 2
Priority: -- → P1
Attachment #8844445 - Flags: review?(mreid) → review+
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Attachment #8844438 - Flags: review?(jezdez) → review+
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: