Closed Bug 1080518 Opened 10 years ago Closed 10 years ago

Treeherder needs a robots.txt to prevent load from crawlers

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: fubar)

References

Details

Ed Morley [:emorley]

Reporter

Description

•

10 years ago

We should probably add a robots.txt to stop additional load from search engines crawling treeherder URLs, since they'll be referenced all over the place. We could add one to the UI repo root, but I guess that will only cover treeherder.m.o/ui/* unless we fiddle with the apache config and add a redirect from root?

Ed Morley [:emorley]

Reporter

Comment 1

•

10 years ago

I meant to add: search engines support JS, so hitting the UI means they do cause API load, plus they'll also likely stumble upon API URLs in bug comments, or worse things like the dynamically generated Swagger docs (treeherder-dev.a.org/docs/ ; is disabled on prod).

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Blocks: 1080757

Summary: Treeherder needs a robots.txt → Treeherder needs a robots.txt to prevent load from crawlers

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Component: Treeherder → Treeherder: Infrastructure

QA Contact: laura

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

No longer blocks: 1080757

Kendall Libby [:fubar] (he/him)

Assignee

Comment 2

•

10 years ago

Going with the default no-bots-here robots.txt unless there's something you'd like excepted: User-agent: * Disallow: / deployed on stage and prod webheads

Assignee: nobody → klibby

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Ed Morley [:emorley]

Reporter

Comment 3

•

10 years ago

sgtm, thank you :-) Is this currently managed via puppet? Is this something we could get checked into the repo instead? I'#m a fan of having as few hidden/magical things as possible, and for most people if it isn't in the repo, its invisible to them :-)

Kendall Libby [:fubar] (he/him)

Assignee

Comment 4

•

10 years ago

it is in puppet because of the apache config, which is managed by the webapp module. with the proxy happening, you'd have to have gunicorn handle it if you wanted it in the repo, I think.

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Depends on: 1118387

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Treeherder needs a robots.txt to prevent load from crawlers

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P3)

Tracking

(Not tracked)

People

(Reporter: emorley, Assigned: fubar)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Updated

Comment 2

Comment 3

Comment 4

Updated