Closed Bug 774766 Opened 12 years ago Closed 10 years ago

hg.mozilla.org should give public access-control headers

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

All
Other
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: benjamin, Unassigned)

References

()

Details

hg.mozilla.org doesn't have any login, and it would be very helpful for developing some cool resources if it had public access control so that other sites can get log/blame information directly. I believe that it is sufficient to just add the "Access-Control-Allow-Origin: *" header to all responses. Doing that from the app itself may involve forking code changes, but I'm hoping we could do this with htaccess or in the load balancer. Ted says to cc bkero, he would know!
CC'ing infrasec for possible security issues by doing this.
Assignee: server-ops → server-ops-devservices
Component: Server Operations → Server Operations: Developer Services
QA Contact: phong → shyam
Whiteboard: Waiting on Opsec
AFAIK hg.mo is read-only and there's no login component.
Flags: sec-review?(yboily)
Whiteboard: Waiting on Opsec
FWIW, I don't think there's actually any way to make this work right now. The webcommands that we implement in our extensions (pushlog) all send this header, but hgweb doesn't seem to have any global facility for sending a custom header, and I couldn't find anything about making Apache insert additional headers into the output of wsgi apps. I'm sure there's some fancy way to do this, but I don't know what it is.
It should be fairly easy to inject headers in the top-level WSGI script.
There's no security issue adding "Access-Control-Allow-Origin: *" to all responses.
Flags: sec-review?(yboily) → sec-review+
This would really help with m-cMerge as well as a future planned bug filer for intermittent failures (that would use hg log info to suggest which devs should be on the CC list). Dirkjan, is the comment 6 suggestion something that you would be able to do (now that we have sec-review+)? (Not sure which team I need to be asking). Thank you! :-)
Flags: needinfo?(dirkjan)
I guess. Are the WSGI scripts even in version control somewhere?
Flags: needinfo?(dirkjan)
I have no visibility into the hg.m.o setup, sorry.
CC'ing some folks who might.
probably a question for fox2mike to answer, I think
Flags: needinfo?(shyam)
More of a jakem thing now
Flags: needinfo?(shyam) → needinfo?(nmaul)
I think we can easily do this at the Apache level, rather than within the WSGI app. The file to modify in puppet is: modules/hg_new/templates/httpd-hg.mozilla.org The line to add is: Header set Access-Control-Allow-Origin "*" I believe anything returned by mod_wsgi to Apache will have this processed afterwards, meaning any such header set by mod_wsgi would be replaced by this one in Apache. If no such header exists, Apache will add it. http://httpd.apache.org/docs/current/mod/mod_headers.html#early supports this belief. This should be a complete solution. For the record we have other sites that use this exact directive... just not with content returned from mod_wsgi (usually we do it with stuff like fonts, which Apache handles directly). When is a good time to try this out?
Flags: needinfo?(nmaul) → needinfo?
Do you think this needs a scheduled downtime? Seems straightforward enough that we should be able to just make the change and quickly test it out, but I don't know what your process for changes like this typically are.
Assignee: server-ops-devservices → server-ops-webops
Component: Server Operations: Developer Services → Server Operations: Web Operations
Flags: needinfo?
QA Contact: shyam → nmaul
bkero and I talked about this. We have no concerns, and I don't feel a need to schedule a downtime. I've committed this change to puppet, and it should be visible in a few minutes....
Done! Looks good from here. :)
Assignee: server-ops-webops → nmaul
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
This caused tbpl to break. Instead of reverting the apache configuration addition I've modified tbpl to remove ading these (since we're adding identical flags to the whole site now). Diff follows. diff -r d445a97d837d pushlog-feed.py --- a/pushlog-feed.py Fri May 17 09:02:03 2013 -0400 +++ b/pushlog-feed.py Thu May 30 00:03:05 2013 -0700 @@ -35,7 +35,6 @@ hgwebcommands.__all__.append(name) ATOM_MIMETYPE = 'application/atom+xml' -ACL_HEADER = 'Access-Control-Allow-Origin', '*' # just an enum class QueryType: @@ -322,7 +321,6 @@ 'files': [{'name': fn} for fn in ctx.files()], }) - req.headers.append(ACL_HEADER) req.respond(HTTP_OK, ATOM_MIMETYPE) return tmpl('pushlog', **data) @@ -426,7 +424,6 @@ parity = paritygen(web.stripecount) - req.headers.append(ACL_HEADER) return tmpl('pushlog', changenav=changenav(), rev=0, @@ -465,7 +462,6 @@ def pushes(web, req, tmpl): """WebCommand to return a data structure containing pushes.""" query = pushlogSetup(web.repo, req) - req.headers.append(ACL_HEADER) return tmpl('pushes', data=pushes_worker(query, 'full' in req.form and web)) def printpushlog(ui, repo, *args):
WFM. Thanks for fixing this!
Just some explanation here: This broke because apparently the Apache directive "Header set" is not properly overriding headers set by mod_wsgi. I don't know if this is a bug or intended behavior (the apache docs lead me to believe it's a bug), but the net result was that on certain pages where this header was already set, Apache was setting *another* one. With most headers this is okay (albeit weird), but with CORS headers it's illegal, and browsers are expected to fail the request. That's what was happening here. http://httpd.apache.org/docs/current/mod/mod_headers.html#header ("set" is acting like "add", don't know why yet) Very sorry for the service interruption.
I asked around in IRC (freenode, #httpd), and haven't heard any definitive information indicating this is a known or expected issue. All signs point to "it should have worked". I've tested a very bare-bones Python/wsgi app, and it works properly: ------ def application(environ, start_response): status = '200 OK' output = 'Hello World!' response_headers = [('Content-type', 'text/plain'), ('Content-Length', str(len(output))), ('Foo', 'bar')] start_response(status, response_headers) return [output] ------ <VirtualHost *:80> ServerName jakem-test WSGIScriptAlias / /data/www/jakem-test/test-wsgi.wsgi Header set Foo bar2 </VirtualHost> ------ I can set the header "Foo: bar" in the app, and Apache will override it as expected. The end result has only one header, no duplication. The value of the header doesn't matter, it works whether they're the same or different. Just to be explicit, it worked with "Access-Control-Allow-Origin: *" as well. I haven't tested hg/pushlog directly, since that's more of a pain to get up and running. That's probably the next logical step. At this point all I can figure is that Apache was not detecting this header as being a duplicate. I still don't know why. It seems to be specific to the hg web app, or maybe the pushlog extension. I'm curious to know why this broke, but at this point it's hard for me to justify spending any more time on it. Especially with the major hg upgrade looming, there's a decent chance that whatever the problem was, it will have been tweaked enough to be unreproducible (if not outright fixed). If someone else has the time, I would love to hear a more authoritative answer.
http://hg.mozilla.org/hgcustom/pushlog/file/d445a97d837d/pushlog-feed.py#l38 -- manually adds the access-control-allow header Which does a req.headers.append() http://hg.mozilla.org/hgcustom/pushlog/file/d445a97d837d/pushlog-feed.py#l325 in the wsgi (and elsewhere in that file too) Which requires that /pushlog and /json-pushes *both* have that header, and is even in tests at http://hg.mozilla.org/hgcustom/pushlog/file/d445a97d837d/runtests.py#l201 (In reply to Ben Kero [:bkero] from comment #18) > This caused tbpl to break. Instead of reverting the apache configuration > addition I've modified tbpl to remove ading these (since we're adding > identical flags to the whole site now). Diff follows. Because of those tests you just broke pushlog automated testing btw. -- On another hand, I'm a bit perplexed here though c#14 was at ~15:00 PT with a proposal on how to fix this, and then in c#16 less than 2 hours later we were deploying it. For a change like this, especially a change to a production system we are upgrading in < a week, with a change that might invalidate the testing releng did, I would have expected at least me or hal approached, and/or CAB. My recommendation would have been to delay this until after the upgrade by ~ a week though, piling on too many changes at once makes tracking down issues all that much harder. In fact I would still advocate that we either test this in staging sometime after monday, or do it in production after June 10 and revert the other changes in this bug until then. The mere fact that it broke production tbpl makes me worried that some of my http: testing on staging server is now invalidated - even if I ignore the fact that pushlog tests are now broken by the pushlog change.
(In reply to Justin Wood (:Callek) from comment #22) > http://hg.mozilla.org/hgcustom/pushlog/file/d445a97d837d/pushlog-feed.py#l38 > -- manually adds the access-control-allow header > > Which does a req.headers.append() > http://hg.mozilla.org/hgcustom/pushlog/file/d445a97d837d/pushlog-feed. > py#l325 in the wsgi (and elsewhere in that file too) Yep, that's what comment 18 is referring to. I guess this hasn't been pushed upstream yet... bkero would know more. > Which requires that /pushlog and /json-pushes *both* have that header, and > is even in tests at > http://hg.mozilla.org/hgcustom/pushlog/file/d445a97d837d/runtests.py#l201 > > (In reply to Ben Kero [:bkero] from comment #18) > > This caused tbpl to break. Instead of reverting the apache configuration > > addition I've modified tbpl to remove ading these (since we're adding > > identical flags to the whole site now). Diff follows. > > Because of those tests you just broke pushlog automated testing btw. I don't quite understand this part. Could you fill me in on how and where this testing is taking place? > c#14 was at ~15:00 PT with a proposal on how to fix this, and then in c#16 > less than 2 hours later we were deploying it. For a change like this, > especially a change to a production system we are upgrading in < a week, > with a change that might invalidate the testing releng did, I would have > expected at least me or hal approached, and/or CAB. There is not currently an hg.mozilla.org staging environment in which we can test things like this (as far as I know)... let alone a tbpl.mozilla.org staging environment which uses it. This is something we definitely need to fix. I considered waiting for CAB, but decided not to because I felt this was very low risk, and very straightforward. We've added this header many times for many apps without issue, and you can see from comment 21 that this wasn't normal WSGI behavior. For the fix (changing pushlog not to send the headers), this felt like a better solution than reverting simply because it enabled us to keep this bug resolved. Had we gone through CAB, there's no reason to expect that we wouldn't have had the service interruption anyway. It's a tough call (whether or not to send a particular change through CAB), because there are *lots* of changes that potentially break all sorts of things, all the time- if we sent every single Apache change through the CAB process, both CAB and general productivity would grind to a halt. In my estimation this one was below the threshold. Easy to do, easy to revert, zero expected downtime, no expected side-effects. (Having said all that, I regret not doing so in this case, because at least then I could have said "well, we went through CAB!" and I wouldn't have to explain myself.) > My recommendation would have been to delay this until after the upgrade by ~ > a week though, piling on too many changes at once makes tracking down issues > all that much harder. In fact I would still advocate that we either test > this in staging sometime after monday, or do it in production after June 10 > and revert the other changes in this bug until then. There is no "hg.allizom.org" staging site to test this on. That's why it was straight-to-prod. I'm not sure what "staging" you're referring to, is there something I'm missing? We can revert both changes tomorrow, if there's some consensus that it's better to have the tests passing than to have this bug resolved. Or maybe the tests could be altered more easily?
the staging I meant was hgssh.stage.dmz.moxilla.com being used for releng testing of June 1 upgrade. I do admit i would likely have missed this issue too but at least releng and sheriff's could have known and been around for change. and knew who to reach for in the case of issues. I more the automated testing is not continuous integration. meaning the tests are manually run by humans atm. we also need changes to be in repo and not applied as hoc I'm place. I still advocate for backout. but will have you speak with hwine once he is up for further thoughts
(In reply to Jake Maul [:jakem] from comment #23) > There is not currently an hg.mozilla.org staging environment in which we can > test things like this (as far as I know)... let alone a tbpl.mozilla.org > staging environment which uses it. This is something we definitely need to > fix. It's mainly only the TBPL UI that uses the pushlog, which can be tested from the local filesystem with a patch to tweak the pushlog source URL (ie no staging TBPL environment required, though we do have tbpl-dev.allizom.o but that needs to normally reference production pushlog).
Thanks for the name... found it in DNS (it's actually hgssh.stage.dmz.scl3.mozilla.com, for anyone reading along). We ultimately were made aware of the problem by philor. You're right though, we should have at least notified them before doing it so we could have had a faster rollback. That was an oversight on my part. I just did a quick curl test against that staging server. It has mercurial-2.5.4 (in prep for the upgrade) so it's not a great test environment for this, but other than that it's close. It has the new Apache setting, along with the original code. It does exhibit the doubled-CORS header problem. I'll wait for Hal to chime in regarding whether we should back out these 2 changes or leave them (or something else).
(In reply to Jake Maul [:jakem] from comment #26) > Thanks for the name... found it in DNS (it's actually > hgssh.stage.dmz.scl3.mozilla.com, for anyone reading along). > > We ultimately were made aware of the problem by philor. You're right though, > we should have at least notified them before doing it so we could have had a > faster rollback. That was an oversight on my part. > > I just did a quick curl test against that staging server. It has > mercurial-2.5.4 (in prep for the upgrade) so it's not a great test > environment for this, but other than that it's close. It has the new Apache > setting, along with the original code. It does exhibit the doubled-CORS > header problem. Great - so we have a non-production setup where the issues can be worked out. And, as of next week, it will likely match the production version of hg. > I'll wait for Hal to chime in regarding whether we should back out these 2 > changes or leave them (or something else). Let's back these out, so we have a known base with lots of flight time on it going into the hg upgrade. My spidey sense starts tingling with "[apache] should have worked". For an enhancement request with no immediate use case, this just seems too risky. I see the following steps for the rollback: - coordinate with sheriffs in #releng on timing - undo yesterday's changes While we're there, it seems prudent to check if there are any other uncommitted changes on the server. If so, those should be listed in the hg upgrade bug 741353 (or new bugs blocking that). We can sort those out prior to this weekend.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I've reverted all changes so pushlog-feed.py is adding HTTP headers and apache is no longer setting any headers.
Now that bug 741353 is resolved, could we take another look at this? :-)
Do we still have the hgssh.stage.dmz.scl3 node we can test this on?
Assignee: nmaul → server-ops-webops
Severity: trivial → normal
Component: Server Operations: Web Operations → WebOps: Source Control
Product: mozilla.org → Infrastructure & Operations
I believe this was set up on the stage host to begin with, and is still active. The problem (if I recall correctly) is that the webapp itself (tbpl) is setting this ACL header, and multiple duplicate headers break things. Likewise if I disable the inclusion of it, releng's tests for the headerfail. So really what needs to be fixed here is that test. Past that it's a two line fix.
(In reply to Ben Kero [:bkero] from comment #31) > I believe this was set up on the stage host to begin with, and is still > active. The problem (if I recall correctly) is that the webapp itself (tbpl) > is setting this ACL header, and multiple duplicate headers break things. The webapp adding the additional header isn't tbpl, it's the hg pushlog extension.
(In reply to Justin Wood (:Callek) from comment #22) > (In reply to Ben Kero [:bkero] from comment #18) > > This caused tbpl to break. Instead of reverting the apache configuration > > addition I've modified tbpl to remove ading these (since we're adding > > identical flags to the whole site now). Diff follows. > > Because of those tests you just broke pushlog automated testing btw. Let's just remove those tests. If we're setting the headers in the web server then the tests for the web app are pointless.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #33) > (In reply to Justin Wood (:Callek) from comment #22) > > (In reply to Ben Kero [:bkero] from comment #18) > > > This caused tbpl to break. Instead of reverting the apache configuration > > > addition I've modified tbpl to remove ading these (since we're adding > > > identical flags to the whole site now). Diff follows. > > > > Because of those tests you just broke pushlog automated testing btw. > > Let's just remove those tests. If we're setting the headers in the web > server then the tests for the web app are pointless. Because our goal is for "us" to use them I won't argue too much. I would say it makes it harder for others to reuse this hook easily. Since we'll then be removing a feature from the hook in favor of frontloading it onto the web nodes. (I think there are a few consumers of our pushlog code out there.)
Anyone else using this code can implement the same fix. We don't have a thriving community of people using this, I don't think it's a viable concern.
Here, I made those changes: http://hg.mozilla.org/hgcustom/pushlog/rev/0ffe19e2e343 The webapp should no longer send the headers, and I removed the tests that were checking for them. The remaining tests pass. bkero: can you update the staging server with this version so we can sanity check it?
I've copied the pushlog-feed.py file to the staging server. The runtests.py server won't work because http://hg.mozilla.org/ is hardcoded in it (and we're not mirroring hgcustom repo on the staging server). As I read it, this runtests.py script can run anywhere though. Could you sanity check against the server? It should be available at http://hgssh.stage.dmz.scl3.mozilla.com with a HTTP header of 'Host: hg.mozilla.org'.
runtests.py doesn't actually run tests against hg.mo. It does pull down a copy of the hg_templates repo (necessary for testing some of the functionality), but it can use a local clone as well. I ran the tests locally before checking this in and they passed, so that part is fine. I can't seem to hit that testing server, after putting the following in my /etc/hosts: 199.101.28.20 hg.mozilla.org $ curl http://hg.mozilla.org/mozilla-central/file/tip/README.txt <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>403 Forbidden</title> </head><body> <h1>Forbidden</h1> <p>You don't have permission to access /mozilla-central/file/tip/README.txt on this server.</p> <hr> <address>Apache Server at hg.mozilla.org Port 80</address> </body></html>
I realized we never actually fixed this. bkero: can you see if that staging server is configured properly?
Flags: needinfo?(bkero)
Assignee: server-ops-webops → nmaul
Plan of attack: 1) We'll add a Zeus Trafficscript rule that will check if the CORS header exists, and if not, set one. WHO: IT 2) The QA test can stop checking for it, because it will always be set on all requests. WHO: QA/RelEng/Eng (not entirely sure who) 3) The hg-pushlog extension can stop sending it, because the infra will be sending it. WHO: QA/RelEng/Eng (not entirely sure who) 4) We'll have Apache start sending this header, where it belongs. The TS from #1 will simply start being a no-op. WHO: IT. 5) We'll disable the TS from #1 now that Apache is always sending it. WHO: IT.
(In reply to Jake Maul [:jakem] from comment #40) > 2) The QA test can stop checking for it, because it will always be set on > all requests. WHO: QA/RelEng/Eng (not entirely sure who) > > 3) The hg-pushlog extension can stop sending it, because the infra will be > sending it. WHO: QA/RelEng/Eng (not entirely sure who) I made these changes last year, in comment 36: http://hg.mozilla.org/hgcustom/pushlog/rev/0ffe19e2e343 That version is not deployed in production.
I have deployed the TrafficScript in comment 40, and verified that it works as expected. That is, it has no significant effect on hits to /repo/pushlog or /repo/json-pushes, and adds the proper header to all other requests. As to comment 41, that rev looks great to me... very simple, r+ from me. Next question: who can deploy this? Follow-up: what might we have to do to deploy it? Specifically, do we have to wait for a tree-closing-window after the next release, or can we just do it anytime after business hours? Does anyone need notified? My guess is this amounts to nothing more than an Apache restart.
clearing my needinfo on this, needinfoing :hwine as he usually dictates whether or not something needs a tree closing iwndow
Flags: needinfo?(bkero) → needinfo?(hwine)
Jake - I'm assuming rollback-if-surprised is also a quick config revert and restart. I'm also assuming you meant a graceful restart. :) Given that, just really need to coordinate with sheriff and buildduty in #releng. Preferably at a time when a fair number of folks are around to detect and resolve any issue. Propose a time in email to sheriffs@ & release@ at least 24 hours in advance then make it happen. We're in a "good part" of the release window now (week or so before a TCW).
Flags: needinfo?(hwine) → needinfo?(nmaul)
Blocks: 1072923
Please can we give this a go soon? :-) In the not so distant future, we'll be switching off TBPL (bug 1054977), however to do this we need to move mcMerge off the TBPL machine to somewhere else (bug 1050477). It would greatly simplify this move, if we can remove the need for the only server side component to mcMerge (https://hg.mozilla.org/webtools/tbpl/file/default/mcmerge/php/getFlags.php), which is a workaround for the lack of public access-control headers.
Ownership of source control systems has changed hands a bit, and no longer falls under me. :bkero or :fubar are now better candidates for this. :)
Assignee: nmaul → server-ops-webops
Flags: needinfo?(nmaul) → needinfo?(bkero)
Component: WebOps: Source Control → Mercurial: hg.mozilla.org
Product: Infrastructure & Operations → Developer Services
This can be performed by pulling individual (or sets) of webheads from the load balancer, then copying the new pushlog-feed script over, applying this diff (with a puppet run). If pushlog nor any other page are manually setting these headers, we shouldn't have a problem with duplicate headers anymore. Index: httpd-hg.mozilla.org.conf =================================================================== --- httpd-hg.mozilla.org.conf (revision 94600) +++ httpd-hg.mozilla.org.conf (working copy) @@ -14,6 +14,8 @@ SetEnv HGENCODING UTF-8 SetEnv LC_TYPE UTF-8 + Header set Access-Control-Allow-Origin "*" + WSGIDaemonProcess hg_mozilla_org processes=24 threads=1 maximum-requests=100 deadlock-timeout=30 user=hg group=hg display-name=hg_mozilla_org WSGIProcessGroup hg_mozilla_org
Flags: needinfo?(bkero)
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/270]
This appears to have already been deployed: $ curl -i -X OPTIONS https://hg.mozilla.org/mozilla-central/raw-file/tip/config/milestone.txt HTTP/1.1 200 Script output follows Server: Apache/2.2.15 (Red Hat) Vary: Accept-Encoding Content-Type: text/plain; charset="UTF-8" Date: Tue, 21 Oct 2014 17:14:37 GMT Access-Control-Allow-Origin: * ...
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/270]
You need to log in before you can comment on or make changes to this bug.