Closed
Bug 906882
Opened 11 years ago
Closed 6 years ago
XML sitemap missing from www.mozilla.org
Categories
(www.mozilla.org :: Pages & Content, enhancement, P2)
www.mozilla.org
Pages & Content
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1369738
People
(Reporter: cmore, Unassigned)
References
()
Details
(Whiteboard: [kb=1128714] )
Attachments
(5 files, 2 obsolete files)
Based on recommendations of our SEO audit and other research, we should create an XML version of a sitemap that is easily crawalable by search engines include references to all languages.
Updated•11 years ago
|
Priority: -- → P2
Updated•11 years ago
|
Whiteboard: [kb=1086766]
Reporter | ||
Comment 1•11 years ago
|
||
:kohei: What are thoughts on making a dynamic sitemap.xml file for only the URLs/locales that are in bedrock and have it expand out over time as more pages and locales move into bedrock?
Comment 2•11 years ago
|
||
I think we can have a dynamic sitemap in some way... As more pages migrated to Bedrock, the sitemap will be more complete. I'll find out how :) I could find a useful script at http://djangosnippets.org/snippets/1434/
Assignee: nobody → kohei.yoshino
Status: NEW → ASSIGNED
Updated•11 years ago
|
Comment 3•11 years ago
|
||
The implementation was not difficult. My rough code is here, and an output is attached. https://github.com/kyoshino/bedrock/commit/d0b485075d343c1e650842395dc637b4ee662a13 Issues: * It takes about 30 seconds to respond. The translation list of each page is based on the template name, but there's no easy way to get the template name of each URL. I had to send an HTTP request to each page. * As you can see, the output is redundant. The file size will exceed a 50 MB limit in the future. https://support.google.com/webmasters/answer/183668?hl=en#1 Possible solution: * Including only /en-US/ URLs in the sitemap. Search engines can still recognize each page's alternate URLs that we already have implemented in Bug 481550.
Comment 4•11 years ago
|
||
Sent PR: https://github.com/mozilla/bedrock/pull/1217 To avoid the issues noted above, I only included English URLs. An output sitemap.xml file is attached.
Reporter | ||
Comment 5•11 years ago
|
||
(In reply to Kohei Yoshino [:kohei] from comment #3) > Created attachment 803081 [details] > Full sitemap.xml including l10n > > The implementation was not difficult. My rough code is here, and an output > is attached. > https://github.com/kyoshino/bedrock/commit/ > d0b485075d343c1e650842395dc637b4ee662a13 > > Issues: > > * It takes about 30 seconds to respond. The translation list of each page is > based on the template name, but there's no easy way to get the template name > of each URL. I had to send an HTTP request to each page. > > * As you can see, the output is redundant. The file size will exceed a 50 MB > limit in the future. > https://support.google.com/webmasters/answer/183668?hl=en#1 > > Possible solution: > > * Including only /en-US/ URLs in the sitemap. Search engines can still > recognize each page's alternate URLs that we already have implemented in Bug > 481550. Do you have a link or can you attach an example complete sitemap.xml file that would include all locales? over 50MB? wow. en-US only sitemap.xml doesn't help SEO much at all. jgmize had an idea: Use sitemap pagination and use the Django pagination feature. jgmize and kohei, can you two sync up?
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(kohei.yoshino)
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(jmize)
Comment 6•11 years ago
|
||
(In reply to Chris More [:cmore] from comment #5) > Do you have a link or can you attach an example complete sitemap.xml file > that would include all locales? over 50MB? wow. The attachment 803081 [details] in my Comment 3 is a complete sitemap. Though it's still 1.63 MB, more and more pages are migrated to and translated on Bedrock... > en-US only sitemap.xml doesn't help SEO much at all. Canonical URLs on each page might help, but of course, a complete sitemap would be helpful. > jgmize had an idea: Use sitemap pagination and use the Django pagination > feature. I'll check it out this afternoon!
Flags: needinfo?(kohei.yoshino)
Comment 7•11 years ago
|
||
I just regenerated a complete sitemap. Now it's 3.1 MB with 859 URLs. Will try to * Use a cron to retrieve URLs including localized pages * Split the complete URL list by locales or specific number of URLs, by using a Sitemap index file https://support.google.com/webmasters/answer/71453
Comment 8•11 years ago
|
||
Sent PR: https://github.com/mozilla/bedrock/pull/1333
Comment 9•11 years ago
|
||
Updated•11 years ago
|
Attachment #821100 -
Attachment description: pull reques → Pull Request on GitHub
Updated•11 years ago
|
Attachment #803297 -
Attachment is obsolete: true
Comment 10•11 years ago
|
||
Attachment #803081 -
Attachment is obsolete: true
Comment 11•11 years ago
|
||
Updated•11 years ago
|
Flags: needinfo?(jmize)
Whiteboard: [kb=1086766] → [kb=1128714]
Updated•11 years ago
|
Severity: normal → enhancement
Comment 12•11 years ago
|
||
Updated•11 years ago
|
Summary: Create a dynamic XML sitemap of top-level URLs in [Bedrock] → Create a dynamic XML sitemap of all indexable URLs in [Bedrock]
Reporter | ||
Comment 13•11 years ago
|
||
Any update on the sitemap bug?
Comment 14•11 years ago
|
||
:cmore I replied on the github PR here: https://github.com/mozilla/bedrock/pull/1333#issuecomment-33157596
Reporter | ||
Comment 15•10 years ago
|
||
All: Given everything else we are working on now, let's put this on hold until later in Q2. I still think it will help, but we have bigger priorities now.
Updated•7 years ago
|
Status: ASSIGNED → NEW
Reporter | ||
Comment 16•7 years ago
|
||
Here's a good example of a XML sitemap of a website that has a lot of sub-sites with their own sub-navigation: https://www.apple.com/sitemap.xml
Comment 17•7 years ago
|
||
Now is a great time for us to resurrect this effort. It's high on the list of marketing priorities[0]. An optimal approach would * generate this sitemap from a more authoritative source than http crawls (e.g. from bedrock itself) * give us an opportunity to choose the priority of certain elements in the sitemap (e.g. firefox marketing pages) in an effort to shape search results. [0] https://docs.google.com/spreadsheets/d/1fizrZ92kNr6sJSMizxl343OF7F-BCJHEUG5TfWj1Gs8/edit#gid=466760365
Reporter | ||
Comment 18•7 years ago
|
||
Here's some related Django XML sitemap documentation that could be helpful here: https://docs.djangoproject.com/en/dev/ref/contrib/sitemaps/ http://bookofstranger.com/implementing-sitemaps-in-django-for-dynamic-and-static-urls/
Reporter | ||
Comment 19•7 years ago
|
||
One more thing here, we also need to make sure the sitemap.xml is linked from the robots.txt like: https://www.mozilla.org/robots.txt i.e. "Sitemap: https://www.mozilla.org/sitemap.xml" Please note that the sitemap URL in robots.txt should be the full absolute URL and not relative like the rest of the URLs in the file. See examples at https://www.apple.com/robots.txt (bottom) and https://www.google.com/robots.txt (bottom)
Comment 20•7 years ago
|
||
Doh, I totally missed the Django sitemap framework the last time I baked my pull request ;) I'm happy to work on this again but my question now is: will the sitemap include all pages on Bedrock or only major pages? The purpose of Bug 1369738 is the latter, I guess...
Reporter | ||
Comment 21•7 years ago
|
||
(In reply to Kohei Yoshino [:kohei] from comment #20) > Doh, I totally missed the Django sitemap framework the last time I baked my > pull request ;) I'm happy to work on this again but my question now is: will > the sitemap include all pages on Bedrock or only major pages? The purpose of > Bug 1369738 is the latter, I guess... Peter German has worked on a spreadsheet to capture all of the URLs to be included in the v1.0 of this sitemap: https://docs.google.com/spreadsheets/d/1Sq-o-R9XjO9VPKaOL-aprOIiWNgvFttZ8XHeSquAWH4/edit#gid=1400086798 Peter: what is the difference between this bug and bug 1369738? If there is no difference, we should keep this bug for historical context and if the bugs are different, but related they should be linked together with specific title differences.
Flags: needinfo?(pgerman)
Comment 22•7 years ago
|
||
I was asked to create a new bug for this. I'll reference this for context.
Flags: needinfo?(pgerman)
Comment 23•7 years ago
|
||
new bug 1369738
Reporter | ||
Updated•7 years ago
|
Summary: Create a dynamic XML sitemap of all indexable URLs in [Bedrock] → XML sitemap missing from www.mozilla.org
Comment 24•6 years ago
|
||
I think Bug 1369738 has covered this.
You need to log in
before you can comment on or make changes to this bug.
Description
•