Add a clearly unofficial, unsupported webkit repo to searchfox dubbed "wubkat"
Categories
(Webtools :: Searchfox, enhancement)
Tracking
(Not tracked)
People
(Reporter: asuth, Assigned: asuth)
References
Details
Attachments
(3 files)
My team recently has been running into the lack of webkit having something comparable to searchfox or chromium's https://source.chromium.org/chromium. Igalia has generously run https://webkit-search.igalia.com/ for some time but it unfortunately seems to be under-resourced and is frequently not operational during Eastern Time work hours. (edit: I erroneously thought there was limited semantic analysis run based on previous investigations and that there was not full blame because of the config scripts, but now that I've checked the server when operational, I see the blame bar is fully operational and semantic analysis for the branch in use seems to be running on everything, which is very exciting!)
To this end I'm planning to add a searchfox indexing job for webkit. I expect there could potentially be a lot of interest in something like this as I understand webkit to be a frequently-embedded web runtime. And I think it would be fantastic if us standing up an index of webkit provides positive externalities that can benefit the open source community at large.
That said, I'm very concerned about the potential for confusion about how supported such a webkit tree would be and the resulting burden on searchfox contributors. Although searchfox is not under-resourced in terms of AWS machine-time (although we try very hard to be responsible with the AWS resources we use, including a major series of indexing optimizations I just landed that cut many indexing jobs time in half), we operate largely on a volunteer-basis and the justification for spending any work-time on searchfox is largely about mozilla-central.
Currently, the only trees we index semantically for C++ are mozilla-central variants and nss (which is something that also gets built as part of mozilla-central, and is extremely stable). The mozilla-central jobs all run their C++ language-specific analyses as part of the mozilla-central CI and although the jobs are tier 2, sheriffs and developers actively help keep these jobs green (ex: bug 1768996 where :glandium provided a fix to searchfox's indexer) as the mozsearch indexer is part of the tier-1 in-tree builds even if its execution is tier 2. For webkit, we will be running the C++ analysis on one of our indexers which has a significantly greater chance of breakage for many reasons; for example, the build script will need to install a bunch of dependencies at runtime which will increase the chance of random failures.
My plan for making it clear that the webkit repo indexing is not a tier-1 or supported repo is to dub it "wubkat". My hope is that this will help set expectations appropriately while also maybe giving people a little chuckle.
While there are obviously other options like adding banners to the generated pages:
- This would be new development work.
- The primary goal here is to provide an identical searchfox experience for gecko developers trying to see what other browsers are doing, and adding a bunch of jarring annoyances is not helpful for that. Also, it's more likely one-off searchfox users would quickly pre-attentively filter out any such nag UI, whereas regular searchfox users would find the added nag UI very jarring as it would deviate from their visual muscle memory for searchfox.
If we find that adding the webkit repo to searchfox gains additional contributors who help pick up some of the maintenance load and/or can help us decrease the maintenance burden by having webkit's CI generate searchfox analysis upstream, then we can definitely consider renaming the repo "webkit". Alternately, if there's upstream interest, maybe "webkit" could run its own officially support mozsearch instance (like a better-resourced version of https://webkit-search.igalia.com/) and we could just redirect wubkat/webkit to that instance and stop running one on searchfox.
For prior art, note that emilio did an initial attempt at webkit support some years ago at https://github.com/emilio/webkit-index-config and I believe the Igalia maintainer built on this with https://github.com/dpino/webkit-index-config which I believe is what powers https://webkit-search.igalia.com/. My thanks to both for providing this groundwork, as the webkit docs definitely don't seem to treat linux as a particularly supported platform and so it was nice to have the extra context that makes it clear that the GTK port is what should be built.
Assignee | ||
Comment 1•2 years ago
|
||
The indexing job has successfully reached the point where it's indexing, hopefully it completes! Of course, https://webkit-search.igalia.com/ is now working for me quite nicely; I presume its re-indexing phase overlaps happens during my daytime work hours when I'd checked a few times last week and it's done by my current-moment late-night hacking hours. Or maybe it was just having an off couple of days when I checked and it no longer goes offline when re-indexing? I certainly know how easy it is for these things to fall over and for it to take some time to get things fixed! (Thankfully the searchfox.org separate indexer and web-server life-cycles can help paper over this by just having a stale web-server continue to run until superseded by a successful run.)
The webkit-search.igalia.com server does seem to be fully semantic indexing now (versus the last time I'd checked some time ago when it was only semantically indexing JSC?) but it is missing "structured" analysis records which means the mozsearch branch in use is older than the landing of bug 1641372 in late August 2021. So it probably makes to move forward with the wubkat job for now in the interest of having a fully up-to-date mozsearch instance with the ongoing searchfox-tool/pipeline-server development, but maybe once that is stabilized and if the Igalia server has continuous uptime we can retire the wubkat indexing job again.
Assignee | ||
Comment 2•2 years ago
|
||
Assignee | ||
Comment 3•2 years ago
|
||
Assignee | ||
Comment 4•2 years ago
|
||
Assignee | ||
Comment 5•2 years ago
|
||
This is now landed and exists as config5 on release5. The following additional steps can happen as follow-ups as/if they make sense:
- List the repo on the root searchfox.org HTML listing (in https://github.com/mozsearch/mozsearch-mozilla/blob/master/help.html).
- Add the cron hookup via lambda / cloudwatch. For now this is just going to be manually re-triggered like config3
I'd like to better understand the webkit-search availability before cranking the indexing interval up or making this more visible on the listing page.
Description
•