Open Bug 1733217 Opened 3 years ago Updated 2 years ago

Provide a way to locate member fields by type, including those used inside smart pointers like nsICOMPtr, RefPtr, SafeRefPtr

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: asuth, Unassigned)

References

Details

:smaug has a use case that currently involves doing (what ends up being) a text search on nsCOMPtr<nsILoadInfo> m. This hinges on the naming convention of prefixing member fields with m.

We briefly discussed this in channel and tentatively it seems like the UX flow people would expect would be to first start with locating the type nsILoadInfo and then refining from there.

Data

Data we currently have

Currently, for this matching definition line:

nsCOMPtr<nsILoadInfo> mLoadInfo;

The analysis data we have is:

Target records:

{"loc":"00046:11-22","target":1,"kind":"use","pretty":"nsILoadInfo","sym":"T_nsILoadInfo","context":"nsIconChannel","contextsym":"T_nsIconChannel"}
{"loc":"00046:2-10","target":1,"kind":"use","pretty":"nsCOMPtr","sym":"T_nsCOMPtr","context":"nsIconChannel","contextsym":"T_nsIconChannel"}
{"loc":"00046:24-33","target":1,"kind":"def","pretty":"nsIconChannel::mLoadInfo","sym":"F_<T_nsIconChannel>_mLoadInfo","context":"nsIconChannel","contextsym":"T_nsIconChannel","peekRange":"46-46"}

Source records:

{"loc":"00046:2-10","source":1,"syntax":"type,use","pretty":"type nsCOMPtr","sym":"T_nsCOMPtr"}
{"loc":"00046:11-22","source":1,"syntax":"type,use","pretty":"type nsILoadInfo","sym":"T_nsILoadInfo","type":"class nsILoadInfo","typesym":"T_nsILoadInfo"}
{"loc":"00046:24-33","source":1,"syntax":"def,field","pretty":"field nsIconChannel::mLoadInfo","sym":"F_<T_nsIconChannel>_mLoadInfo","type":"nsCOMPtr<class nsILoadInfo>","typesym":"T_nsCOMPtr"}

Structured record for the type directly:

{"loc":"00046:24-33","structured":1,"pretty":"nsIconChannel::mLoadInfo","sym":"F_<T_nsIconChannel>_mLoadInfo","kind":"field","parentsym":"T_nsIconChannel"}

Structured sub-record for the containing type in the "fields" array:

{"offsetBytes":72,"pretty":"nsIconChannel::mLoadInfo","sizeBytes":8,"sym":"F_<T_nsIconChannel>_mLoadInfo","type":"nsCOMPtr<class nsILoadInfo>","typesym":"T_nsCOMPtr"}

Implementation Options

Option: Introduce use sub-kinds for target records and support faceting on them?

The sole target record we have for the line of interest in the example is:

{"loc":"00046:11-22","target":1,"kind":"use","pretty":"nsILoadInfo","sym":"T_nsILoadInfo","context":"nsIconChannel","contextsym":"T_nsIconChannel"}

I'm reasonably confident we could annotate that use as happening inside a field definition. We could also indicate when it's used in a local definition. Maybe we could even do a first attempt at bug 1419018's wish of attempting to indicate LHS/RHS usages.

Option: Structured analysis type refinement via use in templates

This is somewhat orthogonal to the above use-classification (which is probably what we should do), but it could make sense for the structured analysis data to have relationships that indicate:

  • The type nsILoadInfo is used in an nsCOMPtr template expansion which has type T_whatever and you can click to see the uses of this specific type.
    • This specific type would then again be filterable by the previous options ability to distinguish field uses from local uses and such.
    • It would also depend on us actually exposing the template's use on that line.
      • A meta-question here is whether the existing use should be transformed to only be the nsCOMPtr<T=nsILoadInfo> use or if we would emit both. My current thoughts are:
        • only emitting the template usage would be over-normalization and it's better for us to probably do what we're currently doing which could be thought of as de-normalized, and just make sure we can easily filter out classes of results that people don't want
        • any rationale around this should probably be aware of our smart pointer classes being implementation details that are usually not interesting on their own.
        • performance data at the granularity of text search/identifier lookup/symbol lookup/json-eval should be provided in the search results (or in a parallel resulting data structure accessible from the web) and drive these decisions, as I don't think these costs are at all intuitive. Also, in my attempts to address the changes made to how we handle overrides it's clear we potentially need to do a bunch of related look-ups and those costs definitely need to be known as we move forward with addressing those.

WIP Progress

During lunch I tried to hack up a quick prototype. This branch resulting in this m-c try run (I haven't stood up a searchfox indexing of the run) nets us "usage":"field" inside:

{"loc":"00046:11-22","target":1,"kind":"use","pretty":"nsILoadInfo","sym":"T_nsILoadInfo","usage":"field","context":"nsIconChannel","contextsym":"T_nsIconChannel"}

The changes also provide "lhs"/"rhs" annotations too for expressions but I don't have a good scenario to look into for those.

:kats, with no rush, any thoughts on the whole use-case here and in particular maybe better ways to propagate the info in the indexer than my abuse of the new flags mechanism? (edit: also, maybe better names than "usage"? "syntax" theoretically has some of the soup-like qualities I was thinking of for this case... I don't know if there should be a unification with that possibly.)

Flags: needinfo?(kats)

Honestly I don't have much in the way of useful opinions here. Given the part of the problem statement that's "provide a way to filter for member fields" the solution you have seems to be quite reasonable. In general providing extra information in the analysis record seems good.

One thought was that instead of having a separate "usage":"field" we could consider extending the "kind":"use" to "kind":"use,field" or more generally, a list of extra information. I don't have any good arguments for this other than saying a token is a "field" seems like it's the "kind" of token it is, and we might want to allow an arbitrary number of "kind"s for a token. But I don't recall what the consumer side of the analysis data looks like and if that would make things more messy.

Flags: needinfo?(kats)

Thanks for the feedback!

Re: kind as list of tokens: Yeah, that was also an option I was considering, but for the prototype, changing the singular field to a list would have been more work in terms of plumbing without regressing things if I had made it to the next steps of plumbing. Source record "syntax" fields are basically like that, but they're just a grab-bag that's propagated to the DOM for styling purposes AIUI. We also (alphabetically) sort and dedup them in read_source and when merging.

The target record kind currently defines where the record will be placed on the resulting crossref object for the given symbol, and that in turn determines which group of def/decl/use/etc, the result will show up in as search results. So for that purpose it needs to be mutually exclusive.

Ordered Lists

We could use a list where the order is semantically important; for example, I seem to have tried to do that in the merge step for source records, but that feels awkward.

Hierarchy combined with Label/Tag Grab Bag

Another option is to model the kinds hierarchically in a self-descriptive way so we'd have "use::field-def"/"use::type::field", "use::local-def"/"use::type::local", "use::func-signature", "use::expr-lval"/"use::expr::lval", and so on. The advantage of the self-descriptive hierarchy is that this would let us partition the data without having to have a complete mapping that tells us that "expr-lval" is a label on "use". The mutually exclusive aspect of the current "kind" mechanism could even be handled by throwing "kind::" on the front either on its own so we have ["kind::use", "use::expr::lval"] or in the big hierarchy with ["kind::use::expr::lval"].

Meta Q: Related Use-Cases

I guess the real problem is I don't know the end set of use-cases we might have for this functionality beyond the current limited problem statement. Right now, all of the additional labels I've added to the clang indexer are mutually exclusive.

Summary / Next Steps

I think this probably needs more thought, and from an implementation perspective it's likely that efforts like semantic linkage in bug 1727789 and continued refining of partially-regressed use cases relating to searching with overrides in bug 1729164 may provide more clarity on the appropriate implementation decisions. Since this bug's use-case is primarily just a question of filtering existing results, it's much simpler than the modeling/performance problems of those bugs.

It's probably also useful to think of the variations for other languages, specifically JS and rust.

If anyone has any additional use-cases in this space of sub-categorization of uses and related filtering, I'd definitely be interested in hearing them.

You need to log in before you can comment on or make changes to this bug.