Closed
Bug 1276190
Opened 9 years ago
Closed 8 years ago
Add script to generate headers with scalar data from Scalars.yaml
Categories
(Toolkit :: Telemetry, defect, P1)
Toolkit
Telemetry
Tracking
()
RESOLVED
FIXED
mozilla50
People
(Reporter: gfritzsche, Assigned: Dexter)
References
(Blocks 1 open bug)
Details
(Whiteboard: [measurement:client])
Attachments
(4 files, 18 obsolete files)
We want to add a python script, similar to histogram_tools et al, that generates the headers with scalar probe information.
We need:
(1) a header with enums
(2) a header with an array of scalar data (type, data-set, expiry, ...)
(3) a header with a string table for scalar names (for enum <-> name lookups, could be in (2))
Lets start off Scalars.yaml for this with test probes for the initially supported types etc.
Reporter | ||
Updated•9 years ago
|
Priority: -- → P3
Whiteboard: [measurement:client]
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → alessio.placitelli
Priority: P3 → P1
Assignee | ||
Comment 1•9 years ago
|
||
This patch:
- Adds the Scalars.yaml that defines the test scalars
- Adds a script that generates the Scalar C++ enums
- Adds a script that generates the Scalar C++ structs (incomplete)
Assignee | ||
Comment 2•8 years ago
|
||
Attachment #8758347 -
Attachment is obsolete: true
Attachment #8758766 -
Flags: feedback?(gfritzsche)
Assignee | ||
Comment 3•8 years ago
|
||
Assignee | ||
Comment 4•8 years ago
|
||
Reporter | ||
Comment 5•8 years ago
|
||
Comment on attachment 8758766 [details] [diff] [review]
bug1276190.patch
Review of attachment 8758766 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/telemetry/Scalars.yaml
@@ +1,2 @@
> +# Scalar Prototype
> +test_scalar_prototype:
We want to use a fixed two-level structure here:
> section.name:
> probe_name:
> ...
There should be no top-level probes.
@@ +120,5 @@
> + kind: int
> + notification_emails:
> + - fake@email.address
> +
> +test.flat:
Let's lead with a good example and group our test probes into the "telemetry.test" section.
::: toolkit/components/telemetry/gen-scalar-data.py
@@ +47,5 @@
> + print("const ScalarInfo gScalars[] = {", file=output)
> + for s in scalars:
> + # We add both the scalar name and the expiration string to the strings
> + # table.
> + name_index = string_table.stringIndex(s.name)
If we would want to match the enum-name here for searchability, we'd need to use upper-case + '_' for both.
On the other hand, this would loose us the '.'-separated names in the serializations, which means we couldn't easily display the data hierarchically without looking at Scalars.yaml.
Hm.
::: toolkit/components/telemetry/gen_utils.py
@@ +1,1 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public
Can we name this better?
@@ +6,5 @@
> +# scripts.
> +
> +from __future__ import print_function
> +
> +class StringTable:
This needs commenting.
::: toolkit/components/telemetry/moz.build
@@ +59,5 @@
>
> GENERATED_FILES = [
> 'TelemetryHistogramData.inc',
> 'TelemetryHistogramEnums.h',
> + 'TelemetryScalarData.inc',
This is a confusing approach, let's do this in a proper C++ style now:
* define the scalar info structure in a separate header
* let the script generate TelemetryScalarData.h which includes that header
* let TelemetryScalar.cpp properly include TelemetryScalarData.h at the top
::: toolkit/components/telemetry/scalar_tools.py
@@ +1,1 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public
Lets name this a bit more descriptively: parse_scalars.py, load_scalars.py, ...?
@@ +5,5 @@
> +import re
> +import yaml
> +
> +# The required and optional fields in a scalar type definition.
> +REQUIRED_FIELDS = {
With the fixed two-level structure can move this and the following definitions into the validation function.
@@ +41,5 @@
> + def __init__(self, name, definition):
> + # Set the name, so we don't need to pass it to the validation functions.
> + self._name = name
> +
> + # Validating the scalar definition.
Let's also validate the group and probe name:
* group name should be alpha-numeric + '.', no leading digit
* probe name should be alpha-numeric + '_', no leading digit
@@ +87,5 @@
> + if not isinstance(definition[f], REQUIRED_FIELDS[f])]
> + if len(wrong_field_types) > 0:
> + raise TypeError(self._name + ' - wrong fields types: ' + ', '.join(wrong_field_types))
> +
> + # If we have any optional field, make sure it has the correct type.
We don't need to type-check optional fields separately, lets do one common type check after checking for required + optional properties.
@@ +236,5 @@
> + # we assume this is a group and the scalar definitions are one level below.
> + scalar_group = scalars[scalar_name]
> + for sub_name in scalar_group:
> + grouped_scalar_name = scalar_name + '.' + sub_name
> + yield ScalarType(grouped_scalar_name, scalar_group[sub_name])
With the fixed two-level structure, we can keep the group and probe name seperately.
We can join them when needed.
Attachment #8758766 -
Flags: feedback?(gfritzsche)
Assignee | ||
Comment 6•8 years ago
|
||
Attachment #8759082 -
Attachment is obsolete: true
Assignee | ||
Comment 7•8 years ago
|
||
Attachment #8759083 -
Attachment is obsolete: true
Assignee | ||
Comment 8•8 years ago
|
||
Attachment #8758766 -
Attachment is obsolete: true
Attachment #8759215 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 9•8 years ago
|
||
Given PEP0008 [1], we should probably rename the "gen-scalar*.py" to "gen_scalar_*.py".
> Modules should have short, all-lowercase names. Underscores can be used in the module name
> if it improves readability. Python packages should also have short, all-lowercase names,
> although the use of underscores is discouraged.
[1] - https://www.python.org/dev/peps/pep-0008/#package-and-module-names
Reporter | ||
Updated•8 years ago
|
Attachment #8759213 -
Attachment mime type: text/x-chdr → text/plain
Reporter | ||
Updated•8 years ago
|
Attachment #8759212 -
Attachment mime type: text/x-chdr → text/plain
Reporter | ||
Comment 10•8 years ago
|
||
Comment on attachment 8759215 [details] [diff] [review]
bug1276190.patch
Review of attachment 8759215 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/telemetry/Scalars.yaml
@@ +1,1 @@
> +telemetry.test:
We can add comments, so let's add one before this section to explain that it's about test probes (and that the probes in this section will not be submitted in pings outside of tests).
@@ +1,3 @@
> +telemetry.test:
> + # Scalar Prototype
> + test_scalar_prototype:
That becomes: TELEMETRY_TEST_TEST_...
I think one TEST is enough.
Let's not add a sample just for copy-n-paste here, there will be documentation soon and other entries to copy from.
@@ +14,5 @@
> + - fake@email.address
> + release_channel_collection: opt-in # Optional, "opt-in" (default) or "opt-out"
> +
> + # Test Scalar Types
> + test_int_kind:
We decided to start out with uint.
@@ +21,5 @@
> + description: >
> + This is a test int type with a really long description, maybe spanning even multiple
> + lines, to just prove a point: everything works just fine.
> + expires: never
> + keyed: false
This should be optional and default to false.
@@ +24,5 @@
> + expires: never
> + keyed: false
> + kind: int
> + notification_emails:
> + - fake@email.address
You can use our telemetry-client-dev address.
@@ +100,5 @@
> + notification_emails:
> + - fake@email.address
> + release_channel_collection: opt-out
> +
> + test_flat:
This seems redundant - left-over?
::: toolkit/components/telemetry/gen-scalar-data.py
@@ +10,5 @@
> +
> +import parse_scalars
> +import sys
> +
> +# The banner/text at the top of the
the... ?
@@ +11,5 @@
> +import parse_scalars
> +import sys
> +
> +# The banner/text at the top of the
> +banner = """/* This file is auto-generated, see gen-scalar-data.py. */
This should have a disclaimer that it's only for internal use in TelemetryScalar.h.
@@ +47,5 @@
> + print("const ScalarInfo gScalars[] = {", file=output)
> + for s in scalars:
> + # We add both the scalar name and the expiration string to the strings
> + # table.
> + name_index = string_table.stringIndex(s.name)
This seems to use only probe_name, not group.name.probe_name
@@ +53,5 @@
> + # Write the scalar info entry.
> + write_scalar_info(s, output, name_index, exp_index)
> + print("};", file=output)
> +
> + string_table_name = "gScalarsStringTable"
Ditto on naming.
::: toolkit/components/telemetry/gen-scalar-enum.py
@@ +24,5 @@
> + print("#ifndef mozilla_TelemetryScalarEnums_h", file=output);
> + print("#define mozilla_TelemetryScalarEnums_h", file=output);
> + print("namespace mozilla {", file=output)
> + print("namespace Telemetry {", file=output)
> + print("enum SCALAR_ID : uint32_t {", file=output)
All upper case is for constants; let's use `ScalarID`.
::: toolkit/components/telemetry/parse_scalars.py
@@ +10,5 @@
> +SCALAR_TYPES_MAP = {
> + 'bool': 'nsITelemetry::SCALAR_BOOLEAN',
> + 'flag': 'nsITelemetry::SCALAR_FLAG',
> + 'int': 'nsITelemetry::SCALAR_COUNT',
> + 'string': 'nsITelemetry::SCALAR_STRING'
We decided to start out with `uint` and `string`. Lets not allow anything else.
@@ +27,5 @@
> + # Validating the scalar definition.
> + self.validate_types(definition)
> + self.validate_values(definition)
> +
> + # Everything is ok, set the rest of the data.
The following are all just copying things from `definition` plus some default values.
Can't we just store `definition` and access it in the getters?
@@ +56,5 @@
> + chars_regxp = r'^[a-zA-Z0-9' + allowed_char_regexp + r']+$'
> + if not re.match(chars_regxp, name):
> + raise TypeError(name + ' should be alpha-numeric + \'{}\''.format(allowed_char_regexp))
> +
> + # Don't allow leading digits, '.' or '_'.
Also not trailing.
@@ +96,5 @@
> + }
> +
> + # Concatenate the required and optional field definitions.
> + ALL_FIELDS = \
> + {k: v for d in (REQUIRED_FIELDS, OPTIONAL_FIELDS) for k, v in d.iteritems()}
Given two dicts a and b, the clearer pattern here is:
c = a.copy()
c.update(b)
@@ +109,5 @@
> + if len(unknown_fields) > 0:
> + raise KeyError(self._name + ' - unknown fields: ' + ', '.join(unknown_fields))
> +
> + # Checks the type for all the required fields.
> + wrong_field_types = [f for f in definition.keys() if not isinstance(definition[f], ALL_FIELDS[f])]
Nit: this is confusingly named, it sounds like it contains the type names.
@@ +111,5 @@
> +
> + # Checks the type for all the required fields.
> + wrong_field_types = [f for f in definition.keys() if not isinstance(definition[f], ALL_FIELDS[f])]
> + if len(wrong_field_types) > 0:
> + raise TypeError(self._name + ' - wrong fields types: ' + ', '.join(wrong_field_types))
histogram_tools.py tells us what the expected type is - let's do that here too.
@@ +118,5 @@
> + list_fields = [f for f in definition if isinstance(definition[f], list)]
> + for field in list_fields:
> + broken_types = [v for v in definition[field] if not isinstance(v, LIST_FIELDS_CONTENT[field])]
> + if len(broken_types) > 0:
> + raise TypeError(self._name + ' - wrong fields types: ' + ', '.join(broken_types))
Can we drop a hint on the types?
@@ +213,5 @@
> + """ Validate the expiration version and account for 'a1'. This function may change
> + the 'expires' field in the definition.
> +
> + :param definition: the dictionary containing the scalar properties.
> + :todo: this was taken from histogram_tools.py. Maybe we should share the code?
Yes - make things pass the actual expiration value and make it shared - lets just expand shared_header_utils to shared_telemetry_utils?
We will have similar sharing needs with the histogram code for the C++ parts.
@@ +244,5 @@
> + raise Exception('Error parsing scalars in ' + filename + ': ' + e.message)
> +
> + # TODO: Should we make sure scalars_yaml is an OrderedDict? The would probably
> + # violate the YAML spec (see http://pyyaml.org/ticket/29#comment:4). That said,
> + # it seems to be important for us given the comments in histogram_tools.py.
Let's visit that wen we actually run into it - right now we only parse from one file anyway.
@@ +251,5 @@
> + # The first level contains the group name, while the second level contains the
> + # probe name (e.g. "group.name: probe: ...").
> + for group_name in scalars:
> + group = scalars[group_name]
> + for probe_name in group:
We should assert that `group` has at least one child.
@@ +256,5 @@
> + # We found a scalar type. Go ahead and parse it.
> + scalar_info = group[probe_name]
> + yield ScalarType(group_name, probe_name, scalar_info)
> +
> +def load_scalars(filenames):
We don't need to support loading from more than one file right.
Let's just restrict this to one file and come back to it when actually needed.
Attachment #8759215 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 11•8 years ago
|
||
Attachment #8759213 -
Attachment is obsolete: true
Assignee | ||
Comment 12•8 years ago
|
||
Attachment #8759212 -
Attachment is obsolete: true
Assignee | ||
Comment 13•8 years ago
|
||
Attachment #8759215 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Attachment #8760187 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 14•8 years ago
|
||
Comment on attachment 8760187 [details] [diff] [review]
bug1276190.patch
Review of attachment 8760187 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/telemetry/gen-scalar-enum.py
@@ +24,5 @@
> + print("#ifndef mozilla_TelemetryScalarEnums_h", file=output);
> + print("#define mozilla_TelemetryScalarEnums_h", file=output);
> + print("namespace mozilla {", file=output)
> + print("namespace Telemetry {", file=output)
> + print("enum Scalar_ID : uint32_t {", file=output)
This should be an |enum class Scalar_ID|...
Reporter | ||
Updated•8 years ago
|
Attachment #8760185 -
Attachment mime type: text/x-chdr → text/plain
Reporter | ||
Updated•8 years ago
|
Attachment #8760186 -
Attachment mime type: text/x-chdr → text/plain
Reporter | ||
Comment 15•8 years ago
|
||
Comment on attachment 8760187 [details] [diff] [review]
bug1276190.patch
Review of attachment 8760187 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/telemetry/Scalars.yaml
@@ +1,1 @@
> +telemetry.test:
Let's add an explanatory comment on what the file is for at the top (we will update it with documentation links later).
As mentioned before, add an explanatory comment before this section, that this is for probes testing the Telemetry system.
For probe coverage, i'm missing an expired and unexpired test probe below.
@@ +6,5 @@
> + description: >
> + This is a test uint type with a really long description, maybe spanning even multiple
> + lines, to just prove a point: everything works just fine.
> + expires: never
> + kind: uint
I think we don't need to keep this short here, so readability - e.g. `unsigned_int` - is probably preferred?
@@ +19,5 @@
> + kind: string
> + notification_emails:
> + - telemetry-client-dev@mozilla.com
> +
> + keyed_uint:
We don't want to support keyed histograms in the first stage, so lets not allow them for now.
::: toolkit/components/telemetry/gen-histogram-data.py
@@ +13,5 @@
> import itertools
>
> banner = """/* This file is auto-generated, see gen-histogram-data.py. */
> """
>
Nit: Let's clean out this character while we're here.
@@ +42,3 @@
> static_assert(output, "sizeof(%s) <= UINT32_MAX" % strtab_name,
> "index overflow")
>
Ditto.
::: toolkit/components/telemetry/gen-scalar-data.py
@@ +30,5 @@
> +
> + print(" {{ {}, {}, {}, {}, {} }},"\
> + .format(scalar.nsITelemetry_kind, name_index, expiration_index,
> + scalar.dataset,
> + "true" if scalar.keyed else "false"),
Nit: Let's have each argument on a new line here, that will be more readable than the mixed style.
@@ +68,5 @@
> + print(banner, file=output)
> + print("#ifndef mozilla_TelemetryScalarData_h", file=output);
> + print("#define mozilla_TelemetryScalarData_h", file=output);
> + print("#include \"ScalarInfo.h\"", file=output);
> + print("namespace {", file=output)
Nit: Why not use two multi-line strings, prefix & suffix?
::: toolkit/components/telemetry/gen-scalar-enum.py
@@ +24,5 @@
> + print("#ifndef mozilla_TelemetryScalarEnums_h", file=output);
> + print("#define mozilla_TelemetryScalarEnums_h", file=output);
> + print("namespace mozilla {", file=output)
> + print("namespace Telemetry {", file=output)
> + print("enum Scalar_ID : uint32_t {", file=output)
Remember the C++ naming conventions:
`enum class ScalarID ...`
::: toolkit/components/telemetry/parse_scalars.py
@@ +41,5 @@
> + def check_name(name, allowed_char_regexp):
> + # Check if we only have the allowed characters.
> + chars_regxp = r'^[a-zA-Z0-9' + allowed_char_regexp + r']+$'
> + if not re.search(chars_regxp, name):
> + raise TypeError(name + ' should be alpha-numeric + \'{}\''.format(allowed_char_regexp))
Nit: "must be"
Also, let's change the format a bit to mention what kind of property failed the check, e.g.:
"group name must be ..., got: X"
@@ +45,5 @@
> + raise TypeError(name + ' should be alpha-numeric + \'{}\''.format(allowed_char_regexp))
> +
> + # Don't allow leading digits, '.' or '_'.
> + if re.search(r'(^[\d\._])|([\d\._])$', name):
> + raise TypeError(name + ' should have a leading/trailing digit, a dot or underscore.')
Nit: "must not have"
Ditto on the format, e.g.: "group name must ..., got: X"
@@ +70,5 @@
> + 'notification_emails': list # This contains strings. See LIST_FIELDS_CONTENT.
> + }
> +
> + OPTIONAL_FIELDS = {
> + 'keyed': bool,
We don't support keyed scalars yet.
@@ +102,5 @@
> + if len(wrong_type_names) > 0:
> + raise TypeError(self._name + ' - ' + ', '.join(wrong_type_names))
> +
> + # Check that data in the lists have the correct types.
> + list_fields = [f for f in definition if isinstance(definition[f], list)]
We need to enforce non-empty lists.
@@ +116,5 @@
> + :param definition: the dictionary containing the scalar properties.
> + :raises ValueError: if a scalar definition field contains an unexpected value.
> + """
> +
> + definition['expires'] = validate_expiration(definition['expires'])
Overwriting this here is unexpected.
This function is also not named clearly, it doesn't just validate.
@@ +125,5 @@
> + raise ValueError(self._name + ' - unknown scalar kind: ' + scalar_kind)
> +
> + # Validate the collection policy.
> + collection_policy = definition.get('release_channel_collection', 'opt-in')
> + if collection_policy not in ['opt-in', 'opt-out']:
Nit: Why do the check even if it's not present?
@@ +171,5 @@
> +
> + @property
> + def keyed(self):
> + """Return true if the scalar is keyed"""
> + return self._definition.get('keyed', False)
Not yet.
@@ +246,5 @@
> + :param filenames: the YAML files containing the scalars definition. It will only
> + load the first file.
> + """
> + if len(filenames) > 1:
> + raise Exception('We don\'t support loading from more than one file.')
Why do we accept a list then?
Let's just accept a single file name.
Attachment #8760187 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 16•8 years ago
|
||
Thanks for the review, Georg.
Attachment #8760187 -
Attachment is obsolete: true
Attachment #8761096 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 17•8 years ago
|
||
Attachment #8760185 -
Attachment is obsolete: true
Assignee | ||
Comment 18•8 years ago
|
||
Attachment #8760186 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Attachment #8761096 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 19•8 years ago
|
||
Attachment #8761097 -
Attachment is obsolete: true
Assignee | ||
Comment 20•8 years ago
|
||
Attachment #8761098 -
Attachment is obsolete: true
Assignee | ||
Comment 21•8 years ago
|
||
Changed "is" (wrong) to "==" when checking the policy.
Attachment #8761096 -
Attachment is obsolete: true
Attachment #8761150 -
Flags: review?(gfritzsche)
Reporter | ||
Comment 22•8 years ago
|
||
Comment on attachment 8761150 [details] [diff] [review]
bug1276190.patch
Review of attachment 8761150 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/telemetry/Scalars.yaml
@@ +1,2 @@
> +# This file contains a definition of the scalar probes that are available for use
> +# within the client.
"This file contains a definition of the scalar probes that are recorded in Telemetry.
They are submitted with the "main" pings and can be inspected in about:telemetry."
@@ +1,4 @@
> +# This file contains a definition of the scalar probes that are available for use
> +# within the client.
> +
> +# The following section is for probes testing the Telemetry system.
+ "They will not be submitted in pings and are only used for testing."
@@ +26,5 @@
> + expired:
> + bug_numbers:
> + - 1276190
> + description: This is an expired testing scalar; not meant to be touched.
> + expires: 4.0a1
As mentioned, you also need to add an unexpired probe with a specific "expires" version (just put it far into the future).
::: toolkit/components/telemetry/gen-scalar-data.py
@@ +67,5 @@
> + if len(filenames) > 1:
> + raise Exception('We don\'t support loading from more than one file.')
> + scalars = parse_scalars.load_scalars(filenames[0])
> +
> + file_header = """\
For consistence, shouldn't we have `banner` next to these?
Either move `banner` here or these up.
@@ +81,5 @@
> + """
> +
> + # Write the scalar data file.
> + print(banner, file=output)
> + print(dedent(file_header), file=output)
Why do we need this and import `dedent` etc. when you just can directly make `file_header` contain what we want to print?
::: toolkit/components/telemetry/gen-scalar-enum.py
@@ +21,5 @@
> + if len(filenames) > 1:
> + raise Exception('We don\'t support loading from more than one file.')
> + scalars = parse_scalars.load_scalars(filenames[0])
> +
> + file_header = """\
Ditto on `dedent` and moving `banner`.
::: toolkit/components/telemetry/parse_scalars.py
@@ +42,5 @@
> + def check_name(name, error_msg_prefix, allowed_char_regexp):
> + # Check if we only have the allowed characters.
> + chars_regxp = r'^[a-zA-Z0-9' + allowed_char_regexp + r']+$'
> + if not re.search(chars_regxp, name):
> + raise TypeError(error_msg_prefix + ' name must be alpha-numeric. Got: \'{}\''.format(name))
Why not use "" and avoid escaping?
Ditto below.
@@ +51,5 @@
> + ' name must not have a leading/trailing digit, a dot or underscore. Got: \'{}\''\
> + .format(name))
> +
> + check_name(group_name, "Group", r'\.')
> + check_name(probe_name, "Probe", r'_')
Nit: Randomly mixed "" & ''.
@@ +114,5 @@
> + # Check the type of the list content.
> + broken_types = ['{} must be {}'.format(v, LIST_FIELDS_CONTENT[field].__name__) \
> + for v in definition[field] if not isinstance(v, LIST_FIELDS_CONTENT[field])]
> + if len(broken_types) > 0:
> + raise TypeError(self._name + ' - ' + field + ': ' + ', '.join(broken_types))
This needs a more descriptive error message.
We want to message that field F for probe P must only contain values of type T.
Attachment #8761150 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 23•8 years ago
|
||
Attachment #8761150 -
Attachment is obsolete: true
Attachment #8761348 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 24•8 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #22)
> Comment on attachment 8761150 [details] [diff] [review]
> bug1276190.patch
> @@ +26,5 @@
> > + expired:
> > + bug_numbers:
> > + - 1276190
> > + description: This is an expired testing scalar; not meant to be touched.
> > + expires: 4.0a1
>
> As mentioned, you also need to add an unexpired probe with a specific
> "expires" version (just put it far into the future).
Yeah, sorry, I thought you meant something else.
> @@ +81,5 @@
> > + """
> > +
> > + # Write the scalar data file.
> > + print(banner, file=output)
> > + print(dedent(file_header), file=output)
>
> Why do we need this and import `dedent` etc. when you just can directly make
> `file_header` contain what we want to print?
I did that to deal with indentation in the multi line strings. Now that I've moved file_* near banner at the top of the file, that's no longer needed. I've removed dedent.
Status: NEW → ASSIGNED
Reporter | ||
Updated•8 years ago
|
Points: --- → 2
Reporter | ||
Comment 25•8 years ago
|
||
One thing i missed before:
We should enforce a maximum string length on group & probe names right away to avoid problems later.
Assignee | ||
Updated•8 years ago
|
Attachment #8761348 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 26•8 years ago
|
||
Attachment #8761348 -
Attachment is obsolete: true
Attachment #8761928 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 27•8 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #25)
> One thing i missed before:
> We should enforce a maximum string length on group & probe names right away
> to avoid problems later.
Just added that. I've enforced a 40char length limit on group and probe names.
This will allow us to fit the current longest key paths in the group name (e.g. "environment.system.gfx.adapter1") and all the current probe names (e.g. CACHE_SERVICE_LOCK_WAIT_MAINTHREAD_NSCACHEENTRYDESCRIPTOR_GETPREDICTEDDATASIZE is one of the longest probe names, with the group prefix taking a consistent chunk of its length).
Reporter | ||
Comment 28•8 years ago
|
||
Comment on attachment 8761928 [details] [diff] [review]
bug1276190.patch
Review of attachment 8761928 [details] [diff] [review]:
-----------------------------------------------------------------
This looks good with the below fixed.
Lets add proper in-tree documentation here in a separate patch that contains:
* what this file is good for
* how/when it is called
* what it generates
* the expected format
* restrictions on the individual fields
We should flag Benjamin for feedback on that specification and the initial Scalars.yaml from a data collection perspective.
::: toolkit/components/telemetry/parse_scalars.py
@@ +42,5 @@
> + def check_name(name, error_msg_prefix, allowed_char_regexp):
> + # Check if we only have the allowed characters.
> + chars_regxp = r'^[a-zA-Z0-9' + allowed_char_regexp + r']+$'
> + if not re.search(chars_regxp, name):
> + raise TypeError(error_msg_prefix + " name must be alpha-numeric. Got: '{}'".format(name))
ValueError
@@ +46,5 @@
> + raise TypeError(error_msg_prefix + " name must be alpha-numeric. Got: '{}'".format(name))
> +
> + # Don't allow leading/trailing digits, '.' or '_'.
> + if re.search(r'(^[\d\._])|([\d\._])$', name):
> + raise TypeError(error_msg_prefix +
ValueError
@@ +52,5 @@
> + .format(name))
> +
> + # Enforce a maximum length on group and probe names.
> + MAX_NAME_LENGTH = 40
> + for n in [group_name, probe_name]:
Nit: move this before `def check_name`.
@@ +54,5 @@
> + # Enforce a maximum length on group and probe names.
> + MAX_NAME_LENGTH = 40
> + for n in [group_name, probe_name]:
> + if len(n) > MAX_NAME_LENGTH:
> + raise TypeError("'{}' must be limited to {} characters"\
`ValueError`.
Lets make this a more helpful message pattern: "Name X exceeds maximum name length of N characters."
@@ +116,5 @@
> + list_fields = [f for f in definition if isinstance(definition[f], list)]
> + for field in list_fields:
> + # Check for empty lists.
> + if len(definition[field]) == 0:
> + raise TypeError(self._name + ' - must not be empty.')
You need to mention the field name as well in the message.
@@ +244,5 @@
> +
> + for probe_name in group:
> + # We found a scalar type. Go ahead and parse it.
> + scalar_info = group[probe_name]
> + yield ScalarType(group_name, probe_name, scalar_info)
I'm not sure why this actually is a generator.
I think it would be clearer to just make this function `load_scalars()` and make it build and return a list.
Attachment #8761928 -
Flags: review?(gfritzsche) → review+
Assignee | ||
Comment 29•8 years ago
|
||
Attachment #8761928 -
Attachment is obsolete: true
Attachment #8762063 -
Flags: review+
Assignee | ||
Comment 30•8 years ago
|
||
Assignee | ||
Comment 31•8 years ago
|
||
Comment on attachment 8762099 [details] [diff] [review]
Part 2 - Add the docs
@Benjamin, would you mind giving a feedback on the docs from a data collection perspective?
We've put in place (and documented) a limit on the length of the data in the string scalar: this is currently capped at 50 chars, giving back an error on longer strings.
Do you have an opinion on the cap values from the data collection perspective? We could always bump the limit if needed.
Attachment #8762099 -
Flags: review?(gfritzsche)
Attachment #8762099 -
Flags: feedback?(benjamin)
Reporter | ||
Comment 32•8 years ago
|
||
Comment on attachment 8762099 [details] [diff] [review]
Part 2 - Add the docs
Review of attachment 8762099 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/telemetry/docs/index.rst
@@ +16,5 @@
> :maxdepth: 2
>
> pings
> common-ping
> + scalars
Nit: Lets move this after the environment.
::: toolkit/components/telemetry/docs/scalars.rst
@@ +3,5 @@
> +=======
> +
> +Historically we started to overload our histogram mechanism to also collect scalar data,
> +such as flag values, counts, labels and others.
> +The scalar measurement types are the suggested way to collect that kind of scalar data.
Add: "The serialized scalar data is submitted with the :doc:`main pings <main-ping>`."
@@ +27,5 @@
> + probe:
> + kind: int
> + ...
> +
> +Groups and probes names need to follow a few rules:
Nit: "Group and probe names".
@@ +31,5 @@
> +Groups and probes names need to follow a few rules:
> +
> +- they cannot exceed 40 characters each;
> +- group names must be alpha-numeric + '.', with no leading/trailing digit or '.';
> +- probe names must be alpha-numeric + '_', with no leading/trailing digit or '_'
Lets make this ``.`` and ``_``.
@@ +35,5 @@
> +- probe names must be alpha-numeric + '_', with no leading/trailing digit or '_'
> +
> +A probe can be defined as follows::
> +
> + a_scalar:
It might be confusing to not have the group name here.
@@ +49,5 @@
> +---------------
> +
> +- ``bug_numbers``: A list of unsigned integers representing the number of the bugs the probe was introduced in.
> +- ``description``: A single or multi-line string describing the purpose of the probe.
> +- ``expires``: The version number in which the scalar expires, e.g. "30"; a version number of type "N" and "N.0" is automatically converted to "N.0a1" in order to expire the scalar also in the development channels. A telemetry probe acting on an expired scalar will return an error. For scalars that never expire the value "never" can be used.
``never``?
@@ +51,5 @@
> +- ``bug_numbers``: A list of unsigned integers representing the number of the bugs the probe was introduced in.
> +- ``description``: A single or multi-line string describing the purpose of the probe.
> +- ``expires``: The version number in which the scalar expires, e.g. "30"; a version number of type "N" and "N.0" is automatically converted to "N.0a1" in order to expire the scalar also in the development channels. A telemetry probe acting on an expired scalar will return an error. For scalars that never expire the value "never" can be used.
> +- ``kind``: A string representing the scalar type. Allowed values are ``uint`` and ``string``.
> +- ``notification_emails``: A list of email addresses to notify in case of regressions.
"... in case of regressions or other issues."
@@ +56,5 @@
> +
> +Optional Fields
> +---------------
> +
> +- ``cpp_guard``: A string that gets inserted as an #ifdef directive around the automatically generated C++ declaration. This is typically used for platform-specific scalars, e.g.``ANDROID``.
``#ifdef``.
Missing space before ``ANDROID``.
@@ +58,5 @@
> +---------------
> +
> +- ``cpp_guard``: A string that gets inserted as an #ifdef directive around the automatically generated C++ declaration. This is typically used for platform-specific scalars, e.g.``ANDROID``.
> +- ``release_channel_collection``: This can be either ``opt-in`` (default) or ``opt-out``. With the former the scalar is submitted by default on pre-release channels; on the release channel only
> +if the user opted into additional data collection. With the latter the scalar is submitted by default on release and pre-release channels, unless the user opted out.
You need to remove the line break, the generated docs have this outside the list.
Attachment #8762099 -
Flags: review?(gfritzsche) → review+
Comment 33•8 years ago
|
||
Comment on attachment 8762099 [details] [diff] [review]
Part 2 - Add the docs
>diff --git a/toolkit/components/telemetry/docs/scalars.rst b/toolkit/components/telemetry/docs/scalars.rst
>+Required Fields
>+---------------
>+
>+- ``bug_numbers``: A list of unsigned integers representing the number of the bugs the probe was introduced in.
>+- ``description``: A single or multi-line string describing the purpose of the probe.
The description should be primarily *what* data is collected and *when* the data is collected. Purpose/why is not necessary.
>+- ``expires``: The version number in which the scalar expires, e.g. "30"; a version number of type "N" and "N.0" is automatically converted to "N.0a1" in order to expire the scalar also in the development channels. A telemetry probe acting on an expired scalar will return an error. For scalars that never expire the value "never" can be used.
What does "a telemetry probe acting on an expired scalar will return an error" mean in practice? I hope that it doesn't actually throw JS exceptions: that's kinda what we don't want, so that we can expire metrics without requiring teams to remove the code that collected them.
>+- ``notification_emails``: A list of email addresses to notify in case of regressions.
"in case of regressions" is not correct yet. Alerts about expiring probes will be sent here. More importantly though, this is the email address that I (as data steward) will use to verify that probes are still useful.
Attachment #8762099 -
Flags: feedback?(benjamin) → feedback-
Assignee | ||
Comment 34•8 years ago
|
||
Attachment #8762099 -
Attachment is obsolete: true
Attachment #8764269 -
Flags: review+
Assignee | ||
Comment 35•8 years ago
|
||
Comment on attachment 8764269 [details] [diff] [review]
Part 2 - Add the docs
Thanks for the feedback, Benjamin. Do you have any opinion on the length limit (50 chars) we enforce on the content for the scalar string type?
Attachment #8764269 -
Flags: feedback?(benjamin)
Comment 36•8 years ago
|
||
Not really. From a data-steward perspective it's not critical to have any limit, but it's nice defense-in-depth and I'm sure you want it just to limit ping size.
Updated•8 years ago
|
Attachment #8764269 -
Flags: feedback?(benjamin) → feedback+
Assignee | ||
Comment 37•8 years ago
|
||
As discussed over IRC, we can unfortunately have other, hidden services pulling histogram_tools.py out of mc. We don't want to break stuff while landing the scalar.
As a short term workaround, I've restored histogram_tools.py to its original state, so it doesn't use shared_telemetry_utils.py.
As a medium/longer term project and to discuss the issue, I've filed bug 1282098.
Attachment #8762063 -
Attachment is obsolete: true
Attachment #8764958 -
Flags: review+
Assignee | ||
Comment 38•8 years ago
|
||
https://hg.mozilla.org/integration/fx-team/rev/4c600177f0b58cb564008db25d8a3114af971c0e
Bug 1276190 - Add a script to generate headers with scalar data from Scalars.yaml. r=gfritzsche
https://hg.mozilla.org/integration/fx-team/rev/d8ecf377e54f5928fd3c25fd5402ddbef6385dcd
Bug 1276190 - Add the documentation. r=gfritzsche
Comment 39•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4c600177f0b5
https://hg.mozilla.org/mozilla-central/rev/d8ecf377e54f
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox50:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
You need to log in
before you can comment on or make changes to this bug.
Description
•