Closed Bug 1103945 Opened 10 years ago Closed 10 years ago

[mozlog] Create a command line tool to merge multiple raw structured logs

Tracking

(firefox38 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla38

Tracking Flags:

Tracking

Status

firefox38

---

fixed

People

(Reporter: davehunt, Assigned: makzmaciek, Mentored)

Details

(Whiteboard: [lang=py][good first bug])

Attachments

(2 files, 4 obsolete files)

Log concatenation script 10 years ago Ankit Goyal (deleted), text/plain	jgraham : feedback-	Details
merge_logs.py - proposed solution 10 years ago Maciek (deleted), text/x-python	jgraham : feedback+	Details
Command_line_tool_to_merge_multiple_raw_structured_logs.patch 10 years ago Maciek (deleted), patch		Details \| Diff \| Splinter Review
Implemented fixes after code-review 10 years ago Maciek (deleted), patch		Details \| Diff \| Splinter Review
Proposed solution 10 years ago Maciek (deleted), patch	jgraham : review+	Details \| Diff \| Splinter Review
Final solution 10 years ago Maciek (deleted), patch	jgraham : review+	Details \| Diff \| Splinter Review

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Description

•

10 years ago

We'd like a command line tool that accepts paths to multiple raw structured logs and combines them into a single log. This would be particularly useful for getting a unified log where we chunk the tests. We could take the earliest suite-start and latest suite-end for use in the combined log. This would allow us to show time saved when chunks are run in parallel. The user should be able to specify the format for the combined log file (or perhaps multiple).

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Updated

•

10 years ago

OS: Mac OS X → All

Hardware: x86 → All

Whiteboard: [lang=py][good first bug]

Akshay Parelkar

Comment 1

•

10 years ago

Hi James, I am interested in working on this script. Dave mentioned specifying a format for the combined log file. Is this the format of the file, i.e. txt, etc.? -Akshay

Akshay Parelkar

Updated

•

10 years ago

Flags: needinfo?(james)

James Graham [:jgraham]

Comment 2

•

10 years ago

Thanks for looking at this! So I suggest that in the first instance you make a tool that takes N structured log files and returns a single structured log that's (approximately) the concatenation of all the different structured logs. The output format is then fixed; it's the same line-delimited JSON as the input. There are a couple of tricks you might need here. The first thing is that you want the output to contain a *single* suite-start and suite-end message. The suite-start message should contain the combined list of all tests from the set of input structured log files. I guess to do this you want to read each file until you reach a suite_start message, grab the data you want and then read the rest of each file once you have constructed the output suite-start message. You probably also want to check that the run_info for each input file is the same and fail if it isn't. I can't think of a time where it makes a lot of sense to concatenate runs from different platforms. I don't think there are any fields specific to other things that could legitimately vary (e.g. the exact name of the field that the tests ran on), but if there are we may have to whitelist the run_info properties that make the output. Hopefully that makes some sense. I can give more detailed pointers if you need them.

Flags: needinfo?(james)

Vidhan Jain

Comment 3

•

10 years ago

Hii,I am new to Open Source.Is there anything I can do here.

Ankit Goyal

Comment 4

•

10 years ago

I would like to work on it. Would you attach a sample log file which you want to concatenate.

James Graham [:jgraham]

Comment 5

•

10 years ago

So for example: http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/40ebad6c4c57c0501137945aac2890132653a40792ba226d8500c13cb662c682c40875b3297004476aef0973d1a03b11dea9f4a9475e1183d0bc76d4fd6f4821 http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/95570e14bf35ed30e27d2513de71a0d3558e011ec62ac31d6603c6aedca7a0e4175a7cd3f315b7c5bce70ed271e54f0c7d6b10cc70f3ca058b976c326d7ccc58 http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/3e1f37dc0e4e1123ea042129cdb6f7971c93b455cadc319c8e921e45aed0f009c1d06d31f96fda57cbb42da56d14c983191fbcd5b580685bc250a56a2866a548 http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/488c1e1af00c9a15de9388d09ad733c01fdd5bebb6d63f798f4d330175efb2358d9033b0f46b7ed6bb9e3dd78bcac8df9ec797fa99c2cb079094ae86be6c359a Should concatenate to a single file with a single suite_start and suite_end message.

Ankit Goyal

Comment 6

•

10 years ago

Attached file Log concatenation script (deleted) — Details

Pass path to log files as command line arguments. A file will be generated in same directory as script named as output.txt containing the concatenated log.

Attachment #8540628 - Flags: review?(james)

Ankit Goyal

Updated

•

10 years ago

Attachment #8540628 - Flags: review?(james) → feedback?(james)

James Graham [:jgraham]

Comment 7

•

10 years ago

(In reply to Ankit Goyal from comment #6) > Created attachment 8540628 [details] > Log concatenation script > > Pass path to log files as command line arguments. > A file will be generated in same directory as script named as output.txt > containing the concatenated log. Thanks for looking at this, and sorry for the tardiness of my reply. Unfortunately the attached patch doesn't quite meet the requirements. The concatenation we want to do here has to be aware of the semantics of a structured log file. In particular there should be exactly one suite_start and one suite_end message in the concatenated file, and the result should update the contents of those entries to reflect the actual data in the concatenated file. For example imagine two input files (in some made up syntax): SUITE_START [test1] TEST_START test1 TEST_END test1 PASS SUITE_END SUITE_START [test2] TEST_START test2 TEST_END test2 FAIL SUITE_END The result of concatenating these files should be: SUITE_START [test1, test2] TEST_START test1 TEST_END test1 PASS TEST_START test2 TEST_END test2 FAIL SUITE_END Does that make more sense?

James Graham [:jgraham]

Updated

•

10 years ago

Attachment #8540628 - Flags: feedback?(james) → feedback-

Maciek

Assignee

Comment 8

•

10 years ago

Hi! I would like to work on this issue. Can you assign it to me? Could you explain couple more things please? Should I discard events before suite_start and after suite_end? How other fields (besides run_info, time and tests) should be merged in suite_{start, end}? Should merged events be sorted or just appended file by file? -Maciek

James Graham [:jgraham]

Comment 9

•

10 years ago

> Could you explain couple more things please? > Should I discard events before suite_start and after suite_end? Let's keep them in the logs, before the suite_start, or after the suite_end, message. > How other fields (besides run_info, time and tests) should be merged in > suite_{start, end}? I think it should likely validate that run_info and device_info are the same in each log file (possibly with an option to override this heck, in which case those fields should be ommitted from the output). For the common fields, I think it makes sense to emit data that is accurate for the combining process: time - The current timestamp thread - name of the current thread pid - id of the current python process source - I think this will typically be the same for all the input files, so use that component - "log concatenation" or something? > Should merged events be sorted or just appended file by file? The structure I suggest is [all pre-suite_start events, unsorted] <suite_start> [all intermediate events, unsorted] <suite_end> [all post-suite_end events] That is, I think a valid implementation strategy would be to open log_1..log_N and for each in turn output each event before suite_start is reached. Then output a single suite_start, then from each file in turn output each event before suite_end is reached, output suite_end, and then the same again for the events after suite_end.

James Graham [:jgraham]

Updated

•

10 years ago

Assignee: nobody → makzmaciek

Maciek

Assignee

Comment 10

•

10 years ago

Attached file merge_logs.py - proposed solution (obsolete) (deleted) — Details

Please see proposed solution attached. What do you think about it? Should I create a patch to a repo? (which one?)

James Graham [:jgraham]

Comment 11

•

10 years ago

Comment on attachment 8549897 [details] merge_logs.py - proposed solution This looks like a great start, thanks! I think we want this script to live in mozilla-central under testing/mozbase/mozlog/mozlog/structured/scripts/ There are also some utility functions in there that you should be able to use for things like reading the log files. Also note that having code at the top level is not ideal (it seems likely that we will want to be able to import this code and use it as a module). For instructions on creating a patch and asking for review, see [1]. I appreciate that's a bit of a wall of text, so if you have more specific questions please ask. [1] https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/How_to_Submit_a_Patch

Attachment #8549897 - Flags: feedback+

Maciek

Assignee

Comment 12

•

10 years ago

Attached patch Command_line_tool_to_merge_multiple_raw_structured_logs.patch (obsolete) (deleted) — Details — Splinter Review

Please find the patch attached. Should I request a review now? I've refactored a bit the script to follow standards from other scripts in the same directory.

Attachment #8549897 - Attachment is obsolete: true

James Graham [:jgraham]

Comment 13

•

10 years ago

This looks good! Please go ahead and request a review.

Maciek

Assignee

Updated

•

10 years ago

Attachment #8552018 - Flags: review?(james)

James Graham [:jgraham]

Comment 14

•

10 years ago

Comment on attachment 8552018 [details] [diff] [review] Command_line_tool_to_merge_multiple_raw_structured_logs.patch Review of attachment 8552018 [details] [diff] [review]: ----------------------------------------------------------------- This looks like a really good start; just some minor changes required. Please add the script to the commands dict in testing/mozbase/mozlog/mozlog/structured/scripts/__init__.py so that it is callable using the structlog command. It also looks like your diff is against the wrong base; it makes sense to use the hg or git commands to produce a diff. Thanks for making the patch! ::: structured/scripts/logmerge.py @@ +15,5 @@ > +def fill_process_info(event): > + event["time"] = int(round(time.time() * 1000)) > + event["thread"] = current_thread().name > + event["pid"] = os.getpid() > + event["component"] = "logmerging" Let's just leave this as-is. @@ +20,5 @@ > + return event > + > + > +def process_until(f, output, action): > + entry = json.loads(f.readline()) So, rather than doing this with json directly it would be great if we could use the reader module in mozlog. This obviously requires some changes in other places too, but I don't think it's too substantial a change. @@ +22,5 @@ > + > +def process_until(f, output, action): > + entry = json.loads(f.readline()) > + while entry['action'] != action: > + json.dump(entry, output) This looks like dump_entry? @@ +40,5 @@ > +def validate_start_events(events): > + for start in events: > + if not start['run_info'] == events[0]['run_info']: > + print("Error: different run_info entries", file=sys.stderr) > + sys.exit(-1) I think this exit code ought to be positive. @@ +51,5 @@ > + > + > +def get_parser(add_help=True): > + parser = argparse.ArgumentParser("logmerge", description='Merge multiple log files.', add_help=add_help) > + parser.add_argument('-o', dest='output_filename', default="output.json", help='output file') Can we default the output to sys.stdout, please?

Attachment #8552018 - Flags: review?(james)

Maciek

Assignee

Comment 15

•

10 years ago

Attached patch Implemented fixes after code-review (obsolete) (deleted) — Details — Splinter Review

Thanks for your feedback! Please see updated patch.

Attachment #8552018 - Attachment is obsolete: true

Attachment #8554863 - Flags: review?(james)

James Graham [:jgraham]

Comment 16

•

10 years ago

Comment on attachment 8554863 [details] [diff] [review] Implemented fixes after code-review Review of attachment 8554863 [details] [diff] [review]: ----------------------------------------------------------------- Just a couple of very minor fixups, otherwise looks good. ::: testing/mozbase/mozlog/mozlog/structured/scripts/logmerge.py @@ +16,5 @@ > +def fill_process_info(event): > + event["time"] = int(round(time.time() * 1000)) > + event["thread"] = current_thread().name > + event["pid"] = os.getpid() > + event["component"] = "logmerging" Let's just delete this line. @@ +21,5 @@ > + return event > + > + > +def process_until(f, output, action): > + for entry in reader.read(f): Being picky now, but I think this will be more obvious if you create the reader in main() and pass that around rather than the file object.

Attachment #8554863 - Flags: review?(james)

Maciek

Assignee

Comment 17

•

10 years ago

Attached patch Proposed solution (obsolete) (deleted) — Details — Splinter Review

Thanks again for review, please see another patch attached.

Attachment #8554863 - Attachment is obsolete: true

Attachment #8555465 - Flags: review?(james)

James Graham [:jgraham]

Comment 18

•

10 years ago

Comment on attachment 8555465 [details] [diff] [review] Proposed solution Review of attachment 8555465 [details] [diff] [review]: ----------------------------------------------------------------- Looks good! Thanks for doing this.

Attachment #8555465 - Flags: review?(james) → review+

James Graham [:jgraham]

Comment 19

•

10 years ago

I notice this patch doesn't have author information, so I can't check it in as-is. I don't know exactly what workflow you are using, bit with git "git show -p ref" should generate a patch for ref and with mercurial "hg export -r rev" should do it. Note that in either case you will need to configure the VCS with your name/email. Whilst you are doing that, you should probably set a commit message in the standard Mozilla format: Bug 1103945 - Add command line tool to merge multiple raw structured logs, r=jgraham But of course I can do that part more easily. let me know if you need any help with this.

Dave Hunt [:davehunt] [he/him] ⌚BST

Reporter

Updated

•

10 years ago

Flags: needinfo?(makzmaciek)

Maciek

Assignee

Comment 20

•

10 years ago

Attached patch Final solution (deleted) — Details — Splinter Review

Thanks for review. Hopefully final patch attached.

Attachment #8555465 - Attachment is obsolete: true

Flags: needinfo?(makzmaciek)

Attachment #8558043 - Flags: review?(james)

James Graham [:jgraham]

Updated

•

10 years ago

Attachment #8558043 - Flags: review?(james) → review+

James Graham [:jgraham]

Comment 21

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/8df082971441

Ryan VanderMeulen [:RyanVM]

Comment 22

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/8df082971441

Status: NEW → RESOLVED

Closed: 10 years ago

status-firefox38: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla38

You need to log in before you can comment on or make changes to this bug.