1170646 - A few M-C fixes to handle short read in Cache code ( from bug 1170564 Failure to deal with short read)

Assignee

Description

•

9 years ago

Attached patch A couple of fixes to Cache routines to handle short read. (obsolete) (deleted) — Details — Splinter Review

This blocks bug 1170564. This bugzilla entry is about >3. A few M-C fixes. in the bug 1170564 . +++ This bug was initially created as a clone of Bug #1170564 +++ After spending some time to weed out the failure to detect I/O errors from C-C TB code during message download [1], I have come to realize there are issues regarding short read. Short read refers to |read| system call returning without the full number of requested octets although the subsequent |read| calls would return the missing octets. That is, short read returns prematurely. Short read can happen during a read operation against a remotely mounted file. High-load on the remote server, or a transient network issue can cause such short read. It is not a stranger to FF developers. Network code is full of cautious programming to take care of such short read. Unfortunately, C-C TB (under linux at least) fails to handle the short read very well and as a result we see errors which I noticed during tests. So I set out to identify and fix these issues and this is a meta entry to track such errors. It turns out that performing additional writes after a short-write is automagically handled by |pt_Write|. Isn't it great? Then why not pt_Read? As I explain later in this post, we sometimes don't know how many we should read in advance. That's why. Short read simulated: I have come to learn that mounting one's profile to remote file server seems to cause strange errors. Since I don't have a CIFS server that seems to cause such short read very often under heavy I/O work load, I resort to simulation. I inject short read by preparing some functions such as |read| and |read64| that simulates short read and preload them using "export LD_PRELOAD=that-file.so" before running C-C thunderbird to find errors caused by short read. How to FIX after identifying the trouble spot: When you deal with short read issues there are a few things that we must keep in mind. At the level of |read| system call, we have no idea whether a short read ought to be followed by additional |read| or not. Then where should we make the decision? It depends on the type of |read| operation. There are two types of |read| operations. (a) # of octets to read is known in advance. The caller of |read| knows exactly that there are certain number of octets to be read in advance. E.g. (1) we check the size of JSON file by calling |stat| in advance and try to read that many octets in one |read|. [As it turns out this is a real-world problem.] (2) we try to read a "block" of a known data structure and sizeof(block) is already known in advance (at compile time, etc). A case in point is the "block" for CACHE operation. (There was an issue and it was fixed.) In these cases, we ought to perform additional |read| operations until all the expected number of octets are read (or until error condition or EOF is encountered.) in the face of short read. So we must provide a mechanism for such additional reads. In my patch, I have introduced a couple of variants of |Read| operation and |PR_Read| that performs additional read operations if we failed to read the desired # of octets in one run until all the octets are read. (b) # of octets to read is not known in advance. E.g., (1) There is a mime-encoded data in a file (as in attachment), and the program needs to read as many data as possible until the end of the data marker line is reached. (2) In general, suppose there is a data structure written in ASCII representation in a file, and also suppose the caller of |read| can know the end of the logical data structure by reading a certain textual construct that signals the end of the data. In this case, the caller of |read| needs to perform the |read| repeatedly until this end of data marker is read. Case (1) is obviously a particular case of (2). (3) There are similar variants. In these cases, we can NOT rely on the number of octets to read in advance. The size of the buffer passed to |read| is just an estimate, a ball park figure. We may have more data to read or much less data to read actually. We ought to perform additional |read| until the end of data marker is reached (or the error is signaled or EOF is reached.) Between (a) and (b), the main difference is case (a) knows the fixed number of octets to be read in advance, and case (b) does not know it. In practice, |read| calls of case (b) tend to happen inside loops and are relatively easy to spot, and no modification is necessary most of the time even if short read has to be considered. (Well there are cases where the ERROR condition was not detected at all in the current code and I tried to fix them.) The short read in case (a) needs to be handled at the spot where the required number of octets is known in the call chain. At the |read| system call level, we have no idea whether a particular short |read| ought to be followed by additional reads at all. To figure out where such problematic short read is encountered, and to learn whether the |read| is type (a) or type (b) above, I had to dump stack trace to figure out where such short read is NOT followed by necessary additional read operations. Using the stack trace, I tried to figure out where the decision to perform additional reads can be made and, if necessary, replaced the read with a version that performs such additional read operations. I think I explained enough background. There will be four bugzilla entries I intend to submit today and block this meta entriy. 1. NSS "dbm" code used for certificate and key handling. (Already reported in newsgroup/mailing list.) 2. Mostly C-C T-B fixes to take care of short read. 3. A few M-C fixes. Yes, there *ARE* places where short read is not handled in M-C! I was surprised myself. Maybe storing one's profile under remote file server was an arcane idea when mozilla code base was written years ago. But today, in corporate setup, PC is often used like a thin client and everything including your profile may be stored on a central server. 4. Case of reading JSON file from JavaScript. There is a known bug which I don't know how to fix. It is filed as a separate bugzilla. It is concerned with a reading of JSON file from JavaScript. There are a few more unresolved cases that manifest themselves during local |make mozmill| test suite operation with simulated short read injection, but I am not sure if they are real and not timing-dependent random errors that are seen before. [Yes, there are such random errors in |make mozmill|.] I will add them to known C-C T-B fixes and M-C fixes *iff* they turn out to be real bugs in the future. TIA [1] Bug 1121842 - [META] RFC: C-C Thunderbird - Cleaning of incorrect Close, unchecked Flush, Write etc. in nsPop3Sink.cpp and friends.

A couple of fixes to Cache routines to handle short read. 9 years ago ISHIKAWA, Chiaki (deleted), patch	michal : review-	Details \| Diff \| Splinter Review
A couple of fixes to Cache routines to handle short read. (Take-2) 9 years ago ISHIKAWA, Chiaki (deleted), patch		Details \| Diff \| Splinter Review
A couple of fixes to Cache routines to handle short read. (Take-3) 9 years ago ISHIKAWA, Chiaki (deleted), patch	michal : review+	Details \| Diff \| Splinter Review
A couple of fixes to Cache routines to handle short read. (Take-4) carrying over michal.novotny: review+ 9 years ago ISHIKAWA, Chiaki (deleted), patch	ishikawa : review+	Details \| Diff \| Splinter Review
Backout f600f0cd7bb3 because of Thunderbird regressions with OSX 9 years ago Kent James (:rkent) (deleted), patch	michal : review+ Sylvestre : approval-mozilla-aurora+ Sylvestre : approval-mozilla-beta+	Details \| Diff \| Splinter Review