Closed
Bug 262192
Opened 20 years ago
Closed 20 years ago
SIGSEGV in pk11_forceTokenAttribute in softoken
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
3.9.4
People
(Reporter: julien.pierre, Assigned: julien.pierre)
Details
Attachments
(1 file, 4 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review |
I was running an SSL session reuse stress test on AMD64 solaris and got a core
file after about 48hrs. Unfortunately, this was with optimized bits, and I can't
narrow down the crash to a particular line of code. The stack on the crashing
thread appears corrupt - dbx only shows a single stack frame. I'm trying to
reproduce the crash with debug bits.
Assignee | ||
Updated•20 years ago
|
Comment 1•20 years ago
|
||
Do you think this bug is only on Solaris 10 AMD64 or could affect any other
plateform ?
Comment 2•20 years ago
|
||
Any idea what object, attribute, and value was being forced?
Might give us some clues
Assignee | ||
Comment 3•20 years ago
|
||
Christophe,
In my previous SSL stress tests, I never saw a core file on any other platform,
so I'm not sure that they would be affected.
It's possible this has to do with the compiler optimizations . I was using the
highest level of optimizations in studio9, with the -xO5 and -fast options, in
order to get the best performance.
I'm now running a stress test with debug bits to find if I can reproduce the
crash. I will leave it running until monday. If it doesn't crash by then, I may
restart a test with the optimized bits to find if I can reproduce it again and
narrow it down to compiler optimizations ...
Nelson : unfortunately, the optimized core file really does not say much at all.
I can't get the value of arguments displayed.
(dbx) threads
t@1 a l@1 ?() sleep on 0x53dfb8 in __lwp_park()
t@2 a l@2 _pt_root() LWP suspended in mutex_trylock_adaptive()
t@3 a l@3 _pt_root() LWP suspended in __lwp_park()
o> t@4 a l@4 _pt_root() signal SIGSEGV in pk11_forceTokenAttribute()
t@5 a l@5 _pt_root() LWP suspended in __lwp_park()
t@6 a l@6 _pt_root() LWP suspended in __pollsys()
t@7 a l@7 _pt_root() LWP suspended in _close()
t@8 a l@8 _pt_root() LWP suspended in __pollsys()
t@9 a l@9 _pt_root() LWP suspended in __lwp_park()
t@10 a l@10 _pt_root() sleep on 0x5a89f8 in __lwp_park()
(dbx) thread t@4
t@4 (l@4) stopped in pk11_forceTokenAttribute at 0x7ffffface53e
0x00007ffffface53e: pk11_forceTokenAttribute+0x00be: movq
0x0000000000000030(%r12),%r10
(dbx) where
current thread: t@4
=>[1] pk11_forceTokenAttribute(), at 0x7ffffface53e
(dbx)
Comment 4•20 years ago
|
||
Julien,
thanks for emailing me the stack contents, disassembly and register values.
The stack does not appear to me to be clobbered.
The crash occurred at line 1714 in pk11_forceTokenAttribute
1713 attribute=pk11_FindAttribute(object,type);
1714 if ((type != CKA_LABEL) && (attribute->attrib.ulValueLen == len) &&
1715 PORT_Memcmp(attribute->attrib.pValue,value,len) == 0) {
the value of "attribute" returned by pk11_FindAttribute was NULL, and
line 1714 doesn't check for NULL before dereferencing it.
I found that the compiler had in-lined the functions pk11_narrowToTokenObject
and pk11_FindAttribute, which effectively caused the pk11_FindAttribute call
to be turned into a call to pk11_FindTokenAttribute. Pretty impressive
optimization.
The fact that pk11_FindTokenAttribute returned NULL may or may not be
indicative of another error, perhaps due to optimization. But the immediate
cause of the crash is the failure to check the attribute pointer value for
NULL.
Assignee | ||
Comment 5•20 years ago
|
||
Thanks, Nelson !
I think it would be useful to add a NULL check for the attribute here, but I'm
not certain what CKR error code to return.
Most other parts of the code return CKR_TEMPLATE_INCOMPLETE for this case. But
pk11_findAttribute doesn't say why it didn't find the attribute.
The proper return code might be CKR_ATTRIBUTE_TYPE_INVALID, or
CKR_OBJECT_HANDLE_INVALID . It's hard to tell.
However, I just found out that the tree that got the core file is a tip tree,
not 3.9 branch, which contained a patch made by Ian to enhance performance with
PKCS#11 contexts . This may very well be the cause of this crash. I will attach
the patch in question. Unfortunately it is one more variable if I have to
reproduce the problem.
Target Milestone: 3.9.3 → ---
Version: 3.9.2 → unspecified
Assignee | ||
Comment 6•20 years ago
|
||
Comment 7•20 years ago
|
||
The attribute type being forced was CKA_NEVER_EXTRACTABLE.
That attribute is forced in these places:
/security/nss/lib/softoken/pkcs11.c, line 1306
/security/nss/lib/softoken/pkcs11.c, line 1413
/security/nss/lib/softoken/pkcs11c.c, line 2926
/security/nss/lib/softoken/pkcs11c.c, line 3404
/security/nss/lib/softoken/pkcs11c.c, line 3408
Comment 8•20 years ago
|
||
The places that attempt to set CKA_NEVER_EXTRACTABLE are all in key (pair)
generation / unwrapping / deriving. Ian's patch seems to affect digesting
only, so it's not at all obvious that his patch would affect the behavior
of the operations relevant to this crash.
Comment 9•20 years ago
|
||
Hmm what is the object type that we are trying to force this attribute on?
The correct CKR value should CKR_ATTRIBUTE_TYPE_INVALID or CKR_HOST_MEMORY,
depending on why it couldn't be found. The Token object calls should return a
static attribute for this value, unless object type isn't what we thought it
was. (indicating a trash somewhere else).
Adding a null check here is a good sanity check anyway, and should be included
in the patch.
bob
Comment 10•20 years ago
|
||
Julien sent me a dump of the memory to which the "object" pointer points.
The content of that memory is NOT a valid PK11Object. That explains why
pk11_FindTokenAttribute returned NULL. But it also says that the NULL
pointer dereference was only a side-effect of the invalid object pointer,
and the real cause of the crash (the code responsible for the invalid object
pointer, or which trashed the PK11 Object) has not yet been identified. :(
Since the stack appears intact, we should be able to walk down the stack
until we find the source of the invalid pointer. That value appears several
times in the stack, so we should be able to find the origin.
Assignee | ||
Comment 11•20 years ago
|
||
FYI, my test with debug bits is still running and the crash has not been reproduced.
Comment 12•20 years ago
|
||
Julien, you should try to do an optimized build
with debug symbols. You can try adding -g after
the -xO5 and -fast options. The compiler will
tell you whether you can use -g or some other
variant of -g with that compiler optimization
level.
Assignee | ||
Comment 13•20 years ago
|
||
I made an "optimized debug build" after struggling a little bit with the
compiler. I restarted my test. I hope I can reproduce the crash and get more
info out of the new core ...
Assignee | ||
Comment 14•20 years ago
|
||
I haven't been able to reproduce this bug in over a week of trying,
unfortunately. I can't debug my old core file either because the specific build
has been overwritten, and I can't reproduce it as the particular compiler has
gone away too. We are just going to have make the defensive fix in the function
where the crash occured, and speculate about the root cause above it.
Assignee | ||
Comment 15•20 years ago
|
||
This patch checks for NULL pointers in the function where the crash was
observed.
As Nelson noted, the content of the PK11Object structure is corrupt, so I
attempt to check if slot and refCount are valid. If there are any other
assumptions I can make to validate the content of this structure, I'd like to
add more checks as well.
Assignee | ||
Comment 16•20 years ago
|
||
Attachment #162015 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #162016 -
Flags: superreview?(rrelyea0264)
Attachment #162016 -
Flags: review?(saul.edwards.bugs)
Comment 17•20 years ago
|
||
Comment on attachment 162016 [details] [diff] [review]
add missing assertions
R+.. though I think the first return after the first if should probably be
CKR_DEVICE_ERROR. It can only happen is there is some sort of internal
corruption. CKR_ARGUMENTS_BAD means the user of the PKCS #11 module passed in
bad arguments. That would already be checked by the code tha converts object id
into PK11Object *.
Attachment #162016 -
Flags: superreview?(rrelyea0264) → superreview+
Assignee | ||
Comment 18•20 years ago
|
||
Comment on attachment 160799 [details] [diff] [review]
Ian's patch for the tip
FYI, Ian's patch causes problems with the SSL3 cipher suite RSA with RC4 128
and SHA1 . The problem is that the server becomes non-interoperable with
standard implementations (MAC mismatch).
The RC4 128/ MD5 suite isn't affected.
This may not have anything to do with the cause of the crash, but I thought I
would mention it.
Attachment #160799 -
Attachment is obsolete: true
Comment 19•20 years ago
|
||
Julien, your observation in comment 18 is more than a little frightening.
Whiel this info is still fresh in your mind, please file a separate bug
about it and cc me on that bug. Please be specific about the RCS version
number and file name that contains the flaw, and how/why it adversely
affects MACs. Thanks!
Assignee | ||
Comment 20•20 years ago
|
||
Nelson, I'm reluctant to file a bug about a patch that hasn't been checked in
anywhere yet.
To observe the problem, you can apply Ian's patch to an otherwise virgin tip NSS
tree, run selfserv with the "-c n" option to select the RSA RC4 128 SHA1 cipher
suite, then connect with mozilla or other tools using pre-existing NSS libs.
They will not be able to connect. But if you use a client that is using the same
broken libs, such as occurs in all.sh, the client will work .
Given that Ian's patch deals with digests and attempts to optimize the use of
PKCS#11 context, if there is something wrong with it, it is not completely
surprising that it could affect MAC computation with SSL, as it does here for
SHA1. But two wrongs seem to make one right in this case, as I suppose both the
client and server perform the incorrect SHA1 computation with the same bad
result . I think this uncovers a critical flaw in our test coverage. Our tests
should be able to figure out that SSL interoperability has been broken by a code
change such as this one.
Assignee | ||
Comment 21•20 years ago
|
||
Actually I need to move the checks in the patch one level up the stack or it
won't do any good, since the same corruption in the PK11Object would crash
pk11_forceAttribute, the sole caller of pk11_forceTokenAttribute .
One more thing, about Ian's patch. I found that it just breaks SSL3 cipher
suites in general, not specifically RSA 128/SHA1. But all TLS cipher suites
still work with the patch. To reproduce the problem with the patch, run selfserv
(from the tip) with -S and -T options, to disable SSL2 and TLS respectively.
Then, the browser won't be able to connect.
Assignee | ||
Comment 22•20 years ago
|
||
With Nelson's help, I figured out that Ian's patch missed one place to flush the
new digest buffer, in PK11_DigestKey . Fixing that also fixed SSL3.
Unfortunately, the review of the patch shows it's very unlikely to have caused
the memory corruption in the PK11Object, so it's probably unrelated to the crash
in this bug.
Comment 23•20 years ago
|
||
Julien - I opened bug 265620 for my patch. Can you post your updated patch
(with Nelson's suggested fix) there? Thanks.
Comment 24•20 years ago
|
||
Comment on attachment 162016 [details] [diff] [review]
add missing assertions
The assertions added in this patch, e.g.
>+ PORT_Assert(object);
>+ PORT_Assert(object->refCount);
>+ PORT_Assert(object->slot);
>+ if (!object ||
>+ !object->refCount ||
>+ !object->slot) {
>+ return CKR_ARGUMENTS_BAD;
>+ }
>+
>+ to = pk11_narrowToTokenObject(object);
> PORT_Assert(to);
> if (to == NULL) {
> return CKR_DEVICE_ERROR;
> }
are good, as far as they go. But they don't really assert
that the object pointer really points to a valid PKCS11Object.
They merely assert that two memory values at various offsets
in the object are non-zero, which IMO has some significant
probability of being true even if the pointer is pointing at
random memory.
So, given that the object is a private struct to softoken,
I propose the following additional check be implemented.
I propose adding a new const char * pointer to struct
PK11ObjectStr, immediately after next and prev. Also,
there will be a global const character string, global to
the softoken only, that is the value "PKCS11Object".
Whenever a PKCS11Object is created/initialized, the new
pointer member must be set to point to that one string.
Thereafter, at any time that we want to check that we have
a valid PCKS11Object pointer, we merely have to check that
the value of that pointer equals the address of that string.
No need to do a string comparison.
We could use any magic value, and could be an int rather than
a char *, but having it be a char * has value in debugging.
With the string, you don't have to memorize magic values in
order to ask "what kind of object is this"? You just look
at the string and it tells you.
I suggest that if we assert that the object pointer points to
a real PKCS11Object, in all the functions that are likely to
have been in the crash's call stack, we will find the offending
code much sooner.
Assignee | ||
Comment 25•20 years ago
|
||
Attachment #162016 -
Attachment is obsolete: true
Assignee | ||
Comment 26•20 years ago
|
||
Nelson,
Thanks for the ideas. Either a string or a magic value is fine with me (we can
use 0xDEADBEEF). I think we should implement this on the tip . For the
short-term 3.9.4 patch release, I would like to use the shorter patch just
submitted.
Assignee | ||
Updated•20 years ago
|
Attachment #163079 -
Attachment description: move assertions one level up → move assertions one level up. Patch for NSS_3_9_BRANCH and tip for now; new patch for tip to follow later.
Attachment #163079 -
Flags: review?(nelson)
Assignee | ||
Comment 27•20 years ago
|
||
Comment on attachment 163079 [details] [diff] [review]
move assertions one level up. Patch for NSS_3_9_BRANCH and tip for now; new patch for tip to follow later.
Saul, please review for 3.9.4 check in. Thanks.
Attachment #163079 -
Flags: superreview?(saul.edwards.bugs)
Assignee | ||
Comment 28•20 years ago
|
||
I have checked in the patch to NSS_3_9_BRANCH due to our imminent 3.9.4 build.
Checking in pkcs11u.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v <-- pkcs11u.c
new revision: 1.54.2.2; previous revision: 1.54.2.1
done
I would still like the reviews, though. Thanks.
Comment 29•20 years ago
|
||
Comment on attachment 163079 [details] [diff] [review]
move assertions one level up. Patch for NSS_3_9_BRANCH and tip for now; new patch for tip to follow later.
This patch has 2 "hunks". The first hunk is fine.
The second one asserts that a pointer is not NULL,
immediately AFTER dereferencing it. e.g.
> if (pk11_isToken(object->handle)) {
>+ PORT_Assert(object);
>+ PORT_Assert(object->refCount);
>+ PORT_Assert(object->slot);
>+ if (!object ||
>+ !object->refCount ||
>+ !object->slot) {
>+ return CKR_DEVICE_ERROR;
>+ }
> return pk11_forceTokenAttribute(object,type,value,len);
> }
Seems to me that these 3 assertions and the code that returns
CKR_DEVICE_ERROR should take place BEFORE we dereference object.
Also, I don't understand what is gained by moving these
assertions up out of pk11_forceTokenAttribute into a code path
that only occurs immediately priori to calling that function.
I thought the point of moving the assertions up was so that
they would also apply to the non-token cases. No?
Attachment #163079 -
Flags: review?(nelson) → review-
Comment 30•20 years ago
|
||
Comment on attachment 163079 [details] [diff] [review]
move assertions one level up. Patch for NSS_3_9_BRANCH and tip for now; new patch for tip to follow later.
Julien, Please correct this before shipping 3.9.4
Attachment #163079 -
Flags: superreview?(saul.edwards.bugs)
Assignee | ||
Comment 31•20 years ago
|
||
Nelson,
You are right, the third patch was not functionally different from the second
one. I moved the assertions before the pk11_isToken call .
Checking in pkcs11u.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v <-- pkcs11u.c
new revision: 1.54.2.3; previous revision: 1.54.2.2
done
Assignee | ||
Comment 32•20 years ago
|
||
move assertions a few lines up. This is a cumulative diff
This patch has been checked in to both the NSS_3_9_BRANCH and the tip :
Checking in pkcs11u.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v <-- pkcs11u.c
new revision: 1.58; previous revision: 1.57
done
A more extensive patch will follow later to incorporate the magic logic and
Nelson's other ideas for the tip, so I'm keeping this bug open.
Attachment #163079 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #162016 -
Flags: review?(saul.edwards.bugs)
Assignee | ||
Comment 33•20 years ago
|
||
I have not seen this specific problem occur again. Marking fixed.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment 34•20 years ago
|
||
Julien, please set the target milestone so we
know in which NSS release this bug was fixed.
Assignee | ||
Updated•20 years ago
|
Target Milestone: --- → 3.9.4
You need to log in
before you can comment on or make changes to this bug.
Description
•