Closed Bug 554596 Opened 15 years ago Closed 15 years ago

Non-standard IP address specifications in URLs are a security risk

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 67730

People

(Reporter: usenet, Unassigned)

References

(Blocks 1 open bug, )

Details

Attachments

(2 files)

According to this post: http://www.viruslist.com/en/weblog?weblogid=208188044 , Firefox still supports a number of obscure and outdated formats for IP addresses, including octal and hex representations and representing the entire IP address as a single 32-bit number. Currently these seem to have no mainstream use other than as an obfuscation technique used to evade third-party URL filters. According to the post cited above, this technique is currently being used in the wild. Removing support for these formats would increase security without removing any significant functionality. The following test cases, all of which currently resolve to access a google.com webserver, are provided by the post cited above: * http://0x42.0x66.0x0d.0x63 * http://0x42660d63 * http://1113984355 * http://00000102.00000146.00000015.00000143
There would seem to be two logical places to apply this restriction: * at the text-string-to-IP-address lookup interface * in the URL parser Rather than attempt to detect abusive formats, implementations should instead rigorously check that the text string provided the RFC-based definitions of IP address formats before passing those text strings down to lower-level routines. Whatever implementation is put in place should enforce similarly conservative restrictions on both IPv4 and IPv6 address strings.
net_IsValidHostName() in nsURLHelper.cpp and/or PR_StringToNetAddr() in prnetdb.c may be good places to put this logic
Suggested logic: If a name is entirely made out of one or more sequences of digits, each optionally prefixed by "0x" or "0X", separated by label separators then it is invalid, _unless_ every one of those sequences is a decimal number in the range 0-255, without either leading zeroes or a 0x-prefix.
This is a small Python program designed to explore the problem space.
Output of enclosed program: test case: http://google.com valid test case: http://0x42.0x66.0x0d.com valid test case: http://0x42.0x66.0x0d.0x63 bogus: numeric alias test case: http://0x42.0x66.0xd.0x63 bogus: numeric alias test case: http://0x42660d63 bogus: numeric alias test case: http://1113984355 bogus: numeric alias test case: http://00000102.00000146.00000015.00000143 bogus: numeric alias test case: http://00000102.00000146.00000015.0x63 bogus: numeric alias test case: http://0x42.0x66.0x0z.0x63 valid test case: http://0x4266.0x0z.0x63 valid test case: http://1.2.3.4 valid
Reviewing inet_pton.c and gethnamaddr.c in the source of glibc makes the puzzle deeper: these ambiguous hex and octal strings are clearly being resolved somewhere, but where?
OK, the badness appears to be happending in inet_aton(). Various implementations of inet_aton() have exciting semi-documented features such as two- and three-part dotted numerical addresses, thus: a.b -- 8.24 bits -- example: http://0x42.0x660d63 a.b.c -- 8.8.16 bits -- example: http://0x42.0x66.0x0d63 Other interesting features seem to be numeric overflow bugs in some implementations, allowing even more aliases to be created, all of which resolve to the same IP address.
More useful information: http://tools.ietf.org/html/draft-main-ipaddr-text-rep-00 -- see section 2.1.1, "Early Practice", which explains how the 4.2BSD inet_aton() became the de-facto standard for IPv4 address interpretation, and that compatibility with this lingers to this day. It concludes: The 4.2BSD inet_aton() has been widely copied and imitated, and so is a de facto standard for the textual representation of IPv4 addresses. Nevertheless, these alternative syntaxes have now fallen out of use (if they ever had significant use). The only practical use that they now see is for deliberate obfuscation of addresses: giving an IPv4 address as a single 32-bit decimal number is favoured among people wishing to conceal the true location that is encoded in a URL. All the forms except for decimal octets are seen as non-standard (despite being quite widely interoperable) and undesirable. http://www.pc-help.org/obscure.htm contains a number of different examples of IP address obfuscation techniques, including uses of the numeric overflows described above.
After some more thought, the proper solution of this looks much easier than the complex code I was considering above. In psuedo-code: def lookup_name(name): try: address = lookup_numeric_address(name) except: return DNS_lookup(name) if name == numeric_address_representation(address): return address else:
That was cut off prematurely. Let's try that again: def lookup_name(name): try: address = lookup_numeric_address(name) except: return DNS_lookup(name) if name == numeric_address_representation(address): return address else: raise bad_numeric_address
A thought... since the code of inet_aton() in the GNU libraries _is_ the BSD code, and is already BSD-licenced, we can simply use a cut-down version of the code there to detect these sorts of numeric cases, triggering the double-check that the IP address is in the canonical form.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Blocks: 559469
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: