Open Bug 811272 Opened 12 years ago Updated 2 years ago

[autoconfig] New account creation does not support IDN international domain names with non-ASCII characters

Categories

(Thunderbird :: Account Manager, enhancement, P5)

enhancement

Tracking

(Not tracked)

People

(Reporter: jhorak, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: Workaround: Enter the punycode. )

From: https://bugzilla.redhat.com/show_bug.cgi?id=867611 ----------- Description of problem: I am trying to configure an email account which contains an internationalized address (something like info@namewithñ.es). It is a valid domain name but when I press "Continue" keeps on trying the server forever. If I also try to click on "Manual config" then it stops but it doesn't allow further configuration. Version-Release number of selected component (if applicable): thunderbird-15.0.1-1.fc17.i686 How reproducible: Always Steps to Reproduce: 1. Try to setup an account with an ñ in the domain name. --------------- I'm getting following exception in Error console: Timestamp: 11/13/2012 02:20:57 PM Error: uncaught exception: Hostname is empty or contains forbidden characters. Only letters, numbers, - and . are allowed.
Such addresses are not supported yet. They will be intentionally and visibly blocked once bug 327812 and bug 80855 land. Only once bug 235312 is fixed we may allow them. That bug is for the backend and the account manager. I'll see what can be done in the Account wizard to reject unsupported domains properly.
Assignee: nobody → acelists
Let's land bug 327812 first, which already touches the wizard a bit and wires up the new hostname checking function. But I don't think it will solve all the exceptions shown here.
Depends on: 327812
There's a whole bunch of work to do with internationalized email addresses & support for IDN based domains in Thunderbird. I'd be willing to do a brain dump of what to fix and roughly how, but I'm not going to be able to do that until next week, but if anyone is interested on working on this, please do feel free to ping me direct.
(In reply to :aceman from comment #1) > Only once bug 235312 is fixed we may allow them. Well, given jhorak's example address info@namewithñ.es, bug 127399 will suffice.
Yes, but if the server name is first automatically derived from the IDN email address then it will fail, as the AW will reject it. But I'll look if it rejects it properly and informs the user on what needs to be done.
I can confirm this problem. The user is not informed about the problem, only in the error console.
But this must be done by somebody more familiar with the account wizard. Until then, TB does warn the user to "double-check the email address".
Assignee: acelists → nobody

If we want to fix this, we would need to adapt the email syntax regexp. However, we cannot just allow all characters. We need to exclude special characters, all punctuation apart from "-" and ".", and take care not to otherwise regress the regexp. But that's too short-sighted:

When used as From: in mail, IDN can cause serious problems. In contrast, in the browser, the IDN affects only your browser. The browser translates it into an x--- punycode and resolves that, so no other software needs to be compatible. If we allow such email addresses with non-ASCII characters in the From: field of outgoing emails, then all email clients of that do not support IDN in email addresses will fail.

The end user will not get responses to his emails.

So, if we support this, when we would need to convert this into a punycode and send punycode as From: address, to ensure that replies arrive.

Similar bugs:

Type: defect → enhancement
OS: Linux → All
Priority: -- → P5
Hardware: x86_64 → All
Whiteboard: Workaround: Enter the punycode.
Version: 16 Branch → Trunk
Summary: Thunderbird cannot configure internationalized email addresses → [autoconfig] New account creation does not support IDN international domain names with non-ASCII characters

(In reply to Ben Bucksch (:BenB) from comment #10)

If we want to fix this, we would need to adapt the email syntax regexp. However, we cannot just allow all characters. We need to exclude special characters, all punctuation apart from "-" and ".", and take care not to otherwise regress the regexp. But that's too short-sighted:

For the From: address, I'd delegate to the server any check. If the server accepts the outgoing message, it will try and deliver it. It's its job, not Thunderbird's. Server must support SMTPUTF8.

If the server doesn't support SMTPUTF8, then non-ASCII characters in the domain name can be worked around by translating to IDNA. However, non-ASCII characters in the local part cannot be translated.

When used as From: in mail, IDN can cause serious problems. In contrast, in the browser, the IDN affects only your browser. The browser translates it into an x--- punycode and resolves that, so no other software needs to be compatible. If we allow such email addresses with non-ASCII characters in the From: field of outgoing emails, then all email clients of that do not support IDN in email addresses will fail.

IDNA translation only serves when resolving. TB only needs to resolve the outgoing server's domain name, not that of any From:, To:, or Cc:.

The end user will not get responses to his emails.

If recipients use compliant email clients, responses will arrive. Thunderbird seems to allow non-ASCII characters in incoming message header fields, although formally it doesn't (see bug #1571672).

So, if we support this, when we would need to convert this into a punycode and send punycode as From: address, to ensure that replies arrive.

The server will convert to IDNA to lookup the MX record, not Thunderbird. As mentioned above, converting to punycode is only needed if the outgoing server doesn't support SMTPUTF8. However, I'd suggest to check outgoing server capabilities when the outgoing server is being edited, so that everything can be done correctly before starting the SMTP session.

I'm unable to find a closed bug where someone complained that an accidental wrong character in one address would cause the server to reject the message. If there are very many recipients, it would be hard to find the one with the wrong character. That was several years ago. Nowadays, non-ASCII characters are supported both in the domain part and in the local part of email addresses. The checks at the location indicated by Jorg make little sense. Recall that most servers allow at most 100 recipients in a message —users should set up a mailing list to send to wider audiences. Catching misspelled recipient is not something that can be done in the compose window using obsolete heuristics.

For the From: address, I'd delegate to the server any check. If the server accepts the outgoing message, it will try and deliver it.

You're missing my point. What if the message is delivered, but the recipient's email client cannot deal with the From: address? The recipient will not be able to reply. Our user will send mail and not get responds from many people. Communication breakdown. This is not helpful.

So, if we send messages out, we need to put the From address as punycode.

Changes like this can happen only after all software supports it. In email, that typically takes 20 years for very successful standards like 8 bit SMTP or HTML email, and never for less successful ones. IDN is the latter class. I know you don't want to hear that, but:

I prefer to allow our user to communicate and send punycode, than using ö as IDN in email domains and the email replies never arrive at our user and the communication breaks down. It's just not a good idea to work hard to add features, with the result of email being less reliable.

I say that as someone with non-ASCII characters in my native language and native names.

If recipients use compliant email clients, responses will arrive.

That's a big "IF". Unless the market share is somewhere at 99.5% or higher, we cannot do this. I would think it's far far lower, even decades after IDN was introduced, which makes it a failed standard.

(In reply to Ben Bucksch (:BenB) from comment #14)

For the From: address, I'd delegate to the server any check. If the server accepts the outgoing message, it will try and deliver it.

You're missing my point. What if the message is delivered, but the recipient's email client cannot deal with the From: address? The recipient will not be able to reply. Our user will send mail and not get responds from many people. Communication breakdown. This is not helpful.

So, if we send messages out, we need to put the From address as punycode.

Don't you think users are able to enter punycode? There is plenty of tools, both local and online, whereby a user can convert a domain name, without Thunderbird imposing it.

Sure, there is a risk of stumbling upon obsolete clients that won't allow replies —like Thunderbird. I bet users of IDNs are well aware of the risk.

Changes like this can happen only after all software supports it. In email, that typically takes 20 years

RFC 3490 is of 2003, obsoleted by RFC 5890 in 2010. That makes 17 years of IDN experience. According to your rule of thumb, within another 3 years Thunderbird will have become totally obsolete. Is there an advantage in being the last client to upgrade?

Besides, if you force punycode conversion, what would you do with internationalized email addresses? I mean, if the local part contains non-ASCII characters. To use UTF-8 was introduced by RFC 5336 in 2008, obsoleted by RFC 6531 in 2012. Do we really need to wait another 8 years before implementing it?

If recipients use compliant email clients, responses will arrive.

That's a big "IF". Unless the market share is somewhere at 99.5% or higher, we cannot do this. I would think it's far far lower, even decades after IDN was introduced, which makes it a failed standard.

All major mailbox providers are implementing EAI. If you consider Gmail and Microsoft alone you're well beyond 50%.

there is a risk of stumbling upon obsolete clients that won't allow replies —like Thunderbird

You're confusing 3 matters here.

  1. I completely agree that Thunderbird should be able to receive and reply to IDN and punycode email addresses. If that is not working today, that should be fixed. It appears that was fixed in bug 127399.
  2. We currently support IDN for display in incoming emails, as far as I understand. But there is a real risk of homoglyphs causing phishing. It was a problem in Firefox, and phishing is an even bigger problem in email than on the web. I would proceed with extreme caution, due to the phishing implications. This was discussed in bug 1504526.
  3. I do not think we should send out emails with From: addresses with non-ASCII characters, but as punycode. Simply because we cannot know whether the recipient email client will be able to read it, and consequently cannot reply. There's a high chance that it does not work, because many clients cannot do that, so sending that out is not an option. Therefore, we should send out punycode instead of the non-ASCII representation of email addresses. If the receiving email client supports IDN, they still can convert the punycode to non-ASCII and display that to the user, as appropriate.

In other words, if we send out From: as punycode, as we do now, IDN-supporting email clients can still display the From as non-ASCII characters. The display at the receiver side is up to the receiver, not to us.
But if we send out From: as non-ASCII characters, we break a lot of the recipients, and that's simply not an option.

The best we could do here would be to accept non-ASCII characters and then convert them back to punycode. But:

Don't you think users are able to enter punycode?

Great! So, people who register a punycode email address can enter the punycode address in the account creation dialog when they set up their own email address, and it will work.
Even if we supported IDN domains here, we would do that only for local display for the owner of the address, but still send out punycode in the email on the wire. See above.

So, this is a lot of work for little gain.

(In reply to Ben Bucksch (:BenB) from comment #16)

there is a risk of stumbling upon obsolete clients that won't allow replies —like Thunderbird

You're confusing 3 matters here.

  1. I completely agree that Thunderbird should be able to receive and reply to IDN and punycode email addresses. If that is not working today, that should be fixed. It appears that was fixed in bug 127399.

Correct. I can receive and reply to mimi@foà.it. Note that, if I type the address as shown, the resulting message is sent with non-ASCII characters in the header. Thunderbird should have verified that the server supports SMTPUTF8, which I don't think it does.

However, when a message contains non-ASCII characters in the header, Thunderbird should have issued an ENABLE UTF8=ACCEPT command, as mentioned in bug #1571672. Without that, IMAP transmission of such messages is in jeopardy.

  1. We currently support IDN for display in incoming emails, as far as I understand. But there is a real risk of homoglyphs causing phishing. It was a problem in Firefox, and phishing is an even bigger problem in email than on the web. I would proceed with extreme caution, due to the phishing implications. This was discussed in bug 1504526.

Resorting to punycode betrays the very purpose of IDN. Furthermore, as soon as users get used to unreadable gibberish in the header fields, the anti-phishing effect vanishes.

If I'm not mistaken, Firefox has a complicate function that checks for co-existence of Western characters with the Russian or Greek characters that can be used to create homoglyphs, such as 'о' (U+043E) and 'ⲟ' (U+2C9F). Adoption of such function in Thunderbird would deserve its own bug.

  1. I do not think we should send out emails with From: addresses with non-ASCII characters, but as punycode. Simply because we cannot know whether the recipient email client will be able to read it, and consequently cannot reply. There's a high chance that it does not work, because many clients cannot do that, so sending that out is not an option. Therefore, we should send out punycode instead of the non-ASCII representation of email addresses. If the receiving email client supports IDN, they still can convert the punycode to non-ASCII and display that to the user, as appropriate.

Again, that betrays the purpose of IDN. It doesn't solve the problem for mimì@foà.it because you can only convert the domain part. Unnecessary conversions can hamper DKIM signatures.

In other words, if we send out From: as punycode, as we do not, IDN-supporting email clients can still display the From as non-ASCII characters. The display at the receiver side is up to the receiver, not to us.
But if we send out From: as non-ASCII characters, we break a lot of the recipients, and that's simply not an option.

So you can rely on the recipient's email client for displaying readable stuff, but you cannot rely on it leaving it as-is when that's all right. Consider that older versions of Thunderbird, overly naive in not checking input ASCIIness, worked better than the overcomplicated current release.

While IDNA was devised with compatibility in mind, breaking existing software was the conclusion of EAI. The experimental RFC 5336 (UTF8SMTP Extension) provided for downgrade. The conclusion of the experiment, RFC 6531 (SMTPUTF8 Extension), opted for incompatibility instead. Thus, the community of people who use just ASCII addresses is not able to send mail to people who use non-ASCII ones. That is to say, the ASCII only community is bound to stay separated from the EAI community anyway. Now, consider that software that leaves text as-is won't hurt users in the ASCII-only community, because they never enter non-ASCII addresses.

The best we could do here would be to accept non-ASCII characters and then convert them back to punycode. But:

Don't you think users are able to enter punycode?

Great! So, people who register a punycode email address can enter the punycode address in the account creation dialog when they set up their own email address, and it will work.
Even if we supported IDN domains here, we would do that only for local display for the owner of the address, but still send out punycode in the email on the wire. See above.

I thought the idea was to stick to UTF-8 everywhere. Most text editors and web forms work that way. Why should mail software differ? The one exception to UTF-8 everywhere is punycode, but you only need it for DNS queries. There are even DNS utilities which don't show punycode at all, e.g. kdig. Punycode outside of on-the-wire DNS makes less and less sense.

So, this is a lot of work for little gain.

The more useless conversions, the more complicated work, the less gain.

(In reply to Alessandro Vesely from comment #17)

Adoption of such function in Thunderbird would deserve its own bug.

See bug #1617385

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.