The world of cybersecurity is full of threats, many of which are surprisingly subtle and challenging to detect. One such threat is the problem of so-called homoglyphs. CWE-1007, also known as “Insufficient Visual Distinction of Homoglyphs Presented to User”, is a vulnerability often used by attackers to deceive and compromise your systems or data. In this blog article, you will get a deep insight into CWE-1007, understand its mechanisms, and how to protect yourself from such attacks. We will discuss examples, technical challenges, and best practices that can help you as a developer understand and mitigate this threat.
- What are Homoglyphs?
- CWE-1007: A detailed investigation
- Examples for CWE-1007
- Why is CWE-1007 dangerous?
- Possible attack scenarios
- Defence strategies against CWE-1007
- Technical challenges in homoglyph recognition
- Best practices for developers
- Case Study: Homoglyph Domain Attack
- Homoglyph recognition tools
- Which CWEs are often used in conjunction with CWE-1007? 1. CWE-601: Open Redirect: 2. CWE-643: Improper Neutralization of Data within XPath Expressions: 3. CWE-79: Cross-Site Scripting (XSS): 4. CWE-20: Improper Input Validation: 5. CWE-77: Command Injection:
- Example application in Vaadin Flow
- Conclusion
- Next Steps
What are Homoglyphs?#
Before we delve into CWE-1007, we must understand what homoglyphs are. Homoglyphs are characters that look visually similar but represent different Unicode codes. This can involve different letters, numbers or symbols. A well-known example is the Latin letter “O” and the number “0”, which can look almost identical to the human eye. There are many other examples, such as “l” (small L) and “I” (capital i), or Cyrillic letters that look like Latin letters.
The visual similarity of homoglyphs is often exploited to deceive you. Attackers use such characters to create phishing websites, generate fake URLs, or corrupt code to make you believe you are dealing with a trustworthy resource. This is particularly problematic because we humans use visual patterns to make quick decisions and can more easily fall victim to such deceptions.
CWE-1007: A detailed investigation#
CWE-1007 refers to the insufficient visual discrimination of homoglyphs when presented as a user. If a system cannot distinguish between similar-looking characters or alert you to them, this can lead to significant security risks. You could accidentally click on a malicious link, visit a fake domain, or trust a fraudulent command.
This vulnerability is particularly common when displaying URLs, usernames, or commands. For example, you could click on a URL that appears to be correct but actually uses homoglyphs to represent a fake website. This allows attackers to steal passwords, credit card information, or other sensitive data. This problem is compounded by widespread internet use, as you constantly interact with potentially harmful content.
Examples for CWE-1007#
A typical example of this vulnerability is the use of fake domains that use homoglyphs. Let’s say you receive an email with a link to “paypa1.com” (which uses the number “1” instead of the letter “l”). Without careful consideration, you might think this is a legitimate link to the PayPal website. The same principle can also apply to usernames on social networks or even essential commands in a console.
Another example is the use of homoglyphs in source code. Attackers could insert fake characters into the code that look like legal characters but result in different functionality. This can be particularly dangerous in open-source projects or teams where multiple developers work on the same code and could miss similar characters. This leads to security vulnerabilities that can be exploited for attacks and poses a high risk to the integrity of the code.
An everyday example is a fake username on social networks. An attacker could create a username like “facebook_support” using the Cyrillic “е”, which looks visually similar to the Latin “e”. You might think it’s an official support channel and click on malicious links or reveal sensitive data.
Why is CWE-1007 dangerous?#
CWE-1007 is dangerous because humans rely heavily on visual cues to make decisions. The human eye is trained to recognise patterns and process information quickly, but often without closely examining each character. Attackers can specifically exploit this weakness to deceive you.
The danger of this vulnerability is not only that you could fall for phishing attacks. It can also lead to more severe security breaches, such as stealing credentials, tampering with software, or inserting malicious code into seemingly legitimate projects. In addition, this deception can also cause financial damage because you trust fake websites and enter your payment information, which can then be misused.
Another danger is that companies can lose their reputation due to these vulnerabilities. If you repeatedly fall for counterfeit versions of a well-known brand, your trust in that brand can be severely affected. Attackers use this method to undermine user trust and specifically to maximise damage to companies.
Possible attack scenarios#
Phishing attacks using fake URLs : Attackers can create a URL almost identical to a known website. You click on the link without noticing that the domain has a slight change (e.g. a Cyrillic letter instead of a Latin letter). This allows attackers to access sensitive information such as passwords or credit card details.
Code injection through fake characters : In complex software projects, homoglyphs can cause the code to execute differently than it seems at first glance. You or other developers may miss these signs, resulting in hard-to-find vulnerabilities in your code. This could be used to insert malicious functions only noticeable in the production environment.
Social engineering in social networks : An attacker could create a username almost identical to a trusted contact’s to deceive you and steal information. For example, someone could use the name “LinkedIn Support” with a slightly modified letter to trick you into giving up your login information or clicking on malicious links. The deception can be particularly effective if you don’t carry out additional security checks.
Defence strategies against CWE-1007#
To protect you from the threat of CWE-1007, technical and organisational measures are necessary. Here are some strategies that can help you minimise risk:
Unicode normalisation : One of the most effective methods of recognising homoglyphs is Unicode normalisation. Unicode normalisation converts similar-looking characters into standard forms, making identifying them more accessible. This can prevent different writing systems from being used to deceive.
import java.text.Normalizer;
public class UnicodeNormalizationExample {
public static void main(String[] args) {
String suspiciousString = "päypäl.com";
String normalizedString = Normalizer.normalize(suspiciousString, Normalizer.Form.NFKC);
System.out.println("Normalized String: " + normalizedString);
}
}This example shows how characters can be normalised to ensure that visually similar but different characters are recognised as such.
User training : Training in dealing with suspicious emails, links and domain names is an essential line of defence. You should learn to check URLs carefully and be aware that characters from different writing systems can be used to deceive you. This training should include regular exercises and examples to increase awareness and improve your ability to recognise such attacks.
Security warnings in the browser : Modern browsers have mechanisms to warn you if you visit a suspicious domain or use characters from different Unicode writing systems. These warnings should be enabled to protect you from such threats. Browser extension developers could develop additional filtering mechanisms that promptly alert you to possible deception.
Code reviews and static code analysis tools : In software projects, developers should conduct code reviews to identify suspicious characters. Static code analysis tools can also help detect homoglyphs in code and mitigate potential security risks.
public class CodeReviewExample {
public static boolean containsSuspiciousCharacters(String input) {
// Checks whether the string contains non-Latin characters
return !input.matches("^[\ -\~]*$");
}
public static void main(String[] args) {
String input = "paypaı.com"; // contains the homoglyph "ı" (small i without a dot)
if (containsSuspiciousCharacters(input)) {
System.out.println("Suspicious characters found: " + input);
}
}
}This example shows how a simple check can help identify potentially dangerous signs and act accordingly.
Allowing specific character sets : Another risk mitigation measure is limiting the use of certain characters. For example, an application might specify that only Latin characters are allowed in usernames or URL paths to reduce the risk of homoglyph attacks. This restriction helps reduce the attack surface.
public class CharacterWhitelistExample {
public static boolean isValidInput(String input) {
return input.matches("^[a-zA-Z0-9]*$");
}
public static void main(String[] args) {
String username = "username"; // contains a non-Latin character
if (isValidInput(username)) {
System.out.println("Username is valid.");
} else {
System.out.println("Username contains invalid characters.");
}
}
}This example shows a simple method for restricting the allowed characters in input to prevent using homoglyphs.
Technical challenges in homoglyph recognition#
Recognising homoglyphs represents a significant technical challenge. One reason for this is the number of characters in the Unicode standard. Unicode includes thousands of characters from different writing systems that may look similar or identical. An algorithm intended to identify such characters must be able to distinguish between visual similarities and actual meaning.
Another problem is that not all applications or systems can display Unicode correctly. In some cases, different characters may appear identical through the rendering process, making it even more difficult for you to distinguish between legitimate and fake content. These technical challenges require the development of robust checking mechanisms that ensure that such attempts at deception can be detected.
Best practices for developers#
As a developer, you play a crucial role in preventing CWE-1007. Here are some best practices to consider when developing secure applications:
Input validation : User input data should always be validated to ensure that no dangerous characters are used. If possible, only certain character sets should be allowed.
public class InputValidationExample {
public static boolean isValidInput(String input) {
return input.matches("^[a-zA-Z0-9]*$");
}
public static void main(String[] args) {
String userInput = "hello123";
if (isValidInput(userInput)) {
System.out.println("Input is valid.");
} else {
System.out.println("Input contains invalid characters.");
}
}
}Escape and encode : Data used for display or transmission should always be escaped and encoded to ensure no harmful characters are inserted unnoticed.
import org.apache.commons.text.StringEscapeUtils;
public class EscapeEncodeExample {
public static void main(String[] args) {
String userInput = "<script>alert('XSS');</script>";
String escapedInput = StringEscapeUtils.escapeHtml4(userInput);
System.out.println("Escaped Input: " + escapedInput);
}
}This example shows how potentially malicious input can be correctly handled to prevent attacks such as Cross-Site Scripting (XSS).
Conscious use of fonts : Choosing the right font can help you recognise homoglyphs better. Some fonts distinguish more clearly between similar-looking characters, making it easier for you to spot differences. Examples of such fonts include ‘Consolas’, ‘Courier New’, and ‘DejaVu Sans Mono’. These fonts are handy when characters need to be clearly distinguished from each other, for example, in source code or security-related information.
Provide additional contextual information : If you’re asked to review sensitive information like URLs or usernames, it can be helpful to provide additional contextual information to help verify legitimacy. This includes warnings or visual markers that alert you that certain signs may be potentially dangerous.
Case Study: Homoglyph Domain Attack#
A famous case study that illustrates the danger of homoglyphs is the attack via a fake “apple.com” domain. In this case, the attacker registered a domain that consisted of Cyrillic characters that visually looked identical to “apple.com.” When you clicked on this domain, you were taken to a website that looked almost exactly like the genuine Apple website. This allowed the attacker to steal sensitive information, such as login credentials.
Such attacks highlight the importance of paying attention to the visual distinction of characters when presenting information. Without careful examination, you would have no technical way of recognising the fake domain. Such attacks highlight the need to use technical and human defences to protect yourself from deception.
Homoglyph recognition tools#
There are some tools and libraries designed explicitly for homoglyph recognition. These tools can help you, as a developer and security professional identify and remediate potential vulnerabilities in your systems. The most well-known tools include:
DNSTwist : A tool used to generate similar domain names and check whether they have been registered for phishing attacks. DNSTwist can help detect potential threats early and take countermeasures.
Homoglyph Detector Libraries : Various program libraries recognise homoglyphs in strings. These libraries can be integrated into applications to ensure suspicious characters are detected. Such libraries are particularly useful when users provide security-relevant input.
Unicode-Analyse-Tools : These tools can help check characters for their Unicode representation, ensuring that similar characters are not used unnoticed. Unicode analysis tools can help you identify and fix problems early.
Which CWEs are often used in conjunction with CWE-1007?#
CWE-1007 is often used with other vulnerabilities that undermine the security and trustworthiness of applications. Here are some of the most common CWEs that frequently appear associated with CWE-1007:
CWE-601: Open Redirect:#
This vulnerability occurs when an application allows users to be redirected to a different, potentially dangerous URL without sufficient validation. CWE-1007 can be used here to deceive users by presenting a seemingly legitimate but homoglyph-manipulated URL redirecting to a malicious resource.
CWE-643: Improper Neutralization of Data within XPath Expressions:#
When user input is used in XPath queries without sufficient validation, CWE-1007 can allow attackers to use visually similar characters to manipulate the query and access data that would not usually be available.
CWE-79: Cross-Site Scripting (XSS):#
Cross-site scripting occurs when an application does not adequately escape or validate user input returned for display. CWE-1007 can also be dangerous here if homoglyphs are used to hide malicious scripts in a context that looks harmless to the user.
CWE-20: Improper Input Validation:#
This general vulnerability occurs when an application does not sufficiently validate user input. CWE-1007 is particularly problematic if no normalisation and validation measures are taken, as visually similar characters could pass as legitimate input.
CWE-77: Command Injection:#
Command injection occurs when a user’s input is used in a way that results in the execution of system commands. By using homoglyphs, attackers could manipulate input so that the actual command deviates from the expected execution, which could give an attacker control over the system.
Combining CWE-1007 with these other vulnerabilities allows attackers to carry out significantly more complex and effective attacks. They leverage the visual deception of homoglyphs to exploit vulnerabilities that might be easier to detect without such deception.
Example application in Vaadin Flow#
To illustrate all the best practices discussed in a real application, here is an example application in Vaadin Flow. Vaadin Flow is a popular open-source Java framework for building modern web applications. This sample application shows how you can implement Unicode normalisation, input validation, escaping, and encoding methods in a secure web application to prevent homoglyph-based attacks.
import com.vaadin.flow.component.button.Button;
import com.vaadin.flow.component.notification.Notification;
import com.vaadin.flow.component.orderedlayout.VerticalLayout;
import com.vaadin.flow.component.textfield.TextField;
import com.vaadin.flow.router.Route;
import org.apache.commons.text.StringEscapeUtils;
import java.text.Normalizer;
@Route("homoglyph-protection")
public class HomoglyphProtectionView extends VerticalLayout {
public HomoglyphProtectionView() {
// Username input field
TextField userInputField = new TextField("Enter username");
userInputField.setHelperText("Only Latin letters and numbers allowed.");
// Validation button
Button validateButton = new Button("Validate", event -> {
String userInput = userInputField.getValue();
// Step 1: Unicode normalization
String normalizedInput = Normalizer.normalize(userInput, Normalizer.Form.NFKC);
// Step 2: Input validation - only Latin letters and numbers
if (!isValidInput(normalizedInput)) {
Notification.show("Username contains invalid characters.");
return;
}
// Step 3: Escape potentially harmful input
String escapedInput = StringEscapeUtils.escapeHtml4(normalizedInput);
// Notification if input is valid
Notification.show("Input is valid: " + escapedInput);
});
add(userInputField, validateButton);
}
// Input validation method that only allows Latin letters and numbers
private boolean invalid input(String input) {
return input.matches("^[a-zA-Z0-9]*$");
}
}In this Vaadin Flow application, what you have learned before is put into practice:
Unicode normalisation : User input is normalised to ensure that visually similar characters that have different Unicode values are treated correctly.
Input validation : The input is checked to allow only Latin letters and numbers. This prevents the use of homoglyphs from other character sets.
Escape and encode : The input is escaped before further processing to remove potentially harmful characters and prevent attacks such as Cross-Site Scripting (XSS).
This sample application shows how to apply the secure user input best practices discussed previously in a real-world web application to minimise homoglyph risks.
Conclusion#
CWE-1007, insufficient visual discrimination of homoglyphs, is a vulnerability that is becoming increasingly important in today’s digital world. Attackers use the visual similarity of characters to deceive you, steal your data, or compromise systems. The threat of homoglyph attacks can have serious consequences, especially if you and your systems are not adequately trained and prepared.
Preventing such attacks requires technical solutions, user training, and developer best practices. By implementing appropriate defence strategies, you can minimise the risk of homoglyph attacks and ensure the security of your systems and data. Everyone must be involved—from developers to end users—be aware of the risks and take steps to mitigate them.
Next Steps#
To better understand and protect yourself from CWE-1007, you should:
- Train your employees about the danger of homoglyphs and teach them to recognize suspicious characters.
- Enable security features in your browsers and applications to warn users about fake domains.
- Integrate homoglyph recognition tools and libraries into your software development processes.
- Conduct regular security scans and code reviews to ensure no malicious characters have been inserted into your systems.
Security is an ongoing process, and the homoglyph threat requires continued vigilance and adaptation. Stay informed, train your team, and use the right tools to ensure homoglyphs aren’t a gap in your security strategy. Together, we can help improve the security of our digital world and successfully ward off homoglyph threats.
Happy Coding
Sven





