---
title: "Detecting Lookalike Domains: Typosquatting & Homoglyphs"
slug: "/resources/blog/how-are-typosquatting-and-homoglyph-attacks-detected"
description: "Three lookalike domain attack types, one detection framework. Catch typosquatting, homoglyphs, and combo-squatting using WHOIS and DNS signals."
---

# How Are Typosquatting and Homoglyph Attacks Detected?

Written By [Nadeem Khan](https://www.linkedin.com/in/nadeem-khan-75a069197), WhoisFreaks Team Published: June 09, 2026, Last Updated: June 10, 2026

Security teams monitoring lookalike domain abuse face three distinct attack patterns. Typosquatting registers domains that look like yours after a typing error. Homoglyph attacks register domains that look identical to yours even when read carefully. Combo-squatting appends brand-trust keywords to a correctly spelled brand name to build convincing phishing infrastructure. Each works differently, and each requires a different detection signal to catch reliably at scale.

The challenge is not recognizing these attacks in principle. Most practitioners already know what typosquatting is. The challenge is catching them before they are weaponized, across 1,500+ TLDs, against a stream of thousands of new domain registrations per day, with tooling that was mostly designed to detect only one of the three.

This guide covers how each attack type works, what detection signals each one produces, and how to build a pipeline that catches all three before a phishing campaign goes live.

**TL;DR**

Typosquatting exploits character-level errors (microsft.com). Homoglyph attacks exploit Unicode substitution (micrоsoft.com with a Cyrillic "о") to create domains that are visually identical to the real thing. Combo-squatting appends high-trust keywords to brand names (microsoft-login-secure.com) to bypass filters trained on misspelling patterns. Effective detection requires three parallel methods: string similarity scoring, Unicode normalization, and keyword enumeration. All three are fed by a daily stream of newly registered domains with WHOIS and DNS enrichment.

## The Three Attack Vectors

### Typosquatting: Exploiting Typing Errors

Typosquatting is the oldest of the three. The attacker registers a domain that is one or two characters different from a known brand name, targeting users who type URLs directly or skim a link without verifying it character by character. Common patterns include character omission (microsft.com), transposition (micosoft.com), adjacent keyboard key replacement (micrpsoft.com), character substitution (micros0ft.com with a zero instead of an "o"), and TLD swapping (microsoft.co instead of microsoft.com).

CISA identified typosquatting as a documented, recurring threat vector in its guidance on domain hijacking and spoofing, noting its use in phishing campaigns, credential theft, and traffic diversion. This works because human reading is predictive: familiar words are processed in chunks rather than letter by letter, so a missing or transposed character often goes unnoticed at browsing speed.

Detection is tractable. String edit distance algorithms (Levenshtein distance in particular) can identify typosquatting candidates reliably because the domain name itself encodes the attack. A domain with an edit distance of 1 or 2 from a known brand target is a candidate for triage. The problem is volume, not complexity.

### Homoglyph Attacks: Exploiting Visual Similarity

The mechanics here are fundamentally different. In a homoglyph attack (also called an IDN homograph attack), the attacker replaces one or more characters in a domain name with visually identical Unicode characters from a different alphabet. The Cyrillic letter "о" (U+043E) and the Latin letter "o" (U+006F) are indistinguishable in most browser fonts. The Greek lowercase omicron "ο" (U+03BF) produces the same result. Substituting one for the other creates a domain that passes visual inspection every time, regardless of how careful the user is.

The [WHOIS protocol defined in RFC 3912](https://www.rfc-editor.org/rfc/rfc3912) operates on the underlying ACE-encoded (Punycode) form of internationalized domain names. While a browser may display "xn--micrsoft-f8d.com" as "micrоsoft.com," the registration record uses the Punycode form. Detection logic must also operate on that form. Running a Levenshtein comparison on the raw display string of a homoglyph domain often returns a match score of zero: the two strings appear identical character-for-character to the comparison function. Running the same comparison after Unicode normalization and confusable mapping surfaces the substitution immediately.

Major browsers including Chrome and Firefox have converted mixed-script IDN domains to their Punycode form in the address bar since 2017, a change made specifically to counter homoglyph attacks. Email clients, however, typically render the display form. That makes email the primary delivery channel for homoglyph attacks, and the one where standard security tooling is least equipped to catch them.

### Combo-Squatting: Exploiting Brand Trust

Combo-squatting appends or prepends high-trust keywords to a legitimate brand name: microsoft-login.com, microsoft-account-verify.com, secure-paypal-billing.com. The brand name is spelled correctly. The edit distance from the real brand domain is large. Standard typosquatting detection misses it entirely, because the attack does not rely on character-level error. It relies on the attacker appearing to be a legitimate extension of the brand.

This pattern now dominates phishing campaigns targeting corporate credentials and financial accounts. A single attacker can generate hundreds of combo-squatting candidates in minutes by crossing a brand list against a keyword list: login, secure, account, billing, support, helpdesk, update, verify, portal, access, signin. Registration cost is low. Each domain looks independently plausible in email link text. And because each one is a new domain with no prior reputation, it passes through email gateway filters that rely on historical signals.

Detection requires a set-membership approach rather than a distance calculation. The question is not "how different is this from the real domain?" but "does this domain contain a brand term and a trust keyword together?"

## Why Detection Gets Hard at Scale

Volume is the first obstacle. Domain registrars process hundreds of thousands of new registrations daily across 1,500+ TLDs. A single organized campaign may register dozens of variants simultaneously: typosquatting variants, homoglyph versions of the same brand, and a set of combo-squatting domains, all in the same registration window. Each one is a low-cost, low-effort asset for the attacker. For the defender, each one requires detection, triage, and a response decision.

The timing problem compounds this. The window between a lookalike domain registration and its first use in a phishing campaign is often under 48 hours. Threat intelligence platforms that curate and distribute malicious domain indicators typically take 48 to 96 hours after a domain is weaponized to include it in a feed, a lag WhoisFreaks observes consistently across its NRD dataset. A brand protection team relying on reputation-based blocking will encounter the attack after it has already run, not before.

The third obstacle is tool fragmentation. Most available detection tools are built around one attack type. A tool that computes edit distance catches typosquatting. One that checks Unicode character tables catches homoglyphs. Neither catches combo-squatting, which requires a keyword-pattern match rather than a distance calculation. Attackers use all three techniques in parallel, sometimes in the same campaign against the same brand, which means a gap in any one detection method is an open lane.

## Detection Signal 1: WHOIS Registration Patterns

WHOIS data is available at the moment of registration. Before any DNS records are configured, before any web content is deployed, before any phishing email is sent, the registration record exists. For lookalike domain detection, WHOIS is the earliest available signal layer.

### Burst Registration and Volume Patterns

Legitimate organizations register one or two domains at a time. Lookalike domain campaigns register in bursts: dozens of brand-matching variants, often in the same TLD block, on the same day, through the same registrar, pointing to the same nameservers. A single new registration for "microsft.com" may be a defensive registration by the brand owner. Thirty domains matching the pattern "micros[*]ft.com" registered within a 12-hour window, all through the same privacy-protected registrar account, are a campaign.

WHOIS data surfaces both signals. Registration timestamp, registrar IANA ID, nameserver assignment, and creation date all appear in the registration record. Grouping newly registered domains by shared attributes (same registrar, same nameserver host, same registration window) identifies campaign infrastructure rather than isolated registrations. The signal is not the individual domain. It is the cluster.

A live WHOIS lookup on a newly flagged candidate returns the registration date, registrar, and nameserver assignment in one call:

```
curl "https://api.whoisfreaks.com/v2.0/whois/live?apiKey=YOUR_API_KEY&domainName=domain.tld"
```

```
{
  "domain_name": "domain.tld",
  "create_date": "2026-05-06",
  "registrar_name": "Namecheap, Inc.",
  "name_servers": ["ns1.malicious.com", "ns2.malicious.com"],
  "domain_status": "clientTransferProhibited",
  "registrant_company": "Privacy service provided by Withheld for Privacy"
}
```

Three signals in one response: registered yesterday, nameserver on a known high-abuse host, registrant behind a privacy proxy. That combination (no DNS or content signals required yet) places this domain in the triage queue immediately.

### Privacy Shield as a Screening Signal

Privacy protection is standard for individual registrants, and most registrars now default to it for new registrations. The signal is not its presence but its combination with other factors. A domain like "paypal-billing-secure.com" registered today behind a privacy proxy is not proof of malicious intent on its own. But across a set of candidates from the same daily NRD feed, privacy-protected registration combined with a brand-keyword match and a registration age under 30 days produces a useful composite risk signal.

No single factor is sufficient. Together, the three narrow a large candidate list down to a population worth analyst time.

### Registrant Infrastructure Pivoting

When a lookalike domain is confirmed malicious, its WHOIS record often connects to a broader cluster. Attackers who register domain portfolios for phishing campaigns frequently reuse registrant email addresses, nameserver providers, and registrar accounts. A [reverse WHOIS lookup](https://whoisfreaks.com/tools/whois/reverse/search) on the registrant email or nameserver from one confirmed lookalike domain surfaces every other domain registered with the same infrastructure, often exposing an entire campaign from a single pivot point.

[Historical WHOIS data](https://whoisfreaks.com/tools/whois/history/lookup) extends this backward in time. An email address or nameserver used across a prior phishing campaign will appear in historical records even if the current registrations have switched to privacy protection, because the registration details from earlier domains are preserved. This makes historical WHOIS particularly useful for connecting new lookalike registrations to known threat actor infrastructure.

When a confirmed lookalike domain exposes a registrant email, a single reverse WHOIS call maps the full campaign scope:

```
curl "https://api.whoisfreaks.com/v2.0/whois?apiKey=YOUR_API_KEY&whois=reverse&email=domains@protonmail.com"
```

```
{
  "total_results": 47,
  "domains": [
    { "domain_name": "microsft-login.com",    "create_date": "2026-05-06" },
    { "domain_name": "paypa1-secure.com",     "create_date": "2026-05-04" },
    { "domain_name": "wellsfarg0-verify.com", "create_date": "2026-05-01" },
    { "domain_name": "amazon-account-id.com", "create_date": "2026-04-29" }
  ]
}
```

One pivot on a single registrant email surfaces 47 domains across multiple brand targets, all registered inside a two-week window. That is a campaign, not a coincidence.

## Detection Signal 2: String Similarity Analysis

WHOIS data describes the registrant. String analysis describes how close the domain name is to a brand target. The two signals work in sequence: WHOIS narrows the registrant risk pool; string analysis narrows the name pattern pool.

### Levenshtein Distance for Typosquatting

Levenshtein distance counts the minimum number of single-character edits (insertions, deletions, substitutions) needed to transform one string into another. For typosquatting detection, compute the distance between the second-level domain of each candidate and every string in your brand list. A distance of 1 catches single-character variants (microsft, micosoft). A distance of 2 catches two-character variants and catches some TLD-swap patterns when the TLD is included in the comparison string.

Threshold selection depends on brand name length. Short brand names (three to four characters) produce high false-positive rates at distance 2; distance 1 is a better threshold there. Longer brand names (eight or more characters) can use distance 2 without generating unmanageable false-positive volume. The output of this step is a candidate list, not a confirmed threat list.

### Unicode Normalization for Homoglyphs

Before any string similarity check, normalize the domain name. Convert all characters to their Unicode NFC form, then apply the Unicode Consortium's confusables mapping to replace known visually similar characters with their canonical Latin equivalents. After normalization, the homoglyph domain "micrоsoft.com" (with a Cyrillic "о") becomes "microsoft.com" and the Levenshtein distance drops to zero, flagging it as an exact brand match rather than a miss.

Without this normalization step, a homoglyph domain passes through string similarity checks without penalty. The raw Unicode codepoints differ; the visual representations are identical. The normalization step is the only thing that closes that gap.

### Keyword Enumeration for Combo-Squatting

Combo-squatting detection does not use edit distance. It uses set membership. Maintain two lists: a brand list (monitored brand names and product names) and a keyword list (high-risk terms commonly used in phishing domains: login, secure, account, billing, support, helpdesk, update, verify, portal, access). For each newly registered domain, check whether the second-level domain string contains an entry from the brand list and an entry from the keyword list.

For an inventory of what is already registered, the [Domain Discovery Tool](https://whoisfreaks.com/tools/domain/taken) searches all registered domains containing a specific keyword. Querying your brand name returns every domain currently incorporating that term across all TLDs, including combo-squatting registrations that automated filters may not yet have flagged. That gives a threat surface baseline before monitoring goes live.

The Domain Discovery API runs that same scan programmatically:

```
curl "https://api.whoisfreaks.com/v2.0/domain/taken?apiKey=YOUR_API_KEY&keyword=microsoft&pageSize=5"
```

```
{
  "total_results": 3842,
  "domains": [
    { "domain_name": "microsoft-login-secure.com",  "create_date": "2026-05-05", "status": "false" },
    { "domain_name": "microsoft-account-verify.net", "create_date": "2026-05-03", "status": "false" },
    { "domain_name": "microsoftsupport-helpdesk.com","create_date": "2026-04-30", "status": "false" },
    { "domain_name": "microsoft-billing-update.org", "create_date": "2026-04-28", "status": "false" },
    { "domain_name": "secure-microsoft-portal.com",  "create_date": "2026-04-25", "status": "false" },
    ...
  ]
}
```

3,842 registered domains containing "microsoft", most of them combo-squatting patterns. The response gives the threat surface baseline before a single monitoring rule is configured.

## Detection Signal 3: DNS and Infrastructure Signals

String similarity and WHOIS analysis both work at registration time. DNS signals arrive later, but they carry direct evidence of operational intent. Both layers work in sequence rather than in parallel.

### MX Records as a Phishing Intent Indicator

An MX record on a lookalike domain means it is configured to send or receive email. Most defensive registrations (brand owners registering typosquatting variants to block abuse) do not configure MX records. Phishing campaigns almost always do. A lookalike domain that scored a Levenshtein distance of 1 and then resolves an MX record within 24 hours of registration should be treated as high priority, regardless of what the registrant data shows.

Checking MX records across a candidate list requires one DNS query per domain. At scale, this runs as a batch job against the filtered output of the string similarity and WHOIS steps.

### Shared Infrastructure Clustering

Lookalike domain campaigns rarely consist of one domain. When a candidate in your list resolves to an IP address shared with other recently registered brand-matching domains, or uses a nameserver cluster already associated with confirmed phishing infrastructure, the scope shifts from one suspicious registration to a campaign in progress.

Grouping candidates by shared A record, shared nameserver, or shared registrar account produces clusters. Prioritizing the cluster with the most domains, the most brand targets, and active MX records allocates analyst time where it will have the most impact. An individual domain may warrant monitoring. A cluster of 40 related registrations, several with MX records already live, warrants immediate action.

## Building a Detection Pipeline at Scale

### Step 1: The Feed Layer

The pipeline starts with coverage. You cannot score a lookalike domain that never enters your detection system. A complete feed of newly registered domains across all monitored TLDs, delivered daily with WHOIS and DNS enrichment, is the input layer that everything else depends on.

The [WhoisFreaks Newly Registered Domains feed](https://whoisfreaks.com/products/newly-registered-domains) delivers daily enriched domain files across 1,528+ TLDs, with standard WHOIS records, cleaned WHOIS records (privacy-protected domains flagged or separated), and DNS records included. Security teams consume these via API or daily CSV download for integration into SIEM and SOAR pipelines. Because the feed is built from zone file analysis, direct registry integrations, and crawling rather than waiting on third-party aggregators, it surfaces new registrations within hours of when they occur, reducing the detection gap to well inside the 48-hour attack window.

Pulling the day's gTLD feed with WHOIS enrichment is a single API call:

```
curl "https://api.whoisfreaks.com/v3.1/domainer/gtld?apiKey=YOUR_API_KEY&whois=true&date=2026-05-07" \
  --output nrd-2026-05-07.csv
```

```
domain_name,create_date,registrar_name,registrant_company,name_server_1,...
microsft-login.com,2026-05-07,Namecheap Inc.,Withheld for Privacy,ns1.bullethosting.net,...
paypa1-secure.net,2026-05-07,GoDaddy.com LLC,Privacy service,ns1.domaincontrol.com,...
wellsfarg0.com,2026-05-07,Tucows Domains Inc.,Withheld for Privacy,ns1.bluehost.com,...
```

Each row in the output is a new domain registered that day, with the WHOIS fields your filter layer needs. The file feeds directly into the similarity scoring step without any intermediate processing.

### Step 2: The Filter Layer

Against every domain in the daily NRD file, run three parallel filters:

**Typosquatting filter:** Compute Levenshtein distance between the second-level domain and each entry in your brand list. Flag all domains at distance ≤ 2.

**Homoglyph filter:** Apply Unicode normalization and confusable mapping first. Then compute Levenshtein distance against the normalized string. Flag domains that match at distance 0 or 1 after normalization.

**Combo-squatting filter:** Check each domain name for the presence of any brand-list entry combined with any keyword-list entry within the second-level domain string. Flag matches regardless of edit distance.

What comes out is a small fraction of the day's NRD volume: domains worth scoring further in triage, not confirmed threats.

### Step 3: Triage and Risk Scoring

Score each candidate in the filtered list by signal weight:

| Signal | Risk Level |
| --- | --- |
| MX records present | High |
| Registered within past 7 days | High |
| Nameserver shared with confirmed malicious domain | Critical |
| Registrant email reused from prior malicious domain | Critical |
| Privacy-protected registration on brand-matching domain | Moderate |
| Registered through high-abuse registrar | Moderate |
| Domain parked, no DNS records | Low |

Routing is determined by signal combination, not individual flags:

*   Immediate analyst review: Any single Critical signal, or two or more High signals together.
*   Accelerated watchlist (48-hour review): One High signal without any Critical signal.
*   Standard watchlist (7 to 14 days): Moderate signals only, no High or Critical.
*   Passive watchlist: Low only (parked, no DNS records). Escalates automatically if MX records appear or the domain resolves.

### Step 4: The Response Layer

Three response paths exist once a domain clears triage.

Block submits the domain to a DNS firewall, email gateway, or IP blocklist for immediate filtering. This is the fastest path and the right default for any domain scoring Critical. It does not remove the domain, but it stops your users from reaching it.

Takedown pursues removal. For trademark infringement, a UDRP complaint filed through WIPO is the standard route. To qualify, the domain must be confusingly similar to your mark, the registrant must have no legitimate interest in it, and it must have been registered in bad faith. Combo-squatting domains with trust keywords appended to a registered trademark typically meet all three tests. Processing takes 45 to 60 days. For faster removal on actively phishing domains, a registrar abuse report through the registrar's ICANN-required abuse contact (listed in the WHOIS record as Registrar Abuse Contact Email) can result in suspension within 24 to 48 hours when evidence of active phishing is included.

Monitor places the domain in an active watchlist for ongoing WHOIS and DNS tracking. This is the right path when you lack trademark standing for a formal takedown but want visibility if the domain goes active. A parked domain today may have an MX record tomorrow.

[Brand monitoring through WhoisFreaks](https://whoisfreaks.com/products/brand-monitoring) automates the detection layer of this pipeline: keyword scanning across 1,528+ TLDs twice daily, with WHOIS-enriched alerts the moment a brand-matching registration appears, before the domain is configured for attack.

## Frequently Asked Questions

**Does a UDRP complaint apply to combo-squatting domains?**

Yes, if the domain contains a registered trademark and was registered in bad faith. Combo-squatting domains that append trust keywords to a trademark (microsoft-login.com, paypal-verify-account.com) typically satisfy all three UDRP criteria: confusing similarity to a mark, no legitimate registrant interest, and bad-faith registration evidenced by the phishing keyword pattern. Typosquatting variants are harder to win on: the misspelling must be close enough to cause consumer confusion, and the registrant may argue defensive or satirical intent. Homoglyph domains are the strongest UDRP candidates of the three: a visual clone of a trademark with no plausible legitimate purpose.

**How do you reduce false positives in Levenshtein-based typosquatting detection?**

Three adjustments move the needle most. First, normalize by brand name length: use a threshold of 1 for short names (three to five characters) and 2 for longer names (eight or more characters). A distance-2 threshold on a four-letter brand like "Visa" generates unworkable false-positive volume. Second, exclude known defensive registrations your own organization holds (these are legitimate edit-distance matches you do not need to triage). Third, apply a TLD allowlist. Filtering out registrations on low-risk TLDs (.edu, .gov, known country registries with strict registration requirements) reduces noise without materially reducing detection coverage. What remains after those three adjustments is a manageable candidate list that rewards analyst time rather than exhausting it.

The threat surface for lookalike domains grows with every new TLD launch and every attacker who discovers how cheap bulk registration is. The detection methods in this guide do not change that math. They change when your team finds out.

Set up [domain monitoring alerts](https://whoisfreaks.com/products/domain-monitoring) to track WHOIS and DNS changes on domains already in your watchlist, and pair them with the brand monitoring layer to cover both new registrations and changes to existing lookalike domains under observation.
