Composite DNS Queries: How Attackers Hide Malicious Domains in Trusted Services

Abstract

Many DNS-based services utilize composite queries, which are queries formed by embedding a referenced domain as a subdomain. For example, example-com.translate.goog refers to example.com as the source domain. The use of composite queries allows services to route users effectively on the Internet; however, these services can also present a security challenge by enabling actors to bypass traditional DNS security controls.

Such services include content proxies (e.g., Google Translate), email security gateways (e.g., Microsoft Outlook Safe Links), content delivery networks, and DNS protocol–based services (such as antivirus lookups and security reputation systems). Malicious actors exploit these mechanisms by passing malicious content through well-known legitimate services, so that the embedded domain is not visited directly from the user’s device, reducing visibility for conventional threat detection systems.

We conducted a comprehensive analysis of services utilizing composite queries in our cloud customer traffic and summarized the purposes and risks they represent. We identified over 100 high-volume services in regular use. Their composite query activity accounts for about 7% of distinct domains, with many embedded domains not visible in direct traffic. Our analysis shows that over 100 known malicious or suspicious domains are embedded in Google Translate queries alone every day.

This blog explains how composite queries are constructed for different purposes and describes how we detect them and accurately extract potentially malicious domains.

Introduction

The Domain Name System (DNS) serves as the Internet’s address book, translating human-readable domain names like www.google.com into IP addresses that computers use to communicate. This fundamental protocol is trusted by network security controls, which attackers may attempt to exploit or bypass.

Domain-embedding services represent a legitimate and widely used class of Internet infrastructure that encodes target domains within DNS query structures. These services create composite DNS queries where the target domain is embedded within the service domain structure, resulting in queries like:

example-com.translate.goog

subdomain.example.com.cdn-service.net

ZXhhbXBsZS5jb20.proxy-service.example

While domain-embedding services serve legitimate purposes—such as translation, email security, content delivery, and privacy protection—they can present security challenges when misused by malicious actors.

The Security Blind Spot

Consider a malicious actor attempting to run a phishing campaign from malicious-domain.com. Under normal circumstances, this domain could be:

Blocked by protective DNS systems based on threat intelligence
Flagged by IP reputation systems when connections are established
Identified through SSL certificate analysis
Detected by network security monitoring tools

However, when the same malicious domain is accessed through a domain-embedding service as malicious–domain-com.translate.goog, the security landscape changes:

DNS firewalls observe queries to translate.goog (Google’s trusted domain) rather than malicious-domain.com
IP reputation systems see connections to Google’s infrastructure (highly trusted) rather than malicious hosting providers
SSL inspection encounters Google’s valid certificates rather than suspicious or self-signed certificates
Traditional threat detection fails because all observable indicators point to legitimate, trusted services

Figure 1. An example of a real-world phishing email containing a Google translate link with an embedded malicious website.

This creates a security gap: the embedded domain can bypass traditional DNS-based security controls because the observable DNS traffic, IP addresses, and SSL certificates all belong to the legitimate service domain, not the actual embedded destination. These embedded domains reach client systems through various channels—shared links in emails, messaging applications, web pages, and other content—yet remain hidden from security controls focused on the observable service domain.

Addressing the Gap: A Paradigm Shift in DNS Security

The detection system shifts the security paradigm from “trust the observable DNS domain” to “extract and analyze the embedded destination.” Rather than accepting the service domain at face value, the system:

Identifies domain-embedding services through analysis of DNS traffic patterns
Extracts embedded domains using multiple decoding techniques
Validates extracted domains through statistical conformity analysis against legitimate domain baselines
Applies independent security analysis to embedded domains using threat intelligence, reputation feeds, and behavioral analysis
Enables granular policy enforcement where security decisions are based on embedded domain reputation and detected service specialization rather than service domain reputation

This paradigm transforms the security response from binary service-level decisions (“block all translate.goog” or “allow all translate.goog”) to context-aware, content-based policies (“allow translate.goog when embedding legitimate domains; block or alert when embedding known malicious domains”). Security response decisions—whether to block all queries to a service, block individual queries associated with known malicious embedded content, or allow traffic—should be made on a case-by-case basis to avoid disrupting critical security and functional services.

Service Categories and Detection Characteristics

Service Categories

Observed generic domain-embedding services fall into several categories organized by abuse potential and security risk. Each category has distinct security implications while sharing common DNS traffic patterns that enable automated detection:

Category	Abuse Risk	Description
Content Proxy	High	Services that deliver content of the embedded domain, either direct or transformed. This includes translation services, anonymizers, link wrappers, and web archives that fetch and serve content through their infrastructure. Enables malicious actors to hide domains within trusted infrastructure for phishing and malware delivery. Examples: Google Translate (translate.goog), Microsoft Outlook SafeLinks (protection.outlook.com), web archive services.
DNS Proxy	High	Services that provide DNS resolution access to embedded domains, returning DNS records for those domains via CNAME records or recursive forwarding. Enables DNS-based hiding of malicious domains. Examples: Public DNS resolvers with embedded query patterns, CNAME-based redirection services for click tracking and campaign management
Metadata Services	Low	Services that return information about embedded domains (i.e. reputation scores, threat classifications, or status indicators), rather than delivering the domain’s content itself. These services provide critical security and operational intelligence and should not be blocked. Examples: SURBL (surbl.org), URIBL (uribl.com), antivirus lookup services, reputation systems, domain availability checkers.
Static Content	Medium	Domains with wildcard DNS records that return identical or non-specific content regardless of subdomain structure. While queries may appear to contain embedded domains, no actual domain-specific embedding occurs. Requires investigation to distinguish legitimate wildcard services from selective abuse where attackers use CNAME records for targeted domains. Examples: CDN wildcards, domain parking pages, anti-caching mechanisms with random prefixes.
Table 1: Categories of domain-embedding services organized by abuse risk and security implications

Common DNS Traffic Characteristics

Despite their diverse purposes and risk profiles, services across all categories share observable DNS traffic patterns that enable automated detection:

Subdomain Structure Patterns:

High subdomain diversity: Services generate hundreds to thousands of unique subdomains as they handle different embedded destinations
Structural consistency: Repeated patterns (prefixes, suffixes) appear across all queries to the same service
Encoding consistency: Each service uses a consistent encoding approach

DNS Query Characteristics:

DNS query characteristics vary between detected services. These characteristics are useful for attribution of services to categories and understanding associated risks:

Query type distribution: The distribution of query types (A, AAAA, CNAME, TXT, etc.) provides insights into service function
Infrastructure patterns: Resolution targets and their stability over time
ASN and hosting analysis: Concentration or distribution of infrastructure across autonomous systems
Provider reputation: Assessment of hosting infrastructure providers

Volume and Temporal Patterns:

Sustained activity: Legitimate services show consistent query volumes over time
Multiple embedded destinations: True domain-embedding services handle diverse target domains, not just single-organization subdomains
Temporal stability: Service infrastructure and patterns remain stable across observation periods

Table 2 shows representative examples of direct DNS traffic compared to domain-embedding service traffic patterns across different risk categories:

Category	Example Service	Query Pattern	Pattern Description
Direct Traffic	N/A	google.com	Standard domain query
Direct Traffic	N/A	mail.example.com	Typical subdomain/ hostname structure
Content Proxy (High Risk)	Google Translate	example-com.translate.goog	Embedded encoded domain
DNS Proxy (High Risk)	CDN Service	example.com.cdn-provider.net -> example.com	Direct subdomain embedding with CNAME
Metadata Services (Low Risk)	Antivirus Lookup	mfrggzdfmztwq2lk.av-service.net	Encoded domain for threat intelligence lookup
Table 2: DNS query patterns showing direct traffic vs. domain-embedding service traffic

Methodology: Multi-Stage Detection Pipeline

The detection system operates as a multi-stage pipeline, with each stage building upon the previous one, progressively refining and enriching detections of domain-embedding services.

Stage 1: Service Domain Detection

The first stage identifies service domain candidates by analyzing domains with abnormally high subdomain diversity. For each candidate, the system attempts to decode embedded domains using multiple decoding techniques and detects consistent structural patterns.

To validate decoded domains, the system builds statistical baselines from direct DNS traffic, capturing domain name properties. Decoded domains are scored against this baseline, allowing it to filter out random strings and other composite components.

Candidates are classified by confidence level based on the number of validated embedded domains, destination diversity, encoding consistency, and statistical conformity scores.

Stage 2: Enrichment and Validation

Detected service domains and embedded domains are enriched with external intelligence: domain registration data, historical DNS observations, reputation feeds, SSL certificates, and others. External sources such as WHOIS and historical DNS records confirm that embedded domains are legitimate registered domains rather than random artifacts. Validation filters ensure temporal consistency, encoding consistency, destination diversity, and registration data coverage to reduce false positives.

Stage 3: Aggregation and Trend Analysis

The final stage aggregates validated detections across time periods to track service domain lifecycles, identify growing or declining services, and maintain historical context on domain-embedding service evolution.

Results: Discovery and Analysis of DNS Traffic Blind Spots

The detection system reveals significant gaps in traditional DNS security monitoring and provides insights into domain-embedding service usage patterns.

Discovery of Previously Unobserved Traffic

Analysis of DNS traffic demonstrates the substantial blind spot created by composite queries:

Composite Query Volume: Queries to domain-embedding services produce a sizable portion of DNS traffic, with their composite query patterns representing a measurable fraction of total query volume. This traffic typically bypasses traditional security analysis focused on direct domain queries.

Unique Domain Discovery: Activity associated with embedded domains represents approximately 7% of distinct domain activity in daily traffic. This includes over 10,000 domains per day that are not queried directly at our cloud DNS resolvers. A notable portion of these embedded domains are not observed in direct traffic within extended observation periods.

Service Domain Identification: The system identified approximately 100 domain-embedding services daily, categorized across multiple confidence levels based on embedding patterns, destination diversity, and encoding consistency.

Figure 2. Daily activity of domains using composite queries

Service Distribution: Analysis of domain-embedding traffic over a two-week period reveals significant concentration among a small number of services. Microsoft Outlook SafeLinks (outlook.com) dominates the landscape at 71.7% of observed traffic, followed by Google Translate (5.3%), Cloudflare (2.7%), URIBL (0.74%), and SURBL (0.45%). The remaining 19.1% is distributed across about a hundred smaller services. This distribution pattern remains consistent across all sampling periods, validating the stability of our observations. Figure 3 shows this distribution.

Figure 3. Distribution of embedded traffic between services

Embedded Domain Characteristics

Domain Age Distribution: Analysis reveals a notable presence of newly registered domains among embedded domains. On average, approximately 100 unique domains per day were registered within the past 7 days, and approximately 350 unique domains per day were registered within the past 30 days.

Figure 4. Domains registered within past 7 days (30 days) window

Service-Specific Analysis

Detailed analysis of 3 well-known domain-embedding services demonstrates diverse usage patterns and risk profiles, illustrating different security considerations from open proxies to critical security infrastructure:

Google Translate (translate.goog) – Content Proxy

Among the highest-volume services detected, they process thousands of unique embedded domains daily. Approximately 10,000 unique embedded domains are observed daily, with 6.6% not present in direct DNS traffic on the same day. Queries resolve Google’s infrastructure, fetching and serving content through trusted infrastructure. 0.3% of embedded domains matched threat intelligence feeds (phishing sites, malware distribution, suspicious recently-registered domains)—representing tens to hundreds of malicious domains daily that bypass traditional security controls. While legitimate use cases represent the majority, even a small number of successful attacks can compromise organizations.

SURBL (multi.surbl.org) – DNS Protocol-Based Service

Email security gateways and spam filters generate moderate query volumes to this threat intelligence service. Analysis observed approximately 1,000 unique embedded domains, with 6.5% not appearing in direct DNS traffic. Rather than resolving embedded domains, SURBL returns encoded threat classifications as pseudo-IP addresses in the 127.0.0.x range. As essential security infrastructure, SURBL enables real-time threat assessment—60% of queried embedded domains represent known malicious or suspicious threats. This service should not be blocked, as it is critical to security operations.

Microsoft Outlook SafeLinks (protection.outlook.com) – Email Security Gateway

Very high volume in enterprise environments reflect widespread Microsoft 365 adoption. Approximately 250,000 unique embedded domains were observed, with 0.4% not present in direct DNS traffic on the same day. Queries resolve Microsoft’s protection infrastructure, scanning embedded URLs before redirecting users. 0.2% of embedded domains match threat intelligence feeds—over a thousand malicious or suspicious domains daily. While this represents a lower malicious rate compared to open proxies like Google Translate, the absolute volume demonstrates that significant malicious activity reaches users through this trusted channel. This service is an example of a valid infrastructure that should not be interrupted, but embedded domains represent valuable contributions to threat analysis.

Conclusion

Modern DNS security requires looking beyond observable domains. Our research reveals that thousands of domains daily remain hidden within trusted infrastructure, invisible to traditional security tools focused on direct DNS queries.

Infoblox’s detection system provides visibility into this previously hidden activity. By extracting embedded domains from composite queries and applying independent threat analysis, security teams can now identify and respond to threats regardless of how attackers attempt to obscure them—while preserving the functionality of legitimate services that organizations depend on.

This capability represents a significant advancement in DNS security. Organizations gain comprehensive visibility across both direct and embedded domain activity, enabling context-aware policies that protect users without disrupting business operations. As domain-embedding services continue to grow in usage, this visibility becomes increasingly essential for maintaining effective security postures.

Abstract

Introduction

The Security Blind Spot

Addressing the Gap: A Paradigm Shift in DNS Security

Service Categories and Detection Characteristics

Methodology: Multi-Stage Detection Pipeline

Results: Discovery and Analysis of DNS Traffic Blind Spots

Conclusion

Vadym Tymchenko

Sr. Staff Data Scientist

Connecting Dots with SSL Certificates: Finding Threat Actors with Graph Theory

Novel AI Techniques for DNS Tunnel Security

Hallucinating for Fun and Profit: Using LLMs to Find Lookalikes without Targets

Connecting Dots with SSL Certificates: Finding Threat Actors with Graph Theory

Novel AI Techniques for DNS Tunnel Security

Hallucinating for Fun and Profit: Using LLMs to Find Lookalikes without Targets

Hiding in Plain Sight: Abusing Composite Domain Names

Abstract

Introduction

The Security Blind Spot

Addressing the Gap: A Paradigm Shift in DNS Security

Service Categories and Detection Characteristics

Methodology: Multi-Stage Detection Pipeline

Results: Discovery and Analysis of DNS Traffic Blind Spots

Conclusion

Vadym Tymchenko

Sr. Staff Data Scientist

You might also be interested in

You might also be interested in