Python Solution for LeetCode 242: Efficient O(n) Anagram Verification

Python: LeetCode 242 — A Single-Map, Linear-Time Method to Check Anagrams

Learn a linear-time Python solution to LeetCode 242 that uses one frequency map to check anagrams efficiently, with performance analysis and interview tips.

Python developers encounter LeetCode 242 — the anagram check — early and often; the problem asks whether two strings contain the same characters with identical counts, possibly in a different order. The simplest robust approach in Python combines an upfront length check with a single frequency map: increment counts for characters from one string and decrement for the other, then verify that all counts return to zero. That pattern achieves O(n) time and minimal auxiliary space for the common lowercase-letter case, and it highlights principles applicable to string algorithms, hash maps, and interview preparation.

Why LeetCode 242 Still Matters for Everyday Coding

Anagram checking is more than a trivia question for interviews — it’s a compact exercise that exposes fundamental algorithmic trade-offs: sorting versus counting, time versus memory, and correctness when handling varied character sets. Engineers encounter the same underlying task in deduplication, log analysis, data validation, and basic natural language tooling. Understanding an efficient solution for LeetCode 242 gives you a template for handling frequency-based problems across search, validation, and preprocessing pipelines.

The Frequency-Counting Strategy and Why It Works

At its core, an anagram test verifies that the two inputs are equal as multisets of characters. There are two obvious strategies:

Sort both strings and compare; this is simple but costs O(n log n) time.
Count frequencies for one string, then compare with the other; this can be done in linear time O(n).

The single-map approach blends the counting steps: walk both strings in lockstep, increase the count for the character from the first string and decrease for the character from the second. If the strings have the same set of characters with identical multiplicities, all counts will cancel out and yield zeros. This reduces passes over the data and avoids creating two separate maps or sorting buffers.

Important practical checks are part of the strategy: verify lengths first (unequal lengths rule out anagrams), choose an appropriate counting structure (dictionary for arbitrary characters, fixed-size array for constrained alphabets), and consider normalization for Unicode input.

Step-by-step Walkthrough of the Single-Map Algorithm

Validate input lengths. If len(s) != len(t), return False. This inexpensive test saves work for obviously different inputs.
Create an empty counter. In Python this is typically a dict or collections.Counter. For ASCII-limited input, a list of fixed size may be faster.
Iterate i from 0 to n-1:
- Increment count[s[i]] by 1.
- Decrement count[t[i]] by 1.
After the loop, iterate over the count values. If any value is non-zero, return False; otherwise return True.

Because the loop performs two constant-time dictionary operations per character and a final pass over at most as many keys as distinct characters, runtime is O(n) and extra space is O(k) where k is the number of distinct characters. For lowercase English letters, k ≤ 26, so space is effectively O(1).

Python Implementation and Idiomatic Variants

Below are several Python variants that illustrate trade-offs between clarity, brevity, and micro-optimizations.

A clear and idiomatic function using a regular dict:

def is_anagram(s: str, t: str) -> bool:
if len(s) != len(t):
return False
counts = {}
for cs, ct in zip(s, t):
counts[cs] = counts.get(cs, 0) + 1
counts[ct] = counts.get(ct, 0) – 1
return all(v == 0 for v in counts.values())

If you prefer the batteries-included approach, collections.Counter provides an expressive one-liner:

from collections import Counter
def is_anagram_counter(s: str, t: str) -> bool:
return Counter(s) == Counter(t)

Counter is convenient and readable, but it constructs two full maps and performs a comparison; its asymptotic cost is still O(n), but the single-map method is slightly more memory-efficient in practice.

For inputs limited to lowercase ASCII letters, a fixed-size array can be fastest:

import string
def is_anagram_ascii(s: str, t: str) -> bool:
if len(s) != len(t):
return False
counts = [0] * 26
base = ord(‘a’)
for cs, ct in zip(s, t):
counts[ord(cs) – base] += 1
counts[ord(ct) – base] -= 1
return all(x == 0 for x in counts)

This avoids Python dictionary overhead and is suitable when constraints guarantee only ‘a’–’z’ characters.

Performance and Complexity Analysis

Time complexity:

Single-map approach: O(n) where n is the length of the strings (after an O(1) length check). Each character produces two constant-time updates.
Sorting approach: O(n log n) due to sort cost.
Counter-based approach: O(n), but with extra allocation.

Space complexity:

General-case (arbitrary Unicode): O(k), where k is number of distinct characters in input. In worst case k = n.
Constrained alphabets (e.g., lowercase letters): O(1) because the counter size is bounded by a small constant.

Micro-performance considerations:

Python dict operations are highly optimized, but for tight inner loops a preallocated list can be faster.
Using zip(s, t) reduces index overhead compared to range-based indexing.
Short-circuit returns on length mismatch or early non-zero detection can save time on average.

Real measurements depend on input distribution, interpreter version, and hardware. For interview problems, algorithmic clarity typically outweighs microbenchmarks; in production systems, choose the optimized variant only when profiling shows a bottleneck.

Edge Cases and Practical Considerations

Handling different kinds of inputs requires care:

Unicode and normalization: Two visually identical strings can have different code point sequences (e.g., composed versus decomposed forms). Normalize inputs (unicodedata.normalize) before counting if anagram checks must respect human-perceived equivalence.
Case sensitivity: Decide whether ‘A’ and ‘a’ are equivalent. Convert to a common case when necessary.
Whitespace and punctuation: Real-world text may include spaces, punctuation, or diacritics. Strip or transform characters according to the domain rules before counting.
Large inputs and streaming: If strings are extremely large or streamed from a source, you can process them chunk by chunk, maintaining an aggregate counter. Streaming requires knowing both sequences are exactly the same length or handling termination signaling.
Memory-constrained environments: If you cannot store the entire counter (e.g., unbounded Unicode), consider hashing or fingerprinting techniques for probabilistic checks, accepting occasional false positives/negatives depending on the method.

When Sorting Is Still Reasonable

Sorting is simpler to implement and easy to reason about, and in many contexts the cost difference is negligible. Use sorting when:

n is small (performance difference invisible).
You prioritize concise code over micro-optimization.
The platform provides a highly optimized sort and you want stable, straightforward behavior.

However, for large inputs or when you expect repeated checks, frequency counting is generally preferable.

Common Pitfalls and How to Avoid Them

Forgetting the length check. Without this, your algorithm may report false positives when one string has extra characters that cancel out counts.
Assuming ASCII where Unicode is present. Validate character domain assumptions before choosing array-based counters.
Mutating inputs inadvertently. For readability, avoid in-place transformations unless necessary; return early when preconditions fail.
Not normalizing in multilingual datasets. A missing normalization step can silently produce incorrect results.

Integration Scenarios: From Interviews to Production

LeetCode-style problems translate directly into interview discussions and small utility functions. In production, an anagram test might be used in:

Data cleaning pipelines to detect near-duplicates.
Validation steps for user-submitted tokens where character multiset equality matters.
Preprocessing for NLP tasks that rely on bag-of-words or character-level features.

When integrating this logic into a larger codebase, provide unit tests that cover ASCII, Unicode, empty strings, and boundary lengths; include docstrings that state behavior with regards to case, whitespace, and normalization. Consider exposing configuration options (case_sensitive, ignore_whitespace, normalize_unicode) for library code.

Natural internal link phrases you might use when documenting or cross-referencing include string algorithms, hash map tutorials, collections.Counter guide, algorithm interview prep, and Unicode normalization practices.

Developer Tips for Interviewers and Candidates

Explain the trade-offs: show awareness of sorting vs counting and justify your choice.
Demonstrate an understanding of edge cases: mention normalization, case sensitivity, and whitespace handling.
Show incremental refinement: start with a clear, correct implementation; then offer optimizations (Counter vs single-map vs fixed array) and explain when each is appropriate.
Write tests on the whiteboard or in your submission: include base cases, repeated characters, and non-overlapping alphabets.
Discuss complexity explicitly: O(n) time, O(k) space, with k bounded by the character set.

These signals show both algorithmic competence and practical engineering judgment.

Broader Implications for Software Engineering and Tooling

The single-map anagram check is small, but it illustrates larger engineering patterns. Frequency counting underpins problems in analytics, cybersecurity (frequency-based anomaly detection), search (inverted index term frequencies), and even machine learning feature extraction. Mastering the idiom of counting-and-cancelling gives engineers a transferable technique for:

Designing memory-efficient algorithms by exploiting bounded alphabets.
Reasoning about algorithmic invariants (counts summing to zero).
Choosing the right data structure for workload characteristics — dicts for flexibility, arrays for speed.

This pattern also highlights the value of language-standard libraries (collections.Counter), which reduce boilerplate. In modern stacks, integrating such utilities with observability and tests helps catch domain-specific pitfalls early, like unexpected encodings or maliciously large inputs.

Testing Strategy and Benchmarks

A practical test matrix should include:

Pair of empty strings.
Identical strings.
Permutations with repeated characters.
Strings of different lengths.
Strings with Unicode composed/decomposed forms.
Very large random strings to measure performance and memory.

Microbenchmarks should compare:

dict-based single-pass implementation
collections.Counter-based approach
fixed-size array for constrained alphabets

Run benchmarks on representative hardware and Python versions. Use timeit, perf, or benchmarking frameworks and interpret results in context; a 10–20% runtime difference in microbenchmarks may be irrelevant unless the function is in a hot path.

When to Use This Pattern Outside of Anagrams

The add/subtract frequency-map trick generalizes:

Compare inventories in distributed systems by streaming diffs.
Verify that two logs contain the same set of events regardless of order.
Implement multiset equality tests for datasets and collections.

Anywhere you need to assert that two multisets are equal, this approach is natural and efficient.

Reflecting on the algorithmic theme, frequency counting is a cornerstone technique alongside two-pointers, sliding windows, and prefix sums in string and array problems. Investing time to master these idioms accelerates problem-solving in interviews and real projects alike.

Looking ahead, as applications increasingly process diverse text sources, nuance around encoding, normalization, and locale-aware comparisons will gain prominence. Tools and libraries will continue to add utilities that make these details easier to handle, but the core algorithmic insight — using frequency counts for O(n) verification — will remain valuable. Enhanced standard libraries, better Unicode defaults, and hardware-accelerated string processing may change micro-optimizations, but the underlying trade-offs between sorting and counting will persist as a useful design decision.