Solana Address Prefix Character Probability Analysis

Why Does the Same Character Have Different Difficulty?

May 15, 2025

1385 Words

https://img11.360buyimg.com/ddimg/jfs/t1/292917/34/5452/49299/681f3cbeF8442cacb/4f28edff12f24bc4.jpg

Recently, someone opened an issue on my SolVanityCL repository about an interesting problem: for certain strings used as prefixes or suffixes of an address, the time to find a matching address can vary significantly.

I didn’t think it was a big problem initially, so I replied to him simply that it was caused by the algorithm: Solana’s public key is unpredictable, even when private keys are incremented consecutively. Therefore, the estimated time is not accurate. However, he told me that he had tried Solana’s official keygen command-line tool and found that the time gap wasn’t as large as with my tool.

This was interesting and interested me, so I took some time to find the reason for the problem. I was sure the ED25519 algorithm I used was correct, because otherwise the public key would not be the same as the result from Solana’s official command-line tool.

I checked the source code of Solana keygen and found some details, but they were not what I expected. I thought it would use a different method to get the random seed, but it still gets random bytes from the system random number generator. However, it includes pruning behavior. When a user specifies some prefixes, pruning can filter out invalid addresses before matching the prefix patterns, reducing the time needed for prefix/suffix pattern matching. However, it still can’t skip the public key derivation process.

For example, when the user specifies the prefix character a, the length of the generated public key address is 44, so there’s no need to match the pattern—when the prefix is a, the public key address length will always be 43. It’s amazing, isn’t it? I will explain the details in this article, and you’ll see why the same character affects the prefix-matching probability.

Before we dive in, I want to list all the characters in the Base58 alphabet: 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

An Invisible Rule for Public Keys

The length of an Ed25519 public key is 32 bytes, so its theoretical maximum value occurs when every byte equals 255:

>>> from base58 import b58encode
>>> max_bytes = (2**256 - 1).to_bytes(32, 'little')
>>> b58encode(max_bytes)
b'JEKNVnkbo3jma5nREBBJCDoXFVeKkD56V3xKrvRmWxFG'
>>> len(b58encode(max_bytes))
44

After Base58 encoding, the resulting address has length 44 and always starts with ‘J’. This means that for a 44-character address, the first character can never be any letter after ‘J’ (K, L, …, Z). If one of those letters appeared first, decoding the address would produce more than 43 bytes—impossible for a 32-byte public key.

Meanwhile, for characters K, L, …, Z to appear as the first character, the address length must be 43.

Base58 character length	Number range	Probability (32-byte range)
44 characters	$5 8^{43} \leq N < 2^{256}$	≈ 94.20%
43 characters	$5 8^{42} \leq N < 5 8^{43}$	≈ 5.70%
≤ 42 characters	$N < 5 8^{42}$	≈ 0.10%

The 43-character address space accounts for about 6% of the 44-character space. As for ‘J’, which is the first character of the theoretical maximum 44-character address, its address space is larger than that of any 43-character address but smaller than that of other 44-character addresses.

The second character of this maximum address is ‘E’, so the second character cannot follow ‘E’ and can only be in the range 0–E (14 possibilities). Therefore, among 44-character addresses, the probability that the first character is ‘J’ is 14 / 58 = 24.1%.

Real Test

I will use actual code to simulate the frequency of each character as the first character. Because the theoretical maximum coordinate is not on the Ed25519 curve, we need more real tests for analysis.

from nacl.bindings import crypto_sign_seed_keypair
import base58, secrets, collections

counter_starts = collections.Counter()
counter_seconds = collections.Counter()
counter_both    = collections.Counter()
counter_ends    = collections.Counter()

def increase_key32(private_key) -> bytes:
    current_number = int(bytes(private_key).hex(), 16)
    next_number = current_number + 1
    new_key32 = list(next_number.to_bytes(32, "big"))
    return bytes(new_key32)

private_key  = secrets.token_bytes(29) + b'\x00' * 3
for _ in range(5_000_000):
    pk, _ = crypto_sign_seed_keypair(private_key)
    addr  = base58.b58encode(pk).decode()
    counter_starts[addr[0]]  += 1
    counter_seconds[addr[1]] += 1
    counter_ends[addr[-1]]   += 1
    if addr[0] == addr[1]:
        counter_both[f'{addr[0]}{addr[1]}'] += 1
    private_key = increase_key32(private_key)

For 43-character addresses, before considering first characters K–Z, we must exclude first characters 1–J. The overall probability of a 43-character address is 5.7%, so the per-character probability is 5.7% / 58 = 0.10%.

There are two cases when the first character falls in the range 2–J:

44-character address space: the probability that the first character is ‘1’ is (1/256). For the other characters, the average probability is $\frac{0.942 - \frac{1}{256}}{16.241}$
43-character addresses: each character’s probability is approximately (5.7%/58 = 0.10%).

Thus, the probability that the first character is in the K–Z range is about 1.7% of that in the 2–J range.

>>> print(counter_starts)
Counter({'9': 296710, 'C': 295743, ..., '2': 289724, 'J': 71823, '1': 19541, 'r': 5192,  ..., 'T': 4894})

If we compare ‘r’ and ‘C’: $\frac{5192}{295743} \approx 1.76$ , which is very close to our calculation; If we compare ‘J’ and ‘C’: $\frac{71823}{295743} \approx 24.3$ , also close to our expectation.

Remaining characters

>>> print(counter_seconds)
Counter({'3': 95358, ..., '1': 89821, ..., 'm': 83861})

For the remaining characters (for example, the second character), they appear with approximately equal frequency, which indicates that the distribution of ED25519 public keys is uniform.

In other words, the difficulty of matching an address pattern depends only on the first character. The root reason is that the number of possible values represented by 32 bytes ( $2^{256}$ ) is not equal to the number of possible 44-character Base58 strings ( $5 8^{44}$ ). If these two numbers were equal, there would be no significant probability gap around character ‘J’.

I will not print the result of counter_ends; you can test it yourself. The result is similar to counter_seconds: each character has the same frequency.

Is character ‘1’ special?

You might notice that the number of times the first character is ‘1’ is greater than that for K–z but less than that for 2–H. if we compute the frequency of ‘1’: 19541 / 5000000 = 0.0039082, which is close to 1/256 (≈ 0.00390625).

This is a particular situation for base58: base58 encoding transfers bytes to characters. A byte ranges from 0 to 255, but after encoding there are only 58 possible characters. Therefore, some characters must occur more than once. However, the character ‘1’ appears as the first character only when the byte is 0. That is why the probability of ‘1’ as the first character is 1/256.

Now let’s look at the case where the first character equals the second character:

>>> print(counter_both)
Counter({'BB': 5275, ..., '88': 4885, 'vv': 105, ..., '11': 65, 'ZZ': 65})

In the first-character statistics, ‘1’ occurs more often than K–z, but in counter_both the occurrence of ‘11’ is much smaller. Calculating its frequency: 65 / 5000000 = 0.000013, which is close to 1/(256×256).

Therefore, for prefixes consisting entirely of ‘1’s, the behavior is indeed different: each ‘1’ corresponds to byte 0, and if the prefix length exceeds 2, it has the minimum probability.

Final decision

Assume the length of the prefix or suffix you want to match is $L$ .

Suffix matching

Each character is equally likely, so the probability of matching a suffix of length $L$ is $(\frac{1}{58})^{L}$ .

Prefix matching

Prefix matching is more complex. First use the empirical probabilities from the table for the first character:

Range	Characters	Total Probability	Single-character probability $p (C)$
‘1’	only 1	0.39%	0.39%
‘2’–‘9’, ‘A’–‘H’	16 characters total	94.15%	5.88%
‘J’	only J	1.45%	1.45%
‘K’–‘Z’, ‘a’–‘z’	40 characters total	4.0%	0.1%

Then the full-prefix probability by cases:

All ‘1’s Every ‘1’ corresponds to byte 0, so $P = (\frac{1}{256})^{L}$ ;
First character not ‘1’ $P = p (C) \frac{1}{58}^{L - 1}$ .

Disclaimer: All of the above probability analyses assume that the integer corresponding to an ED25519 public key is uniformly distributed over $[0, 2^{256} - 1]$ . Additionally, the values given are theoretical probabilities and do not guarantee that you will obtain a matching address within the expected number of trials.