It's no secret that I am fascinated by letters and letter patterns in much the same way I am with numbers and number patterns. Really all kinds of patterns interest me for the most part. But today I want to chat about the generation of names for any purpose...
Not long ago I was involved in an extended Dungeons & Dragons campaign. This campaign called for me to generate dozens upon dozens of nonplayer characters that the people who were playing the game would need to interact with. Since it is impossible to tell beforehand who the characters will choose to talk to, you kind of have to flesh out ALL the people they might interact with. Which is kind of a pain in the ass. First order of business, for me anyway, is giving each character a name. The name is like an anchor from which I draw connotations and helps me keep the characters separate in my head.
It also makes things more real for the players, so when they greet the stableboy, let's say, and ask him what he's called, I'm not sitting there going "duh... Joe".
Occasionally just by thinking real hard you can come up with names you really like. For example for a novel that I ended up never writing, I picked the name "January Day McKane" for the heroine of the story. I loved that name.
Most of the time however, generating names is a chore, and if you need to make 100 names or so, you could be sitting there all day. Similarly choosing a name for your baby can be a lot of work because you want to get something nice, that you and your significant other like, and that has no negative connotations that you can think of. (You don't want, for example, to name your son "arson".)
Further a lot of people trying to name their baby desperately seek to differentiate their child from others by giving them a name that is unique. Something that will make them stand out from all the Jacobs and Emilys. Some folks get the brilliant idea of changing I's to Y's or C's to K's or G's to J's... in other words deliberately mispelling their child's name. I generally despise this, partly because it is so very unoriginal, and partly because the poor kid spends the rest of his life saying "No, that's Tymothy with a Y, not with an I."
These reasons are why I invested time back in the 1990's to building name generation software. My early attempts were pretty gross.
The first try was a program that just strung together random letters. It turned out pretty quick that this was no good. The software kept churning out names like WHVCWJHGCYE or OOOQF that would be completely unpronounceable by anyone other than Strong Bad (come on, fhqwhgads).
The next attempt was a program that alternated random consonants and vowels. It seemed like a great idea but the results still left a lot to be desired. ILOHOWUSECA and QEXYJUKILO certainly could be pronounced (with some effort) but they sounded unrooted... they didn't sound anything like the language.
As I tried different approaches, I found myself slowly trying to approximate the rules of names in code. The problem was I wasn't going to get there without way more effort than I was prepared to make. So instead I took a different tack. I wrote a piece of software that looked at a text sample, analyzed the letter patterns in the sample, and then produced new names based on the patterns in the sample.
This worked MUCH better. My first approach was what I called "letter doublet analysis". It's pretty simple, when the code chews through the sample text, it keeps track of the number of times one letter follows another. For example, given the text:
five six seven vat
We derive the following pairs FI, IV, VE, SI, IX, SE, EV, VE, EN, VA, AT. Note that VE appears twice. The fact that it occurred twice is important, because it establishes that VE should occur twice as often as VA, which only occurred once. This means that when generating names, we are more likely to take the more common route, which is good, because that will produce something that sounds more like a name.
Having analyzed the letter patterns, we pick a starting letter and see what happens. In this case let's pick 'S'. According to our list of letter patterns above, S can only be followed I or E, with either choice being equally likely. So let's pick E and keep going. By the same rules, E can be followed only by V or N, so let's pick N. N cannot be followed by anything, so we are stuck, having created the word SEN.
To avoid being stuck, I also keep track of the beginnings and ends of words as letter patterns. Using * to mean the end or beginning of a word, we get these further pairs from the sample text above: *F, E*, *S, X*, *S, N*, *V, T*. Now when we get to N we have a clear indication of what to do next, N can be followed by end-of-word. This also allows us to make intelligent choices about the start of words. All the words in our sample begin with F, S, or V... therefore, so should the names in our output. Further S is the starting letter twice as often (there are two *S pairs in the list), and this should be reflected in our output. If we were to use letter doublet analysis, we should expect to be able to generate a large number of new words from this sample text. This isn't an exhaustive list, but it covers enough to give you an idea: fivat, five, fiven, fiveva, fiveve, fix, sen, sevat, seve, seven, seveva, seveve, sivat, sive, siven, siveva, siveve, six, vat, ven, vevat, veve, veven, veveva, veveve. Some of the words produced are the same words we fed in, which makes sense, that really should happen if the rules we are following are based on the input text. And to boot, they all seem pretty pronouncible. Great!
It turned out, however that letter doublet analysis wasn't good enough. Consider the following input text:
christine harley
Based on these two words, results such as chrley, hristine, hriney, and hrley are all acceptable. Blech. The problem is that it's okay for H to be followed by an R, but ONLY when H was preceded by a C.
But CHR is not a letter doublet, it is a letter triplet, and letter doublet analysis is never going to capture rules like this. So I modified my name generation software to do letter triplet analysis if desired. This requires keeping statistics on how often each PAIR of letters is followed by a particular letter. Instead of keeping track of the number of times H follows C, and R follows H, the algorithm keeps track of how many times R follows CH. Like letter doublet analysis, you also have to pay attention to the ends and beginnings of words, but you express them in terms of triplets of letters. For example CHUCK reduces to these triplets: **C, *CH, CHU, HUC, UCK, CK*, and K**. Given these rules, the letter C would only appear at the beginning of a word, or after the letters HU. The letter H would only appear if preceded by *C (C at the beginning of a word). So you couldn't, for example, generate the word CHUCH using letter triplet analysis, whereas letter doublet analysis on the same input text produces CHUCH quite easily. The end result of letter triplet analysis is that the names generated tend to sound a lot better, but you get a lot less of them, especially if your input text is short.
For example, the input text of "christine harley" produces only the following names through letter triplet analysis: christine, harley. This is not surprising because the names have no letter pairs in common. If you choose names with pairs in common, then you give the software ways to build hybrid names. If your input text was "samantha amanda" for example, then you can get new names like: amantha, samanda. Add "tamara" to the input text and more possibilities arise: amara, samara, tamanda, tamantha. Letter triplet analysis works great, especially with a large sample of input text. Sure it will definitely produce lots of garbage names that you won't like, but mixed in with the crud will be some gems.
So why not analyize quadruplets? A couple of reasons actually, the most important being that in order to generate a new name, there would need to be names in the input text that share letter triplets in common. This probably doesn't happen enough for it to be really worth it. Secondly, doing letter doublet analysis requires one to keep track of 729 different possible letter combinations and the number of times each one occurs in the input text. Letter triplet analysis requires one to track 19,683 different possible letter combinations and their incidence counts. Going to quadruplets would require the software to track 531,441 (half a million) combinations--read: lots of memory, lots of time. Why go to all that effort when the benefits are likely to be small anyway?
So when generating names I use both types of analysis. If the text sample is really small, I use letter doublet to shake things up, otherwise I use letter triplets. My daughter's middle name "Abrielle" was generated in this way from a list of girls names that my wife and I liked. Later we found out that "Abrielle" was in fact a real (archaic) French name, meaning "April". (Our kid was born in April... pretty cool.)
One of the cool things about letter triplet analysis is how much the output resembles the input. If you feed in French names, the stuff you get out sounds French. If you feed in Italian names, you get Italian sounding words out. If you feed in the names of minerals, or flowers, the output sounds like it might be minerals or flowers. Simple statistical analysis of the words, and then following those statistics in the generation results in completely new words that might not look out of place among the real words. For example, can you pick out the fake chemical elements in this list: cadmium, calcium, californium, callium, carbon, caridium, cerium, cerbismine, cesium, chlorine, chromium, cobalt, coberium, copper, curium, curypton? If you're very familiar with the table of elements it will be no problem. If you are writing Star Trek fiction, they'll all do, thanks.
You can also blend different sample texts together to generate words that sound completely new, or something like both samples. For example if you feed in a bunch of French names, and then also feed in a bunch of Italian names, you'll get Fritalian output. Weird. For example in my recent D&D campaign I had to come up with hundreds of names for elves. For female elves, I used a blending of the statistics from Celtic and Welsh female names, and flower names. For male elves I used Celtic and Welsh male names, and tree names. This allowed me to come up with some wonderful blended names that sounded very authentic (some examples: Afrael, Aliaric, Braewanon, Foelle, Menae, and Viliza). Generally, the quality of the output with these types of analyses depends heavily on the quality of the input. If your input text is a mishmosh of unrelated names, you should expect the output to have a similar feel.
Now early this morning, when I woke up (at about 4:30 am), I decided to see if I could take the name generation algorithms I coded all those years ago and rewrite them in JavaScript. It was a little tough in spots--string and array manipulation in JavaScript is very different from Visual Basic--but I managed to pull it off. Here then is the online version of my name generator.
DISCLAIMER: The name generator is something I threw together in a couple hours. Feel free to use it but I don't guarantee that it is compatible with all browsers. It may have bugs. It might lock up your browser. It might destroy the Earth. I certainly hope not, because that's where I keep all my stuff. Anyway, use at your own risk.
There's an explanation of how to use the generator below.
I'm including the Name Generator in this article as an embedded frame. If your browser doesn't support embedded frames, or if you would just rather work with the generator outside of the context of this article, click here. Feel free to play with it. I think it is a lot of fun, and if you like fooling about with letter patterns, you will probably enjoy it too.
The basic steps are:
- enter text in the box at the top
- click analyze and wait a few seconds
- click either of the generate buttons
You can enter (or better yet, paste) any sample text you want into the text area at the top (or you can click one of the category buttons and some sample text will be provided). Clicking analyze will then cause the software to study the text and tabulate all the doublets and triplets. Depending on how much sample text there is, this could take a few seconds. Then you can click either of the generate buttons to generate some names based on the sample text. One of the buttons uses doublet analysis, and the other triplet analysis. I think you'll find the triplet analysis better, but doublet analysis still churns out a good name every now and again. Keep in mind that both types of analysis will churn out garbage names you will have to pick through, but I find picking through them kind of amusing. Note that each time you analyze sample text, it gets blended with the previous analysis (if you do boys names and then you do girls names, you'll get "transgendered" results.) This allows you to combine rocks with elements for example, or girls with trees. If you want to do a fresh analysis, you can throw away all the old statistics by clicking Reset.
The generate buttons will each generate 100 names. This list of names will be sorted, and duplicates will be removed. If you don't see anything you like, just click one of the generate buttons again and a new set of 100 names will be generated.
The category buttons are fairly obvious. Boys and Girls give the 1000 most popular names of the respective gender for newborns in the USA in 2003. Surnames gives the 1000 most popular surnames of the 1990 US census. Keep in mind that the boys, girls, and surnames are ethnically diverse, so they'll generate some odd output. If you find a list of names that is ethnically homogenous somewhere try pasting that in instead. Here you can think of the "elements" as an example of homogenous names. I think the name generator is particularly cool for baby names. If you have a long list of names you really like, maybe the rules extracted by the software will capture whatever quality it is about those names that you like, and will therefore generate names that appeal to you. Or, you can take the names of every member of your family, your grands and great grands, and heck any favorite aunts, uncles, or cousins, and then analyze them. You could make a new name based on your particular family group. Trying to come up with the name of a nonexistant Austrian town for a short story? No problem, grab an atlas and feed the name generator a list of Austrian town names, and generate an entire tourguide worth of mythical Austrian cities.
You can even weight the results. If you are generating boys names and you are particularly fond of the name Horace for some reason, simply include the name Horace in the input text multiple times. If you start with a list of 50 names, and you add 50 Horaces to the list, you're gonna get a lot of names out that have bits of Horace in them.

Search
Recent Comments




