Custom generators¶

Configuration rules¶

Configuration is a flat dictionary of rules:

{
    '<rule_id>': {
        'comment': 'Some info about this rule. Not mandatory.',
        'type': '<nested|cartesian|words|phrases|const>',
        # additional fields, depending on type
    },
    ...
}

<rule_id> is the identifier of rule. Root rule must be named 'all' - that’s what you use when you call generate() or generate_slug() without arguments.

There are six types of configuration rules.

Words list¶

A ground-level building block. Chooses a random word from a list, with equal probability.

# This will produce random color
'color': {
    'type': 'words',
    'words': ['red', 'green', 'yellow']
},
# This will produce random taste
'taste': {
    'type': 'words',
    'words': ['sweet', 'sour']
},
# This will produce random fruit
'fruit': {
    'type': 'words',
    'words': ['apple', 'banana']
},

How whitespace is handled?

By default, leading/trailing whitespace is stripped, and whitespace in the middle of a word is forbidden. You can change this behavior on per-list basis using strip_whitespace and allow_whitespace parameters.

Empty word always raises an exception.

When words are in a *.txt file, leading/trailing whitespace for each line is always stripped.

Phrases list¶

Same as words list, but each element is one or more words.

# This will produce random color
'color': {
    'type': 'phrases',
    'phrases': ['red', 'green', 'navy blue', ['royal', 'purple']]
}

Phrase can be written as a string (words are separated by space) or as a list of words.

How whitespace is handled, and what about custom separators?

Each phrase is processed at initialization time with the following algorithm:

If a phrase is defined as a string (like 'navy blue' above):

1.1. If strip_whitespace=True, leading/trailing whitespace is stripped.

1.2. Phrase is split into words by whitespace (everything that matches r'\s+'),
or using separator parameter if it’s defined for this Phrase list. Separator can be a plain string, or a regular expression starting with re:
For each word in a phrase, if strip_whitespace=True, leading/trailing whitespace is stripped.

Empty word or empty phrase always raises an exception, and whitespace in the middle of a word raises an exception if allow_whitespace=False (default).

When phrases are in a *.txt file, leading/trailing whitespace for each line is always stripped.

Nested list¶

Chooses a random word (or phrase) from any of the child lists. Probability is proportional to child list length.

# This will produce random adjective: color or taste
'adjective': {
    'type': 'nested',
    'lists': ['color', 'taste']
},

Child lists can be of any type.

Number of child lists is not limited.

Length of nested list is the sum of lengths of all child lists.

Constant¶

It’s just a word. Useful for prepositions.

'of': {
    'type': 'const',
    'value': 'of'
},

Cartesian list¶

Cartesian list works like a slot machine, and produces a list of length N by choosing one random word (or phrase) from every child list.

# This will produce a random list of 4 words,
# for example: ['my', 'banana', 'is', 'sweet']
'all': {
    'type': 'cartesian',
    'lists': ['my', 'fruit', 'is', 'adjective']
},
# Additional const definitions
'is': {
    'type': 'const',
    'value': 'is'
},
'my': {
    'type': 'const',
    'value': 'my'
},

Length of Cartesian list is the product of lengths of child lists.

Let’s try the config defined above:

>>> from coolname import RandomGenerator
>>> generator = RandomGenerator(config)
>>> for i in range(3):
...     print(generator.generate_slug())
...
my-banana-is-sweet
my-apple-is-green
my-apple-is-sour

Warning

You can have many nested lists, but you should never put a Cartesian list inside another Cartesian list.

Number¶

To add a random number to your slugs, use configuration like this:

'all': {
    'type': 'cartesian',
    'lists': ['word', 'number']
},
'word': {
    'type': 'words',
    'words': ['dog', 'cat', 'bird']
},
'number': {
    'type': 'number',
    'digits': '3'  # default is 3 if omitted; min=1, max=7
}

Result:

>>> from coolname import RandomGenerator
>>> generator = RandomGenerator(config)
>>> for i in range(3):
...     print(generator.generate_slug())
...
cat-798
bird-931
cat-83

Numbers start with 1 and have at most digits digits.

Length limits¶

Number of characters¶

There are two limits:

max_length
This constraint is hard: you can’t create RandomGenerator instance if some word (or phrase) in some rule exceeds that rule’s limit.

For example, this will fail:
{ "all": { "type": "words", "words": ["cat", "tiger", "jaguar"], "max_length": 5 } }
Different word lists and phrase lists can have different limits. If you don’t specify it, there is no limit.

Note: when max_length is applied to phrase lists, spaces are not counted. So this will work:
{ "all": { "type": "phrases", "phrases": ["big cat"], "max_length": 6 } }
max_slug_length
This constraint is soft: if result is too long, it is silently discarded and generator rolls the dice again. This allows you to have longer-than-average words (and phrases) which still fit nicely with shorter words (and phrases) from other lists.

Of course, it’s better to keep the fraction of “too long” combinations low, as it affects the performance. In fact, RandomGenerator performs a sanity test upon initialization: if probability of getting “too long” combination is unacceptable, it will raise an exception.

For example, this will produce 7 possible combinations, and 2 combinations (green-square and green-circle) will never appear because they exceed the max slug length:
{ "adjective": { "type": "words", "words": ["red", "blue", "green"] }, "noun": { "type": "words", "words": ["line", "square", "circle"] }, "all": { "type": "cartesian", "lists": ["adjective", "noun"], "max_slug_length": 11 } }

Both of these limits are optional. Default configuration uses max_slug_length = 50 according to Django slug length.

Number of words¶

Use number_of_words parameter to enforce particular number of words in a phrase for a given list.

This constraint is hard: you can’t create RandomGenerator instance if some phrase in a given list has a wrong number of words.

For example, this will fail because the last item has 3 words:

{
    "all": {
        "type": "phrases",
        "phrases": [
            "washing machine",
            "microwave oven",
            "vacuum cleaner",
            "large hadron collider"
        ],
        "number_of_words": 2
    }
}

Configuration files¶

Another small example: a pair of (adjective, noun) generated as follows:

(crouching|hidden) (tiger|dragon)

Of course, you can just feed config dict into RandomGenerator constructor:

>>> from coolname import RandomGenerator
>>> config = {'all': {'type': 'cartesian', 'lists': ['adjective', 'noun']}, 'adjective': {'type':'words', 'words':['crouching','hidden']}, 'noun': {'type': 'words', 'words': ['tiger', 'dragon']}}
>>> g = RandomGenerator(config)
>>> g.generate_slug()
'hidden-dragon'

but it becomes inconvenient as number of words grows. So, coolname can also use a mixed files format: you can specify rules in JSON file, and encapsulate long word lists into separate plain txt files (one file per one "words" rule).

For our example, we would need three files in a directory:

my_config/config.json

{
    "all": {
        "type": "cartesian",
        "lists": ["adjective", "noun"]
    }
}

my_config/adjective.txt

crouching
hidden

my_config/noun.txt

dragon
tiger

Note: only config.json is mandatory; you can name other files as you want.

Use auxiliary function to load config from a directory:

>>> from coolname.loader import load_config
>>> config = load_config('./my_config')

That’s all! Now loaded config contains all the same rules and we can create RandomGenerator object:

>>> config
{'adjective': {'words': ['crouching', 'hidden'], 'type': 'words'}, 'noun': {'words': ['dragon', 'tiger'], 'type': 'words'}, 'all': {'lists': ['adjective', 'noun'], 'type': 'cartesian'}}
>>> g = RandomGenerator(config)
>>> g.generate_slug()
'hidden-tiger'

Text file format for words¶

Basic format is simple:

# comment
one
two  # inline comment

# blank lines are OK
three

You can also specify options like this:

max_length = 13

Which is equivalent to adding the same option in config dictionary:

{
    'type': 'words',
    'words': ['one', 'two', 'three'],
    'max_length': 13
}

Options should be placed in the beginning of the text file, before the first word.

Text file format for phrases¶

For phrases, format is the same as for words. If any line in a file has more than one word, the whole file is automagically transformed to a "phrases" list instead of "words".

For example, this file:

one
two

# Here is the phrase
three four

is translated to the following rule:

{
    "type": "phrases",
    "phrases": [
        ["one"], ["two"], ["three", "four"]
    ]
}

Unicode support¶

Default implementation uses English, but you can create configuration in any language - just save the config files in UTF-8 encoding.