Brillig Understanding, Inc.

ChatScript is an open-source natural language engine (SourceForge) (Github).

ChatScript competes with Machine Learning as a technology.



Design Constraints and Goals: Memory, Speed, Conciseness, Preciseness, Generality

ChatScript is a rapidly evolving engine with new capabilities and releases at least every month. Underlying this are a bunch of design constraints and goals.

I come from the video game industry, which pays enormous attention to both memory and speed. So ChatScript is built as a production-quality system. It can run a fully-written chatbot locally on an iPhone using 16M of memory. A low-grade server can serve that same fully-written chatbot to a couple thousand users simultaneously.

Writing a hand-crafted chatbot is a large authoring task. Every character you have to type is a cost, so ChatScript is designed to enable great conciseness of expression to facilitate speed of authoring. Most rules fit on a single line. This is why ChatScript has concepts you can define and simultaneously looks at the canonical input as well as the original input.

My intention is that you can precisely express what you want. While not fully exploited yet, ChatScript can represent words down to a specific meaning. You can write patterns which match only the noun use of bat, or only the meaning of bat as a flying creature. The combination of a script and the engine may not be good enough to recognize that the word bat in a sentence only refers to the creature, but if you write code to figure that out (one can in many cases), then your patterns can be specific enough to react only to that meaning. Precise expression is why patterns can reflect the absence of words and why ChatScript keeps punctuation and case in its patterns.

ChatScript is intended more generally to allow chatbots to be embodied. That means a ChatScript brain can drive a body in a real or virtual world; that it supports planning and inferencing in addition to mere natural language processing. This is why ChatScript can represent information as fact triples, has a graph traversal query language built in, and why ChatScript has planning capability built in.

ChatScript began life as a tool to make chatbots, but the goal evolved into one of general handling of many natural language needs.

Basic Features:

Powerful pattern matching aimed at detecting meaning.

Simple rule layout combined with C-style general scripting.

Built-in WordNet dictionary for ontology and spell-checking.

Extensive extensible ontology of nouns, verbs, adjectives, adverbs.

Data as fact triples enables inferencing.

Rules can examine and alter engine and script behavior.

Planner capabilities allow a bot to act in real/virtual worlds.

Remembers user interactions across conversations.

Document mode allows you to scan documents for content.

Ability to control local machines via popen.

Ability to read structured JSON data from websites.

Postgres support for big data or large-user-volume chatbots.



OS Features

Runs on Windows or Linux or Mac or iOS or Android

Fast server performance supports thousands of simultaneous users.

Multiple bots can cohabit on the same server.



Support Features

Mature technology in use by various parties around the world.

Integrated tools to support maintaining and testing large systems.

UTF8 support allows scripts written in any language

User support forum on Chatbots.org



How ChatScript works

CS is a scripting language for interactivity. Each time CS communicates with the user, this is called a volley. Volleys are always asynchronous. In CS, each volley actually consists of accepting an incoming input from an arbitrary user, loading data about the user and their state, computing a response, writing out a new state, and sending a response to the user.

Topics and Rules: The fundamental code mechanism of ChatScript is the topic, which is a collection of rules. Rules have pattern and code components. Within a topic each rule is considered in turn by matching its pattern component. Patterns can access global data and the user's input, can perform comparisons, and can memorize sections of input data. If the pattern fails, the next rule in the topic is considered. If a pattern succeeds, the rule's code section is then executed to completion (barring error conditions).

A rule's code can be a mixture of CS script to execute and words to say to the user. Code can invoke other topics or directly request execution of a specific rule. When the rule code completes, if user output has been generated, then by default no more rules are initiated anywhere in the system. Rules currently in progress complete their code. If no output was generated, the topic continues onto the next rule, trying to match its pattern. When a topic completes without generating output, it merely returns to its caller code, which continues executing normally.

Functions: Topics are not functions and do not take arguments. CS provides system functions and you can write user functions in ChatScript. Function names always start with ^, like ^match(argument1 argument2) and no commas are used to separate the arguments (since commas themselves might be legal arguments). These are classic functions in that they have arguments and a collection of code to execute. Their code can generate output and/or make calls to other functions, including invoking topics and rules. Functions are a convenient way to abstract and share code.

Rejoinders: So how is it that CS handles returning input from the user? A rule that generates user output may have rules called rejoinders that immediately follow the rule. Rejoinders are intended to analyze the specific next input from the user to see if certain expectations are met and decide what to do. If, for example, we output a yes or no question, one rejoinder rule might look for a yes answer, while another rejoinder hunts for a no answer. When CS outputs text to the user, if the rule has rejoinders, CS notes the rule. When new user input arrives, CS will try executing the rejoinder rules immediately, to see if they match the user's input. All previous stack-based functions are gone, all previous stack-based calls from other topics are gone. CS is just in the here and now of this topic and the rejoinders of that rule. If CS finds a matching rejoinder rule, it continues in this topic. If it doesn't, CS reverts to globally using whatever the control script dictates it try for any user input.

User variables: In addition to script code, ChatScript has data. It supports global user variables whose names always start with $, e.g., $tmp. Global means they are visible everywhere. You don't have to pre-declare them. You can directly use one and you can just summon one into existence by assigning into it: e.g.,

$myvariable = 1 + $yourvariable

$myvariable is created if it doesn't already exist. And if $yourvariable hasn't been created, it will be interpreted as 0 or null depending on context (here it is 0).

User variables always hold text strings as values. Numbers are represented as digit text strings, which are converted into binary formats internally as needed. Text comes in three flavors. First are simple words (arbitrary contiguous characters with no spaces). Second are passive strings like “meat-loving plants”. Third are active strings like ^”I like $value”. Active strings involve references to functions or data inside them and execute when used to convert their results into a passive string with appropriate value substitutions. Other languages would name a CS active string a format string, and have to pass it to a function like sprintf along with the arguments to embed into the format. CS just directly embeds the arguments in the string and any attempt to use the active string implicitly invokes the equivalent of sprintf.

User variables also come in permanent and transient forms. Permanent variables start with a single $ and are preserved across user interactions (are saved and restored from disk). Transient variables start with $$ and completely disappear when a user interaction happens (are not saved to disk).

Function variables: ChatScript also has function argument variables, whose names always start with ^ and have local (lexical) visibility. Here is a sample user function header:

outputmacro: ^myfunction( ^argument1 ^argument2)

Facts: ChatScript supports structured triples of data called facts, which can be found by querying for them. The 3 fields of a fact are either text strings or fact references to other facts. So you might have a fact like (I eat “meat-loving plants”) and you could query CS to find what eats meat-loving plants or what do I eat. Or even more generally what do I ingest (using relationship properties of words). JSON data returned from website calls are all represented using facts so you can query them to find the bits of data you seek.

Like user variables, facts can be created as transient or permanent. Permanent facts are saved across user interactions, transient ones disappear automatically. When you want to point a user variable at a fact, the index of the fact is stored as a text number on the variable.

Output: Some of the text in rule output code is intended for the user. There is pending output and committed output. Pending output consists of whatever isolated words that are not part of executing code exist in the code. They accumulate in a pending output stream, and when the rule finishes successfully, the output is committed. If the rule fails, the pending output is canceled. You can also make function calls that directly commit output regardless of whether the rule subsequently fails.

Marking: When CS receives user input, it tokenizes it into sentences and analyzes each sentence in turn. It “marks” each word of the sentence with what concepts it belongs to. Concepts always begin with ~. Usually concepts are explicit enumerations of words, like ~animals is a list of all known animals or ~ingest is a list of all verbs that imply ingestion. Sometimes concepts are implicit collections handled directly by the engine, like ~number is the implied set of all numbers (we wouldn't want to actually enumerate them all) or ~noun is the set of all nouns or ~mainsubject is the current subject of the sentence. After this marking analysis, patterns can efficiently find whether or not some particular concept is matched at a particular position in the sentence. CS actually analyzes two streams of input, the original input of the user and a canonical form of it. So the system marks an input sentence of “my cat eats mice” and also marks the parallel sentence “I cat eat mouse”, so patterns can be written to catch general meanings of words as well as specific ones.

Memorizing: Rule patterns can dictate memorizing part of the input that matches a pattern element. The memorized data goes onto “match variables”, which are numbered _0, _1 … in the order in which the data is captured. CS memorizes both the original input and the canonical form of it. The pattern can use match variables in comparisons and the output can also access the data captured from the input.

Control flow & errors: CS scripts execute everything as a call and return (no GOTO). The return values are the current pending output stream and a code that indicates a control result. That result in part affects how additional rules in the calling topics or functions execute, in that you can make a rule return a failure or success code that propagates and affects the current function, or rule, or topic, or sentence, or input. So a failure or success down deep can, if desired, end all further script execution by sending the right code back up the calling sequence. When code returns the “noproblem” value, all callers will complete what they are doing, but if user output was created will likely not initiate any new rules.


Home About Us Technology Projects Testimonials ChatBot Demo Awards/Press Publications Contact