Brillig Understanding, Inc.
Machine Learning vs ChatScript technology
Machine Learning (ML) can dance rings around ChatScript in some areas. It is wonderful for autonomous vehicles, image recognition, and games but it has serious limitations for natural language, where ChatScript can dance rings around ML.
The way we describe CS is that it can find meaning, as precisely or as vaguely as you need it to. From "how much is this lamp" (only one specific way to accept cost of a specific object) to "~cost * ~object" (price of any object in the universe) to "I like *" (user says they like something). In each of these you can write a single simple rule pattern to detect the intent. And each would take lots of sample inputs to train ML to do it.
The issues are how well does each do, how much effort is it to program meaning detection in either, and what other restrictions or abilities exist for either. Neither ChatScript nor ML understand all meanings, but each can be used to hunt for specific predictable meanings. ML is trained in how to do pattern matching; CS is scripted (programmed) in how to do it. ML interpolates and extrapolates from what it has been taught, leading to errors in overgeneralization. CS does precisely what you request of it, leading to whatever errors you script in.
Bots and Bot Platforms and Hubris
One can write and deploy a bot entirely by oneself. But typically when a company wants to build a complex bot serving lots of customers, they will turn to a bot platform, a place that supports building bots and hosts lots of different people's bots. ML is not complete without a bunch of support systems. So when we talk about ML, while there are open source ML libraries you can use, many people use the tuned services of a bot platform. Major companies like Amazon, Google, Microsoft, Facebook and others provide such services. Which in turn provide limitations on what you can do.
When you talk about ChatScript, it itself is an open source complete server. But still companies need to build infrastructure to support it (load balancers, logging systems, deployment systems, etc). And currently there is only one bot platform for CS ( Kore.ai). It is used by lots of major enterprise customers doing conversations at scale.
Bot platforms have issues like privacy and cost. You send your data to them and you pay to use them. As trustworthy as you may think the big companies are (and who trusts Facebook these days?) if you have highly sensitive data you have no guarantee that all employees are trustworthy. CS can be run not just on-premises (Kore systems can be deployed on-premises so Kore sees no data), but on device with no internet connectivity at all (total privacy and no cost).
It would be the height of hubris for me to claim I successfully compete against Amazon, Google, Facebook, Microsoft, and IBM. So I won't. I will note that others have said that in the area of natural language, ChatScript is a competing alternative.
VentureBeat, How to pick a platform for your financial sector chatbot, Feb 2017
Conversate's 25 platforms table, May 2017
And finally there is the Gartner's reports, which are highly respected but since they are commercial proprietary documents I can't reprint the 2019 Chatbot report here. They described 9 platforms, including Watson, Google, Microsoft, Kore, and ChatScript itself. They said that ChatScript "powers many of the vendors in the chatbot segment".
Machine learning works by giving it a series of sentences of tokens (what we think of as words but in actuality there is no understanding of them as words). Give enough sentences and the system can learn to “generalize” and detect the intent of an input sentence. Intent typically means what command do you want a bot to execute. Correspondingly, you can use ChatScript to script detection of an intent. Tokens "are" words and it does "understand" how words are related to each other either grammatically (conjugations) or semantically (ontologically).
ML does not understand meaning of words, so often singular and plural forms, verb conjugations, typos, are all different words to it (meaning more training data AND need to strap on some kind of automatic spell checker from your end). In fact, every single number is a unique word (including one vs 1). CS has a built-in dictionary, spell-checking, and ontology of words. And ways to address collections of words (concepts), so that, for example, the set of all numbers can be referenced as a single token in a CS pattern. In fact, ChatScript patterns can access three different input streams: the original user's input, the adjusted input after spell-correction, contraction expansion, and other cleanups are done, and a canonical form of the adjusted input.
ChatScript is extremely powerful for handling natural language at a cost of a steep learning curve for training programmers. Typically we recommend combining language experts (marketer or English major or lawyer) and programmers, because programmers tend to be lousy at thinking of alternative ways of expressing an intent. And foreign programmers are even worse because English (presumed language of the bot for the moment) is not their native tongue.
ChatScript is harder for most programmers to learn because it requires a new mindset. Programmers are used to imperative languages (like C++ and Java). They take a while to transition to different mindsets: recursive languages (LISP) or purely functional ones that don't believe in assignment statements (Haskell). ChatScript, while having a large imperative component that is easy to learn, involves yet another paradigm shift, that of using patterns to enable rules (expert systems) where each rule acts like a subroutine. It is not a clear execution decision tree like most programming languages.
In contrast, using ML involves almost no skill (which is why companies love it and live with its limitations). Someone skilled picks the ML technique to use (or you buy into a bot platform), and then anyone can add sentences as examples. If the system makes a mistake, you just add the new sentence with the right or wrong mark, and tell it to relearn. This seems so easy (though relearning takes noticable time for a bot with many intents). Almost every single bot development environment uses this technique. But revising the input data set with failed examples can go on endlessly.
There are hidden issues with ML around data and limitations on understanding complex NL imput. First, no one warns bot developers that typically at least a thousand sample sentences per intent are needed. ML works with big data, data the typical bot developer does not have. This means the developer will give a bunch of examples, think it works, release it to the public, get back failures one-by-one and then add new sentences. This will take quite some time to actually get the system to work well, time the developer does not expect. Facebook's AI Research team estimated that they would need many millions of samples to optimize a single restaurant booking intent. In contrast, ChatScript uses small data. One can use Fundamental Meaning to define intents by using a verb, its synonyms, and a direct object noun. And adding in idiom mapping.
CS pattern matching can be much more precise than ML's. ML is typically heavily dependent upon detecting and reacting to key words. It is not that good at distinguishing context that invalidates detection. An intent of "see a doctor" will likely be selected wrongly if you say "I don't need to see a doctor" whereas CS can be given contexts in which words become moot.\P>
The big ML success stories you read about are done with millions of input samples. Most people build NL bots with inadequate data, leading to crappy bots that train the human to know what works instead of allowing anything reasonable. Simple bots like Facebook's WeatherUnderground bot fail in ridiculously simple ways after being there for 2 years. Intent recognition ML requires a large amount of sample inputs per intent and constant monitoring and logging of failed (and even succeeded) results to adjust your training data. How many intents will you have --- do you have the training data already or do you have to create it from scratch? CS does not require preexisting data.
Generating ML Training Data
Classicly you provide a collection of sample inputs to some third party service and ask them to label the intents. You pay a third party because you cannot tie up your own valuable staff doing things you can hand over to ignorants.
And then you say to yourself "but that person might make a mistake". So you hand the inputs over to 3 people to label, going for a majority rule answer. You may discover that all 3 disagree, so maybe you up it to 5 people. Training is getting more expensive.
But really, you are looking at the issue incorrectly. EVERY answer you get from the humans is correct. You are trying to simulate human judgement. So accept all the labels and be grateful if your system picks one of them.
Intent and Entity Detection
The typical NL system requires both intent and entities. Intent is what you want to do roughly (like tell me the weather) and entities are data that make the request specific (in Seattle three days from now).
Most people think of ML in conjunction with intent detection. ML requires being trained in advance to detect intent. CS does not because it can understand "meaning". For example, KORE can handle intents recogition arising from dynamic menus returned from API calls to some website. ML cannot because it needs to be trained in advance. You can load an entire bot dynamically in a CS bot platform (for example redefining its ability using some bot definition tool and then updating a live server with the revised bot definition).
Entities are the data detected to go along with the intent, like “what is the weather in Seattle”. Since machine learning does not handle entities easily, systems provide predefined entity types. One disadvantage of such systems is that entities are rarely definable by the bot author (with the exception of enumeration data). So if the system doesn't currently support the entity you need, you are stuck. ChatScript allows you to define any entity you need.
ChatScript is REALLY fast. Given an input of "my iphone is frozen", ML determined the intent (SMARTPHONES-FROZEN) in 1087 ms. In 1/10 of that time, CS performed a full natural language analysis (pos-tagging, parsing the sentence), determined the same intent, found the entity and brand was an Apple iphone, and issued the dialog response to the user. Similarly, for an input of "my stereo makes a buzzing sound", ML found the intent (HOME_STEREO_THEATER-BAD_SOUND) in 1454 ms, whereas CS took only 54 ms to determine the same information (again with full NL analysis) and that the product was a stereo and make a response.
ML use comes with a confidence level and you have to decide what level of confidence you want to accept. If ML says the probability of it being intent A is 75%, do you use it or not? Consider a simple classifier ML for JustAnswer, which wants to decide which of some 15 medical specialties to put the user's input in. Users often give long inputs, but for short inputs where to humans the answer is completely obvious (100% confidence), ML does not register strong confidence.
I have terminal cancer - oncology 89%
My son has a fever - pediatrics 89%
I am having delusions - mental health 89%Does this mean we should accept knowing that ML expects to be wrong 11% of the time here. How low do we accept and does it vary based on input length? CS matches binary- yes or no. Fixing an answer that is wrong is easy.
ML for Natural Language is fine for simple intents, but cannot read and extract information from narratives. Consider "My son is dying of throat cancer. His oncologist has been giving him aspirin to cure him. Can he sue". Our ML would say unconfidently this is medical 47.2% (specialty: oncology 42%) . 2nd guess is legal: 47% (specialty: estate law 25%). Whereas by interpreting the user's story, ChatScript could say it's clearly legal (malpractice) and glean the diagnosis, the treatment, the prognosis, and who this is about. This capability is equally useful for extracting data from reams and reams of user logs.
Combining ML and CSThe most clever systems use both CS and ML. One reason is they have different types of failures. You can then write stuff to figure out how to combine to decide who is right. Having CS in control of who wins is common but I have seen a system where another ML system is used as a binary classifier to decide who is more likely to be right.
CS can be used to enhance or reduce ML confidence levels. You can write CS rules to override ML values in particular situations that are hard to fix with training data.
ML is bad at fine distinction. Suppose have a bot that wants to connect you to professional help. It takes user input and decides if you need computer help or legal help or other. ML can tell you the area (legal) and even the specialty (family law). But it's pretty much hopeless at distinguishing: "I was divorced but now I'm getting married" from "I was married but now I'm getting divorced". Or consider that NL has a notion of "stop words", which are words which are filtered out before or after processing of natural language data because they are so common (like "a" or "the"). ML has difficulty handling such words, which can be critical to recognizing intent. Consider "I want a meeting with John" == "schedule a meeting" and "I want the meeting with John" == "retrieve an existing meeting". Easy with CS and hard with ML. Hence, I use ML to select gross intent and CS to handle refined actual issues.
One CS user wrote a medical patient bot for training doctors doing intake questioning. They then tried the same using ML. The ML was slightly worse (not seriously so). Their mistakes differ (ML being optimistic and CS coded pessimistically). But then he wrote an ML classifier and trained it to decide which answer was going to be correct, ML or CS. That chosen result was noticably better than either system by itself.
And if you don't yet have the training data for ML, you can use CS as a quick way to build and deploy a system from which you can then gather user inputs, training ML later.
Fine, so what else can't ML systems do? Bot platforms must learn each bot's intents independently. So you need to name the bot in addition to the intent. E.g., “Alexa, use WeatherUnderground and tell me the weather in Seattle.” Having to name the bot on every interaction is tedious. This is not required with ChatScript. One central bot can determine what bot is needed to use an intent.
ML treats the user's input as one thing, even if it consists of multiple sentences and has no ability to detect and represent a conjunction of requests like "tell me the weather in Seattle and book me a flight there for Tuesday". Nor a sequence of sentences if that were broken into two sentences in the same volley. In fact, they typically strip punctuation from the input to improve matching. This does not matter to them because they have no use for punctuation disambiguating meaning. It certainly would not handle pronoun resolution.
In fact, because ML platforms typically run independent bots, they cannot pass along data from one bot to another nor track context of recent conversations, nor detect multiple requests in a sentence. So ML cannot handle these:
Tell me the weather in Seattle. And in Chicago.
Tell me the weather in Seattle and book me a flight there.
Nor can you interrupt a conversation with a bot.
User: Book me a flight to Seattle.
Bot: When would you like to fly?
User: First, schedule a meeting for next week.
… conversation about meeting
… when complete, return to flying request.
ChatScript can handle these things.
Pronoun Resolution and EllipsisBecause ML itself handles inputs independently, it is not good for resolving pronouns or ellipsis (omission of words that can be determined by context). But it's an unfair comparison, because it really would be the responsibility of the dialog manager. And you can kludge things together in various dialog managers in the major bot platforms that come with ML.
Tell me the weather in Seattle. And in Chicago.
Here you are omitting "tell me the weather". Because CS comes with a dialog manager integrated in, it is easy to detect the initial intent + entity. Handle that. Then detect in the next sentence that no intent is recognized, but an entity matching the needs of the most recent intent is found, and so run with that. It may well also be doable in other bot platforms, but it will be awkward.
Pronoun resolution is another thing a dialog manager can do. In fact, most commands to a bot don't need to resolve anything but 'it' and 'there'. In car automation, for example, it's immaterial whether you say "I want more heat" or "We want more heat". And nothing the car does is going to need to distinguish gender pronouns. That means you only need to deal with pronouns for an object (it) and a location (there). Full-scale computer science pronoun resolution becomes unneccesary when you can just track the most recent object or location reference. Unlike narrative text which may refer to multiple objects, commands to bots rarely do. So it's trivial in CS and less trivial in other bot platforms. The other platforms can certainly remember most recent use of entities from a detected intent and pass them around with every intent (awkward but doable). But they cannot detect an entity that occurs when not involved in an intent. CS can. "while we're driving to New York, play the Beatles for me". That is intent "play music" and entity Beatles, with a side nod to a location that ML will not detect and track.
Similarly, using a dialog manager, ML could fake "it" detection as the most recent entity of an intent. But it would be very awkward to handle the request "crank it up", where it could refer to car speed, radio volume, left window, etc. You could write a bunch of training sentences that allow ML to come back with 3 possible intents of equal probability (tedious, awkward, and the confidence score will be unreliable). And then in your dialog manager try to consult saved context to pick one. But, as I said, it would be awkward, bending the expectations of how to use these platforms.
This is easy to do in CS. CS pattern detection is not limited to requiring things be intents. CS is meaning based. "up" is both a directional concept (vertical direction) and an amplification concept (more). I would merely write rules that detect words meaning direction and amplification status, put them lower than ones that detect a specific intent (so we are finding a homeless modifier). Have those rules in their output code section check if one of the entities of car speed, portal (windows are portals), and audio volume was most recently mentioned, and execute the appropriate action. Then this conversation is handled:
Turn on the radio
Crank it up
Not so much
Integration with other abilities
You cannot build a chatbot without a dialog manager. ML itself does not include a dialog manager and the ones provided by the major bot platforms using ML are clunky and awkward to use for any large set of conversational nodes. An example is DialogFlow (Google, formerly API.ai). Because they are RESTFUL interfaces you have to designate what data to pass back along with the intent, and then pass that data back in on the next intent as context, with all state memory of the conversation saved on systems you have to program on your end. The primary cost in creating long dialogs is the typing cost for defining them. DialogFlow is awkward to write for. As a Gartner's report put it: "this is a visual tool to build decision trees ... but clients report limited functionality." CS was designed with dialog management built in. It handles conversational state automatically for users, across an infinity of time. Writing conversation is fast and easy using a simple text editor.
And if you want to use parsing or other NL abilities, the systems that do support it (like IBM Watson), require horrendous XML interfacing code. Or you have to supply your infrastructure to do things beyond the simple bot platform.
ChatScript can invoke other website's JSON APIs. It can talk to Postgres or Mongo or MySQL or Microsoft SQL databases. And it even has its own built-in data representation abilities. So without writing any external code, you could create a bot that can handle home improvement questions. Just locally define a table of products (e.g., dishwasher or snowmobile), possible brands (Miele or Snow Pro), model numbers in their product lines (if you have them), and issues ("won't start" or "has blinking error code"). Then an input like "I have a dishwasher that fails to start. It's a Miele CF9ett6." and all that data can be pulled out. Or this input "My Miele won't begin to wash. It's a CF9ett6". The system can use its structures to figure out you have a dishwasher. Try doing that strictly in ML. Or imagine the work to write external systems to manage that somehow.
By the way, almost all companies use ML to perform sentiment analysis. But a paper reports that about one in four sentiment analysis predictions by Amazon’s Comprehend change when a random shortened URL or Twitter handle is placed in text, and Google Cloud’s Natural Language and Amazon’s Comprehend makes mistakes when the names of people or locations are changed in text. “The [sentiment analysis] failure rate is near 100% for all commercial models when the negation comes at the end of the sentence (e.g. ‘I thought the plane would be awful, but it wasn’t’), or with neutral content between the negation and the sentiment-laden word,” the paper reads. None of that is a problem for a ChatScript sentiment analyzer.And since ML uses confidence values which dilute when you have different intentions to detect, sentiment analysis is rarely precise with ML. IBM Watson, for example, gives a mushy analysis a lot of the time. A score above .5 indicates some likelihood of the emotion (higher is stronger). .5 means entirely uncertain. Below .5 means not likely there, less is more. Kore's sentiment analysis using ChatScript is more effective (more definitive, closer to human assessment) and was merely an experimental creation, not a tuned product.
The Lincoln's Gettysburg Address yields this:
IBM- They have no opinion on fear. Absolutely sure that there is no anger and disgust. They think there is no joy and sadness.
Kore- The general tone is definitely positive. There is some joy and no fear. Anger and disgust were not detected.
Positive: .83FDR's Day of Infamy speech yields this:
IBM- They are unsure if fear or sadness present or not, maybe a bit. They are very sure that there is no anger, disgust and joy.
Kore- The tone is not positive and is very sad. There is no fear and no anger. Disgust and joy are not detected.
An issue with bot platforms is that they make voice and text intent processing into a black box. You cannot intervene. So, for example, if you know under some circumstances voice recognition will make a mistake with a word, you can't correct it. Or suppose you want to allow a user to define their own meanings. E.g., a car application accepts the command "raise window" and you want to allow the user to be able to say "smumble means raise" (maybe user wants to define French equivalents of commands). You can't do that with ML but you can with CS because you can intercept the input from voice and do other things with it before you let the rest of your script handle it. If you are not on a bot platform, you could intercept, but writing python code or some such to try to manage it will not be pleasant.
Testing, Debugging, and Regression
Testing and debugging ML is a nightmare because you cannot reliably predict intent some unknown sentence will map to. Expect 15% error rate on a well-trained system and when you find an error, since you don't know why it failed, you just have to add the error sentence into your training data and add a bunch more that do and do not match the intent to help it distinguish.
Just picture this ML classification failure I have seen, classifying legal inputs into one of 17 specialties. The chatbot asks what state the user is in, the user replies "Ky.". For that input, ML believes with over 90% confidence that this conversation should be in the legal specialty of family law. It may make no sense to you. And debugging why is not really an option. But I can imagine that some sample input in family law had that abbreviation for Kentucky in it, and no other input anywhere did, so it became a unique keyword of obvious signficance. Training it out of existence presumably means adding it to some inputs in other areas of law as well.
With ChatScript, you know what the pattern is and you can explicitly alter it. CS also has built in capabilities to verify that rules can match either locally or globally. The biggest issue with expert system technology is that it is easy to add a rule elsewhere that breaks your system by overriding the rule you wanted to fire. CS can check for that automatically. And you can create a regression suite and CS can tell you if it does the right thing. This is not a mere diff against an expected output. If you edit your intended output, regression may still pass because it can tell that the rule that generated the output was the same rule as before or that while the order of rules may have changed but the pattern of the rule is the same.
One can certainly do regression testing of ML. If you use a bot platform it will be an intellectual annoyance that you will have to pay for those transactions, but it really doesn't amount to any cost.
Full human conversation
Obviously, ChatScript has been used to make Loebner-winning (Turing test) chatbots that handle actual conversation. A lot of that is FAQ-type answers for questions like "what does your mother do" or "do you have any siblings". A Loebner-level chatbot like Rose has 9000 or more such answers ready and handles that in 1-2 rules per answer (a rule pattern is typically a single line). ML would nominally require 9,000,000 training sentences to handle that. Good luck with that. We've never seen a Loebner-winning ML-based chatbot.
Mostly people use bots for FAQ question answering (when are you open) or command behavior (create an appointment). Most FAQ behaviors can be addressed using data base lookups like Lucerne to return the data (though obviously CS can be used. Just not recommended if FAQs run into the thousands).
Production grade features
ChatScript is a complete high-performance scalable server. An application could interact with it directly and be complete. You can easily deploy it on Amazon or other cloud services. ML requires a server to handle it, and other servers to support it.
Chatscript lends itself to version control, change accountability and automated deployment. You define a bot as a readable text file that gets compiled. Obviously ML can have version control on its input data, but you cannot do code reviews or understand the implications of adding a single additional sentence. With CS programmers can examine the changes and see if they make sense.
Command intent behavior is managed in a CS bot platform by "fundamental meaning" (explained at the end). The basic sentence typically has a subject(you)-command verb- object (what to act upon), along with entity data. To generate an intent pattern you start by taking an example of some input, like "tell me the weather". Strip it down to the basics and you have (YOU)-tell-weather. That is the intent. Then you create a concept set for the verb, and put in all the synonyms you can think of for tell (explicate, describe, say...) and for weather (rain snow, temperature, ...). That can all be done quickly and become the equivalent of thousands of ML training sentences (because each word in synonyms needs to be in a training sentence and the number of sentences becomes a minimum of the product of the sizes of the two sets (often more). And you need to define patterns for idioms, like "how is it out" == tell weather. But this can be written as a single rule, using a simple text editor, and so this intent could be coded by a skillful CS programmer in a few minutes. Generating the list of training sentences would take lots longer and actually running the training itself to create the model even longer. Then later, if an input failed to match, you can easily debug why and adjust your pattern and recompile and redeploy and you are ready to go again.
For an ML-based bot platform, you need to know the name of the bot you want to talk to. With Alexa that means saying "Alexa" and then saying the bot name. How do you know the name? You have to do a search for it in advance. Wouldn't it be wonderful if you could just ask Alexa to do something and she would figure out which bot you needed?
Fundamental meaning means you don't need to know the name of the bot you want to talk to. Kore.AI's Kora bot can do this. She can consult the intents supported by all available bots (fundamental meaning intents) and find ones relevant to the incoming request. There is no "discovery" problem. The right bot gets selected automatically (or if there were multiple choices and you had not established a default choice she asks you which of some bot list you wanted to use. And so when you make two requests in a single input, she can perform the first with one bot and the second with another, even passing some of the data you supplied for the first bot to be used by the second one.
As a bit of tongue-in-cheek, an MIT Technology Review article computed the carbon emissions of training a significant deep learning ML model, taking into account the energy it consumes and the average fuel mix in the US to create that energy. They found that it emitted as much carbon as the entire lifetime of a car including fuel. An American life in a year emits 36K pounds. The car emitted 126K pounds of CO2 equivalent. And an ML transformer (213M parameters) with neural networks search emits 626K pounds to train. The cost of a ChatScript bot is too small to measure.
And that's why we say ChatScript dances rings around ML when NL is involved.
Home About Us Technology Projects Testimonials ChatBot Demo Awards/Press Publications Contact