Visual Chatbot Introduction

As Artificial Intelligence (AI) continues to increase its ability to convincingly replace humans, software developers and businesses alike dream up more and more creative ways to apply it in order to generate profit. Online chatbots have been a cornerstone of the customer service side of online businesses since the mid-2000s, receiving customer feedback and answering questions rapidly - though often ineffectively.

In a recent survey, over half of online customers were unsatisfied by their ACX (automated customer service) experiences, citing slow response speed (59%), inability to understand context (51%), and low accuracy of solutions (44%) as their biggest pet peeves. Certain companies, like Ada, boast far-higher NPS numbers, but the average scores in the industry show that many tools are lacking

Fortunately, new developments in AI promise more efficient and multi-faceted ways of interacting with users. The integration of visual information processing into traditional chatbots can add a whole new dimension to the automated conversation experience.

What is a visual chatbot?

A visual chatbot is an AI computer program that is able to interpret images or video provided by users in real time and respond to questions about the visual content.

An effective visual bot could identify damage on your vehicle following an accident and give you a quote for repair costs. It could scan your face and tell you which glasses would look best on you. It could even walk you through installing a light fixture or fixing your wifi router.

Given sufficient time and investment, the possibilities of visual bots are practically endless. These systems could replace thousands of customer service workers and increase customer satisfaction, saving businesses huge sums of money.

What distinguishes visual chatbots from standard chatbots?

As high-tech as they may sound, chatbots - short for chatterbots -  have been around since the 1960s when John Weizebaum created ELIZA.

Eliza Chatbot Image 1

Eliza Chatbot Image 2

Figure 1: A brief and not particularly helpful therapy session with a version of ELIZA.

Chatbots have come a long way since this early attempt at a natural language processing program (also known as an NLP chatbot). If you want to see for yourself, try comparing ELIZA (1966) to ALICE (Artificial Linguistic Internet Computer Entity, 1995). You can even take a look at a hilariously unproductive conversation between two early chatbots, ELIZA and PARRY (AKA “ELIZA with attitude”; developed in 1972).

Despite this progress, the vast majority continue to rely on typed input from users, or occasionally spoken dialogue. One could say that standard chatbots have ears.

Visual chatbots differ from this norm in one major way - they have eyes. These advanced bots can interpret not just language-based, but visual data from users. Their possible applications in the business world have been explored and expanded throughout the last decade, and 2021 will surely be no exception. With this being said, why is it that businesses are so interested in visual chatbots?

Why do I need my chatbot to see?

With traditional chatbots projected to save businesses over 8 billion dollars annually by next year, it is no question why adding more features and abilities to this technology is of interest to business leaders.

They say a picture is worth a thousand words, which makes a video worth a million (depending on duration and frame rate, but let’s not get technical). Information that the client may not know themselves, such as the make and model of their product, could easily be determined by a visual chatbot with access to a vast database of images and information to compare it to. This could accelerate the problem solving process and give the customer the results they want with far more efficiency and convenience.

At the end of the day, a chatbot’s effectiveness at responding to a user’s query is limited by the user’s ability to communicate it. Even if the client has all the information the program requires to resolve their problem, language is inherently flawed and includes bias, intonations and colloquialisms that computer programs in this day and age are not yet equipped to interpret.  By replacing language and its associated inaccuracies with visual data about the problem itself, the computer program can receive the information in as raw and unbiased a form as possible. This again will allow for a more rapidly achieved and clearer path to problem resolution.

Finally, visual chatbots have the ability to simulate human interaction on a whole new level. A traditional chatbot can’t tell you how you look in a pair of pants or help find the right makeup shade for your skin tone like a retail worker could. This ability would denounce the idea that computers could never replace human workers because they can’t see and understand your problems or provide opinions.

Examples of visual chatbot use cases

Although visual chatbot technology is still very much under development, many examples already exist that put the principles of visual dialogue to practice. Despite their imperfections, these early models demonstrate the potential effectiveness of more advanced bots in a variety of contexts.

Customer support is probably the most rapidly growing application of visual bots due to the massive financial drivers behind it. One such example is when IKEA worked with GrokStyle in 2018 to create a visual search function in the IKEA Place AR app. The addition allowed customers to take a picture of a piece of furniture and receive listings of similar products in return. The app can also display a piece of furniture over an image or live video of a room to show what it would look like before purchasing the product. 

The project gathered $2 million in funding before it even began, and has been praised for its accuracy and effectiveness. The next step, of course, is for IKEA’s visual chat program to help stressed customers assemble their notoriously frustrating furniture.

Chatbots are said to be 4 times more effective than traditional lead generation methods at converting website visitors into leads. With this in mind, it makes sense that adding a visual aspect could be even more effective by adding another level of connection.

The healthcare industry has also begun to embrace the possibilities of AI. From period tracking apps, to mental health support, and care for cancer patients and survivors, the frontier of health chatbots has lurched forward dramatically in recent years. 

In 2019, Northwell Health launched a chatbot with the goal of reducing colonoscopy no-shows at two New York Hospitals which had been struggling with lack of patient follow-through. Concerningly, studies had previously shown that less privileged patients were the most likely to be no-shows, with as many of 40% not showing up to their procedures.

Sabina Zak, Northwell Health’s vice president of community health and health services research, explains that “health literacy is a strong determinant of wellness among vulnerable populations”. 

The Northwell Health chatbot was found to produce 97% patient satisfaction and a reduction in post-acute care expenses.

The addition of visual dialogue capabilities to online healthcare chatbots could allow patients to show the bot their symptoms or injuries to retrieve more information about them. For obvious ethical reasons, AI is not yet able to diagnose or treat patients. However, it proves highly effective as a supplementary source of information for worried patients with busy healthcare providers. With sufficient investments and the careful oversight of physicians, who knows what the future of online medicine will hold.

In the context of a global pandemic, this tool could be especially helpful at reducing hospital traffic and ensuring that nervous patients can get their questions answered without having to leave the house.

Steps towards integrated computer vision

Integrating visual information processing capabilities into existing interactive chatbots is not as simple as giving the program access to a webcam. There are several steps in the development of this kind of technology:

Recognizing images

In order to recognize what the user is showing it, a visual chatbot needs to be able to draw from an extensive pre-established database of images to compare with the input it is receiving. Once it determines a match for the visual input provided by the user, the bot can use the information associated with the pre-existing images to identify what it is being shown.

Google reverse image search does just this - it receives an image input from the user and compares it to images it has already seen in order to determine what might be in the user’s image.

Translating images into text

At this stage, the bot should be able to not just recognize an image, but identify its content with written language and retrieve relevant text information. This requires not only an extensive image database, but relevant textual information to go with each image. Museum chatbots are one example that are now used widely to provide additional information about specific exhibits to curious visitors.

Formulating replies

Chatbots are interactive by definition, and must be able to reply to textual questions about provided visual content in conversation. This is known as interactive visual dialogue, which will be discussed later.

This chatbot demo accomplishes this by identifying several different key features in an image and comparing them to images & information from the Internet. Visual chatbots are  “intelligent” because they learn, meaning that as they accumulate a greater database of information and conversation patterns over time, their recognition accuracy and response quality will improve.

Generating an image

At the most advanced level, visual chatbots can create smart images in response to visual input. In 2018, CCC Information Services Inc. introduced a visual bot capable of generating a damage “heat map” after a user supplies an image of their vehicle after an accident. This allows clients to receive a quote instantly; have a visual explanation for their quote; and speed up the insurance claim process.

What is interactive visual dialogue?

Interactive visual dialogue involves AI engaging in natural conversation with a human user about visual content. The program must be able to answer questions about the content of the image by piecing together the context through image recognition, chat history, and the question itself.

Virginia tech visual chatbot Illustration

Figure 2: An example from Virginia Tech of the process through which a bot capable of visual dialogue returns an answer to a question about an image.

Such a feat can be achieved only through a particularly extensive data set for the model to draw from, as well as an advanced ability to interpret language and formulate replies.

The future applications of visual dialogue could allow users to learn more about something they see by asking questions about an image they provide of it. Much like a QR code on an exhibit can allow gallery-goers to read more information about a piece of artwork, interactive visual dialogue could allow users to know more without seeing more. They also provide the added benefits of being able to ask specific questions about particular aspects of the visual content, and of being able to draw from the internet and an image database rather than requiring a specific content page to be designed and associated with a matrix barcode.

How to create a visual chatbot

So you’ve seen the vast applications of visual chatbots and want to apply it to improve your business. What's next?

The most popular visual chatbot demo has been available to all internet users to experiment with since 2017. It can identify the different components of an image and respond to basic questions, such as “what colour is the mug?” or “what breed of dog is shown?”. The open-source chatbot code is available to be used and modified on Github.

If coding your own AI isn’t up your alley, Ada ( is a good place to start. By providing a no-code AI chatbot design platform, Ada is a more polished approach to creating your own visual chatbot for your business. We consistently see improvements in CX (customer experience) by reducing customer wait time by up to 98%, all while cutting costs and driving profit. Companies such as Shopify, Zoom, and Chatter Wireless have all used Ada’s AI to automate their online customer service with strong results.

If you’re interested in booking a demo, you can do so here.