By Meagan Bryson
Share this post
AnswerRocket Co-founder, CTO, and Chief Scientist Mike Finley has a knack for breaking down complicated concepts and making them much easier to understand. Mike dives into how large language models work, and what makes generative AI different from other types of AI.
He also discusses what large language models (LLMs) could look like in the future, and how AnswerRocket has made the perfect pairing between our augmented analytics platform and ChatGPT’s conversational capabilities.
Read the transcript of the interview below or watch the YouTube video here:
Question: From Day 1, how has AnswerRocket leveraged AI and natural language?
Mike Finley: Since day one, AI–and specifically natural language–have been at the heart of what we’re trying to bring about. The reason for that is, fundamentally, the very first computer program was written, whatever, almost 200 years ago. A program, by definition, is a human speaking in computer language. The goal, the AnswerRocket goal, was to stop that, to have the computer speak the human language, right? That was our very initial purpose and to erase that 200 years of history of people having to speak in computer and instead have the computer learn to speak what we do. It’s been a core aspect of the solution from the beginning…
The idea of a prompt of engaging with text or even with voice, whether it’s from mobile or from desktop, basically communicating naturally with the machine and having the machine naturally communicate back to you, that’s been kind of the central theme of how we’ve been seeking to democratize access to data, right?
That’s the idea, that democratization happens because everybody can speak, right?
There’s some really good studies out there that talk about, for every person that can read a spreadsheet, there are ten people that can read a sentence, right?
For every person that can read a sentence, there are ten people that can read a graph, right?
If we can understand people’s natural language and talk back to them in pictures and in words, that solves a problem that’s really never been solved before. And that’s been our goal.
Question: How do LLMs benefit the end user?
Mike Finley: The fundamental technique of AI working using these things called neural networks, or neurons, artificial neurons, has been around since the 1950s. Again, nearly 100 years of this concept, and they’ve been getting larger and better. As machines have gotten better and faster, the techniques have improved and so forth. This fundamental idea from a long time ago that said, “Hey, can we make something that’s like a human neuron and make a group of those like a human brain that’s really been evolving and evolving?”
Suddenly when a machine starts to act like a human in terms of being able to translate languages and understand concepts or finish stories or whatever, these large language models are really just big groups of those early designs, of those early neural networks. Now they’re organized in a very special way, right? There have been major advances kind of in layers, things like deep belief networks and autoencoders, things like transformers. These are layers of the technology that have happened along the way.
Fundamentally, a large language model is just a really big group of artificial neurons organized in a special way, the same way that our real neurons are organized in a very special way. They’re set up in a way that they can take in what a person is communicating and they can answer back in that same fashion.
Question: What’s unique about a generative AI model?
Mike Finley: A large language model is part of a group of algorithms, a kind of program, right, that’s called an autoencoder, right? I remember the very first one that I worked with was in 2007. It turns out the Post Office has a giant database of handwritten digits because the Post Office needs to be able to read all kinds of crazy envelopes where people have written really sloppy eights that look like threes and really sloppy fours that look like nines. They have tens of thousands of examples curated in a database somewhere.
There’s a guy named Jeffrey Hinton who’s out of the University of Toronto. I think he’s at Google now, but he took all that data and he fed it into a computer. This is the brilliant thing about his work, is he didn’t tell the computer which ones were threes and which ones are eights, which ones are twos, and which ones were fives, right? He just gave it all to the computer and said, you figure it out. This computer, sure enough, created ten separate groupings of these things, and it figured out what zeros and ones and twos and threes and fours and so forth were in that grouping, right, using this auto encoding technique. The wild thing was when I recreated the results of that original paper and so did everybody else out there in the AI community. When I recreated that the wildest part about it was not only could it recognize all those previously human-made handwritten digits but you could ask it for an original two and it would make an original two.
It wouldn’t look like a bunch of static, right? It would look like a two that was nowhere in the database, right? This is kind of a chilling feeling, right?
That’s called a generative model, right? And that’s the G in GPT. The generative model says, hey, after you’ve shown this thing, all the examples of everything that you want to train it on, that you want it to learn, then you can just kind of poke it and say, okay, well, now what do you think? It’ll actually generate back to you something that’s like all the stuff you trained it on, right? Before these generative models, AI was all about saying, let me beat a bunch of examples into the machine until it guesses right on the next one, right? Adjusting all these really fine-tuned numbers. These generative models all of a sudden can kind of flip that around, right?
They can not just be able to look at the inputs from the outside world and say, “Oh, I think that’s a cat or it’s a dog. I think it’s a three or a five,” or I think whatever the conclusion is from that algorithm. But they can say, “The recommendation for you is to watch the next John Wick movie”, right? Those kinds of things suddenly come out of these generative models that are able to basically be kind of poked from the outside and caused to go into motion and cause to do their work.
Question: How do LLMs work with foreign languages?
Mike Finley: Well, it turns out if you take back to my handwritten digits examples, if you take a person who, for example, I grew up in Spain, we put lines across our sevens, right? Our seven. You draw a normal seven, and then you put a line through it. You do the same thing with Z’s, right?
Well, that’s almost like another variety of language, right? It still knows to put those things with the sevens. It doesn’t get confused and put them with the fours or with the twos.
Well, it turns out when you take a language model and you train it on a whole lot of English and then you train it on a whole lot of French and a whole lot of Spanish and Hebrew and Chinese, it turns out it’s able to learn all those separately.
This is really where these emergent phenomena come from, right? The idea of emergence in AI is unexpected results that come out of the combination of two or more things. There’s an emergent phenomenon, which is that the idea of “house” in one language and the idea of “house” in another language are really closely related inside the quote-unquote mind of the machine, right? That it determines that those two concepts “house” and “casa,” which is the Spanish word for house, that those two things are really very similar to each other because of the way that they relate to everything else.
The English model showed that a person opens the door of a house and dogs live in a dog house, and houses have roofs. Well, all those same kinds of ideas exist in Spanish, or in Hebrew, or in French, or in Chinese. The machine is organizing all these thoughts and let’s not get philosophical, but let’s call them thoughts when it’s organizing all of those thoughts, it turns out the concepts that it builds in two different languages end up being very similar. It’s able to translate simply by relating where each of those concepts are within the context of its thoughts, right, of what it’s done with the word. It’s the idea that there are not “thoughts,” they’re called “embeddings.” The idea of embedding is what is the large language model’s concept of a word, or a phrase, or a sentence, or a paragraph–the same as if to a human you went up and said, “A red fire hydrant was on the corner.”
Well, your mind has now gone into a certain state. If I put you in an MRI machine, I could scan your mind and it would be a certain state, right? You have that thought in your brain. The embedding is the representation of the thought in the brain of the large language model, right? It turns out again that I can take the same phrase in two different languages and your brain would go into the same shape if it understood those two languages. I can give that same phrase in two different languages to a large language model and it will get a very similar shape as far as how it relates those things ultimately inside of its embedding.
Question: Just how large will LLMs get?
Mike Finley: Starbucks, if you say, “I want a hot drink,” they have three sizes, right? The 20 is the largest one. If you want a cold drink, they actually have four sizes, right? There’s an extra large cold one and you might say, “Why?”
“Why is there an extra large cold one and not an extra large hot one?”
The reason is because the top seller in hot drinks is the middle size. In other words, they made them bigger and bigger until sales started going down. They did that for hot and it turns out they only need three sizes. Do that for the cold. It turns out you need four. You need that fourth size to know that you’ve reached the peak of the performance.
Well, same thing in large language models, right? We’ve been making bigger and bigger language models, really since I was a child, right? Because a long time ago in a galaxy far away, you would have two layers of 30 neurons. That was worth writing a paper about in a scientific journal, right?
Well, now two layers of 30 neurons is 110 trillion of the size of the neural network. Not literally, but it’s a very small network. These networks keep growing bigger and bigger. Now, unlike the Starbucks drinks where they’ve figured out the highest size, that’s the biggest return. We haven’t figured it out yet for language models. We keep making them bigger and they keep getting better. The question is, where is the end?
Question: What differentiates ChatGPT from other LLMs?
Mike Finley: The original language model, the way that it’s trained, the same way that the post office said, “Hey, here’s a whole bunch of digits, figure it out,” right? The way that these language models are trained is, “Here’s a whole bunch of Internet stuff. Figure it out,” right? Literally the figure it out part is tell me what the next word is going to be. I’m going to give you 1,000 words. Tell me what the next word is going to be.
The language model, on one hand, you could say, “Yeah, well I could just have a database. I could save the Internet and then I can tell you what the next word is going to because all I gotta do is go find the 1000 words that you gave me, find it on the internet somewhere and look at the next word and boom, I’ve got the answer.” Right?
Now, if you do that, you haven’t generalized at all, right? Because now if I change one of those thousand words, you’re not going to know what the next word in that list is.
Language models aren’t really, they’re not allowed to memorize that way, right? They’re not allowed to, meaning that the program as it’s written doesn’t allow the language model to simply store away what it’s been taught. It has to generalize it. It has to say, oh, you said, having had three different misspellings of that verb, I know that all those are the same thing, right?
Or you said something in English and it looks a lot like something that I saw in French, so I’m going to remember that in the same way I’m not going to remember that it exists in English or that it exists in French. I’m simply going to remember the concept of whatever that idea is.
Basically, this idea of training it to learn what the next thing is how the language model gets really smart. You can imagine if you took every list of 1,000 words on the entire internet and gave it that as an example, then you’ve got billions of examples to train with, right? That’s really, that super big training set is what allows us to keep building bigger and bigger networks to learn everything that’s going on in here, all right? If all you ever did was learn to say what the very next word is, then you wouldn’t really be able to have a lot of chat dialogues.
Because all that stuff on the internet, Wikipedia, and the laws of the state of Kansas, and the case history for the courts of Mexico, whatever it is, all of that, it’s not full of a bunch of chats, right? It’s not full of a bunch of conversational language. There’s some, sure, you could watch interviews if you transcribe YouTube videos, you’re going to get a nice list of chats, right, of conversational chat, so it could learn some of how to predict the next word. It’s a really bizarre thing to try to teach one of these language models how to chat because it needs to say something and then wait for you to answer, not try to guess your next word. It needs to let you answer and then it needs to try to continue what it said before knowing what you said, right? It’s a very different kind of way of thinking about language.
If all you ever learned to do was predict the next word, then you’re kind of a know-it-all, right? You want to just keep talking like I’m doing right now, instead of letting the other person have a conversation.
What the ChatGPT specifically was all about saying, what, let’s actually train it to…
…know when to stop.
…to know what to say.
…to know how to listen.
Essentially, the reinforcement learning came in as a secondary stage that says, “Yeah, I know you know how to finish every sentence anybody ever said on the entire internet. You’re so smart. Great. Now I don’t need you to do that. What I need you to do is actually have a sensible conversation with a human.” There has to be some reinforcement, just like there would have to be if you had a person who had never had a conversation and suddenly they’re out trying to have a drink at a bar.
Well, they wouldn’t be much fun to have a drink with, right?
Some reinforcement learning goes through the process of getting that person to stop guessing what the next word is going to be and do a little bit of active listening.
Question: How is AnswerRocket using LLMs?
Mike Finley: You can think of it as kind of a wonderful marriage, right? AnswerRocket has been in development for a number of years, learning how to understand databases, find insights in data for various different industries, right? For pharma, for banking, for insurance, for packaged goods, for video games, whatever that insight-seeking behavior is that we’ve been teaching AnswerRocket with a thin layer of natural language understanding around it and a fairly sophisticated insight selection model, right? We have become really good at saying, “Hey, based on the data that you have and the question that you’ve asked me, here are some things that I can tell you that are really insightful, like an analyst would have provided you.”
Now, GPT comes along and what it does really well is it understands the question the user might ask, right? Where AnswerRocket might have needed very specific time frames, measures, facts to identify the question, GPT can tell right away,
“Oh, you’re trying to compare two things to see if they grew the same, right?” Or
“Oh, what you’re after is a deep understanding of where the anomalies are in this list of things, right?”
GPT understands that it doesn’t know how to get that answer. It understands what you want to know, but it has no idea how to get it. AnswerRocket is kind of a perfect mate to that because it can understand what the user wants, connect up with AnswerRocket. Now all of a sudden, Max understands you because of how smart GPT is and it knows how to answer because of how smart AnswerRocket is.
Now, Max does all the analysis and finds out the comparison or the trends or the outliers that you’re after. The question is, how does Max explain it back to the user? Now, Max normally would give a chart, it would give a list of facts, it would cite a whole bunch of different really interesting things.
The great thing about going back through GPT on the way out back to the user is that we can get GPT to explain those facts back to the end customer. GPT can say, “Oh, I’m reading good news! I’m telling you something really great. Isn’t it wonderful that this has occurred?”
Max didn’t know that. Max didn’t know if it was good or bad or if sales went up. Sounds like a number, right? To Max, GPT knows that sales going up is a good thing, right? These are logical constructs that exist in language, right?
Unless we kind of build that in and hand tune it and tweak it, Max doesn’t really have on its own, doesn’t have that capability or AnswerRocket, doesn’t have that capability. So now, Max, not only can it understand you, but it has legs to go search the database. It has the brain to go understand and analyze the insights. It’s got the voice to give it back, to give it all back to the user in a way that makes it really feel like you’re collaborating with a coworker, collaborating with analyst that knows all of your data inside out better than you ever really want to, and is going to accelerate your day by taking the bulk out of the analysis time and giving you back the time to consider what the results are, to figure out what you’re going to do next.
Question: How will LLMs have evolved in 2 years?
Mike Finley: First prediction in a two-year time frame: GPT will not be the only standout.
So this technology is unique and special (GPT is), but OpenAI doesn’t have a monopoly on it by any sense.
I think there’ll be a lot more models that perform as well as or better than GPT. They’ll be specialized in different areas. The next part is going to be the movement of the model “down.”
Right now, it’s in the cloud because that’s where you can have really expensive infrastructure and apply it to a lot of different people in maybe not a two-year time frame, but not much longer. It’ll be in your phone, right? It’ll be something that’s rapidly available, widely available to everyone. Not the giant version that’s been trained on everything that’s ever been written, right? A small enough version that can make an appointment for you and whatever, talk to your friend’s chat agent to settle whether or not you’re going to have a drink after work, right?
Those kinds of things that’ll all be in place, right? Because it’s much easier. We’ve been in a world where getting my phone to talk to your phone is this tedious process of writing API calls and IP addresses and interconnections and making things intertwined. Human language evolved because it is extremely simple and yet it conveys so much information. Machines will be talking to machines using human languages, right? Not because they’re great languages, but because they’re just really good at capturing meaning in a very small package, right? There’ll be a lot more of that kind of thing. We’ll have individualized agents, they’ll be pushed further out to the edge. The last thing I guess in this bucket is that these models are going to get smarter and smarter. Now, again, the models learn from documentation that exists out there in the real world, right? They learn how things work because of documents that exist in the real world.
They don’t learn things that are not out there documented in the real world. In the case where, let’s say, companies have somebody pricing is a good example. Many companies have experts who are really good at kind of saying;
“What should we do with pricing for next season?”
“What should we do for the next promotion based on this new product launch?”
These are human experts. Their knowledge, the way that they think, has never been written down on a piece of paper in a way that’s codified in rules that you could feed into a large language model. Instead, the language model is just going to have to absorb all these decisions. It’s going to have to absorb thousands of examples of decisions made by humans and use those to say, oh, we should raise the price or we should lower the price, or we should change the product name, or we should innovate the packaging, whatever the thing is that the human would have done.
Over the next couple of years, you’re going to see models that are trained with a lot more depth of that kind of experience, right? GPT wasn’t trained with marketing mix models for every brand in America for the last 30 years. In a couple of years, you will be able to have a large language model that is trained on that kind of depth of industry-specific information, right? And same with weather forecasts. You can imagine every other category of large amounts of information that formerly was a human expert grokking it all and becoming the person that relayed it. Now that’s going to be these language models. As those training sets get built up and up, they’ll be able to start making a lot of those decisions or recommendations like human users previously would do.
What that’s going to do is really, again, open up the possibility it may not better at analyzing the trajectory of the price of your competitors, but you can apply it to all your competitors every day, which you could never do before, right?
You used to be able to do it for one competitor once a week. Now it’ll do it for every competitor every day. Right. That’s going to again change the game in business and make it better for consumers because pricing will be lower, better for the manufacturers because they’ll be spending less to get better results. Ultimately there’ll be less waste in the system because it does a better job of understanding the trends of consumers and weather shipments and all these different things.
I see enormous amounts of efficiency and profitability and improvement getting built in because language models ultimately just remove the friction of understanding between systems.
In conclusion, AnswerRocket has embraced AI and natural language processing since its inception, aiming to bridge the gap between humans and computers in terms of communication. By enabling users to interact with data through natural language queries, AnswerRocket has democratized access to information, catering to a broader audience that may not be proficient in reading spreadsheets or interpreting graphs. Leveraging large language models (LLMs), such as generative AI models, AnswerRocket empowers users to engage in seamless conversations with machines and obtain valuable insights. LLMs have revolutionized the AI landscape by simulating human-like understanding and generating original content. Moreover, these models have proven their ability to comprehend and translate foreign languages, capitalizing on the concept of emergence to recognize similarities across languages. As the size of LLMs continues to grow, their capabilities expand, prompting exciting possibilities for the future. With the introduction of ChatGPT, AnswerRocket has further refined the conversational aspect of LLMs, emphasizing active listening and enabling meaningful interactions between humans and the system. AnswerRocket’s integration of LLMs has paved the way for advanced data analysis and intuitive user experiences, revolutionizing the field of data analytics and unlocking new possibilities for businesses across various industries.