How ChatGPT works
Learn the fascinating inner workings of AI assistants like ChatGPT
2024-11-29
This tutorial is part 2 of our free 'Learn how to use ChatGPT' course.
In this tutorial we'll cover:
- What's generative AI
- What happens when you ask ChatGPT a question?
- Is it always accurate?
- Can ChatGPT do anything besides text generation?
- Why are there various models within ChatGPT?
- How does ChatGPT compare to other AI assistants?
In the previous tutorial, we learned that LLMs like ChatGPT are "taught" through machine learning, the technological process involved in the input. But what about the output? How does ChatGPT formulate its responses?
The answer lies in a technology called generative AI.
What's generative AI?
While machine learning and deep learning focus on understanding and interpreting data, generative AI takes it a step further by creating new data that mimics the original in novel ways. It's akin to a chef who’s able to think up new dishes based on what they learned from their training and studying recipes.
Generative AI is used for diverse creative and practical applications, such as generating realistic images, videos, and text for chatbots and virtual assistants, creating new video game environments, and more.
So what happens behind the scenes during a ChatGPT interaction?
When you type a query into ChatGPT and it responds, there are several processes working in tandem to make everything work:
- You type your question, e.g., "What was Isaac Newton's role in the development of classical mechanics?"
- ChatGPT uses a process known as natural language processing (NLP) to comprehend the meaning, context, and intent behind the question.
- ChatGPT then uses knowledge retrieval to locate the most relevant information from its training data to create a response.
- Once ChatGPT has compiled the information it needs, it uses a process called "language generation" or "text generation" (this is where generative AI comes into play) to generate a coherent and meaningful response.
Is it always accurate?
LLMs have been through extensive training, meaning they’re impressively correct most of the time. But they’re not 100% reliable.
At times, AI can produce an answer that’s misleading or factually incorrect. This is known as a hallucination, and can be caused by:
- Being asked an ambiguous question, causes the AI to make incorrect assumptions.
- Limitations in the AI's knowledge, particularly if the required information is missing from its training data.
- Missing context
- The AI's inherent design constraints, like the inability to access current information or comprehend context beyond its training cut-off date.
Can ChatGPT do anything besides text generation?
ChatGPT isn’t just a text generator. It comes packed with some powerful, additional tools to help you tackle tasks far beyond text content creation. Let’s dive into each of them:
DALL-E image generation
DALL-E is OpenAI’s image generation model, which they’ve natively integrated into ChatGPT. It’s based on a transformer model like GPT, but instead of generating text, it produces images from text descriptions. It’s trained on massive datasets of images and captions, learning to generate new visuals based on your prompt. When you use DALL-E, the model converts the text input into a latent representation of an image, which is then rendered in various styles or formats.
With DALL-E, you can generate custom images based on text prompts. Whether it's illustrations for blog posts, marketing collateral, or just creative visuals, DALL-E delivers high-quality, AI-generated images in seconds. Just type your idea, and the model handles the rest.
Web browsing
Web browsing uses techniques related to Retrieval-Augmented Generation (RAG). When ChatGPT browses the web, it retrieves relevant content from real-time sources and integrates it into its responses. The RAG framework allows it to first gather information and then generate a more accurate, up-to-date answer based on that data. This retrieval step is key for providing current insights or information not present in the training data.
This is great for market research, competitor analysis, or keeping up with breaking news, all without leaving your chat window.
Code interpreter and data analysis
The code interpreter runs Python code within a sandboxed environment. It's a type of "interactive computation" where your inputs trigger a real-time execution of scripts. This feature bridges natural language with programming, allowing ChatGPT to manipulate data, analyze datasets, and generate visualisations. It works much like a lightweight IDE within the chat interface, letting you use Python for tasks like statistical analysis, data plotting, and more.
Whether you’re crunching numbers or building scripts, ChatGPT can help streamline your workflow with its code interpreter skills.
Canvas
You can also interact with ChatGPT via a more interactive canvas using the “GPT-4o with canvas” model. Canvas is an editable workspace that appears alongside ChatGPT's traditional chat interface. It’s optimized for writing and coding tasks and allows you to create more polished, final content within an editable document-like page.
You can jump in and edit ChatGPT’s initial generation yourself in Canvas, highlight content for ChatGPT to re-write or explain, or even work with one-click experiences to have ChatGPT shorten, lengthen, or polish the overall document. If you’re familiar with Claude Artifacts, this is OpenAI’s response to that. We find it to be a more collaborative experience than Artifacts, and possibly a glimpse at the future of AI chat assistant user experience.
Why are there various models within ChatGPT?
You might’ve noticed different versions of models within ChatGPT—like GPT-4o, GPT-o1, or GPT-o1-mini. But why so many models? It comes down to performance, specialization, and efficiency. Here’s a quick breakdown:
GPT-4o
This is the workhorse of the GPT family, optimized for speed and efficiency without sacrificing much in terms of response quality. GPT-4o is a leaner, faster version of the original GPT-4, designed to handle most everyday tasks at a lower cost (which is primarily relevant if you’re using the API), while still delivering high-quality responses. It’s your go-to model when you need reliable performance for general-purpose use.
GPT-o1
When you need even more precision or have highly complex tasks, GPT-o1 is a step up. It has chain-of-thought architecture built-in (we’ll explain more about this in a later lesson) so it’s designed to handle deeper context, more nuanced reasoning, and longer conversations without losing track. Essentially, it’s tuned for scenarios where quality and comprehension matter most, but it’s a bit slower compared to GPT-4o.
GPT-o1-mini
This model balances between power and efficiency. Think of GPT-o1-mini as a mid-tier solution for when you don’t need the full power of GPT-o1 but still want better reasoning and memory compared to GPT-4o. It’s more efficient for tasks that require intermediate complexity, like data analysis, or slightly more technical content generation.
Different models exist to meet different needs, whether it’s speed, cost, or complexity. Sometimes you need blazing-fast answers; other times, precision and depth are more important. By offering this range, ChatGPT can be versatile depending on your specific use case. That said, for most use cases, GPT-4o is all you need.
How does ChatGPT compare to other AI assistants?
While ChatGPT shares many capabilities with other AI assistants like Claude, there are often distinctions:
- Model quality: As newer, more advanced models are released at different times, the best-performing model can vary. When GPT-4 was initially introduced, it was the most capable model, but Claude's subsequent release of Sonnet 3.5 seems to have taken the lead for some specific tasks. The leading model seems to continue to change every month, but OpenAI’s GPT line always is at, or near, the top of the list.
- Features: ChatGPT boasts a rich feature set, including image analysis and generation, code generation, custom GPTs, memory, web browsing, and more. While Claude and other AI chatbots offer many similar functionalities, ChatGPT currently provides a more comprehensive suite of features. However, this feature gap is shrinking, as new features are continually added to all assistants.
- Platform: ChatGPT has a web version, mobile app, and desktop app — and boasts advanced, multi-modal functionality like voice mode. If you’re looking for a consistent, powerful experience across all devices, ChatGPT is a great option. However, other platforms, like Claude, are catching up in this area, launching their suite of device-native apps as well.
- ChatGPT is guided by its own set of principles and guidelines, which may differ from those governing other AI models, resulting in varying outputs and behaviours.
Now you understand ChatGPT's inner workings, in the next tutorial, we’ll walk through the plans and environments OpenAI provides so you can get started with the optimum ChatGPT setup.