Understanding Generative AI Methods

Welcome to the Course 🚀

Welcome to Introduction to Generative AI . If your work involves writing, research, presentations, or visuals, generative AI is already becoming part of your workflow. In this first unit, you’ll build a plain-language mental model for what generative AI does, how it works, and where it can go wrong. You’ll cover: The five major AI capabilities: text generation, image generation, image description, web search, and automation The three main model families: large language models, diffusion-style image models, and multimodal models Key limitations like context windows and hallucinations, plus why human verification matters

Five Capabilities, Not One Tool ⚒️

When people say "AI," they may be talking about five different capabilities, and mixing them up can lead you to use the wrong tool for the job. Capability What it does Useful for Text generation Drafts written content Emails, summaries, talking points Image generation Creates pictures from a description Hero slides, illustrations, mockups Image description / vision AI Reads a picture and describes what’s in it Alt text, triage photos, chart interpretation Web search Retrieves current information from the internet Recent facts, live sources, up-to-date research Automation Chains AI capabilities together with your other tools Multi-step workflows and repeated tasks The practical move: before you open a tool, name which capability you actually need. "I need a draft" is text generation. "I need a visual" is image generation. "What's in this photo?" is image description. Asking the wrong tool the right question is a top reason people decide AI "doesn't work."

Capability	What it does	Useful for
Text generation	Drafts written content	Emails, summaries, talking points
Image generation	Creates pictures from a description	Hero slides, illustrations, mockups
Image description / vision AI	Reads a picture and describes what’s in it	Alt text, triage photos, chart interpretation
Web search	Retrieves current information from the internet	Recent facts, live sources, up-to-date research
Automation	Chains AI capabilities together with your other tools	Multi-step workflows and repeated tasks

The Three Model Families Behind AI Tools 🧑‍🧑‍🧒

Three model families sit underneath those capabilities, and a rough mental picture of each will save you a lot of guesswork. Large language models (LLMs) power text generation. Picture an extremely well-read autocomplete: given everything you've typed so far, the model predicts the most likely next word, then the next, then the next, all the way to the end of the response. It learned these patterns from massive amounts of human-written text. It doesn't "understand" in the way you do; it pattern-matches at a scale that feels like understanding. Diffusion-style image models power image generation. Think of them as starting with a screen of static (random noise) and gradually "uncrumpling" it into a coherent picture that matches your description, one denoising pass at a time. They learned what a "warm, candid office photo" looks like by training on millions of captioned images. Multimodal models can take in more than one type of input, like text plus an image, and reason across them. That's what lets a single tool answer "what's wrong with this shipment photo?" or "turn this whiteboard sketch into a summary." Capabilities like image description and most modern chat tools live here.

Prediction, Context Windows, and Hallucinations ⚠️

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal