Every week there’s a new headline about AI “matching” human performance on some benchmark. Almost always technically true but misleadingly phrased. These headlines describe a narrow capability; the implication is a broad competency. Wherever the difference lies is probably where most people get their wrong ideas about what these tools can actually accomplish.
These are Five things that AI in 2026 won’t perform consistently -and what each one means to how you use them in daily life.
1. Let you know when it is producing a fabrication
Language models today -chatgpt, Claude, Gemini etc.-will provide you with a confident-sounding answer regardless of whether or not they actually know the information behind the answer. A simple example is asking for the population of a small country, the date of a minor historical event, or the specifications of an unknown item. You will receive a confident-sounding response based upon either complete accuracy or complete fabrication.
The term used to describe this phenomenon is hallucination. Hallucination is also not something that the next version of the model will address through an update. Language models lack a mechanism for representing “I don’t know.” they are trained to generate responses, not to decline generating them. New versions of language models create fewer hallucinations than earlier versions, however no version of language models creates near-zero hallucinations, and most dangerous hallucinations occur for questions that sound almost identical to things the model does know.
What to do: anytime you’re concerned about fact-based content –medical, legal, financial, historical–verify any statement produced by an AI with regard to any specifics against a primary source before taking action. An AI is fantastic for providing explanations for concepts and identifying things to research. An AI is not credible as a final authority regarding facts.
2. Reason about what someone else knows or does not know
AI is capable of being extremely good at analyzing what is in front of it. AI is relatively poor at reasoning about what is in front of someone else. If you ask it to develop a plan for negotiating, analyze what your competitor knows about your company, or predict how someone will respond to news, the AI-generated response sounds intelligent and fluent but is generally superficial.
AI lacks a structural mechanism for maintaining dynamic representations of other individuals’ mental states. While an AI can discuss general principles for reasoning about others, it cannot actually apply those principles. Systems designed specifically to perform this type of reasoning exist (e.g., adapative decision tree modeling in engineering platforms), however they were developed differently than typical chatbots.
What to do: use an AI for creating draft material, reformulating concepts, and surfacing options. Don’t rely on an AI as your strategic advisor in any scenario involving another person’s decisions. That role remains your responsibility.
3. Create a consistent multi-step plan
If you request an AI to create a plan for completing a series of tasks with 5 or 6 steps that depend on what was completed previously, you’ll see the relationships between the tasks begin to disintegrate by the fourth step. Each individual step in the plan usually appears to be reasonable. However, the dependencies between each of the individual steps gets lost.
Both of the previous issues are connected. The architecture provides optimal surface-level credibility for each individual step in isolation -not overall coherency across multiple steps. You may obtain viable first-drafts of multi-step plans but you still need humans to perform the integration and dependency-checking necessary to determine if the plan created by the AI is valid.
What to do: use an AI to help you brainstorm potential steps in developing a plan. Have the AI assess the validity of the plan you’ve developed (the AI is quite skilled at identifying inconsistencies in plans it did not create). Do not assume that a multi-step plan generated by an AI represents a final product worthy of your full reliance unless you manually inspect it.
4. Determine when it has been cajoled into an inaccurate answer
Test this concept: request any modern AI to provide an accurate answer to a factual question. Obtain its answer. Gently challenge its confidence -“are you certain? I believe it’s really x”. See how frequently the AI converts your erroneous answer into its preferred answer.
Sycophancy describes this behavior. There is a bias present during the development process for these models toward avoiding conflict with users. This bias can override the AI’s underlying knowledge. Thus, an AI assistant is relatively easily influenced into agreeing with your incorrect answers -especially if you appear confident.
What to do: do not dispute an AI when attempting to obtain an accurate answer. Clearly state your inquiry and accept whatever response is provided by the AI initially. If you sincerely believe that the answer is incorrect -do not attempt to persuade the AI to adopt an alternative view -because both answers will likely be equally unreliable after negotiations.
5. Count, spell or complete basic arithmetic operations in word problems
Until you encounter it in a matter that affects your interests -this issue seems humorous. Ask a cutting-edge AI how many times the letter ‘R’ appears within “strawberry”, and it will provide the wrong answer at least some percentage of the time. Inquire with an AI about how many words are contained in a sentence it recently composed and you will receive a wrong total. Provide a math problem expressed in words that requires multiple steps to resolve, and while the AI is able to identify all components of the problem correctly -it will fail to accurately solve the arithmetic portion.
The reason is that language models operate using tokens -not individual letters or numbers. The direct relationship between a token and its corresponding characters is inaccessible to the language model’s reasoning processes. More recent language models have addressed well-known failure scenarios (example: “strawberry”), however small variances on similar types of queries continue to elicit errors from newer language models.
What to do: never rely solely on an AI for word counts, character counts, exact arithmetic operations, or precise manipulations of text without verifying its results. For everything that demands exactness -utilize tools specifically engineered for that particular application (a spreadsheet for mathematics, a word counter for counting words, a calculator for performing arithmetic).
–
Once you understand how AI currently operates -I.e., very effectively at producing fluently written and reasonably competent output on virtually unlimited topics -but poorly at aspects of work requiring precision, applying reasoning skills with regard to other individuals’ thoughts and feelings, maintaining consistent multi-step plans, and honestly admitting limitations -you will find AI becomes far more valuable once you recognize which tasks you should assign it and which remain your domain.
Operators with extensive experience testing AI against adversarial system performance have been quantifying this gap for years -with diminishing gaps but little likelihood of closing entirely due to current architecture designs.
Utilize AI for what it excels at. Validate what it generates. Trust AI as little as possible regarding the Five areas mentioned above. That is essentially the entire operating manual needed by most users.