AI agents get office tasks wrong around 70% of the time, and a lot of them aren't AI at all

202510 Review: This industry really changes quickly, so fast that you can't keep up.

Reports indicate that due to rising costs, unclear commercial value, and inadequate risk management, according to Gartner's prediction, by the end of 2027, over 40% of agentic AI (agentic AI, startup name) projects will be canceled. Even among the approximately 60% of projects that remain, task success rates will only be between 30% and 35%. Agentic AI refers to automating office processes such as searching and analyzing exaggerated content in emails by connecting various applications and APIs through machine learning modules. The article points out that although this concept has sparked imagination in sci-fi works like Captain Picard's "Tea, Earl Grey, hot" command from Star Trek: The Next Generation, these technologies have not yet reached ideal efficiency in reality and carry cybersecurity and privacy risks.

To test the actual performance of agentic AI, researchers at Carnegie Mellon University (CMU) established a simulation platform called TheAgentCompany, a test platform for evaluating AI agents' performance in tasks such as web browsing, coding, operating applications, and internal communication. Test results show that the best model, Gemini-2.5-Pro, achieved only about 30.3% complete task success rate, while other models performed even worse, exposing defects such as handling pop-up interfaces and message transmission errors. Meanwhile, the Salesforce team's CRMArena-Pro test platform for customer relationship management (CRM) indicated that even leading long-range language models (LLMs) agents have about 58% success rate in single-round conversations and only about 35% in multi-round interactions, with all models generally lacking confidentiality data protection capabilities, further increasing corporate cybersecurity risks.

Additionally, the report reveals that many new products claiming to have agentic intelligence on the market are mostly just repackaged traditional AI assistants, robotic process automation (RPA), or chatbots. Only a very small portion of suppliers truly possess agentic capabilities. Although Gartner expects that by 2028, about 15% of daily work decisions will be automatically executed by AI agents, and 33% of enterprise software products will incorporate this technology, current achievements still fail to meet complex business needs, with some application areas (such as handling emails) more likely to cause serious consequences due to errors.

In discussions, developers and industry professionals have differing views on the prospects of agentic AI. Some commentators believe that current progress might be entering platform saturation or the "human-in-the-loop" era, arguing that agentic AI is essentially no different from existing automation tools. Others suggest that constructing agentic technology from the perspective of libraries rather than frameworks can ensure predictable execution results while maintaining flexibility and composability. Some comments also mention that although agentic AI can slightly improve work efficiency in some applications, significant technical breakthroughs are still needed in terms of speed, context window length, and cost to truly meet the practical needs of office automation.

https://news.ycombinator.com/item?id=44412349