AI-driven supervision tools are now central to modern RegTech strategies, particularly in communications surveillance and misconduct detection. Yet as firms invest in AI to reduce alert volumes and ease compliance workloads, bold claims of “99% accuracy” or near-total false positive reduction continue to circulate.
In reality, those numbers rarely tell the full story. In this technical deep dive, Theta Lake recently examined the metrics that genuinely matter, the modelling techniques used to reduce false positives, and the practical steps firms can take to validate vendor claims before deployment, in the second part of a two-part series.
One of the most common misconceptions in AI-enabled supervision relates to “accuracy”. In environments such as corporate communications monitoring, the base rate of actual misconduct is extremely low, often below 0.01%.
In these highly imbalanced datasets, a model could label every message as “no fraud” and still achieve 99.99% accuracy—while failing to detect any genuine misconduct. In such scenarios, accuracy becomes almost meaningless. What truly matters is how effectively a system identifies rare but critical positive cases amid vast volumes of benign communications.
To assess performance properly, firms must focus on precision, recall and the F1-score. Precision answers a simple but vital question: of all the alerts flagged as misconduct, how many were genuinely problematic? High precision means fewer false positives and greater trust in the system. Read the full article.










