The rise of generative AI has placed data at the center of discussions globally amid concerns about privacy, cybersecurity and accuracy of AI-generated content. A recent survey of risk and compliance professionals by Riskonnect highlighted some important issues such as employee misuse, ethical risks and intellectual property concerns. Similar findings also are being echoed by major institutions like KPMG and MIT. All these reinforce the need for businesses to manage their AI-driven data carefully.

Organizations need to ensure that they use the right data for training and decision-making. The need was evident at Data Day Texas when experts emphasized understanding context of data. Simply having large amounts of data is no longer enough.

Earlier it was believed that more data means better results, but recent developments are challenging the notion. Companies like Databricks, IBM and Snowflake are focusing on rightsized AI models and basically those which prioritize efficiency over just data volume.

One key challenge faced in AI is that models often function as “black boxes” and this makes it difficult to track the way they arrive at decisions. AI systems pull data from various sources and sometimes the unverified ones. This increases the risk of AI-generated errors and it is termed as hallucinations as the model produces incorrect or misleading information without clear explanations.

Ole Olesen-Bagneux introduced the concept of the meta grid at Data Day Texas. It is basically a framework for mapping metadata instead of raw data. Metadata provides valuable context and offers insights into data sources, business logic and processing methods. Andrew Nguyen from Best Buy Health illustrated this with a medical example and said that the meaning of a patient’s condition could vary depending on whether a doctor, student or clinician entered the information.

Data challenges extend beyond structured databases to unstructured data such as text, voice and images. AI models often struggle to differentiate between reliable and unreliable sources. Today’s tools offer some help but they are not yet sufficient for fully grasping context. AI models alone cannot determine whether a source is appropriate for answering a query.