Ahead of the AI & Big Data Expo Europe, AI News sat down with Ivo Everts, Senior Solutions Architect at Databricks, to explore the company’s groundbreaking developments in open-source AI and data governance.
Databricks has made waves with its DBRX model, a large language model (LLM) that has set a new benchmark in the industry. According to Everts, “DBRX surpassed all leading open-source models on key benchmarks and delivers inference speeds up to twice as fast as models like Llama2-70B.” This success is largely due to a variety of technological innovations that have optimized training efficiency.
Everts believes the DBRX model is one of the best in the open-source space, performing exceptionally well on industry standards such as language understanding (MMLU), programming (HumanEval), and mathematics (GSM8K). The ultimate goal of DBRX is to democratize AI, allowing businesses to train custom LLMs on their own data in a cost-effective manner.
In line with its commitment to open ecosystems, Databricks has open-sourced its Unity Catalog, a critical tool for data governance. Everts explains, “By open-sourcing Unity Catalog, we’re making it more accessible across cloud platforms like AWS and Azure, as well as on-premises systems.” This flexibility ensures organizations can implement uniform data governance policies regardless of where their data is stored or processed.
Key features of Unity Catalog include:
Databricks has also introduced a new product called Databricks AI/BI, designed to combine generative AI with traditional business intelligence. Everts believes that for BI tools to be truly intelligent, they must understand the unique semantics of business data.
The AI/BI tool consists of:
Databricks also introduced Mosaic AI, a platform designed to streamline the development and deployment of machine learning and generative AI applications. Mosaic AI offers several powerful components:
One of the key innovations within Mosaic AI is its ability to offer cost-effective training of custom LLMs. “We focus on fast startup times and live prompt evaluation, enabling users to monitor how responses change throughout training,” explains Everts.
Central to these innovations is the Data Intelligence Platform by Databricks, which merges the power of data lakes and warehouses. Using Delta Lake technology, it enables real-time data processing and sharing across organizational boundaries. Delta Sharing ensures secure data exchange, while MLflow, PyTorch, and TensorFlow support machine learning and AI model development.
The platform’s cloud-native architecture, combined with the Photon engine, ensures scalability and performance. “The Data Intelligence Platform is designed to handle everything from ETL pipelines to data governance, offering a truly unified AI and data solution,” adds Everts.
Databricks is a key sponsor of AI & Big Data Expo Europe, where they will be showcasing their open-source AI and data governance solutions. Everts hints at exciting demonstrations at their booth, including a custom GenAI app that allows users to generate their own cartoon image using open-source models from Hugging Face and data from Unity Catalog.
For those attending the expo, Databricks will be located at booth #280, where their experts will provide in-depth insights into the future of open-source AI and how organizations can improve their data governance strategies.
By following these technological advances, Databricks is paving the way for a future where businesses can seamlessly leverage open-source AI models and maintain robust data governance across multi-cloud environments. Their innovations are transforming how organizations manage data, build custom AI models, and explore business intelligence—all while making it more accessible and cost-effective.