AI Tools

Databricks: Transforming Open-Source AI and Data Governance with Innovation

Databricks

By Tirupati Rao

Ahead of the AI & Big Data Expo Europe, AI News sat down with Ivo Everts, Senior Solutions Architect at Databricks, to explore the company’s groundbreaking developments in open-source AI and data governance.

The Power of the DBRX Model

Databricks has made waves with its DBRX model, a large language model (LLM) that has set a new benchmark in the industry. According to Everts, “DBRX surpassed all leading open-source models on key benchmarks and delivers inference speeds up to twice as fast as models like Llama2-70B.” This success is largely due to a variety of technological innovations that have optimized training efficiency.

Everts believes the DBRX model is one of the best in the open-source space, performing exceptionally well on industry standards such as language understanding (MMLU), programming (HumanEval), and mathematics (GSM8K). The ultimate goal of DBRX is to democratize AI, allowing businesses to train custom LLMs on their own data in a cost-effective manner.

Unity Catalog: Redefining Data Governance

In line with its commitment to open ecosystems, Databricks has open-sourced its Unity Catalog, a critical tool for data governance. Everts explains, “By open-sourcing Unity Catalog, we’re making it more accessible across cloud platforms like AWS and Azure, as well as on-premises systems.” This flexibility ensures organizations can implement uniform data governance policies regardless of where their data is stored or processed.

Key features of Unity Catalog include:

  • Centralized Data Access Management: Organizations can streamline access controls across multiple data assets.
  • Role-Based Access Control (RBAC): This feature allows tailored permissions based on specific user roles.
  • Data Lineage and Auditing: Unity Catalog makes it easier to track data usage and dependencies, offering detailed audit trails to ensure compliance.
  • Cross-Cloud and Hybrid Support: Unity Catalog is designed to provide consistent governance across multi-cloud and hybrid environments.
Introducing Databricks AI/BI for Enhanced Data Visualization

Databricks has also introduced a new product called Databricks AI/BI, designed to combine generative AI with traditional business intelligence. Everts believes that for BI tools to be truly intelligent, they must understand the unique semantics of business data.

The AI/BI tool consists of:

  1. AI-Powered Dashboards: These offer a low-code interface for creating interactive dashboards with built-in features like visualizations, cross-filtering, and periodic reporting.
  2. Genie: A conversational AI tool that interprets natural language queries to generate adaptive visualizations and insights. “Genie learns from the data to improve over time and can answer follow-up questions,” says Everts.
Mosaic AI: The Future of AI and ML Development

Databricks also introduced Mosaic AI, a platform designed to streamline the development and deployment of machine learning and generative AI applications. Mosaic AI offers several powerful components:

  • Unified Tooling: A comprehensive suite for building and managing AI/ML solutions.
  • Generative AI Patterns: This includes tools for prompt engineering, fine-tuning, and pre-training models to meet evolving business needs.
  • Centralized Model Management: A feature that supports centralized deployment, governance, and querying of AI models.

One of the key innovations within Mosaic AI is its ability to offer cost-effective training of custom LLMs. “We focus on fast startup times and live prompt evaluation, enabling users to monitor how responses change throughout training,” explains Everts.

The Databricks Data Intelligence Platform: Combining AI with Data Governance

Central to these innovations is the Data Intelligence Platform by Databricks, which merges the power of data lakes and warehouses. Using Delta Lake technology, it enables real-time data processing and sharing across organizational boundaries. Delta Sharing ensures secure data exchange, while MLflow, PyTorch, and TensorFlow support machine learning and AI model development.

The platform’s cloud-native architecture, combined with the Photon engine, ensures scalability and performance. “The Data Intelligence Platform is designed to handle everything from ETL pipelines to data governance, offering a truly unified AI and data solution,” adds Everts.

Showcasing Innovations at AI & Big Data Expo Europe

Databricks is a key sponsor of AI & Big Data Expo Europe, where they will be showcasing their open-source AI and data governance solutions. Everts hints at exciting demonstrations at their booth, including a custom GenAI app that allows users to generate their own cartoon image using open-source models from Hugging Face and data from Unity Catalog.

For those attending the expo, Databricks will be located at booth #280, where their experts will provide in-depth insights into the future of open-source AI and how organizations can improve their data governance strategies.


By following these technological advances, Databricks is paving the way for a future where businesses can seamlessly leverage open-source AI models and maintain robust data governance across multi-cloud environments. Their innovations are transforming how organizations manage data, build custom AI models, and explore business intelligence—all while making it more accessible and cost-effective.

Recent AI News