The #1 Open Source Metadata Platform
DataHub is an extensible data catalog that enables data discovery, data observability and federated governance to help tame the complexity of your data ecosystem.
Built with ❤️ by Acryl Data and LinkedIn.
Get Started →Join our SlackJoin August Townhall! ✨Get Started Now
Run the following command to get started with DataHub.
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade acryl-datahub
datahub docker quickstart
Metadata 360
Combine technical, operational and business metadata to provide a 360 degree view of your data entities.
Shift-left
Apply “shift-left” practices to pre-enrich important metadata using ingestion transformers, support for dbt meta-mapping and other features.
Active Metadata
Act on changes in metadata in real time by notifying key stakeholders, circuit-breaking business-critcal pipelines, propogating metadata across entites, and more.
Open Source
DataHub was originally built at LinkedIn and subsequently open-sourced under the Apache 2.0 License. It now has a thriving community with over a hundred contributors, and is widely used at many companies.
Forward Looking Architecture
DataHub follows a push-based architecture, which means it's built for continuously changing metadata. The modular design lets it scale with data growth at any organization, from a single database under your desk to multiple data centers spanning the globe.
Massive Ecosystem
DataHub has pre-built integrations with your favorite systems: Kafka, Airflow, MySQL, SQL Server, Postgres, LDAP, Snowflake, Hive, BigQuery, and many others. The community is continuously adding more integrations, so this list keeps getting longer and longer.
The Origins of DataHub
Explore DataHub's journey from search and data discovery tool at LinkedIn to the #1 open source metadata management platform, through the lens of its founder and some amazing community members.
A Modern Approach to Metadata Management
Automated Metadata Ingestion
Push-based ingestion can use a prebuilt emitter or can emit custom events using our framework.
Pull-based ingestion crawls a metadata source. We have prebuilt integrations with Kafka, MySQL, MS SQL, Postgres, LDAP, Snowflake, Hive, BigQuery, and more. Ingestion can be automated using our Airflow integration or another scheduler of choice.
Learn more about metadata ingestion with DataHub in the docs.
source:
type: "mysql"
config:
username: "datahub"
password: "datahub"
host_port: "localhost:3306"
sink:
type: "datahub-rest"
config:
server: 'http://localhost:8080'
datahub ingest -c recipe.yml
Discover Trusted Data
Browse and search over a continuously updated catalog of datasets, dashboards, charts, ML models, and more.
Understand Data in Context
DataHub is the one-stop shop for documentation, schemas, ownership, data lineage, pipelines, data quality, usage information, and more.