v0.8.44
版本发布时间: 2022-09-01 12:13:59
datahub-project/datahub最新发布版本:v0.13.3(2024-05-24 07:11:13)
Release Highlights
Known Issues
Standalone Kafka Consumers
We have identified that using standalone Kafka consumers (MCP/MCL messages) has been a broken feature since v0.8.44. Root cause is some spring bean dependencies that were not correctly excluded.
This has gone undetected in our testing infrastructure because our tests do not run with standalone consumers since then until recently. The underlying issue has been fixed by https://github.com/datahub-project/datahub/pull/5827 and we are now running all our smoke tests with standalone consumers, since https://github.com/datahub-project/datahub/pull/5856 to prevent this from happening in the future. The fix will be released in v0.8.46.
[Helm] DataHub Actions Container
We recently rolled out support for running ingestion in debug mode. This requires a bump in the datahub-actions
container to either HEAD (latest) or v0.0.7
. The correct version is set correctly as the default in v0.2.103.
User Experience
- Improvements to UI-based ingestion: view live logs during execution, view ingestion summary (ie. number of entities ingested), and rollback functionality. Also surfaces CLI-run ingestion jobs.
- New look on Homepage: Domains have been promoted to the top of the fold, so they are listed above Entity cards and Platform cards
- Improvements to searching for Looker resources - when searching for a measure or dimension, we will now surface Looks & Dashboards that reference those fields
- The DataHub Docs Site has a new look! We are reorganizing content to make it easier and more intuitive for DataHub Developers and End-Users alike to navigate our resources.
- Improved Error Handling on the UI - a much nicer messaging when exceptions are caught by the frontend application.
- Misc minor bug fixes and improvements
Developer Experience
- Eternal personal access tokens are now supported
- Deprecated support for Python 3.6 (we expect this to have little-to-no impact on the Community based on pip download data)
Metadata Ingestion
- Improved documentation for Domains transformer
- Stateful Ingestion now supported for Glue
-
data-lake
Source has been deprecated in favor ofs3
source - Chart Entity now supports chartUsageStatistics
- dbt ingestion supports auto-extracting owner from the
meta
block - Improved Snowflake Connector is now available; we expect this to provide a reduction in ingestion run-time and lower levels of complexity
What's Changed
- chore(ingest): remove orderedset dependency by @hsheth2 in https://github.com/datahub-project/datahub/pull/5591
- refactor(ingest): simplify upgrade version stats by @hsheth2 in https://github.com/datahub-project/datahub/pull/5588
- feat(metadata-service-auth): add support for eternal personal access tokens by @ksrinath in https://github.com/datahub-project/datahub/pull/5433
- fix(ci): paths for github workflows by @anshbansal in https://github.com/datahub-project/datahub/pull/5595
- fix(ingest): Fix ingest Clickhouse without password by @liyuhui666 in https://github.com/datahub-project/datahub/pull/5511
- fix(ci): cleanup sleeps to instead use retries by @anshbansal in https://github.com/datahub-project/datahub/pull/5597
- Kafka form Addition and resolved confilict by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5598
- fix(ingest): Fix minor logging bug in the glue source. by @rslanka in https://github.com/datahub-project/datahub/pull/5605
- fix(ci): use different image for smoke base image by @anshbansal in https://github.com/datahub-project/datahub/pull/5607
- fix(ci): cancel docker-unified workflow only on PRs on new commits by @anshbansal in https://github.com/datahub-project/datahub/pull/5608
- fix(ci): add env variable for creds smoke test by @anshbansal in https://github.com/datahub-project/datahub/pull/5609
- fix(ui) Followups to recent changes to UI ingestion forms by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5602
- docs(transformers): Add domain transformer documentation in transformers readme by @mohdsiddique in https://github.com/datahub-project/datahub/pull/5606
- feat(model): adding status aspect to assertions by @shirshanka in https://github.com/datahub-project/datahub/pull/5612
- fix(ingest): use default telemetry ID when config is unwritable by @hsheth2 in https://github.com/datahub-project/datahub/pull/5614
- chore(ingest): drop python 3.6 support by @hsheth2 in https://github.com/datahub-project/datahub/pull/5521
- fix(ui): Split based on Data Platform delimiter in Lineage viz by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5613
- feat(search): Sticky search filters + misc bug fixes & improvements by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5601
- fix(graphql): handle null source values in ml features & primary keys by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5626
- fix(graph service): only query for entities that should have lineage [Breaking Change] by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5539
- feat(model): Add optional message field to auditstamp by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5611
- fix(ingest): fix indenting issue in azure ad connector by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5627
- feat(tokens) Create and display non-expiring tokens on the frontend by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5630
- Schema tab: Fixed the header issue by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5622
- build(docs-website): only show release notes for recent releases by @hsheth2 in https://github.com/datahub-project/datahub/pull/5621
- docs(README): update links and reorg content by @maggiehays in https://github.com/datahub-project/datahub/pull/5618
- perf(operations): performance improvement to operations tab via reduced fetching by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5632
- feat(ui) Retrieve last ingested timestamp and display on frontend by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5600
- Update README.md and maintaining consistency by @hemanthkotaprolu in https://github.com/datahub-project/datahub/pull/5623
- fix(ingest): fix delta-lake dict iteration bug by @hsheth2 in https://github.com/datahub-project/datahub/pull/5625
- fix(ingest): okta - make async loop init more robust by @shirshanka in https://github.com/datahub-project/datahub/pull/5640
- fix(ingest): cli - handle exception in upgrade check by @shirshanka in https://github.com/datahub-project/datahub/pull/5641
- build(docs-website): make codegen script idempotent by @hsheth2 in https://github.com/datahub-project/datahub/pull/5620
- docs(airflow): fix formatting by @hsheth2 in https://github.com/datahub-project/datahub/pull/5617
- fix(ui): Fixing minor search redirect filtering issue introduced by sticky filters by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5643
- fix(ingestion): Update developer docs by @szalai1 in https://github.com/datahub-project/datahub/pull/5644
- feat(ui): Adding slack handle to corp group info by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5645
- fix(delta-table): allow env, credential file based s3 auth by @MugdhaHardikar-GSLab in https://github.com/datahub-project/datahub/pull/5636
- feat(GraphQL API): Add "browsePaths" field to browsable entity types by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5646
- feat(ingest): generate a list of aspects in codegen by @hsheth2 in https://github.com/datahub-project/datahub/pull/5633
- feat(ingestion): Glue stateful ingestion by @amanda-her in https://github.com/datahub-project/datahub/pull/5553
- feat(ingest): add snowflake-beta source by @mayurinehate in https://github.com/datahub-project/datahub/pull/5517
- fix(ingest): remove alphabet field from allow/deny config by @hsheth2 in https://github.com/datahub-project/datahub/pull/5629
- feat(mssql): add multi database ingest support by @MugdhaHardikar-GSLab in https://github.com/datahub-project/datahub/pull/5516
- chore(ingest): drop data-lake source in favor of s3 source by @hsheth2 in https://github.com/datahub-project/datahub/pull/5628
- fix(ingest): use mongodb ping command to test connection by @hsheth2 in https://github.com/datahub-project/datahub/pull/5650
- fix(ingest): remove
profile_sql_table
event by @hsheth2 in https://github.com/datahub-project/datahub/pull/5616 - fix(ci): use graphql instead of restli by @anshbansal in https://github.com/datahub-project/datahub/pull/5610
- feat(ingest): rest_emitter - Adding option to disable ssl by @szalai1 in https://github.com/datahub-project/datahub/pull/5642
- feat(ingest): GE Profile/Action Trino support by @aezomz in https://github.com/datahub-project/datahub/pull/5361
- Stats Tab: Table and column stats hide when there is no data by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5651
- fix(ingest): redash - fix redash dashboard url bug by @de-kwanyoung-son in https://github.com/datahub-project/datahub/pull/5500
- Glossary: Worked on the refetching data issue by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5638
- feat(ingestion) Fetch live logs on an ingestion run from UI by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5653
- fix(spark-lineage): Create application setup on sqlevent start by @MugdhaHardikar-GSLab in https://github.com/datahub-project/datahub/pull/5657
- fix(ui) Remove constraint for searching with less than 3 characters by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5654
- docs: adds ABLY as DataHub adopter by @de-kwanyoung-son in https://github.com/datahub-project/datahub/pull/5656
- fix(siblings): set sleep after checking if the restore step should run by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5660
- fix(users): add origin aspect to corpuser by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5662
- feat(domains): highlighting domain recommendation cards on homepage by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5655
- feat(ingestion) Followups to live ingestion logs in UI by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5676
- feat(test): add option to send to slack thread by @anshbansal in https://github.com/datahub-project/datahub/pull/5673
- chore(ingest): set min stackprinter version by @hsheth2 in https://github.com/datahub-project/datahub/pull/5666
- docs(airflow): fix note formatting by @hsheth2 in https://github.com/datahub-project/datahub/pull/5679
- docs: fixes typos in Business Glossary docs by @topleft in https://github.com/datahub-project/datahub/pull/5615
- fix(docs) Fix link from Business Glossary ingestion page by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5680
- Worked on the Hive ingestion form by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5661
- feat(ingestion): Support for displaying history of CLI ingestion runs in the "Manage Ingestion" UI by @rslanka in https://github.com/datahub-project/datahub/pull/5639
- Search Page: Pagination Issue by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5685
- feat(ingestion-ui) Display CLI-based ingestion sources in UI by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5681
- fix(schema-history): make latestVersion field on result optional by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5689
- feat(ingest): file - add support for folders, large files, improve co… by @shirshanka in https://github.com/datahub-project/datahub/pull/5692
- feat(ingest): rest-sink - stability improvements to handle large inpu… by @shirshanka in https://github.com/datahub-project/datahub/pull/5693
- Add UP_FOR_RETRY DPI run result by @divyamanohar-stripe in https://github.com/datahub-project/datahub/pull/5664
- feat(ingest): add support for a event failure log + reporting cancelled runs on cli by @shirshanka in https://github.com/datahub-project/datahub/pull/5694
- fix(doc): Fixing boolean type in datahub rest emitter's json schema by @treff7es in https://github.com/datahub-project/datahub/pull/5695
- fix(ui) Refresh executions on Ingestion page when they are visible by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5698
- fix(ingest): emit status aspect for entities ingested from okta and azure_ad by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5700
- feat(kafka-setup): Adds SASL SSL support in kafka setup docker image by @pedro93 in https://github.com/datahub-project/datahub/pull/5697
- fix(ingest): refactor sync-async config, thread-safety for sink repor… by @shirshanka in https://github.com/datahub-project/datahub/pull/5705
- feat(ingest): add
enable_owner_extraction
option to dbt by @hsheth2 in https://github.com/datahub-project/datahub/pull/5707 - feat(ingestion): add github_info config for dbt by @remisalmon in https://github.com/datahub-project/datahub/pull/5648
- docs(ingest): add info about datahub auth tokens with airflow by @hsheth2 in https://github.com/datahub-project/datahub/pull/5703
- fix(airflow): Stable tag order in DataFlow/DataJobs by @treff7es in https://github.com/datahub-project/datahub/pull/5696
- fix(ingest): add pymongo srv extra by @hsheth2 in https://github.com/datahub-project/datahub/pull/5701
- fix(ui): Long overdue - Fix red error screens during OIDC login, logout exception scenarios by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5708
- feat(ingest): better reporting for file source, friendlier stats names by @shirshanka in https://github.com/datahub-project/datahub/pull/5710
- Worked on postgres ingestion form integration by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5671
- feat(ingest): Add mode option to presto-on-hive source by @szalai1 in https://github.com/datahub-project/datahub/pull/5659
- Worked on the alignment of all data in domain list by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5713
- feat(retention) Enable retention and set max versions for executionRequests by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5704
- fix(ingestion): Fix nifi integration tests. by @rslanka in https://github.com/datahub-project/datahub/pull/5718
- build(deps): bump nbconvert from 6.5.0 to 6.5.1 in /docker/datahub-ingestion by @dependabot in https://github.com/datahub-project/datahub/pull/5716
- feat(ingest): remove nulls during serialization by @shirshanka in https://github.com/datahub-project/datahub/pull/5719
- feat(looker): index looker charts and dashboards by business term by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5649
- fix(GMS): No such classes directory file:///etc/datahub/plugins/auth/r… by @mohdsiddique in https://github.com/datahub-project/datahub/pull/5720
- fix(ingestion): ingest tables from dba_tables in oracle source by @mohdsiddique in https://github.com/datahub-project/datahub/pull/5592
- fix(ingest): redshift-usage: check full table/schema names with AllowDenyPattern by @hsheth2 in https://github.com/datahub-project/datahub/pull/5702
- Worked on the scroll to top of the page after pagination change by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5714
- feat(ingest): round time to 2 decimal places by @anshbansal in https://github.com/datahub-project/datahub/pull/5721
- fix(superset): do not crash when display_uri is not set by @daha in https://github.com/datahub-project/datahub/pull/5711
- fix(deps): remove tdigest dependency and associated code by @shirshanka in https://github.com/datahub-project/datahub/pull/5729
- fix(ingest): bigquery - Not setting ge config schema when profiling with temp table by @treff7es in https://github.com/datahub-project/datahub/pull/5737
- feat(ingest): file - allow filter by aspect and get stats by @anshbansal in https://github.com/datahub-project/datahub/pull/5738
- fix(ingest): looker - soft-deleted charts should re-emerge on re-disc… by @shirshanka in https://github.com/datahub-project/datahub/pull/5732
- feat(elasticsearch): Add nested type display by @liyuhui666 in https://github.com/datahub-project/datahub/pull/5524
- fix(docs): fixes issue with auto-generated ingestion doc by @shirshanka in https://github.com/datahub-project/datahub/pull/5733
- feat(mysql): support multiple database in single recipe by @MugdhaHardikar-GSLab in https://github.com/datahub-project/datahub/pull/5684
- fix(ingest): tweak mongodb schema inference to fix test by @hsheth2 in https://github.com/datahub-project/datahub/pull/5744
- fix(bootstrap): Remove malformed test in bootstrap.json by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5747
- docs(site redesign): Overhaul Docs Site by @maggiehays in https://github.com/datahub-project/datahub/pull/5731
- fix(ingestion): Fix SQL Lineage Parser to handle special tokens with a hyphen in table and column names. by @rslanka in https://github.com/datahub-project/datahub/pull/5748
- Snowflake beta improvements by @mayurinehate in https://github.com/datahub-project/datahub/pull/5736
- chore(ingest): update mixpanel api endpoint by @hsheth2 in https://github.com/datahub-project/datahub/pull/5750
- feat(model): add chartUsageStatistics to the chart entity by @shirshanka in https://github.com/datahub-project/datahub/pull/5753
- fix(ui): Improve Error Messaging on the UI by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5752
- chore(ingest): add vulture config and remove some dead code by @hsheth2 in https://github.com/datahub-project/datahub/pull/5745
- fix(doc): presto-on-hive - Removing new lines from docs to fix doc generation by @treff7es in https://github.com/datahub-project/datahub/pull/5755
- feat(restore-indices): add multithreading and add aspectName, urn filter by @anshbansal in https://github.com/datahub-project/datahub/pull/5712
- fix(GMS): fix no such classes directory file:///etc/datahub/plugins/auth/resources by @mohdsiddique in https://github.com/datahub-project/datahub/pull/5743
- feat(ingestion) Add ability to rollback ingestion from UI - BE PR by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5739
- feat(ingestion-ui) Add ability to set debug_mode on UI ingestion sources by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5762
- fix(search): validate entities exist before returning search results in EntityClient by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5751
- feat(ingestion-ui) Add ability to rollback ingestion runs from the UI - FE only by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5740
- fix(ingest): proper null skip logic in serialization by @hsheth2 in https://github.com/datahub-project/datahub/pull/5749
- fix(ingest): snowflake-beta fix missing initialization of variable by @mayurinehate in https://github.com/datahub-project/datahub/pull/5757
- fix(ingest): add databricks dep for hive by @hsheth2 in https://github.com/datahub-project/datahub/pull/5764
- feat(ingest): add config to extractor interface by @hsheth2 in https://github.com/datahub-project/datahub/pull/5761
- chore: update server-side telemetry endpoint by @hsheth2 in https://github.com/datahub-project/datahub/pull/5759
- feat(ingestion): bigquery - Bigquery beta connector - first cut by @treff7es in https://github.com/datahub-project/datahub/pull/5663
- feat(ingestion): looker chart usage statistics by @mohdsiddique in https://github.com/datahub-project/datahub/pull/5652
- feat(restore-indices): add urn like filter by @anshbansal in https://github.com/datahub-project/datahub/pull/5770
- feat(restore-indices): add timing info by @anshbansal in https://github.com/datahub-project/datahub/pull/5773
- feat(simplified homepage): adding option to show limited entity types on homepage by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5678
- fix(ingest): add pydantic version upper bound by @hsheth2 in https://github.com/datahub-project/datahub/pull/5775
- Worked on the Secret Fields in ingestion form by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5727
- feat(cli): add spinner to indicate progress by @shirshanka in https://github.com/datahub-project/datahub/pull/5769
- feat(roles): add roles feature to DataHub by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5767
- feat(model): add storage size to dataset profiles by @shirshanka in https://github.com/datahub-project/datahub/pull/5777
- docs(roles): add documentation about roles by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5778
- fix(ui): Remove add limit on Entity Profile for glossary terms and tags by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5780
- fix(ci): Attempting to fix failing smoke tests by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5760
- fix(tags) Add creator of tag as the owner of it by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5787
- docs(lookml): updating github_info in lookml docs by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5779
- fix(audit logs) Set actor urn on audit stamp through Java Entity Client by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5788
- feat(ingestion-ui) Add test connection button to Looker form by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5794
- fix(ingestion): fix looker chart-usage by @mohdsiddique in https://github.com/datahub-project/datahub/pull/5791
- fix(ingest): Fix oauth config validation in snowflake. by @rslanka in https://github.com/datahub-project/datahub/pull/5796
- fix(bootstrap): Creating dedicated thread pool for executing async bootstrap steps + misc fixes by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5798
- feat(previews): add previews for glossary terms, tags, and domains by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5784
New Contributors
- @hemanthkotaprolu made their first contribution in https://github.com/datahub-project/datahub/pull/5623
- @szalai1 made their first contribution in https://github.com/datahub-project/datahub/pull/5644
- @amanda-her made their first contribution in https://github.com/datahub-project/datahub/pull/5553
- @de-kwanyoung-son made their first contribution in https://github.com/datahub-project/datahub/pull/5500
- @topleft made their first contribution in https://github.com/datahub-project/datahub/pull/5615
- @divyamanohar-stripe made their first contribution in https://github.com/datahub-project/datahub/pull/5664
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.43...v0.8.44