v0.10.0

版本发布时间: 2022-07-18 10:50:31

microsoft/SynapseML最新发布版本:v1.0.5(2024-08-30 10:16:51)

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.10.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, Java, .NET, C#, and F#.

Highlights


OpenAI Language Models	.NET, C#, and F# Support	Full MLFlow Support	Live Demos in Browser
Embed 175-billion parameter models into your databases with ease	Use or train any SynapseML model from .NET	Quick and easy MLOps, model management, and autologging	Explore the SynapseML library with zero setup
Learn More	Getting Started Guide	Explore the Docs	Run in Browser

New Features

General ✨

SynapseML now supports .NET, C#, F#, and other .NET ecosystem languages in addition to Scala, Python, and R. Please see our Setup Guide and LightGBM from .NET example for more details. (#1539, #1156, #1443)
SynapseML is now usable from your browser with zero setup using Binder. Quickly explore our demos in Binder. (#1487, #1493)

Azure Cognitive Services for Big Data 🧠

Added OpenAI GPT-3 Sentence Completion Transformer. Use this feature to embed 175-billion parameter language models into distributed pipelines and databases to solve a variety of general purpose NLP tasks across natural language and code. (#1495, #1541)
Added an example of Sentence Completion with GPT-3 (#1564)
Added support for Form Recognizer V3.0 (#1269)
Improved MVAD usability with async training and better data validation (#1477)
Upgraded the univariate anomaly detection version to v1.1-preview (#1440)
Added a multivariate anomaly detection sample notebook (#1365)
Added a Text to Speech example to cognitive service overview (#1350)
Added opinion mining to TextSentiment Models (#1449)
Fixed Azure Maps schemas (#1553)
Removed modelID param validators in FormRecognizerV3 (#1551)
Fixed form recognizer and form ontology learner issues (#1506)
Fixed setServiceName python method in OpenAI (#1498)
Fixed error in Text Analytics Analyze schema
Improved error handling for MVAD (#1448, #1391)
Removed unused concurrency parameter for MVAD (#1383)
Improved robustness of flood risk notebook by adding polling (#1427)

Responsible AI at Scale 😇

Added partial dependence plots (PDP) to allow for understanding how independent variables affect a model's prediction (#1426)
Updated ICE/PDP documentation with PDP-based feature importance and additional examples (#1441, #1352)
Added a notebook for ICE and PDP feature explainers (#1318)
Updated data balance documentation to better describe how it can be used to ensure model fairness (#1540)

MLFlow 🔃

Added documentation for MLFlow autologging (#1508)
Added documentation on the SynapseML-MLFlow integration (#1428)

LightGBM on Spark 🌳

Added the ability to pass in generic argument strings to LightGBM enabling many complex parameterizations (#1444)
Added seed parameters to LightGBM (#1387)
Added a method to get LightGBM native model string directly (#1515)
Fixed issue with validation data creation during useSingleDataset mode (#1527)
Fixed multiclass training with initial scores (#1526)
Fixed saving LightGBM model iterations with early stopping (#1497)
Fixed issue where chunk size parameter was incorrectly specified during data copy (#1490)
Fixed issue where when empty partition is chosen as the main worker in singleDatasetMode (#1458)
Fixed bug with data repartitioning in LightGBMRanker (#1368)
Fixed outdated docs for useSingleDatasetMode (#1562)
Refactored LightGBM class structure to improve logging and debugging (#1557)

Vowpal Wabbit 🐇

Fixed issues with the saveNativeModel for the VWRegressionModel #1364 (#1366)
Fixed issues with building quadratic interaction terms (#1460)

Isolation Forests 🌲

Added an Isolation Forest Multivariate Anomaly Detection sample notebook (#1483)

Additional Updates

Maintenance 🔧

Removed unused debugging code (#1546)
Remove Synapse test exclusion for Explanation Dashboard notebook (#1531)
Made python style checks verbose (#1532)
Fixed library checking while installing library on Databricks cluster (#1488)
Upgraded and fix Dockerfiles (#1472)
Added Developer Docker Image build to pipeline (#1480)
Fixed ADO area path in Issue Linker (#1464)
Fix master version badge display
Improved Databricks error reporting
Updated azure cli to stop build errors
Fixed SSL handshake flakiness
Added itsdangerous as a dependency to ADB tests (#1412)
Turned on debug for pr to work item workflow
Pointed pr linker to official implementation
Changed GitHub action trigger from pull_request_target to pull_request (#1413)
Fixed issue where Unit Tests were not executing (#1409)
Added Azure DevOps PR linker (#1394)
Updated GH PAT name (#1389)
Re-enable Synapse E2E Tests (#1517)
Updated SynapseE2E Tests to Spark 3.2 (#1362)
Fix ADO issue/pr linking (#1463)
Cleaned up extra MVAD models and improved network resiliency (#1457)
Updated azure blob client version (#1563)
Fixed docker security vulnerability (#1561)
Streamlined scalastyle hook (#1530)
Updated CODEOWNERS (#1523)
Updated OpenAI resource info (#1525)
Fixed semantic PR checking (#1503)
Updated docker images to remain compliant (#1500)
Added component governance explicitly to build so timeout variable works (#1489)
Fixed path for notebook test files in gitignore (#1485)
Increased component governance timeout (#1482)
Added conda caching to build
Stopped build from failing after 1 hour
Fixed flaking MVAD test
Refactored build pipeline definitions
Split Synapse tests into multiple test (#1377)
Moved from ADO Pipelines to GitHub Workflows (#1406)

Website Improvements 💻

Fixed MathJax expressions rendering (#1343)
Fixed google analytics gtags (#1434)
Corrected placement of BingSiteAuth.xml config (#1445, #1439)
Fixed website security and upgrade docusaurus (#1545)
Moveed Geospatial Services to its own folder (#1345)
Bumped minimist from 1.2.5 to 1.2.6 in /website (#1455)
Bumped node-forge from 1.2.1 to 1.3.0 in /website (#1451)
Bumped prismjs from 1.25.0 to 1.27.0 in /website (#1430)
Bumped follow-redirects from 1.14.7 to 1.14.8 in /website (#1402)
Bumped nanoid from 3.1.23 to 3.2.0 in /website (#1355)
Bumped shelljs from 0.8.4 to 0.8.5 in /website (#1347)
Bumped follow-redirects from 1.14.1 to 1.14.7 in /website (#1348)
Bumped cross-fetch from 3.1.4 to 3.1.5 in /website (#1496)
Bumped async from 2.6.3 to 2.6.4 in /website (#1481)
Pinned onnxmltools to a specific version (#1524)

Bug Fixes 🐞

Fixed twitter sentiment detection notebook (#1544)
Fixed issue with DataConversion serialization (#1505)
Fixed typos in TestBase (#1501)
Fixed issue in GridSpace python API (#1470)
Fixed reflective class loading in IntelliJ (#1456)
Removed verbose ComputeModelStatistics output and convert scoredLabelsCol to DoubleType (#1361)
Fixed flaking in geospatial notebooks

Code Style 🎶

Improved style checks using pre-commit (#1538, #1528, #1535)
Formatted code and notebooks with Black style checker (#1522, #1520)

Documentation 📘

Tabularized badges for readability (#1486)
Added a PR template (#1418)
Improved installation readme (#1369, #1422)
Added a Security readme (#1511)
Updated the Azure Synapse readme (#1372)
Remove reference to custom maven resolver
Added pointer to docs on synapse pool configuration
Fixed typos in readme (#1516)

Contributor Spotlight

We are excited to highlight the contributions of the following SynapseML contributors:


Serena Ruan	Ric Serradas	Puneet Pruthi
Serena is a Software Engineer II on the Synapse team in Beijing and a force of nature. In this release, Serena has continued her prolific contribution steak by adding language support for .NET, C#, and F# and integrating SynapseML with MLFlow. Additionally, Serena has contributed several features to the MLFlow and Spark.NET open-source communities so that these systems can work better for every user. These contributions are just some of the many amazing things Serena has accomplished during this release, and her devotion and craft are pivotal to the ecosystem.	Ric is a Senior Engineering Manager on the OneNote team with a shining personality and drive to collaborate. In just a few weeks Ric hit the ground running by setting up an automated link between GitHub and Azure DevOps, building the first working version of SynapseE2E tests, and re-writing our entire build in GH Actions. Furthermore, Ric worked tirelessly through nights and weekends to land his contributions.	Puneet is a Senior Engineer on the SynapseML team with a knack for engineering systems and dockerization. Puneet's contributions to the library include architecting the new binder integration, driving our Synapse E2E tests to completion, and improving SynapseML’ s infrastructure around community engagement. Puneet is constantly thinking of ways to improve the community and we value his effort.

Mark Niehaus	Keerthi Yanda	Yagna Oruganti
Mark is a Senior Software Engineer on the SynapseML team with a deep knowledge of the .NET ecosystem and infrastructure development. In this release, Mark architected SynapseML’ s .NET binding blob publishing strategy, drove the OpenAI GPT-3 bindings to completion, and wrote a detailed GPT-3 walkthrough. Mark completed these projects while supporting the Time Series Insights service, speaking to his ability to keep multiple plates spinning at a time.	Keerthi is a Software Engineer II on the SynapseML team. Despite joining Microsoft just a few months ago, Keerthi has quickly learned the SynapseML ropes to take command of our integration with the Azure Synapse platform. Huge kudos to her for braving long build times, and daunting error messages to make sure SynapseML works out of the box on Synapse Analytics clusters.	Yagna is a Senior Data and Applied Scientist on the Industry AI team with a talent for building solutions that integrate many community tools to solve customer challenges. Yagna's first contribution to SynapseML was a masterpiece of a demo showing how to use Isolation Forests, MLFlow, Tabular SHAP, and the interpret-ml explanation dashboard in a single anomaly detection example.

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external, who helped create this version of SynapseML

Serena Ruan @serena-ruan, Eric Dettinger, Scott Votaw @svotaw, Puneet Pruthi @ppruthi, Ric Serradas @riserrad, Mark Niehaus @niehaus59, Kyle Rush @k-rush, Keerthi Yanda @KeerthiYandaOS, Yagna Oruganti @YagnaDeepika, Jason Wang @memoryz, Ilya Matiach @imatiach-msft, Yazeed Alaudah @yalaudah, Elena Zherdeva @ezherdeva, Kashyap Patel @ms-kashyap, Martha Laguna @martthalch @marthalc, Alex Li @liyzcj, Maria Guirguis @maguir, Alexandra Savelieva @alsavelv, @netang, Sudhindra Kovalam @SudhindraKovalam, Markus Cozowicz @eisber, Tom Finley, Markus Weimer, Jeff Zheng, James Verbus @jverbus, Chris Hoder, Misha Desai, Nellie Gustafsson, Eren Orbey, Beverly Kodhek, Louise Han @jr-MS, Justyna Lucznik, Kim Manis, Mitrabhanu Mohanty, Bogdan Crivat, Anand Raman, William T. Freeman, James Montemagno, Luis Quintanilla, Dennis Kennedy, Ryan Hurey, Jarno Ensio, Brian Mouncer, Steve Suh @suhsteve, Akshaya Annavajhala (AK), Guolin Ke, Tara Grumm, Niharika Dutta @Niharikadutta, Andrew Fogarty, Juanyong Duan, Weichen Xu @WeichenXu123, Spark.NET Team, ONNX Team, Azure Global, Vowpal Wabbit Team, LightGBM Team, MSFT Garage Team, MSR Outreach Team, Speech SDK Team, MLflow Team

Learn More


Visit our website for the latest docs, demos, and examples	Read more about SynapseML's GA release in the Microsoft Research Blog	Learn more about our .NET bindings and code generation system.

Watch a demonstration of SynapseML to create a multilingual search engine.	Read our Paper from IEEE Big Data '21	Explore our integration with the Azure OpenAI Service

相关地址：原始地址下载(tar) 下载(zip)

查看：2022-07-18发行的版本