MyGit

v0.11.0

microsoft/SynapseML

版本发布时间: 2023-03-05 21:37:53

microsoft/SynapseML最新发布版本:v1.0.5(2024-08-30 10:16:51)

SynapseML: Simple and distributed machine learning
Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.11.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, Java, .NET, C#, and F#.

Highlights

ChatGPT and GPT-4 at Scale Simple Deep Learning LightGBM v2
Intelligent chat and embeddings. Simplified Prompting APIs. Train custom image and text classifiers with ease Higher performance, >10x lower memory footprint, same API
View Notebook Learn More Try an example
ONNX Model Hub Causal Learning Vowpal Wabbit v2
Embed >150 state of the art deep networks into your pipelines Discover and measure causal treatment effects New second generation integration
Learn More View Docs Explore Samples

New Features

General ✨

Open AI 🤖

Deep Learning 🕸

Azure Cognitive Services for Big Data 🧠

Causal Learning 📈

LightGBM 🌳

Vowpal Wabbit 🐇

Additional Updates

Bug Fixes 🐞

Build 🏭

Documentation 📘

Maintenance 🔧

Deprecations and Removals 🗑️

Testing 💚

Contributor Spotlight

We are excited to highlight the contributions of the following SynapseML contributors:

Scott Votaw Serena Ruan Haizhou (Dylan) Wang
Scott Votaw is a Principal Engineer on the SynapseML team has solved some of SynapseML’s toughest challenges in record time. In this release, Scott contributed both the new LightGBM streaming execution mode, and fully replaced our deep learning stack with the ONNX Runtime. These efforts were massive lifts including huge changes to the LightGBM native libraries and complex dependency management jujitsu respectively. Scott brings his love for the craft to every project he works on so keep your eyes peeled for more amazing feats of engineering from him in future releases. Serena is a Software Engineer II on the SynapseML team and operates on a separate plane of existence than the rest of us mere mortals. Following up on prior major contributions like .NET support, form recognition, translation, and creating the SynapseML Website, Serena contributed the Simple Deep Learning package for this release. This package makes it easy to train modern deep text and vision networks from Hugging Face and torchvision on Spark clusters. Serena seeks only the most difficult engineering challenges and her contributions have laid the groundwork for many more deep-learning based algorithms in SynapseML. Haizhou (Dylan) is a Senior Software engineer in the CSX Data team and a first-time contributor to the SynapseML library. Dylan contributed the new SynapseML causal learning package for the v0.11 release. This package helps users discover the effectiveness of things like medical treatments or economic policies even without controlled experiments. With his elegant contributions, Dylan has laid the foundation for more causal collaborations with the EconML library.
Markus Cozowicz Brendan Walsh Jessica Wang
Markus is a Principal Applied Scientist who (just!) joined the SynapseML team. Despite only recently coming on board officially, Markus has long been a prolific contributor to the library and built the Vowpal Wabbit and Isolation Forest integrations. In this release, Markus contributed the second generation of the Vowpal Wabbit integration, improving its generality and applicability. He also expanded the OpenAI integration to support embeddings and simplified prompt templating. Our team is incredibly lucky to have such a consistent and thoughtful collaborator. Brendan is a Senior Engineer on the SynapseML team who recently joined after a long tenure on the Cognitive Services team where he developed their containerized cognitive service effort and co-authored the SynapseML publication on large-scale microservices. Brendan used this expertise to onboard Emotion Detection for text to speech models. He then went on to use this new emotive reading capability to create and donate thousands of audiobooks to the open source. You can learn more about Brendan’s awesome technical philanthropy efforts at https://aka.ms/audiobook. Jessica is Software Engineer who recently joined the SynapseML team. Already, Jessica has grown into the role of the SynapseML benevolent “doc”tator. This release Jessica has worked hard to ensure that the SynapseML notebooks work across a wide variety of Spark platforms and are easy and simple to get started with. This work requires knowledge of the entire library’s surface area, and we are thankful Jessica has worked so hard to learn this breadth of content. If you have been following notebook examples from https://aka.ms/spark you have Jessica to thank!
Kyle Rush Avrilia Floratou Jason Wang
Kyle is a Senior Software Engineer on the SynapseML team with a penchant for architecture and a streak of taking on big responsibility behind the scenes. Kyle has been instrumental in expanding our testing infrastructure to new platforms so that the lights stay on even as the number of contributions increases. This often requires nontrivial code and delicate cross-team collaboration, and Kyle has both the engineering might and the charismatic finesse to make sure these systems can be spun up successfully. Avrilia is Principal Scientist Manager in the Grey Systems Lab, first-time SynapseML contributor, and a delightful collaborator. This release, Avrilia contributed the first prototype of the simplified OpenAI prompting transformer. This contribution makes it easy to ask ChatGPT and other LLMs questions about large datasets and to create new LLM-derived columns in databases. You can learn more about her work through the OpenAI Docs and prompting demo Jason Wang is a Principal Software Engineering on the CSX Data team and has a long history of not only contributing huge features to SynapseML, but actively maintaining his contributions. This release, Jason’s work on the ONNX model hub protocol enables quick access to over >150 pretrained deep networks from the Java and Scala ecosystems. Jason has also been instrumental in fixing the most difficult and arduous bugs, some even stemming from the core Spark runtime. Finally, we deeply appreciate Jason’s leadership in the community: he consistently encourages and helps others contribute, and his impact extends far beyond his own personal contributions.

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external, who helped create this version of SynapseML

Eric Dettinger, Markus Weimer, Serena Ruan @serena-ruan, Scott Votaw @svotaw, Haizhou (Dylan) Wang @dylanw-oss, Puneet Pruthi @ppruthi, Markus Cozowicz @eisber, Brendan Walsh @BrendanWalsh, Jessica Wang @JessicaXYWang, Kyle Rush @k-rush, Avrilia Floratou, Jason Wang @memoryz, Mark Niehaus @niehaus59, Keerthi Yanda @KeerthiYandaOS, Ilya Matiach @imatiach-msft, Kashyap Patel @ms-kashyap, Martha Laguna @martthalch @marthalc, Sarah Shy @sarahshy, @ocworld, @adityakode, @nightscape, Alexandra Savelieva @alsavelv, Tom Finley, Jeff Zheng, James Verbus @jverbus, Chris Hoder, Misha Desai, Nellie Gustafsson, Eren Orbey, Beverly Kodhek, Louise Han @jr-MS, Raj Rikhy, Marcos Campos, Mike Estee, Brice Chung, Justyna Lucznik, Kim Manis, Mitrabhanu Mohanty, Bogdan Crivet, Anand Raman, William T. Freeman, Akshaya Annavajhala (AK), Guolin Ke, Spark.NET Team, ONNX Team, Azure Global, Vowpal Wabbit Team, LightGBM Team, MSFT Garage Team, MSR Outreach Team, Speech SDK Team, MLflow Team

Learn More

Visit our website for the latest docs, demos, and examples Watch our AI Show episode on creating and donating thousands of audiobooks Learn the basics of SynapseML
Read the v0.11 Update blog Apply OpenAI language models to your large datasets Learn how to check data quality with GPT-3

Changes:

See More
  • c1aa5a542e26d84912cf93b8780d62d8952cad01 docs: Fix type in old version of Interpretability - Explanation Dashboard.md (#1846)
  • 48f9c4c46e2a80d81639bcfcb1fb374e038e4de0 feat: OpenAI Prompt Template support (#1843)
  • fbbb4336d12fe1d3124295aa3c42f16d522d133e chore: fix form recognition tests (#1842)
  • b90425c41cbff7451c1bece19b59b254660c5b7b build: bump amannn/action-semantic-pull-request from 5.0.2 to 5.1.0 (#1839)
  • 31c4ea34bea59d8456d29f2791117ee3ef1a5627 fix: correct copy/paste error in acr cleanup (#1838)
  • c99796f955692f041a707e729216be05eeae9e02 feat: add maxNumClasses param to LightGBMClassifier for multi-class (#1841)
  • eb0bbe335ffdb00e58dd89947f0542e9a602ba31 fix: modify synapse test config, modify isolation forest notebook for testing (#1833)
  • 8af5112d33f5de24f97f5d8f59fd9d403e412bd6 docs: remove hyperOpt exclusion - mlflow on synapse (#1837)
  • d66796d7c743d63e9e0b2bd1b45874f448b61aff build: bump http-cache-semantics from 4.1.0 to 4.1.1 in /website (#1826)
  • 7038e1da31f9ef48f119a7805110ea190155f97c feat: add Azure OpenAI embedding support (#1832)
  • 6c6d89b54a595718a149e8accc9de3b311fadb22 chore: turn off failing synapse tests (#1835)
  • eb355817285564bd062eb6deab9755146ba00157 docs: add hyperopt sample (#1828)
  • 2262a9bf5d3db651f9d87cd2a3849e1154d600ad feat: add aad auth for openAI (#1829)
  • a7e20ce39d479a680da5811f6fa04975e0b407c0 docs: Add a notebook for advanced cognitive service usage (#1825)
  • 3eed94c205629764119ec1ba04b93f426c201cd8 test: Update test to validate ATE value should be positive with the test data (#1821)
  • 8aa4ae17dc9e6e45fa53047f9827edc5beb2b8ac chore: re-enable E2E tests for Synapse-Extension (#1823)
  • 1dcb5881235994c42ad88aafdfffebeb9ae29ea4 docs: update spark3.2 installation on Synapse (#1815)
  • e36643f1a71e39b0c6bfe6c53935b857453dcd61 fix: Fix DML regression bug, should remove both treatment and outcome columns as feature columns (#1820)
  • 4ef8e30167a304ecd2a58dbf4a443d91e062d8ad fix: Add spark config to fix ArrayStoreException in Synapse - Add back HyperparameterTuning nbs to test pipeline (#1757)
  • eb403728f9162565fd3ec124012ed671b92447e0 fix: Add TreatmentCol Type check at the very beginning (#1816)
  • af0a218bcc9b03d9b677ba7b6978a09fc64ba46b docs: update required spark and python version on website doc (#1812)
  • e0c5364df052e6e896e01e5a3e971876771a7170 build: bump ua-parser-js from 0.7.31 to 0.7.33 in /website (#1809)
  • e212f5df2f3c1da89409f3214a4d5c82dc1f135f chore: add retry to commands (#1814)
  • bd1e0a61f3fc0dc109702e613d21eadd04c56565 fix: breeze NoSuchMethodError (#1807)
  • 13ff3467fd0778761463ed3ebfa652e07dc5ad1c chore: Disable synapse-extension tests, add params to pipeline (#1810)
  • 4c8d2e972f1bfb33efc6aced2b8a9a542de119ec doc: apply diffs from website/docs to website/versioned_docs (#1808)
  • 7d8d6fd634bee190520853ebf69f6e62b5ac15b7 replicating the unit test data (#1806)
  • fb47138d7e881ecb654d5d039893e63bb13abce9 docs: DoubleMLEstimator document and sample notebook (#1730)
  • 8ba77e4eea742c4728a662ba68a660db6c974054 build: Return values from TaskKeys (#1775)
  • 94ed68512d41434af28f26097dbe2ebef5c1643c chore: disable Interpretability tests (#1803)
  • 54e7ac6143fc25ed2300e85fddb4e3ce21922e84 feat: add setting for getting word level timing information from SpeechToText (#1801)
  • d8d523c6ba603d62744b06608265bb5bfbdad8df feat: annual Vowpal Wabbit improvements (#1579)
  • 01e31dc69102694cbd05be1757c69b20efc13651 test: fix issue with missing causal tests (#1799)
  • 9d92349ac6805321838eee0a9e6ca8407cdbf16f test: remove interpretability exclusion (#1798)
  • d308dc4497abba715f245f4e1c3cb130968d26d2 refactor: enhancement to aad token (#1797)
  • 333dedbdc8342ed2e73df29821c5a8912f3e58fd feat: upgrade MVAD to v1.1 (#1788)
  • 54a749638c3383fc0609bed86af088d5d960223a chore: fix typo in chron build def
  • c653ed77a13d7d99238c52ceb6aa4c4a6290ea3a docs: Clean latex - Data Balance Analysis (#1796)
  • 76e7b73954cc1dc0da9e6c7022a42ed8a873329a chore: linx fixes for README and features (#1794)
  • fc3a7a6a1485b2b3229a76648c2836d6c4fe4f53 chore: Re-enable e2e tests, add cron schedule to build definition (#1774)
  • a95fad403184ae4fa4b886c28b2ea072b4344373 docs: Add docs for LightGBM execution mode (#1779)
  • 44bcbf1fdcf5be829a46688ddc7e9cba06f98f13 improve documentation - bug, typo, correctness (#1791)
  • 77da4b40822a5e40afdbe5a6956666ffc9a0068f chore: acrolinx fixes for reference, mlflow and getting_started in 0.10.2 (#1793)
  • 8cc6a16a0e30cf6d8e58cb64f209c738931b9adf docs: mollify acrolinx (#1792)
  • 974e36aaad0714a3e9ca83caa6a1130fb51f249a docs: Added more up-to-date ONNX docs (#1781)
  • dc57deaf43010c91a181abe42776d43426f2e0cc feat: add aad authentication support for cognitive services (#1778)
  • dd1563fc3004143c120de3b58d77d4f66338b8a2 build: bump json5 from 2.2.1 to 2.2.3 in /website (#1785)
  • 3d8c84d7ffbe246a1e7651f75c0347b07fc5becd chore: fix style (#1790)
  • 53788bd94cf0cc4badaaa91651b02221d297cf69 fix: small tweaks to clean_acr (#1787)
  • 851efdc98de7c61480912c14bf192ad222c41b97 build: bump actions/upload-artifact from 3.1.1 to 3.1.2 (#1786)
  • 9978c3b760b88839f0d5f574c159d7ed96323994 build: bump ossf/scorecard-action from 2.1.1 to 2.1.2 (#1777)
  • 421e3fe3c09f7871db80f588654ef587d47adcea Fix: fix annamespace import for Experimental (#1780)
  • 8dc4a582bb3e3f9c477583e850b76ddd4254973d build: bump ossf/scorecard-action from 2.1.0 to 2.1.1 (#1773)
  • 61435147a64a52680152b3e55275a139a8b6b787 build: Remove unnecessary SbtPlugin settings (#1771)
  • de21adaa9408df0ec566711f6ccd9fb1bc3ac903 chore: disable synapse-internal tests
  • d0a9f20df0f753c93ad459cd401197165b64be96 feat: Causal DoubleMLEstimator (#8) (#1715)
  • 7ab63a14ac1a9f14174b16b1be88ea99a94b5db4 fix: Update synapse-extension test environment, enable cleanup of old arti… (#1769)
  • be02bf78b4ab3b789f6cc773cf71653acc8e0933 build: bump ossf/scorecard-action from 2.0.6 to 2.1.0 (#1770)
  • 0bf4772ec35b0cd9b91d7da0c97d2fb1ad193cd1 build: bump actions/upload-artifact from 3.1.0 to 3.1.1 (#1766)
  • bb6e37b956c62378b13cebeed0b63b2916e4a98c Update codeql.yml (#1765)
  • 8a30fd482b9f564ff96277d712f1ec856a1c00cc Create codacy.yml
  • d9810b157d68f22bda50b4dbfd271514e49a812a Create codeql.yml
  • 630a442c1fcf7556e3514fd91dd8bf6d88a502ab Create scorecards.yml
  • 7785cb5e15fe17b51d3610d95f045de5e0a72d26 feat: add new AnalyzeText API (#1760)
  • cf14041be1e17e030af18fa1eb2afc96623e3e65 chore: delete old models before tests rather than after (#1759)
  • 9e32a99b40151136d4af0c2bbb1ffba8865f45a1 fix: fix failing SpeakerEmotionInferenceSuite
  • 4a2595461fb1d68609ec74ce4f7d33e58095452e fix: delete too many anomaly models (#1758)
  • adad80d4454dd4fab4aacc93fa34e2aefb5ff75b chore: remove cntk and downloader tests from build
  • 046689d9666c334aa0e11a603882afc62c8d1775 chore: fix codecov.yaml
  • 3a3be327d0022d3181ac22ea90c9b6b2b5beae92 feat: Add LightGBM streaming execution mode (#1580)
  • b205cc47b0e43ddcd1a947246bd568fbbe897630 fix: fix modelVersion param in TextAnalytics (#1756)
  • b797d6c2ec9cea652ba01fcc03ca91baed413b76 fix: make logging infrastructure consistent and add logging checks (#1755)
  • 557470bcec78acba3adcd9524337be628446311a fix: fix website sidebars and vulnerabilities in packages (#1753)
  • 9c98609a23bb82f4bfeb2bcabf0b7c9840061691 docs: update deepvision docs on website (#1752)
  • 629da631a263e46fbccd98d2026c73291087a46b refactor: move different cognitive services into sub packages (#1746)
  • 37f2e90dd948419a7b31ec1bcbafb079f7fdb9ea chore: fix clean acr (#1751)
  • c6cc0a88a14fbbce6296200d1d0b258cd4220c5f fix: Add docs for passThroughArgs (#1749)
  • b6ef511ab10ca8101d7637cbd912b0efdabfb9fa docs: Pinning binder to latest released version
  • 2a89e13870b136ffcfa5262956878bcfe8c6e578 feat: Delete CNTKand related utils (#1743)
  • 98add7a4b2c260660d4b92dabd9d5a8ee4bd67d4 chore: fix conda env creation (#1748)
  • 558f5d887302cb9b2af48d3a373002bba292eeb9 chore: bump spark to 3.2.3 (#1744)
  • 70843d5c13d995742c277ae13b015b85c348e0a8 chore: bump docusaurus (#1740)
  • 2d06b94683282770f126176e2a7ab9be1dd5db80 build: bump loader-utils from 2.0.3 to 2.0.4 in /website (#1719)
  • aa69541cd290301eedcd7f0e386498d22ea99b0d docs: removing beta tag from R

This list of changes was auto generated.

相关地址:原始地址 下载(tar) 下载(zip)

查看:2023-03-05发行的版本