MyGit

0.125.0

jqnatividad/qsv

版本发布时间: 2024-04-01 20:38:32

jqnatividad/qsv最新发布版本:0.134.0(2024-09-10 20:11:27)

In this release, we focused on the 🏎️ need for even more speed 🏎️ .

This was done primarily by tweaking several supporting qsv crates. qsv-docopt now parses command-line arguments slightly faster. qsv-stats, the crate behind commands like stats, schema, tojsonl, and frequency, has been further optimized for speed. qsv-dateparser has been updated to support new timezone handling options in datefmt. qsv-sniffer also got a speed boost.

Per the benchmark suite, stats is 25% faster (1.563 secs vs 2.067 secs) when computing the 13 "streaming" stats and 13% faster when computing --everything (17 columns of addl stats - 3.149 secs vs 3.656 secs) for the 1M row, 41 column, 520mb sample of NYC's 311 data.

The count command has been refactored to utilize Polars' SQLContext, which leverages LazyFrames evaluation to automagically count even very large files in just a few seconds. Previously, count was already using Polars, but it mistakenly fell back to a slower counting mode. Now, it consistently delivers fast performance, even without an index. On the same benchmark suite, it takes 0.052 secs vs 0.503 seconds - almost 10x faster!

As count is not just a top-level command, but also a widely used helper used by several qsv commands, this gives the entire suite a nice performance boost.

Continuing on the performance front, the excel command now has a new short --metadata mode, allowing users to just get a "shorter" version of the metadata report that only list the workbook's top level metadata (sheet index, sheet name, sheet type, visibility) instead of the full metadata report (which also has info like num rows, column metadata, etc.). On the benchmark suite, the short metadata report takes all of 0.005 secs vs 11.237 secs for the 1M row xlsx version of the same NYC 311 data - more than 3 orders of magnitude faster! (it may actually be faster since 0.005 secs is at the limits of what hyperfine can measure)

The datefmt command also got some major enhancements with new timezone handling and timestamp parsing options, though at the cost of a small 15% performance penalty.

Lastly, we are excited to announce that qsv will be featured at the CSV,Conf,V8 conference in Puebla, Mexico on May 28-29. I'll be presenting a talk titled "qsv: A Blazing Fast CSV Data-Wrangling Toolkit". Hope to see you there!.


Added

Changed

Fixed

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.124.1...0.125.0

相关地址:原始地址 下载(tar) 下载(zip)

1、 qsv-0.125.0-aarch64-apple-darwin.zip 119.92MB

2、 qsv-0.125.0-aarch64-unknown-linux-gnu.zip 14.72MB

3、 qsv-0.125.0-geocode-index.bincode 14.12MB

4、 qsv-0.125.0-geocode-index.bincode.cities15000 14.12MB

5、 qsv-0.125.0-geocode-index.bincode.cities15000.sz 5.58MB

6、 qsv-0.125.0-i686-pc-windows-msvc.zip 14.19MB

7、 qsv-0.125.0-i686-unknown-linux-gnu.zip 15.21MB

8、 qsv-0.125.0-x86_64-apple-darwin.zip 133.7MB

9、 qsv-0.125.0-x86_64-pc-windows-gnu.zip 31.45MB

10、 qsv-0.125.0-x86_64-pc-windows-msvc.zip 136.72MB

11、 qsv-0.125.0-x86_64-unknown-linux-gnu.zip 190.27MB

12、 qsv-0.125.0-x86_64-unknown-linux-musl.zip 57.34MB

13、 qsv-0.125.0.msi 32.55MB

查看:2024-04-01发行的版本