0.123.0
版本发布时间: 2024-03-05 22:18:55
jqnatividad/qsv最新发布版本:0.134.0(2024-09-10 20:11:27)
OPEN DATA DAY 2024 Release! 🎉🎉🎉
In celebration of Open Data Day, we're releasing qsv 0.123.0 - the biggest release ever with 330+ commits! qsv 0.123.0 continues to focus on performance, stability and reliability as we continue setting the stage for qsv's big brother - qsv pro.
We've been baking qsv pro for a while now, and it's almost ready for release. qsv pro is a cross-platform Desktop Data Wrangling tool marrying an Excel-like UI with the power of qsv, backed by cloud-based data cleaning, enrichment and enhancement service that's easy to use for casual Excel users and Data Publishers, yet powerful enough for data scientists and data engineers.
Stay tuned!
Highlights:
-
sqlp
now has automaticread_csv()
fast path optimization, often making optimized queries run dramatically faster - e.g what took 6.09 seconds for a non-trivial SQL aggregation on an 18 column, 657mb CSV with 7.43 million rows now takes just 0.14 seconds with the optimization - 🚀 43.5x FASTER 🚀 ! [^1] [^1]: measurements taken on an Apple Mac Mini 2023 model with an M2 Pro chip with 12 CPU cores & 32GB of RAM, running macOS Sonoma 14.4
# with fast path optimization turned off
/usr/bin/time qsv sqlp taxi.csv --no-optimizations "select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID"
VendorID,total_amount
1,52377417.52985942
2,89959869.13054822
4,600584.610000027
(3, 2)
6.09 real 6.82 user 0.16 sys
# with fast path optimization, fully exploiting Polars' multithreaded, mem-mapped CSV reader!
/usr/bin/time qsv sqlp taxi.csv "select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID"
VendorID,total_amount
1,52377417.52985942
2,89959869.13054822
4,600584.610000027
(3, 2)
0.14 real 1.09 user 0.09 sys
# in contrast, csvq takes 72.46 seconds - 517.57x slower
/usr/bin/time csvq "select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID"
+----------+---------------------+
| VendorID | SUM(total_amount) |
+----------+---------------------+
| 1 | 52377417.529256366 |
| 2 | 89959869.1264675 |
| 4 | 600584.6099999828 |
+----------+---------------------+
72.46 real 65.15 user 75.17 sys
"Traditional" SQL engines
qsv and csvq both operate on "bare" CSVs. For comparison, let's contrast qsv's performance against "traditional" SQL engines that require setup and import (aka ETL). Not counting setup and import time (which alone, takes several minutes), we get:
sqlite3.43.2 takes 2.910 seconds - 20.79x slower
sqlite> .timer on
sqlite> select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID;
1,52377417.53
2,89959869.13
4,600584.61
Run Time: real 2.910 user 2.569494 sys 0.272972
PostgreSQL 15.6 using PgAdmin 4 v6.12 takes 18.527 seconds - 132.34x slower
even with an index, qsv sqlp is still 5.96x faster
-
sqlp
now supports JSONL output format and adds compression support for Avro and Arrow output formats. -
fetch
now has a--disk-cache
option, so you can cache web service responses to disk, complete with cache control and expiry handling! -
jsonl
is now multithreaded with additional--batch
and--job
options. -
split
now has three modes: split by record count, split by number of chunks and split by file size. -
datefmt
is a new top-level command for date formatting. We extracted it fromapply
to make it easier to use, and to set the stage for expanded date and timezone handling. -
enum
now has a--start
option. -
excel
now has a--keep-zero-time
option and now has improved datetime/duration parsing/handling with upgrade of calamine from 0.23 to 0.24. -
tojsonl
now has--trim
and--no-boolean
options and eliminated false positive boolean inferences.
Added
-
apply
: addgender_guess
operation https://github.com/jqnatividad/qsv/pull/1569 -
datefmt
: new top-level command for date formatting. https://github.com/jqnatividad/qsv/pull/1638 -
enum
: add--start
option https://github.com/jqnatividad/qsv/pull/1631 -
excel
: added--keep-zero-time
option; improved datetime/duration parsing/handling with upgrade of calamine from 0.23 to 0.24 https://github.com/jqnatividad/qsv/pull/1595 -
fetch
: add--disk-cache
option https://github.com/jqnatividad/qsv/pull/1621 -
jsonl
: major performance refactor! Now multithreaded with addl--batch
and--job
options https://github.com/jqnatividad/qsv/pull/1553 -
sniff
: added addl mimetype/file formats detected by bumpingfile-format
from 0.23 to 0.24 https://github.com/jqnatividad/qsv/pull/1589 -
split
: add<outdir>
error handling and add usage text examples https://github.com/jqnatividad/qsv/pull/1585 -
split
: added--chunks
option https://github.com/jqnatividad/qsv/pull/1587 -
split
: add--kb-size
option https://github.com/jqnatividad/qsv/pull/1613 -
sqlp
: added JSONL output format and compression support for AVRO and Arrow output formats in https://github.com/jqnatividad/qsv/pull/1635 -
tojsonl
: add--trim
option https://github.com/jqnatividad/qsv/pull/1554 - Add QSV_DOTENV_PATH env var https://github.com/jqnatividad/qsv/pull/1562
- Add license scan report and status by @fossabot in https://github.com/jqnatividad/qsv/pull/1550
- Added several benchmarks for new/changed commands
Changed
-
luau
: bumped Luau from 0.606 to 0.614 -
freq
: major performance refactor - https://github.com/jqnatividad/qsv/commit/1a3a4b4f54f7459ce120c2bc907385ad72d34d8e -
split
: migrate to rayon from threadpool https://github.com/jqnatividad/qsv/pull/1555 -
split
: refactored to actually create chunks <= desired--kb-size
, obviating need for hacky--sep-factor
option https://github.com/jqnatividad/qsv/pull/1615 -
tojsonl
: improved true/false boolean inferencing false positive handling https://github.com/jqnatividad/qsv/pull/1641 -
tojsonl
: fine-tune boolean inferencing https://github.com/jqnatividad/qsv/pull/1643 -
schema
: use parallel sort when sorting enums for fields https://github.com/jqnatividad/qsv/commit/523c60a36bf45b4df5e66f3951a91948c22d5261 - Use array for rustflags to avoid conflicts with user flags by @clarfonthey in https://github.com/jqnatividad/qsv/pull/1548
- Make it easier and more consistent to package for distros by @alerque in https://github.com/jqnatividad/qsv/pull/1549
- Replace
simple_home_dir
withsimple_expand_tilde
crate https://github.com/jqnatividad/qsv/pull/1578 - build(deps): bump rayon from 1.8.0 to 1.8.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1547
- build(deps): bump rayon from 1.8.1 to 1.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1623
- build(deps): bump uuid from 1.6.1 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1551
- build(deps): bump jql-runner from 7.1.2 to 7.1.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1552
- build(deps): bump jql-runner from 7.1.3 to 7.1.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1602
- build(deps): bump jql-runner from 7.1.5 to 7.1.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1637
- build(deps): bump flexi_logger from 0.27.3 to 0.27.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1556
- build(deps): bump regex from 1.10.2 to 1.10.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1557
- build(deps): bump cached from 0.47.0 to 0.48.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1558
- build(deps): bump cached from 0.48.0 to 0.48.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1560
- build(deps): bump cached from 0.48.1 to 0.49.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1618
- build(deps): bump chrono from 0.4.31 to 0.4.32 by @dependabot in https://github.com/jqnatividad/qsv/pull/1559
- build(deps): bump chrono from 0.4.32 to 0.4.33 by @dependabot in https://github.com/jqnatividad/qsv/pull/1566
- build(deps): bump mlua from 0.9.4 to 0.9.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1565
- build(deps): bump mlua from 0.9.5 to 0.9.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1632
- build(deps): bump serde from 1.0.195 to 1.0.196 by @dependabot in https://github.com/jqnatividad/qsv/pull/1568
- build(deps): bump serde from 1.0.196 to 1.0.197 by @dependabot in https://github.com/jqnatividad/qsv/pull/1612
- build(deps): bump serde_json from 1.0.111 to 1.0.112 by @dependabot in https://github.com/jqnatividad/qsv/pull/1567
- build(deps): bump serde_json from 1.0.112 to 1.0.113 by @dependabot in https://github.com/jqnatividad/qsv/pull/1576
- build(deps): bump serde_json from 1.0.113 to 1.0.114 by @dependabot in https://github.com/jqnatividad/qsv/pull/1610
- bump Polars from 0.36 to 0.37 https://github.com/jqnatividad/qsv/pull/1570
- build(deps): bump polars from 0.37.0 to 0.38.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1629
- build(deps): bump polars from 0.38.0 to 0.38.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1634
- build(deps): bump strum from 0.25.0 to 0.26.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1572
- build(deps): bump indexmap from 2.1.0 to 2.2.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1575
- build(deps): bump indexmap from 2.2.1 to 2.2.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1579
- build(deps): bump indexmap from 2.2.2 to 2.2.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1601
- build(deps): bump indexmap from 2.2.4 to 2.2.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1633
- build(deps): bump robinraju/release-downloader from 1.8 to 1.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1574
- build(deps): bump itertools from 0.12.0 to 0.12.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1577
- build(deps): bump rust_decimal from 1.33.1 to 1.34.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1580
- build(deps): bump rust_decimal from 1.34.0 to 1.34.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1582
- build(deps): bump rust_decimal from 1.34.2 to 1.34.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1597
- build(deps): bump reqwest from 0.11.23 to 0.11.24 by @dependabot in https://github.com/jqnatividad/qsv/pull/1581
- build(deps): bump tokio from 1.35.1 to 1.36.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1583
- build(deps): bump tempfile from 3.9.0 to 3.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1590
- build(deps): bump tempfile from 3.10.0 to 3.10.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1622
- build(deps): bump indicatif from 0.17.7 to 0.17.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1598
- build(deps): bump csvs_convert from 0.8.8 to 0.8.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1596
- build(deps): bump ahash from 0.8.7 to 0.8.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1599
- build(deps): bump ahash from 0.8.8 to 0.8.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1611
- build(deps): bump ahash from 0.8.9 to 0.8.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1624
- build(deps): bump ahash from 0.8.10 to 0.8.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1640
- build(deps): bump governor from 0.6.0 to 0.6.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1603
- build(deps): bump semver from 1.0.21 to 1.0.22 by @dependabot in https://github.com/jqnatividad/qsv/pull/1606
- build(deps): bump ryu from 1.0.16 to 1.0.17 by @dependabot in https://github.com/jqnatividad/qsv/pull/1605
- build(deps): bump anyhow from 1.0.79 to 1.0.80 by @dependabot in https://github.com/jqnatividad/qsv/pull/1604
- build(deps): bump geosuggest-core from 0.6.0 to 0.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1607
- build(deps): bump geosuggest-utils from 0.6.0 to 0.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1608
- build(deps): bump pyo3 from 0.20.2 to 0.20.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1616
- build(deps): bump crossbeam-channel from 0.5.11 to 0.5.12 by @dependabot in https://github.com/jqnatividad/qsv/pull/1627
- build(deps): bump log from 0.4.20 to 0.4.21 by @dependabot in https://github.com/jqnatividad/qsv/pull/1628
- build(deps): bump sysinfo from 0.30.5 to 0.30.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1636
- build(deps): bump qsv-sniffer from 0.10.1 to 0.10.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1644
- deps: bump halfbrown from 0.24 to 0.25 https://github.com/jqnatividad/qsv/commit/b32fc7161715fc0d3cc96b1566f89354bea36abf
- apply select clippy suggestions
- update several indirect dependencies
- pin Rust nightly to 2024-02-23 - the nightly that Polars 0.38 can be built with
Fixed
- fix: fix feature = "cargo-clippy" deprecation by @rex4539 in https://github.com/jqnatividad/qsv/pull/1626
-
stats
: fixed cache.json file not being updated properly https://github.com/jqnatividad/qsv/commit/b9c43713b0943baf2d70eb7089e1d8f05b848b9d
Removed
- Removed
datefmt
subcommand fromapply
https://github.com/jqnatividad/qsv/pull/1638
New Contributors
- @clarfonthey made their first contribution in https://github.com/jqnatividad/qsv/pull/1548
- @alerque made their first contribution in https://github.com/jqnatividad/qsv/pull/1549
- @fossabot made their first contribution in https://github.com/jqnatividad/qsv/pull/1550
- @rex4539 made their first contribution in https://github.com/jqnatividad/qsv/pull/1626
Full Changelog: https://github.com/jqnatividad/qsv/compare/0.122.0...0.123.0
1、 qsv-0.123.0-aarch64-apple-darwin.zip 100.5MB
2、 qsv-0.123.0-aarch64-unknown-linux-gnu.zip 13.63MB
3、 qsv-0.123.0-geocode-index.bincode 13.58MB
4、 qsv-0.123.0-geocode-index.bincode.cities15000 13.58MB
5、 qsv-0.123.0-geocode-index.bincode.cities15000.sz 2.35MB
6、 qsv-0.123.0-i686-pc-windows-msvc.zip 13.27MB
7、 qsv-0.123.0-i686-unknown-linux-gnu.zip 14.23MB
8、 qsv-0.123.0-x86_64-apple-darwin.zip 110.99MB
9、 qsv-0.123.0-x86_64-pc-windows-gnu.zip 30.66MB
10、 qsv-0.123.0-x86_64-pc-windows-msvc.zip 115MB
11、 qsv-0.123.0-x86_64-unknown-linux-gnu.zip 141.31MB
12、 qsv-0.123.0-x86_64-unknown-linux-musl.zip 42.18MB
13、 qsv-0.123.0.msi 31.6MB