v3.0.0
版本发布时间: 2022-07-21 16:10:44
INL/BlackLab最新发布版本:v3.0.1(2022-10-06 19:11:25)
Changed
- Minimum Java version was raised from 8 to 11.
- Based on Lucene 8. Thanks to @zhyongwei for the initial version update. Further changes were made to how DocValues are used, as this API is now sequential instead of random-access.
- Smarter default config values based on number of CPU cores and max. heap memory. A debug message will show that and how the default value for a missing was determined.
- Corpora larger than 2^31 tokens are now supported. The few operations
that don't support this yet will produce a clear error message. This functionality can
be disabled with the
search.enableHugeResultSets
setting (defaulttrue
) that might slightly improve performance. - Warn if an annotation named 'word' or 'lemma' has no explicit
sensitivity
declared. Due to a special case, these will automatically get sensitivitysensitive_insensitive
, but this quirk is deprecated and should not be relied upon. - Clearer error message if no
indexLocations
were found. - BLS now resolves symlinks while scanning indexLocations.
- BLS now allows dots in index names (in addition to underscore and dash).
- DocIndexerXPath now throws an exception if it encounters a non-UTF8 doc.
- FileProcessor should now handle files larges than 4G (although such files may lead to other problems, e.g. excessive memory use).
- When search is interrupted, there should now be a better indicating as to why.
- Stack trace should be included in more error responses if in debug mode.
- 'Unauthorized to view content' error now refers to documentation.
- If a format config contains an error, report the file it occurs in.
- Document that the first annotation declared becomes the main annotation.
- BLS now also looks at
X-Forwarded-For
header to determine debug mode. - BLS now accepts wildcards in the debug mode ip configuration.
- Update Jackson, revert YAML bug workaround.
- Improve how search/count times are reported in BLS.
New
- Added
naf
(NLP Annotation Format) to the builtin formats. -
FrequencyTool
is a commandline tool that allows you to get frequency lists for an entire corpus.
Java API
-
Hits
,HitsInternal(Mutable)
, CapturedGroups and other interfaces refactored to make (im)mutability more explicit. -
Doc
andDocImpl
classes were removed. Now that we useDocValues
everywhere, caching Lucene documents doesn't make sense. - Searches should no longer get stuck queued even if maxConcurrentSearches is set to a low value.
Fixed
- Fix usecontent=orig with outputformat=json
- Fix metadata value frequency reading, which due to a bug with how YAML was handled would all be read back as 0.
- Fix an issue where
HitProperty.contextIndices
would seemingly change during a sort operation. - Prevent NPE if no patt specified with
/hits
request. - Fix
hitsProcessedAtLeast()
method not always blocking. It may not be clear from the name, but this method will wait for the specified amount of hits to be processed, or will returnfalse
if all hits were processed and there were fewer than that amount. - Fix NPE for malformed sort string like
docid,
. - Don't hardcode "word" as the main annotation.
- Fix errors when running tests in parallel.
Removed
- support for previous BlackLab indexes (because Lucene 8 cannot read Lucene 5 indexes); you must reindex your data to use this version. If this is impractical, please keep using v2.3.0 for now. We would like to provide a conversion tool at some point.
- support for obsolete content store and forward index files (cs types "utf8" and "utf8zip", fi version 3; these were all replaced with newer versions six years ago. older indexes will need to be re-indexed)
- Some deprecated settings. A warning will be shown if the setting is still found.
- Deprecated methods from
Indexer
, among others.
1、 blacklab-core-3.0.0.zip 119.44MB
2、 blacklab-server-3.0.0.war 116.25MB