v0.3.5
版本发布时间: 2023-07-06 05:50:07
NaiboWang/EasySpider最新发布版本:v0.6.2(2024-04-22 06:37:17)
如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址。
Windows x64版本支持64位的Windows 10及以上系统,Windows x86版本支持所有位数(32位和64位)的Windows 7及以上系统,即64位的Windows 7也要下载此版本。注意此版本的Chrome浏览器永远都是109,不会随着Chrome更新而更新(为了兼容Win 7系统),因此如果想用最新版Chrome浏览器采集数据,请在Windows 10 x64及以上系统上运行x64版本的软件。
The Windows x64 version supports Windows 10 and above with 64-bit, while the x86 version of Windows supports all versions (32-bit and 64-bit) of Windows 7 and above, meaning that the 64-bit version of Windows 7 should also download this version. Note that the Chrome browser in this version is always version 109 and will not update with Chrome updates (to maintain compatibility with the Win 7 system). Therefore, if you want to collect data with the latest version of the Chrome browser, please run the x64 version of the software on Windows 10 x64 and above systems.
MacOS版支持所有芯片组,包括Intel和M1,M2等处理器,但操作系统最低版本要求为11.1,更低操作系统版本请下载v0.2.0版本的Mac版使用,或自行下载代码并编译,示例编译方式看这个Issue。
The MacOS version supports all chipsets, including Intel, M1, M2, and other processors. However, the minimum operating system requirement is 11.1. For lower operating system versions, please download the code and compile it yourself. An example compilation method can be found in this issue.
同理,Linux版只适用于Ubuntu 20.04及以上版本、Deepin、Debian及其衍生版本,如想使用其他Linux发行版采集数据,请自行下载代码并编译。
Similarly, the Linux version is only compatible with Ubuntu 20.04 and above, Deepin, Debian, and their derivatives. If you want to use other Linux distributions for data collection, please download the code and compile it yourself.
更新说明
-
提速:极大的提升了大部分场景的采集速度。
-
所有写JavaScript/系统命令代码语句的地方以及打开网页的链接池,都可以用
Field["参数名"]
表示最近提取到的页面参数值,即实现了全面的变量
功能。 -
循环中可以在任意位置使用
自定义操作
的退出循环
选项直接退出循环,即添加了Break
功能。 -
可以提取在
<iframe>
标签内的数据。 -
增加暂停执行任务功能,可长按键盘
p
键暂停和继续执行任务。 -
执行阶段也可以使用
XPath Helper
来调试XPath,配合上面的暂停功能使用。 -
可导出为
Excel/TXT
文件,可写入MySQL
数据库,可指定数据类型为整数/小数/日期
等,点此查看MySQL写入教程。 -
调用任务时的输入参数值可以通过读取Excel文件替换。
-
浏览器操作台可通过左上角拖动改变操作台大小。
-
提取数据的字段可设置为不保存(适用于只想将此字段作为变量输入的情况)。
-
输入文字操作后可用
<enter>
或<ENTER>
表示硬回车,即输入完成后在当前文本框按回车。 -
可以模拟手机端浏览器运行。
-
针对被cloudflare的变态网站,可以使用undetected_selenium即uc库进行处理,点此查看视频教程。
-
新增默认索引位置使用last()从后往前数的XPath提示。
-
操作后等待时长可设置为设定时间的50%-150%的随机等待。
-
软件包内自带python源代码以供专业人士修改任务流程和调试。
-
打开网页
的高级操作支持获取当前页面Cookie,并可修改Cookie。 -
更改点击元素方式,真正模拟现实世界鼠标点击操作。
-
通用参数设置:每采集多少条本地写入一次,默认为10;控制栏预览数据长度,默认为15等。
-
压缩任务文件大小。
-
保存名称和位置更改。
-
流程图自动更新和保存,无需点击
确定
按钮。 -
源代码优化,使二次开发更容易。
-
Bug修复:如执行系统命令如果失败会打印错误信息,修复了MacOS和Linux下系统命令执行失败的Bug;URL格式判断,累计增长的字段名索引值不正确等Bug。
Update Instruction
- Speed up: Greatly improved the collection speed in most scenarios.
-
Variable Functionality: All places where JavaScript/system command code statements are written, as well as the link pool for opening web pages, can use
Field["parameter name"]
to represent the most recently extracted page parameter value, thus achieving a comprehensivevariable
function. -
Loop Control: During a loop, you can use the
exit loop
option ofcustom operation
at any position to directly exit the loop, that is, theBreak
function has been added. -
Data Extraction: Data within
<iframe>
tags can be extracted. -
Task Control: Added pause execution task feature, you can press and hold the
p
key on the keyboard to pause and continue execution. -
XPath Debugging: You can also use
XPath Helper
to debug XPath during the execution stage, which can be used in conjunction with the pause feature above. -
Data Export and Writing: Can be exported to
Excel/TXT
files, can be written toMySQL
databases, can specify data types asinteger/decimal/date
, etc., click here to view MySQL writing tutorial. - Parameter Handling: The input parameter values when calling tasks can be replaced by reading Excel files.
- Interface Adjustment: The browser operation console can be resized by dragging the top left corner.
- Data Handling: Fields for extracting data can be set to not be saved (suitable for cases where you only want to use this field as a variable input).
-
Text Input: After entering text operation,
<enter>
or<ENTER>
can be used to represent a hard return, that is, press enter in the current text box after entering. - Device Simulation: Can simulate mobile browser running.
- Cloudflare Handling: For websites blocked by Cloudflare, you can use the undetected_selenium or uc library to handle it, click here to view the video tutorial.
- XPath Indexing: Added a hint for using last() from the back as the default index position in XPath.
- Wait Time Control: The waiting time after the operation can be set to 50%-150% of the set time for random waiting.
- Source Code Included: The software package comes with Python source code for professionals to modify the task process and debugging.
-
Cookie Handling: The advanced operations of
open webpage
support getting the current page Cookie and can modify Cookie. - Click Simulation: Change the way to click elements, truly simulating real-world mouse click operations.
- General Parameter Settings: General parameter settings: how many times to write locally for each collection, the default is 10; control bar preview data length, the default is 15, etc.
- File Compression: Compressed task file size.
- Name and Location Changes: Changes to save name and location.
-
Flowchart Updates: Automatic update and saving of flowchart, no need to click the
OK
button. - Source Code Optimization: Source code optimization, making secondary development easier.
- Bug Fixes: Bug fixes: such as printing error information if the execution of system commands fails, fixing the bug of system command execution failure under MacOS and Linux; URL format judgment and other bugs.
1、 EasySpider_0.3.5_Linux_x64.tar.xz 239.8MB
2、 EasySpider_0.3.5_windows_x64.7z 240.91MB
3、 EasySpider_0.3.5_windows_x86.7z 199.42MB