MyGit

v0.3.1

NaiboWang/EasySpider

版本发布时间: 2023-05-24 03:48:29

NaiboWang/EasySpider最新发布版本:v0.6.2(2024-04-22 06:37:17)

如果下载速度慢,可以考虑中国境内下载地址:中国境内下载地址

强烈建议大家观看新特性讲解视频

B站最新版特性视频已上传,新视频非常有用,推荐大家观看。

【重要】自定义条件判断之使用循环项内的JS命令返回值 - 第二弹

如何执行自己写的JS代码和系统代码 (自定义操作)

如何自定义循环和判断条件 - 第一弹

如何对元素和网页截图及(无头模式)命令行执行指南

OCR识别元素内容功能

注意,v0.3.1版本任务task文件夹内.json文件和之前所有版本均不兼容,请重新设计v0.3.1版本任务。

更新说明

  1. 高级操作:

image

  1. 判断条件和循环条件中同样增加了执行自定义脚本,并根据自定义脚本的返回值是否为真来作为条件判断和循环的判断条件,同样极大的增加了任务的可操作性。循环中增加了用代码break的操作设定,自定义操作可以操作循环内元素。 image

  2. 可同时生成多种XPath供用户选择,并预装了XPath Helper扩展供大家调试XPath。

  3. 增加采集元素背景图片地址,当前页面标题,当前页面URL地址功能。

  4. 增加保存元素截图功能,如要截图某元素或整个网页页面,可以用此功能(配合无头模式效果更好)。

  5. 增加下载图片功能。

  6. 增加OCR识别元素功能(使用此功能需首先自行安装Tesseract库:https://blog.csdn.net/u010454030/article/details/80515501

  7. 可直接提取对元素执行JavaScript代码后的返回值,实现如正则表达式,获得元素背景颜色等功能。

  8. 增加切换下拉选项功能,采集下拉选项正在选中的值和文本。

image

  1. 大幅增加使用提示和说明,使软件更易用(如增加了iframe标签的处理方式说明,各个选项的参数意义,以及循环项XPath的修改说明等等)。
  2. 执行命令时增加了如何用命令行执行任务的提示:https://github.com/NaiboWang/EasySpider/wiki/Argument-Instructionimage
  3. 增加无头模式,即无浏览器界面模式配置。
  4. 修复了使用用户配置浏览器模式下的中文路径不能正确识别的问题。
  5. 修复了条件分支没有无条件分支时会卡死的问题。
  6. 修复了保存任务后会输入框卡死的问题。
  7. 打开网页操作和点击元素操作新增设置页面最长加载等待时间。
  8. 增加了鼠标移动到元素功能。
  9. 找不到元素时会提示。
  10. 修复网页滚动Bug。
  11. 任务名称初始化为第一次进入页面的标题值。
  12. 增加版本更新提示。
  13. 应要求增加出品方信息。
  14. 更新chrome版本为113。

Update Instruction

  1. Advanced Operations:
  1. Custom scripts are also supported in the conditions and loop conditions. The return value of the custom script determines the condition for the judgment of conditions and loops, greatly enhancing the flexibility of tasks. The ability to use the break statement within a loop is added, allowing custom operations to manipulate elements within the loop.

  2. Multiple XPath expressions are generated simultaneously for user selection, and the XPath Helper extension is pre-installed for XPath debugging.

  3. Added the functionality to extract the background image URL of elements, current page title, and current page URL.

  4. Added the capability to save screenshots of elements or entire web pages. This feature works best in headless mode.

  5. Added the functionality to download images.

  6. Added OCR recognition of elements. To use this feature, Tesseract library needs to be installed first: https://tesseract-ocr.github.io/tessdoc/Installation.html

  7. Directly extract the return value of executing JavaScript code on elements, allowing for functionalities such as regular expression matching and obtaining the background color of elements.

  8. Added the capability to switch dropdown options and extract the selected value and text of dropdown options.

  9. Significantly improved user guidance and explanations to make the software more user-friendly. This includes instructions on handling iframe tags, explanations of parameter meanings for various options, and explanations on modifying the XPath for loop items, and more.

  10. Added instructions on how to execute tasks from the command line.

  11. Added headless mode configuration, allowing the software to run without a browser interface.

  12. Fixed the issue where Chinese paths couldn't be recognized correctly when using user-configured browser modes.

  13. Fixed the issue where the program would freeze when there was no unconditional branch in the conditional branching.

  14. Fixed the issue where the input box would freeze after saving a task.

  15. Added the option to set the maximum waiting time for page load in the "Open Page" and "Click element" operations.

  16. Added the functionality to move the mouse to an element.

  17. Displays a prompt when an element cannot be found.

  18. Fixed the webpage scrolling bug.

  19. The task name is initialized with the value of the page title upon the first visit.

  20. Added version update prompts.

  21. Added the information of the publisher as requested.

  22. Updated Chrome version to 113.

相关地址:原始地址 下载(tar) 下载(zip)

1、 Download_Link_Address_in_China_Mainland.txt 514B

查看:2023-05-24发行的版本