webspot自动提取测试

测试demo

from webspot.constants.html_request_method import HTML_REQUEST_METHOD_REQUEST
from webspot.detect.detectors.plain_list import run_plain_list_detector

#run_plain_list_detector(<写url>, HTML_REQUEST_METHOD_REQUEST)
t = run_plain_list_detector("https://cuiqingcai.com", HTML_REQUEST_METHOD_REQUEST)
print(t.results)

对于python 3.10 3.11需要做一下修改

--- a/webspot/detect/models/result.py
+++ b/webspot/detect/models/result.py
@@ -6,7 +6,7 @@ from webspot.detect.models.selector import Selector


 class Result(BaseModel):
-    name: Optional[str]
+    name: Optional[str] = ""
     selectors: Optional[Dict[str, Selector]]
     score: Optional[float]
     scores: Optional[Dict[str, Optional[float]]]

--------------------------------------------------------------------------------

--- a/webspot/detect/models/selector.py
+++ b/webspot/detect/models/selector.py
@@ -7,5 +7,5 @@ class Selector(BaseModel):
     name: str
     selector: str
     type: Optional[str]
-    attribute: Optional[str]
-    node_id: Optional[int]
+    attribute: Optional[str] = ""
+    node_id: Optional[int] = 0
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容