自动翻译器的python部分
一、设计思路
1.qt提取剪贴板/鼠标选中内容作为翻译内容
2.使用python向百度翻译提交翻译内容,然后取回翻译结果
3.使用qt显示翻译结果
二、实现步骤
我们已经安装了Jupyter作为开发环境
- 先分析百度翻译提交接口
有三种接口方式可以使用
- 地址栏
https://fanyi.baidu.com/translate?aldtype=16047&query=&keyfrom=baidu&smartresult=dict&lang=auto2zh#en/zh/world
world就是要查询的单词 -
https://fanyi.baidu.com/sug
这个是百度自动识别的单词下拉项
-
https://fanyi.baidu.com/v2transapi
这个是真正百度翻译的接口
这三种接口各有优缺点:
- 第一种url简单,使用方便,缺点就是返回的是一整个网页,需要从网页中提取翻译的内容,但是该网页是动态渲染的,里面并没有我们需要的信息
- 第二种并不是一个真实的翻译,只是百度检索出类似的情况,不一定是需要的,而且如果是一句话的翻译,这个是空的
- 第三种是真正的翻译,但是需要提交详细数据,下面就是要提交的数据,其他数据还好,这个sign比较麻烦,他是js动态生成的,是加密的,我们无法模拟
from: en
to: zh
query: world
transtype: realtime
simple_means_flag: 3
sign: 335290.130699
token: fcd815f24ac02a1ddc7c485f38c8efe8
综合考虑,这三种我们都要放弃。
针对动态渲染的网页,Python提供了许多模拟浏览器运行的库,比如Selenium
- 使用Selenium
首先命令行安装selenium
pip install selenium -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
- 导入模块
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
- 封装类
class Translation:
def __init__(self):
self.options = webdriver.ChromeOptions()
self.options.add_argument('headless') # 后台运行
# self.options.add_experimental_option("excludeSwitches", ["ignore-certificate-errors"])
# 禁止图片的加载
self.prefs = {"profile.managed_default_content_settings.images":2}
self.options.add_experimental_option("prefs",self.prefs)
self.browser = webdriver.Chrome(executable_path='chromedriver.exe', options=self.options)#, desired_capabilities=self.desired_capabilities)
self.load = False
def __del__(self):
self.browser.close()
def translate(self, words):
try:
if not self.load :
self.load = True
self.browser.get('https://fanyi.baidu.com/translate#en/zh/')
self.input= self.browser.find_element_by_id('baidu_translate_input')
self.input.clear()
self.input.send_keys(words)
self.button = self.browser.find_element_by_id('translate-button')
self.button.click()
time.sleep(1)
self.out = self.browser.find_element_by_class_name('output-bd')
print (self.out.text)
return self.out.text
finally:
# print ("translate [ {0} ] error.".format(words))
return ''
- 测试类
if __name__ == '__main__':
words = [
'0',
'Taylor was nominated for a Primetime Emmy Award last year for portraying Minnie in the latest Mickey Mouse TV show rendition ',
'"I really want whoever comes after us to be aware of the history and the tradition, and to love the characters as much as we do," Taylor said about herself and Allwine, according to Disney.',
'"Minnie Mouse lost her voice with the passing of Russi Taylor," Bob Iger, Disney Chairman and CEO, wrote on Twitter.'
]
print ('---start-')
translation = Translation()
for w in words:
print ('----', w)
result = translation.translate(w)
print (result)
del translation
- 测试结果
---start-
---- 0
---- Taylor was nominated for a Primetime Emmy Award last year for portraying Minnie in the latest Mickey Mouse TV show rendition
泰勒去年因在最新的米奇老鼠电视节目“表演”中饰演米妮而获得艾美奖的提名。
---- "I really want whoever comes after us to be aware of the history and the tradition, and to love the characters as much as we do," Taylor said about herself and Allwine, according to Disney.
“我真的希望任何一个追随我们的人都能意识到历史和传统,并且像我们一样热爱这些角色,”根据迪士尼的说法,泰勒在谈到自己和奥尔温时说。
---- "Minnie Mouse lost her voice with the passing of Russi Taylor," Bob Iger, Disney Chairman and CEO, wrote on Twitter.
迪斯尼董事长兼首席执行官鲍勃•伊格尔在Twitter上写道:“米妮•老鼠在路西•泰勒去世后失去了声音。”
之所以第一个数据是0,是因为未知原因第一个翻译时候,网页会刷新,导致得不到翻译结果,所以需要屏蔽。