论文翻译助手,python3调用剪贴板和谷歌翻译

英语烂,看论文都费劲,谷歌翻译和欧陆词典是我的好朋友。
从pdf里复制段落到谷歌翻译是最常用的操作了。
但是删换行什么的太讨厌。
python写个小工具。

功能:从windows剪切板中读取复制的文字,格式处理,调用谷歌翻译api,返回结果。
环境:win10,Python 3.5.2 |Anaconda 4.2.0 (64-bit)

主程序 Clipboard.py,从这里运行。包括读写剪贴板,格式化处理。
一是注意剪贴板使用中的异常处理,剪贴板打开了必须要关闭,cb.CloseClipboard(),否则会影响复制粘贴使用(如果发现复制粘贴失效了,关闭python即可)
二是注意编码问题。python3里str都是unicode编码,从剪贴板读的时候,格式控制要选win32con.CF_UNICODETEXT,不要用win32con.CF_TEXT。那个出来时bytes类型,转str的时候还会有好多毛病。

# -*- coding: utf-8 -*-
"""
Created on Fri Oct 19 10:48:45 2018

@author: BigFly
"""
import win32clipboard as cb
import win32con
from translate import google_translate

def gettext():
    cb.OpenClipboard()
    try:
        t = cb.GetClipboardData( win32con.CF_UNICODETEXT)
    except TypeError:
        print("There are NO TEXT in clipboard.")
    else :
        return t
    finally:
        cb.CloseClipboard()

def settext(aString):
    cb.OpenClipboard()
    try:
        cb.EmptyClipboard()
        cb.SetClipboardData( win32con.CF_UNICODETEXT, aString)
    except:
        print("Any error in func:settext()")
    cb.CloseClipboard()
    
#删()引用
def deletBracket(source,flags,pad_sym=chr(0)):
    code={"(":1, ")":-1}
    index = [i for i in range(len(source)) if source[i]=="(" or source[i]==")"]
    match,start=0,-1
    for i in index:
        match+= code[ source[i] ]
        if start<0 and match==1:
            start = i
        if match==0:
            concent=source[start: i+1]
            check=sum([concent.find(flag) for flag in flags])+len(flags)
            if check > 0:
                source=source.replace(concent,pad_sym*len(concent),1)
            start=-1
    return source.replace(pad_sym,"")
    
source= gettext()
if source:
    source= source.replace(chr(0),"")
    # huanhang
    source=source.replace("\r","")
    source=source.replace("\n"," ")
    # fenju
    pad_sym=chr(0)
    source=source.replace("e.g. ","e.g."+pad_sym)
    source=source.replace("i.e. ","i.e."+pad_sym)
    source=source.replace("Eq. ","Eq."+pad_sym)
    source=source.replace("Mr. ","Mr."+pad_sym)
    
    source=source.replace(". ",". \r\n")
    source=source.replace(pad_sym," ")
    # qu()
    source=deletBracket(source,["et al.", ", 201", ", 200", ", 199"],pad_sym)
    source=source.replace("  "," ")
    
    settext(source)
    print(source)
    print("[ %d ]"%(len(source)))
    print(google_translate(source))

'''

Our architectures
will have only one representation at one resolution besides
the pooling layers and the convolutional layers that initialize
the needed numbers of channels. Take the architecture in
Table 1 as an example. There are two processes for each
resolution. The first one is the transition process, which
computes the initial features with the dimensions of the next
resolution, then down samples it to 1=4 using a 2×2 average
pooling. A convolutional operation is needed here because
F is assumed to have the same input and output sizes. The
next process is using GUNN to update this feature space
gradually. Each channel will only be updated once, and all
channels will be updated after this process. Unlike most of
the previous networks, after this two processes, the feature
transformations at this resolution are complete. There will
be no more convolutional layers or blocks following this feature representation, i.e., one resolution, one representation.
Then, the network will compute the initial features for the
next resolution, or compute the final vector representation of
the entire image by a global average pooling. By designing
networks in this way, SUNN networks usually have about
20 layers before converting to GUNN-based networks.
'''

调用谷歌翻译的程序,网上找的现成代码稍改了一下
原文:https://blog.csdn.net/yingshukun/article/details/53470424

translate.py
改了返回数据的处理:
result返回的是个长度为9的list,result[0]是翻译结果,后边有备选翻译等其他东西,用不着。
result[0]也是个列表,长度为行数or句子数+1,最后一个是翻译结果的拼音
把result[:-1]中的翻译结果拼接起来就是我们要的了。
该文件可直接运行,测试翻译。

# -*- coding: utf-8 -*-
"""
Created on Tue Oct 23 18:58:26 2018

@author: BigFly
"""

import requests  
from HandleJs import Py4Js    

js=Py4Js()

def google_translate(content):   
    if len(content) > 4891:    
        print("翻译的长度超过限制!!!")    
        return  
    tk = js.getTk(content)
    param = {'tk': tk, 'q': content}
    result = requests.get("""http://translate.google.cn/translate_a/single?client=t&sl=en
        &tl=zh-CN&hl=zh-CN&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss
        &dt=t&ie=UTF-8&oe=UTF-8&clearbtn=1&otf=1&pc=1&srcrom=0&ssel=0&tsel=0&kc=2""", params=param).json()[0]
    #返回的结果为Json,解析为一个嵌套列表
    return "".join([text[0] for text in result[:-1]])

if __name__ == "__main__":    
    content = """An old woman had a cat. 
The cat was very old; she could not run quickly, and she could not bite, because she was so old. 
One day the old cat saw a mouse; she jumped and caught the mouse. 
But she could not bite it; so the mouse got out of her mouth and ran away, because the cat could not bite it.
Then the old woman became very angry because the cat had not killed the mouse. 
She began to hit the cat. The cat said, "Do not hit your old servant. 
I have worked for you for many years, and I would work for you still, but I am too old. 
Do not be unkind to the old, but remember what good work the old did when they were young."""
    print(google_translate(content))

HandleJs.py
这段是用js生成tk码的,tk码由提交的要翻译的内容生成,相当于是个校验吧,不了解。
注意安装execjs模块时,名字是 PyExecJS。 pip install PyExecJS

# -*- coding: utf-8 -*-
"""
Created on Tue Oct 23 18:57:54 2018

@author: BigFly
"""
import execjs
 
class Py4Js():
    def __init__(self):
        self.ctx = execjs.compile("""
        function TL(a) {
        var k = "";
        var b = 406644;
        var b1 = 3293161072;
        
        var jd = ".";
        var $b = "+-a^+6";
        var Zb = "+-3^+b+-f";
    
        for (var e = [], f = 0, g = 0; g < a.length; g++) {
            var m = a.charCodeAt(g);
            128 > m ? e[f++] = m : (2048 > m ? e[f++] = m >> 6 | 192 : (55296 == (m & 64512) && g + 1 < a.length && 56320 == (a.charCodeAt(g + 1) & 64512) ? (m = 65536 + ((m & 1023) << 10) + (a.charCodeAt(++g) & 1023),
            e[f++] = m >> 18 | 240,
            e[f++] = m >> 12 & 63 | 128) : e[f++] = m >> 12 | 224,
            e[f++] = m >> 6 & 63 | 128),
            e[f++] = m & 63 | 128)
        }
        a = b;
        for (f = 0; f < e.length; f++) a += e[f],
        a = RL(a, $b);
        a = RL(a, Zb);
        a ^= b1 || 0;
        0 > a && (a = (a & 2147483647) + 2147483648);
        a %= 1E6;
        return a.toString() + jd + (a ^ b)
    };
    function RL(a, b) {
        var t = "a";
        var Yb = "+";
        for (var c = 0; c < b.length - 2; c += 3) {
            var d = b.charAt(c + 2),
            d = d >= t ? d.charCodeAt(0) - 87 : Number(d),
            d = b.charAt(c + 1) == Yb ? a >>> d: a << d;
            a = b.charAt(c) == Yb ? a + d & 4294967295 : a ^ d
        }
        return a
    }
    """)
        
    def getTk(self,text):
        return self.ctx.call("TL",text)
    

程序演示:

pdf里选中,复制

运行下clipboard.py,中英文结果都出来了。按句换行,括号引用都去掉了,清爽。

格式处理后的英文还放到了剪贴板里,可以在别处直接粘贴(这是为了方便做ppt用的):

Deep neural networks have become the state-of-the-art systems for image recognition as well as other vision tasks .
The architectures keep going deeper, e.g., from five convolutional layers to 1001 layers .
The benefit of deep architectures is their strong learning capacities because each new layer can potentially introduce more non-linearities and typically uses larger receptive fields .
In addition, adding certain types of layers will not harm the performance theoretically since they can just learn identity mapping.
This makes stacking up layers more appealing in the network designs.

嗯,,还是得好好学英语,不要依赖这个。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,539评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,911评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,337评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,723评论 1 290
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,795评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,762评论 1 294
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,742评论 3 416
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,508评论 0 271
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,954评论 1 308
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,247评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,404评论 1 345
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,104评论 5 340
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,736评论 3 324
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,352评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,557评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,371评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,292评论 2 352

推荐阅读更多精彩内容