Performance

Often minor changes will fix any performance problems in well-designed code. while badly-designed code will require major rewriting.

7.1 A Bottleneck

When solving problems, it's important to ask the right question.

7.2 Timing and Profiling

Automate timing measurement

Most systems have a command to measure how long a program takes. On Unix, the command is called time:

% time slowprogram

real   7.0
user  6.2
sys    0.1
%

This runs the command and reports three numbers, all in seconds: "real" time, the elapsed time for the program to complete; "user" CPU time. time spent executing the user's program; and "system" CPU time, time spent within the operating system on the program's behalf. If your system has a similar command, use it; the numbers will be more informative, reliable, and easier to track than time measured with a stop-watch. And keep good notes.

C and C++ provide a standard routine, clock, that reports how much CPU time the program has consumed so far. It can be called before and after a function to measure CPU usage:

#include<time.h>
#include<stdio.h>
...
clock_t before;
double elapsed;

before=clock();
long_running_function();
elapsed=clock()-before;
printf("function used %.3f seconds\n",elapsed/CLOCKS_PER_SEC);

The scaling term, CLOCKS-PER-SEC, records the resolution of the timer as reported by clock. If the function takes only a small fraction of a second, run it in a loop. but be sure to compensate for loop overhead if that is significant:

before = clock(); 
for (i = 0; i < 1000; i++)
 short_running-function() ;
elapsed = (clock()-before) / (double)i;

In Java, functions in the Date class give wall clock time, which is an approximation to CPU time:

Date before = new Date(); 
long-running_function() ; 
Date after = new Date(); 
long elapsed = after.getTime() - before.getTime();

The return value of getTime is in milliseconds.

Use a profiler

Besides a reliable timing method, the most important tool for performance analysis is a system for generating profiles. A profile is a measurement of where a program spends its time. Some profiles list each function, the number of times it is called, and the fraction of execution time it consumes. Others show counts of how many times each statement was executed. Statements that are executed frequently contribute more to run-time, while statements that are never executed may indicate useless code or code that is not being tested adequately.

Profiling is an effective tool for finding hot spots in a program.The statistics in a profile can be only approximate. (considering sophistication of compilers and the complexity of caching and memory effects,and the fact that profiling a program affects its performance)

The way to use profiling is to identify the critical time-consuming parts of the program, improve them to the degree possible, and then measure again to see if a new hot spot has surfaced. Eventually, often after only one or two iterations. there is no obvious hot spot left.

Profiling is usually enabled with a special compiler flag or option. The program is run, and then an analysis tool shows the results. On Unix, the flag is usually -p and the tool is called prof:

% cc -p spamtest.c -o spamtest 
% spamtest
% prof spamtest
*Concentrate on the hot spots. *

When a single function is so overwhelmingly the bottleneck, there are only two ways to go: improve the function to use a better algorithm, or eliminate the function altogether by rewriting the surrounding program.

Draw a picture.

Pictures are especially good for presenting performance measurements. They can convey information about the effects of parameter changes, compare
algorithms and data structures, and sometimes point to unexpected behavior.

7.3 Strategies for Speed

Use a better algorithm or data structure
Enable compiler optimizations.

One zero-cost change that usually produces a reasonable improvement is to turn on whatever optimization the compiler provides.

programmers must enable the optimizer explicitly once they believe the program has been debugged.

One thing to be aware of is that the more aggressively the compiler optimizes, the more likely it is to introduce bugs into the compiled program. After enabling the optimizer, re-run your regression test suite. as you should for any other modification.

Tune the code

If the right algorithm is in place, if speed is still an issue the next thing to try is tuning the code: adjusting the details of loops and expressions to make things go faster.

Don't optimize what doesn't matter.

7.4 Tuning the Code

Collect common subexpressions.

If an expensive computation appears multiple times. do it in only one place and remember the result.

Replace expensive operations by cheap ones.

In olden times, this used to mean replacing multiplications by additions or shifts. but that rarely buys much now. Division and remainder are much slower than multiplication. however, so there may be improvement if a division can be replaced with multiplication by the inverse, or a remainder by a masking operation if the divisor is a power of two. Replacing array indexing by pointers in C or C++ might speed things up, although most compilers do this automatically.

*Unroll or eliminate loops. *

There is a certain overhead in setting up and running a loop. If the body of the loop isn't too long and doesn't iterate too many times. it can be more efficient to write out each iteration in sequence.

Cache frequently-used values.

Cached values don't have to be recomputed. Caching takes advantage of locality, It's best if the caching operation is invisible from outside, so that it doesn't affect the rest of the program except for making it run faster.

Write a special-purpose allocator.

Often the single hot spot in a program is memory allocation, which manifests itself as lots of calls on malloc or new. When most requests are for blocks of the same size, substantial speedups are possible by replacing calls to the general-purpose allocator by calls to a special-purpose one. The special-purpose allocator makes one call to malloc to fetch a big array of items, then hands them out one at a time as needed, a cheaper operation. Freed items are placed back in a free list so they can be reused quickly.

Some algorithms can use stack-based allocation, where a whole sequence of allocations is done, and then the entire set is freed at once. The allocator obtains one big chunk for itself and treats it as a stack, pushing allocated items on as needed and popping them all off in a single operation at the end. Some C libraries offer a function alloca for this kind of allocation, though it is not standard. It uses the local call stack as the source of memory, and frees all the items when the function that calls alloca returns. Buffer input and output. Buffering batches transaction.

Buffer input and output.

Buffering batches transactions so that frequent operations are done with as little overhead as possible, and the high-overhead operations are done only when necessary. The cost of an operation is thereby spread over multiple data values. The drawback is the need to flush output buffers to make data visible; in the worst case, information still in a buffer will be lost if a program crashes.

Handle special cases separately

By handling same-sized objects in separate code, special-purpose allocators reduce time and space overhead in the general allocator and incidentally reduce fragmentation.

Precompute result

Sometimes it is possible to make a program run faster by precomputing values so they are ready when they are needed.

Use approximate values

If accuracy isn't an issue, use lower-precision data types. On older or smaller machines, or machines that simulate floating point in software, single-precision floating-point arithmetic is often faster than double-precision, so use float instead of double to save time. Some modern graphics processors use a related trick. The IEEE floating-point standard requires **"graceful underflow" **as calculations approach the low end of representable values, but this is expensive to compute. For images, the feature is unnecessary, and it is faster and perfectly acceptable to truncate to zero. This not only saves time when the numbers underflow, it can simplify the hardware for all arithmetic. The use of integer sin and cos routines is another example of using approximate values.

Rewrite in a lower-level language

Lower-level languages tend to be more efficient, although at a cost in programmer time. Thus rewriting some critical part of a C++ or Java program in C or replacing an interpreted script by a program in a compiled language may make it run much faster.

Occasionally, one can get significant speedups with machine-dependent code.This is a last resort, not a step to be taken lightly, because it destroys portability and makes future maintenance and modifications much harder. Almost always, operations to be expressed in assembly language are relatively small functions that should be embedded in a library; memset and memmove, or graphics operations, are typical examples. The approach is to write the code as cleanly as possible in a high-level language and make sure it's correct by testing it as we described for memset in Chapter 6. This is your portable version, which will work everywhere, albeit slowly. When you move to a new environment, you can start with a version that is known to work. Now when you write an assembly-language version. test it exhaustively against the portable one. When bugs occur. non-portable code is always suspect: it's comforting to have a comparison implementation.

7.5 space efficiency

Save space by using the smallestpossible data type.

This might mean replacing int with short if the data will fit; this is a common technique for coordinates in 2-D graphics systems, since 16 bits are likely to handle any expected range of screen coordinates.

The logical extension of this approach is to encode information in a byte or even fewer bits, say a single bit where possible. Don't use C or C++ bitfields; they are highly non-portable and tend to generate voluminous and inefficient code. Instead, encapsulate the operations you want in functions that fetch and set individual bits within words or an array of words with shift and mask operations.

Don't store what you can easily recompute.

Changes like these are minor, however; they are analogous to code tuning. Major improvements are more likely to come from better data structures, perhaps coupled with algorithm changes.

Basic idea:** store the common value or values implicitly or in a compact form, and spend more time and space on the remaining values.** If the most common values are really common, this is a win.

Space efficiency concerns sometimes manifest themselves in the external representation of information as well, both conversion and storage. **In general, it is best to store information as text wherever feasible rather than in some binary representation. **Text is portable, easy to read, and amenable to processing by all kinds of tools; binary representations have none of these advantages. The argument in favor of binary is usually based on "speed," but this should be treated with some skepticism, since the disparity between text and binary forms may not be all that great.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,711评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,079评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,194评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,089评论 1 286
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,197评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,306评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,338评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,119评论 0 269
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,541评论 1 306
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,846评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,014评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,694评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,322评论 3 318
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,026评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,257评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,863评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,895评论 2 351

推荐阅读更多精彩内容