绘制的牛顿分形
引言:
- matlab并行计算
- 复数操作
- 图像操作
0. 发端
最近看李忠先生写的数学入门读物《迭代、浑沌、分形》(感谢小白和利飞送我这本书,不过才开始看),读到关于复迭代的问题(第二章第一节),觉得有必要动手操作一下。书中论述的复函数的迭代序列的收敛情况属于牛顿分形,关于“牛顿分形”,参考文献[1]。
1. 函数my_cayley_para_func(),某区间的牛顿-拉弗森迭代
该函数用来计算给定区间内的的牛顿-拉弗森迭代,其中z为复数。因为该方程有一个实根,两个复根,所以,返回值格式为:
[区间内某点 该点最终趋近的根 迭代次数 根序号]
其中根序号在正常情况下为1,2,3中的某一个。
为了确定迭代次数,我们设定容许误差为err_tolerance=1e-3
,即当本次迭代在上一次迭代的1e-3邻域内时,认为迭代完成。
为了确定某复数是否在另一复数的邻域内,我们还需要计算两复数之间的距离,这里使用范数函数norm(z1, z2)
即可,很符合范数的定义哦。
由于我们已经确知方程的根的分布情况,所以我们直接计算迭代结果与实际根的距离,便可确定根序号。实际上可用本次迭代结果与上次迭代结果的距离进行判断。
% Cayley problem, parallel version
% for more detail, see "Diedai, Hundun,Fenxing" by Li Zhong, page 037
function z_rec = my_caylay_para_func(a, b)
err_tolerance=1e-3;
k=1;
for m=1:length(a)
for n=1:length(b)
z(1)= a(m) + b(n)*i;
% Newton-Raphson method to find roots of z^3-1=0
t = 2;
err = 100;
while err >= err_tolerance
z(t)= z(t-1) - (z(t-1)^3-1)/(3*z(t-1)^2);
%err = sqrt((real(z(t))-real(z(t-1)))^2 + (imag(z(t)) - imag(z(t-1)))^2)
err = norm(z(t-1)-z(t));
t = t+1;
end
err = norm(z(t-1)-1);
if (err<err_tolerance)
rootn=1;
else
err = norm(z(t-1) - exp(i*2*pi/3));
if (err<err_tolerance)
rootn=2;
else
err = norm(z(t-1) - exp(i*4*pi/3));
if (err<err_tolerance)
rootn=3;
else
rootn=4;
end
end
end
if imag(z(1))==0
z(1)=z(1)+1e-7*i;
end
z_rec(k,:) = [z(1) z(t-1) t-1 rootn];
k = k+1;
end
end
2. 串行my_cayley.m
该脚本调用my_cayley_para_func进行运算,并最终绘制牛顿分形,由于plot效率较低,所以注释掉了,我们只关注运算部分即可。
clear;
err_tolerance=1e-3;
k=1;
tic
a = -10:0.05:10;
b = -10:0.05:10;
z_rec = my_cayley_para_func(a,b);
toc
tic
% plot result
% hold on;
% [m, n]=size(z_rec);
% for k=1:m
% p=plot(z_rec(k,1), 'Marker', '.');
% if 256-z_rec(k,3)*10<0
% z_rec(k,3)=0;
% else
% z_rec(k,3)=(256-z_rec(k,3)*10)/256;
% end
% switch(z_rec(k,4));
% case 1
% set(p, 'Color', [z_rec(k,3) 0 0]);
% case 2
% set(p, 'Color', [0 z_rec(k,3) 0]);
% case 3
% set(p, 'Color', [0 0 z_rec(k,3)]);
% end
% end
%
% hold off;
toc
运行结果如下:
当步进为0.1的时候:
>> my_cayley
Elapsed time is 2.339106 seconds.
Elapsed time is 0.000001 seconds.
当步进为0.05的时候,运算规模增加4倍,运算时间增加14.6倍,不止4倍。为何?
>> my_cayley
Elapsed time is 33.681831 seconds.
Elapsed time is 0.000000 seconds.
效果如图:
这是第一版,由于直接把复数点传给plot画,当虚部为0的时候(即实轴上的点),plot会按照画实数的方法来画,故而在x=1处有一条垂直于实轴的红线。
这是修改后的样子,还是用plot画的,根据趋近于根的情况和迭代次数染色。
3. 并行my_cayley_para.m
开两个任务,并行处理。最终结果输出到png文件上,比plot快多了。按理说可以开四个任务,但是不知道为什么跑死了。
% Cayley problem, parallel version
% for more detail, see "Diedai, Hundun,Fenxing" by Li Zhong, page 037
clear;
step = 0.02
upper = 10
lower = -10
tic;
sched = findResource(); % default scheduler
job1 = createParallelJob(sched);
set(job1, 'MaximumNumberOfWorkers',1);
job2 = createParallelJob(sched);
set(job2, 'MaximumNumberOfWorkers',1);
a = lower:step:upper;
b = lower:step:0;
task1 = createTask(job1, @my_cayley_para_func, 1, {a,b});
a = lower:step:upper;
b = 0:step:upper;
task2 = createTask(job2, @my_cayley_para_func, 1, {a,b});
submit(job1);
submit(job2);
wait(job1);
wait(job2);
re1 = get(task1, 'OutputArguments');
re2 = get(task2, 'OutputArguments');
z_rec = [re1{1}; re2{1}];
toc
tic;
% we could of course compose image data in parallel
[m, n]=size(z_rec);
for k=1:m
if 255-z_rec(k,3)*10<0
z_rec(k,3)=0;
else
z_rec(k,3)=255-z_rec(k,3)*10;
end
x = real(z_rec(k,1))*(1/step);
y = imag(z_rec(k,1))*(1/step);
x = round(x+(upper/step))+1;
y = round(y+(upper/step))+1;
switch(z_rec(k,4));
case 1
imgX(x, y, :) = [z_rec(k,3) 0 0];
case 2
imgX(x, y, :) = [0 z_rec(k,3) 0];
case 3
imgX(x, y, :) = [0 0 z_rec(k,3)];
end
end
imgX = uint8(imgX);
imwrite(imgX, 'cayley_para.png', 'png');
toc
destroy(job1);
destroy(job2);
运行结果如下:
当步进为0.1时,比串行版本慢
>> my_cayley_para
step =
0.1000
upper =
10
lower =
-10
Elapsed time is 4.311944 seconds.
Elapsed time is 0.323080 seconds.
当步进0.05时,效果好的出奇。
>> my_cayley_para
step =
0.0500
upper =
10
lower =
-10
Elapsed time is 5.219964 seconds.
Elapsed time is 1.090936 seconds.
当步进为0.02时,运算规模增加25倍(相比0.1步进),运算时间增加319倍
>> my_cayley_para
step =
0.0200
upper =
10
lower =
-10
Elapsed time is 1378.127378 seconds.
Elapsed time is 14.683606 seconds.
效果如图:
4. 改进的并行my_cayley_para.m
由于发现运算规模的增大与计算时间的增长并非线性,怀疑matlab在某个运算规模的时候效率是最高的。
定义最佳运算规模点S,满足:
- 当运算规模s=S时,花费的运算时间为tS;
- 当运算规模s≠S时,花费的运算时间总有t>tS。
重新设计程序,令其按照某个规模对运算任务进行分配,代码如下:
% Cayley problem, parallel version
% for more detail, see "Diedai, Hundun,Fenxing" by Li Zhong, page 037
clear;
% arguments
step = 0.02
upper = 10
lower = -10
% inner arguments
xmax = 300;
ymax = 300;
Smax = xmax*ymax; % maximum task scale in point
Tmax = 2; % maximum number of tasks running in parallel
TaskClips = ceil(((upper-lower)/step)^2/Smax); % task clips
% TaskClipsPerCol = ceil((upper-lower)/step/400);
% TaskClipsPerRow = ceil((upper-lower)/step/400);
TaskClipFinished = 0; % finished task clips counter
TaskClipAt = 0; % task clip currently processed
TaskUsed = 0; % used task counter
NL = 0; % new line indicator
x = lower;
y = lower;
% initialization
sched = findResource(); % default scheduler
taskpool = cell(Tmax,6);
for i=1:Tmax
job(i) = createParallelJob(sched);
set(job(i), 'MaximumNumberOfWorkers',1);
taskpool{i,1}='ready';
end
% main loop
tic
while (TaskClipFinished < TaskClips)
% Phase1 : Check taskpool if there are ready tasks.
if (TaskUsed<Tmax) % task pool is ready to use
if (TaskClipAt<TaskClips) % not all task clips is being processed/have been processed.
for i=1:Tmax
if (strcmp(taskpool{i,1}, 'ready'))
TaskClipAt = TaskClipAt + 1;
if (TaskClipAt<=TaskClips)
if (NL == 1)
x = x + xmax*step;
y = lower;
NL = 0;
end
dx = x + xmax*step;
dy = y + ymax*step;
if (dy >= upper)
dy = upper;
NL = 1;
end
if (dx >= upper)
dx = upper;
end
taskpool{i,3} = x:step:dx;
taskpool{i,4} = y:step:dy;
taskpool{i,6} = TaskClipAt;
taskpool{i,2} = createTask(job(i), @my_cayley_para_func, 1, {taskpool{i,3},taskpool{i,4}});
submit(job(i));
taskpool{i,1}='used';
TaskUsed = TaskUsed +1;
[x dx y dy TaskClips TaskClipFinished TaskClipAt TaskUsed]
y = dy;
else
break;
end
end
end
end
end
% Phase2 : clean up finished tasks, get outputs.
TaskAvailable = 0;
for i=1:Tmax
if (strcmp(taskpool{i,1}, 'used'))
state = get(job(i), 'State');
%state = 'finished';
if (strcmp(state,'finished'))
TaskAvailable = TaskAvailable + 1;
TaskClipFinished = TaskClipFinished + 1;
% save results
re = get(taskpool{i,2}, 'OutputArguments');
z_rec = [re{1}];
[m, n]=size(z_rec);
for k=1:m
if 255-z_rec(k,3)*10<0
z_rec(k,3)=0;
else
z_rec(k,3)=255-z_rec(k,3)*10;
end
imgx = real(z_rec(k,1))*(1/step);
imgy = imag(z_rec(k,1))*(1/step);
imgx = round(imgx+(upper/step))+1;
imgy = round(imgy+(upper/step))+1;
switch(z_rec(k,4));
case 1
imgX(imgx, imgy, :) = [z_rec(k,3) 0 0];
case 2
imgX(imgx, imgy, :) = [0 z_rec(k,3) 0];
case 3
imgX(imgx, imgy, :) = [0 0 z_rec(k,3)];
end
end
% destroy and recreate job
destroy(job(i));
job(i) = createParallelJob(sched);
set(job(i), 'MaximumNumberOfWorkers',1);
taskpool{i,1}='ready';
end
end
end
TaskUsed = TaskUsed - TaskAvailable;
pause(0.5);
end
toc
imgX = uint8(imgX);
imwrite(imgX, 'cayley_para.png', 'png');
for i=1:Tmax
destroy(job(i));
end
步进0.02时,输出如下,将任务按照300*300的规模分成了12片。总耗时93.77秒,平均每片耗时7.814秒。
>> my_cayley_para
step =
0.0200
upper =
10
lower =
-10
ans =
-10 -4 -10 -4 12 0 1 1
ans =
-10 -4 -4 2 12 0 2 2
ans =
-10 -4 2 8 12 2 3 1
ans =
-10 -4 8 10 12 2 4 2
ans =
-4 2 -10 -4 12 3 5 2
ans =
-4 2 -4 2 12 4 6 2
ans =
-4 2 2 8 12 5 7 2
ans =
-4 2 8 10 12 6 8 2
ans =
2 8 -10 -4 12 7 9 2
ans =
2 8 -4 2 12 8 10 2
ans =
2 8 2 8 12 9 11 2
ans =
2 8 8 10 12 10 12 2
Elapsed time is 93.773797 seconds.
当步进为0.01时,任务分为45片,估计耗时351.65秒。实际耗时370.84秒。部分输出为:
>> my_cayley_para
step =
0.0100
upper =
10
lower =
-10
ans =
-10 -7 -10 -7 45 0 1 1
ans =
-10 -7 -7 -4 45 0 2 2
。。。
ans =
8 10 -7 -4 45 42 44 2
ans =
8 10 -4 -1 45 43 45 2
Elapsed time is 370.840813 seconds.
步进为0.01,任务规模250*250,开3个任务,分为64个任务分片,部分运行输出如下:
ans =
'FinishTask : ' [7.5000] [10] [10] [10] [64] [63] [64] [3]
ans =
'FinishTask : ' [7.5000] [10] [10] [10] [64] [64] [64] [1]
tElapsed =
265.7671
eff =
1.5051e+04
运行期截图,有意思的是,CPU负荷瞬时值可能超出100%
至于为什么matlab会存在最佳运算规模点,原因未知。
另外,虽然CPU号称是4核8线程,但是根据实际的经验,开出的任务数最好比实际物理核心数少一个。
补给:
当把下限取-0.01,上限取0.01,步长1e-5,N-R法容差1e-7,作出的图形如下所示,这是非常好的分形。
5. 需要新的实现
MATLAB在2014a之后,移除了旧的并行计算接口,替换如下:
-
findResource
替换为parcluster
; -
createParallelJob
替换为createJob
。
新发现,规模还需要考虑job数,即只要不超过一个特定数值,就可以开到4个核,以下是一些测试数据:
任务规模 | 时间(秒) | 备注 |
---|---|---|
100x100x2 | 583.209777 | - |
300x300x2 | 191.616231 | - |
200x200x3 | 182.200293 | - |
200x200x4 | 147.327081 | - |
250x250x4 | 195.529967 | 超过特定数值,效率下降 |
170x170x4 | 181.846846 | - |
170x170x7 | 158.055520 | 超过物理核数,任务会排队 |
225x225x4 | 143.546186 | - |
✅225x225x5 | 129.017329 | 利用排队减少pause()惩罚 |
230x230x5 | 135.159124 | - |
220x220x5 | 129.495626 | - |
220x220x5 | 619.636121 | -10:0.01:10 |