今天在ruby_china逛的时候,看到一句话
当涉及到大量String的时候,记得用Set,O(1) vs O(n)
platforms = Set.new %w[bash Chai D3JS Go Javascript Ruby]
foo if platforms.include? platform
就去尝试了一下
API中set的说明
Set implements a collection of unordered values withno duplicates. This is a hybrid of Array's intuitive inter-operationfacilities and Hash's fast lookup.
Set is easy to use with Enumerable objects (implementingeach
). Most of the initializer methods and binary operatorsaccept generic Enumerable objects besidessets and arrays. An Enumerable object can beconverted to Set using theto_set
method.
Set uses Hash as storage, so you must note thefollowing points:
- Equality of elements is determined according to Object#eql? andObject#hash.
- Set assumes that the identity of each element doesnot change while it is stored. Modifying an element of a set will renderthe set to an unreliable state.
- When a string is to be stored, a frozen copy of the string is storedinstead unless the original string is already frozen.
用了一段代码验证了一下:
require 'benchmark'
require 'set'
arr = (1..1000000).map {|e| e.to_s}
set = Set.new arr
i = rand(1000000)
j = i.to_s
Benchmark.bmbm(10) do |t|
t.report("arr_not_include") { 10.times { arr.include? i} }
t.report("set_not_include") { 10.times { set.include? i} }
t.report("arr_include") { 10.times { arr.include? j} }
t.report("set_include") { 10.times { set.include? j} }
end
结果如下:
Rehearsal ---------------------------------------------------
arr_not_include 1.109000 0.000000 1.109000 ( 1.105745)
set_not_include 0.000000 0.000000 0.000000 ( 0.000014)
arr_include 0.079000 0.000000 0.079000 ( 0.071470)
set_include 0.000000 0.000000 0.000000 ( 0.000084)
------------------------------------------ total: 1.188000sec
user system total real
arr_not_include 1.203000 0.000000 1.203000 ( 1.211598)
set_not_include 0.000000 0.000000 0.000000 ( 0.000010)
arr_include 0.062000 0.000000 0.062000 ( 0.064577)
set_include 0.000000 0.000000 0.000000 ( 0.000015)
确实很明显。
这里也有提到Set与Array的区别。