Obrázok:Hash table average insertion time.png
Z Wikipédie
Tento súbor je zdieľaný zdroj a môže byť použitý v iných wiki.
Toto je súbor z Wikimedia Commons. Prosíme pozrite si jeho popisnú stránku . |
[edit] Summary
Shows the average number of cache misses expected when inserting into a hash table with various collision resolution mechanisms; on modern machines, this is a good estimate of actual clock time required. This seems to confirm the common heuristic that performance begins to degrade at about 80% table density. Created by Derrick Coetzee in Mathematica, Illustrator, and Photoshop.
It is based on a simulated model of a hash table where the hash function chooses indexes for each insertion uniformly at random. The parameters of the model were:
- A table size of 1,000 elements.
- An L1 cache line size of 16 words, as on the Pentium 4. L2 cache effects are not accounted for.
Because the linear probing values varied widely according to the random choices used to fill the table, I took the average value over 25 runs. The (rather inefficient) Mathematica code used to generate the table follows:
<<Statistics`DescriptiveStatistics`; f[tablesize_,points_,cachewords_]:= Module[{i,r,j,compares1,compares2,k,slots1,slots2}, slots1 = Table[0,{i,1,tablesize}]; slots2 = Table[0,{i,1,tablesize}]; Table[ For[i=0,i<Floor[Length[slots1]/(points+1)],i++, r=Random[Integer,{1,Length[slots1]}]; slots1[[r]]++]; For[i=0,i<Length[slots1]/(points+1),i++, r=Random[Integer,{1,Length[slots2]}]; For[j=r,slots2[[j]]>0,j=If[j\[Equal]Length[slots2],1,j+1]]; slots2[[j]]++]; compares2=0; For[i=1,i<=Length[slots2],i++, For[j=i,slots2[[j]]>0,j=If[j\[Equal]Length[slots2],1,j+1]]; compares2+= Ceiling[If[j\[GreaterEqual]i,j-i,j+Length[slots2]-i]/cachewords]]; {N[Apply[Plus,slots1]/Length[slots1]]+2, N[compares2/Length[slots2]]+1},{k,1,points}]]; t=Table[f[1000,49,16],{i,1,25}]; Export["Hash_table_average_insertion_time.eps", Show[Map[ListPlot[#,PlotJoined\[Rule]True,Frame\[Rule]True, FormatType\[Rule]TraditionalForm, FrameLabel\[Rule]{"Density of table", "Average cache misses per insertion"},Axes\[Rule]False]&, Table[{i/50,Mean[Table[t[[k,i,j]],{k,1,Length[t]}]]},{j,1,2},{i,1, Length[t[[1]]]}]]]]
You may be curious what happens in the case where no cache exists. In other words, how does the number of probes (number of reads, number of comparisons) rise as the table fills? The curve is similar in shape to the one above, but shifted left: it requires an average of 24 probes for an 80% full table, and you have to go down to a 50% full table for only 3 probes to be required on average. This suggests that in the absence of a cache, ideally your hash table should be about twice as large for probing as for chaining.
[edit] Licensing
I, the author of this work, hereby release it into the public domain. This applies worldwide. In case this is not legally possible: Afrikaans | Alemannisch | Aragonés | العربية | Български | Català | Česky | Dansk | Deutsch | Ελληνικά | English | Español | Esperanto | فارسی | Français | Galego | 한국어 | हिन्दी | Hrvatski | Ido | Bahasa Indonesia | Íslenska | Italiano | עברית | Latina | Lietuvių | Magyar | Bahasa Melayu | Nederlands | Norsk (bokmål) | Norsk (nynorsk) | 日本語 | Polski | Português | Ripoarish | Română | Русский | Slovenčina | Slovenščina | Српски | Svenska | ไทย | Türkçe | Українська | Tiếng Việt | Walon | 简体中文 | 繁體中文 | +/- |
Odkazy na obrázok
Na tento obrázok odkazujú nasledujúce články: