|
The Nsort executable and subroutine library have a long history of
record breaking sort performance for the
MinuteSort benchmark.
Year
|
MinuteSort Record
|
Input/Output Drive(s)
|
Input Reads MB/sec
|
Output Writes MB/sec
|
2006 |
40 GB |
128 Striped |
2587 |
1221 |
2004 |
34 GB |
112 Striped |
1399 |
1186 |
While the MinuteSort contest is now dominated by cluster-based systems,
Nsort's records remain the best single-node results
(see sortbenchmark.org).
Both of Nsort's records utilized large servers with expensive i/o subsystems.
However, current hardware trends have greatly reduced the price of
high performance servers and i/o subsystems.
Similar or better Nsort performance is possible today at a very modest
system costs.
But how can good performance be achieved with Nsort?
The performance of the Nsort executable or subroutine library is
dependent on many factors including:
- key types, decimal key type is slowest
- average record size, bigger record size means better mb/sec
- processor speed and number of processor cores
However the two most prominent factors that limit Nsort performance are:
- Nsort memory size
- disk configuration (when considering the elapsed time to read the input data from disk, sort it, and write the output to disk)
To demonstrate the importance of memory size and disk configuration to Nsort performance,
tests were run using an input data set of 100,000,000 100-byte records
(10,000,000,000 bytes) as generated by the
gensort program
used with the standard
sort benchmarks.
The system configuration was as follows:
- Windows Server 2008 Datacenter
- Intel Core i7-980X Processor (6 cores, 12 hyperthreads)
- LSI SAS 9207-8i HBA
- Seagate Barracuda 1TB ST1000DM003 Hard Drives
- Samsung 840 PRO 256GB Solid State Drives
Various drive configurations were tried using either 1, 2 or 4 hard drives
or solid state drives for the sort input and output filse, and temporary files.
(A separate system drive was used whose size and type was irrelevant to
these test results.)
All the sort data drives were connected via the LSI HBA.
These tests do not reflect the highest possible performance with Nsort
(more disks --> more better).
On the other hand,
these tests are optimistic for the number of disk drives used in several ways:
- Newly formatted disk volumes were used. Real world disk volumes with
fragmented free space would yield worse performance.
- These results were "tuned" in that various file transfer
sizes were tried with only the best result presented here.
Not all Nsort users try out multiple file transfer sizes.
- These April 2013 tests used hard drives and solid state drives with
excellent performance,
not all drives have have comparable performance.
The logical drive type (i.e. number
of physical drives used and whether they were single, striped or mirrored)
used for the sort input and output files is listed for each test
in the first column.
If an input and output file transfer size was specified to Nsort,
it is listed in the next column.
Otherwise this field is left blank.
Any drive striping or mirroring was done with the
standard Windows Disk Management tool.
The logical drive type for the temporary file is listed in the next column
along with any specified transfer size.
Some of the tests were done using approximately 11.5 GB of Nsort memory.
In these cases the no temporary file was necessary and the temp drive is listed as None.
In all other cases, Nsort was directed to use 2 GB of memory thereby forcing
it to use the specified temporary drive(s).
The resulting input file read speed, temp file write speed, temp file read
speed, output file write speed, and the elapsed seconds of the sort from the
Nsort statistics output are all given.
The last column contains a link that may be selected to get the
Nsort command line and statistics output for the test.
Input/Output Drive(s)
|
Trans Size
|
Temp Drive(s)
|
Trans Size
|
Input Reads MB/sec
|
Temp Writes MB/sec
|
Temp Reads MB/sec
|
Output Writes MB/sec
|
Elapsed Seconds
|
Full Stats
|
Hard Drives - Seagate Barracuda 1TB ST1000DM003 |
1 Single |
|
None, 1 Pass |
|
179 |
- |
- |
188 |
109 |
Get |
1 Single |
16m |
Same as I/O |
4m |
89 |
88 |
85 |
85 |
233 |
Get |
1 Single |
16m |
1 Single |
16m |
178 |
180 |
190 |
190 |
113 |
Get |
2 Striped |
|
None, 1 Pass |
|
337 |
- |
- |
371 |
57 |
Get |
2 Striped |
16m |
Same as I/O |
4m |
162 |
160 |
125 |
127 |
143 |
Get |
2 Striped |
16m |
2 Striped |
4m |
307 |
308 |
214 |
217 |
80 |
Get |
2 Striped |
16m |
2 Single |
16m |
320 |
312 |
308 |
312 |
67 |
Get |
2 Mirrored |
8m |
None, 1 Pass |
|
191 |
- |
- |
188 |
106 |
Get |
2 Mirrored |
16m |
Same as I/O |
4m |
92 |
91 |
110 |
111 |
203 |
Get |
2 Mirrored |
16m |
2 Mirrored |
4m |
172 |
170 |
191 |
190 |
114 |
Get |
4 Striped |
16m |
None, 1 Pass |
|
650 |
- |
- |
650 |
31 |
Get |
4 Striped |
16m |
Same as I/O |
4m |
232 |
224 |
195 |
198 |
96 |
Get |
Solid State Drives - Samsung 840 PRO 256GB |
1 Single |
8m |
None, 1 Pass |
|
563 |
- |
- |
532 |
37 |
Get |
1 Single |
8m |
Same as I/O |
16m |
257 |
253 |
273 |
275 |
77 |
Get |
1 Single |
8m |
1 Single |
4m |
417 |
411 |
533 |
532 |
44 |
Get |
2 Striped |
16m |
None, 1 Pass |
|
1073 |
- |
- |
998 |
20 |
Get |
2 Striped |
8m |
Same as I/O |
4m |
347 |
338 |
451 |
451 |
52 |
Get |
2 Striped |
8m |
2 Striped |
16m |
679 |
668 |
977 |
978 |
26 |
Get |
2 Striped |
8m |
2 Single |
4m |
961 |
963 |
980 |
979 |
21 |
Get |
2 Mirrored |
16m |
None, 1 Pass |
|
563 |
- |
- |
532 |
37 |
Get |
2 Mirrored |
16m |
Same as I/O |
4m |
219 |
216 |
340 |
338 |
77 |
Get |
2 Mirrored |
16m |
2 Mirrored |
4m |
383 |
377 |
539 |
532 |
46 |
Get |
4 Striped |
16m |
None, 1 Pass |
|
2079 |
- |
- |
1953 |
10 |
Get |
4 Striped |
16m |
Same as I/O |
4m |
470 |
456 |
875 |
865 |
34 |
Get |
Several guidelines can be drawn from these tests.
- Using enough memory to avoid temporary file usage yields the best results.
- Striping disks together increases performance,
although not quite linearly.
- Disk mirroring does not hurt performance.
- If you have multiple disks available, how can they be configured to
maximize Nsort performance?
- If you can give Nsort enough memory so that no temp disk is necessary,
then stripe the
disks together to form a volume for your sort input and output files.
- If you can't give Nsort enough memory to avoid temp file usage,
the following options are available in decreasing order of performance:
- Stripe together half of your available disks to form the volume for
input and output files. Use the other half of your disks as single disk
volumes for temp files.
- Stripe together half of your available disks to form the volume for
input and output files.
Stripe together the other half of your disks for use as a temp volume.
- Stripe together all your disks and use that volume for input, output and temp sort usage.
While this yields the worst relative performance,
there may be reasons other than Nsort performance to pick this option,
such as the most flexible usage of overall disk storage space.
|