An external sorting algorithm is an algorithm that can handle massive amounts of information. Users utilize it when the data that needs sorting doesn’t fit into a computer’s primary memory (usually the random access memory [RAM]). In such a case, you must place the information in an external memory device (usually a hard disk drive [HDD]).
Think of it this way. We should drink at least eight glasses of water a day to stay healthy. Let’s say you use a 1-liter tumbler while at work, and your goal is to drink eight glasses in eight hours. You know that 1 liter is equal to four glasses. To reach your goal, you’ll need to refill your tumbler once while in the office to reach your goal. In this scenario, the tumbler represents the computer’s RAM, which can only “process” four glasses of water (or data) per batch, and the HDD is the water source, which can provide massive amounts of “data,” depending on your goal.
Other interesting terms…
Read More about a “External Sorting Algorithm”
External sorting algorithms typically use a hybrid sort-merge strategy, which allows a computer to sort data into chunks small enough to fit in the RAM. Each chunk is read, sorted, and written out to a temporary file. Once the entire mass gets sorted, the outputs get merged to form a single larger file.
Here’s an introductory video showing how external sorting algorithms work:
How Does an External Sorting Algorithm Work?
The following diagram shows how a simple external sorting algorithm works:
You can sort a vast mass of data made up of six blocks into three smaller chunks, that is if the computer can only handle two blocks at a time. Each time the two-block chunks are processed, they get stored in a temporary file. Once all of the smaller chunks are processed, they are merged back for storing as an entire massive data block in an external storage device like an HDD.
How Can Computers Run an External Sorting Algorithm Efficiently?
Jim Gray’s Sort Benchmark compared several external sorting algorithm implementations and identified the three ways discussed in more detail below as the most effective. Gray was a Turing Award recipient in 1998 “for his seminal contributions to database and transaction processing research and technical leadership in system implementation.” The Sort Benchmark, meanwhile, is a test performed on sorting algorithms to gauge each one’s effectiveness compared with an agreed-upon baseline. You can learn more about it and the various algorithms’ standing on the Sort Benchmark Home Page.
Using Parallelism
You can use several disk drives running in parallel to improve sequential read-and-write speed. That also saves on cost. In fact, the Sort Benchmark winner in the cost-centric Penny Sort category used six hard drives connected to a midrange machine. This kind of sorting software can use multiple threads to speed up sorting on a modern multicore computer. The program can use asynchronous input/output (I/O) so that one data chunk can be sorted or merged with another while another is read from or written to disk. In other cases, multiple machines connected by fast network links can sort data chunks simultaneously.
Increasing Hardware Speed
Another option is to use more RAM or fast external memory like solid-state drives (SSDs) instead of typical HDDs. Note that several factors can affect a hardware’s maximum sorting speed. Examples include central processing unit (CPU) speed and the number of cores, RAM access latency, I/O bandwidth, disk read/write speed, disk seek time, and others. Balancing the hardware to minimize bottlenecks is critical when designing efficient sorting systems.
Increasing Software Speed
In other cases, machines separate data into one of many bins depending on their values. Some compress inputs, intermediate files, and outputs to reduce the time spent on I/O.
Is There Such a Thing as an Internal Sorting Algorithm?
The short answer is yes. Using an internal sorting algorithm requires all of the data to be sorted in a computer’s RAM. But that is only possible for data sets that are small enough for the RAM to hold and process. In this case, the data doesn’t need to be separated into smaller chunks and merged back.
—
By now, we hope you know the answer to “What is an external sorting algorithm?” and how it works.