Sorting Data Structure
Arranging the data in ascending or descending order is known as sorting. Sorting is very important from the point of view of our practical life.
The best example of sorting can be phone numbers in our phones. If, they are not maintained in an alphabetical order we would not be able to search any number effectively.
Why is Sorting Important?
- Efficient Searching: Searching algorithms like Binary Search require sorted data.
- Better Readability: Sorted data is easier to understand and use.
- Data Optimization: Used in databases, file systems, and memory management.
- Improves Algorithm Performance: Many algorithms assume or perform better with sorted input.
Types of Sorting
Sorting algorithms can be broadly classified into Internal Sorting and External Sorting based on where the data to be sorted is stored during the sort operation.
Internal Sorting
Internal sorting is performed entirely in the main memory (RAM). All the data that needs to be sorted must fit into memory.
- Fast and efficient for small to medium-sized datasets.
- Relies on in-memory data structures like arrays and lists.
Common Internal Sorting Algorithms
- Bubble Sort
- Insertion Sort
- Selection Sort
- Merge Sort (in-memory)
- Quick Sort
- Heap Sort
Example: Sorting a list of 10,000 student records in a program that has enough RAM to store all of them.
Advantages:
- Faster due to high-speed memory access.
- Simpler implementation.
Disadvantages
- Not suitable for very large datasets that exceed RAM.
External Sorting
External sorting is used when the data is too large to fit into main memory and must be stored in external memory (like hard disks or SSDs) during sorting.
- Designed for handling massive data sets (e.g., terabytes of data).
- Involves reading/writing to disk in chunks (blocks or pages).
- Sorting is done in phases, often using merge-based techniques.
Common External Sorting Algorithm:
- External Merge Sort (also known as Multi-way Merge Sort)
How It Works (Example: External Merge Sort):
- Divide the data into chunks that fit into RAM.
- Sort each chunk using an internal sorting algorithm.
- Store the sorted chunks (runs) on disk.
- Merge these sorted runs into a single sorted file using a k-way merge algorithm.
Example: Sorting a 100 GB log file on a machine with only 8 GB of RAM.
Advantages
- Can handle huge data sets beyond RAM size.
- Efficient when combined with optimized disk I/O strategies.
Disadvantages
- Slower due to disk access times.
- More complex implementation.
Comparison Table
Feature | Internal Sorting | External Sorting |
---|---|---|
Memory Usage | Uses RAM only | Uses disk and RAM |
Dataset Size | Small to medium | Very large |
Speed | Faster | Slower due to I/O |
Algorithms Used | Quick, Merge, Heap | External Merge Sort |
Complexity | Simpler | More complex (I/O handling) |
Use Case | In-memory applications | Databases, Big Data |
Application of Sorting
- The sorting is useful in database applications for arranging the data in desired ORDER.
- In the dictionary like applications the data is arranged in sorted order.
- For searching the element from the list of elements the sorting is required
- For checking the uniqueness of the element the sorting is required.
- For finding the closest pair from the list of elements the sorting is required.
Sorting Category
Here's a comprehensive list of sorting algorithms, categorized by technique and characteristics.
I. Comparison-Based Sorting Algorithms
These algorithms determine the order of elements based on comparisons (e.g., a > b
, a < b
).
A. Simple (Elementary) Sorting Algorithms
- Bubble Sort
- Selection Sort
- Insertion Sort
- Gnome Sort
- Cocktail Shaker Sort (Bidirectional Bubble Sort)
- Comb Sort
- Odd-Even Sort (Brick Sort)
B. Efficient Comparison-Based Sorting Algorithms
- Merge Sort
- Quick Sort
- Heap Sort
- Shell Sort
- Tim Sort (Hybrid of Merge + Insertion Sort used in Python, Java)
- Intro Sort (Hybrid of Quick + Heap + Insertion, used in C++ STL)
- Tree Sort (Using Binary Search Tree)
- Patience Sort
II. Non-Comparison-Based Sorting Algorithms
These algorithms do not compare elements directly and usually work on integers or keys.
- Counting Sort
- Radix Sort
- Bucket Sort
- Pigeonhole Sort
- Flash Sort
- Postman Sort
III. Hybrid Sorting Algorithms
These combine multiple techniques to optimize performance.
- Tim Sort (Merge + Insertion)
- Intro Sort (Quick + Heap + Insertion)
- Block Sort
- Library Sort (Gapped Insertion Sort)
IV. Parallel and External Sorting Algorithms
Used for large datasets or parallel/distributed systems.
A. Parallel Sorting
- Bitonic Sort
- Odd-Even Merge Sort
- Parallel Merge Sort
- Sample Sort
B. External Sorting
- External Merge Sort
- Polyphase Merge Sort
- Balanced Multi-way Merge Sort
V. Specialized and Theoretical Sorting Algorithms
- Bead Sort (natural sorting, not efficient in practice)
- Pancake Sort (used in theoretical computer science)
- Stooge Sort (inefficient, educational)
- Bogo Sort (randomized, inefficient, educational)
- Sleep Sort (joke algorithm based on timing)
- Cycle Sort (in-place and minimal memory writes)
Popular Sorting Algorithms
- Bubble sort
- Insertion sort
- Radix Sort
- Quick sort
- Merge sort
- Heap sort
- Selection sort
- Shell Sort
Sorting deals with sorting the data stored in the memory, whereas external sorting deals with sorting the data stored in files.
In bubble sorting, consecutive adjacent pairs of elements in the array are compared with each other.
Insertion sort works by moving the current data element past the already sorted values and repeatedly interchanging it with the preceding value until it is in the correct place.
Selection sort works by finding the smallest value and placing it in the first position. It then finds the second smallest value and places it in the second position. This procedure is repeated until the whole array is sorted.
Merge sort is a sorting algorithm that uses the divide, conquer, and combine algorithmic paradigm. Divide means partitioning the n-element array to be sorted into two sub-arrays of n/2 elements in each sub-array.
Quick sort works by using a divide-and-conquer strategy. It selects a pivot element and rearranges the elements in such a way that all elements less than pivot appear before it and all elements greater than pivot appear after it.
Radix sort is a linear sorting algorithm that uses the concept of sorting names in alphabetical order.
Heap sort sorts an array in two phases. In the first phase, it builds a heap of the given array. In the second phase, the root element is deleted repeatedly and inserted into an array.
Shell sort is considered as an improvement over insertion sort, as it compares elements separated by a gap of several positions.
Time Complexity of all Sorting Techniques
Time complexity of various sorting algorithms.
S.No | Algorithm | Worst Case | Average Case | Best Case |
1 | Selection Sort | O (n 2) | O (n 2) | O (n 2) |
2 | Bubble Sort | O (n 2) | O (n 2) | O (n 2) |
3 | Insertion Sort | O (n 2) | O (n 2) | n – 1 |
4 | Quick Sort | O (n 2) | log2 n | log2 n |
5 | Heap Sort | O (n log n) | O (n log n) | O (n log n) |
6 | Merge Sort | O (n log n) | O (n log n) | O (n log n) |
7 | Radix Sort | O (n 2) | O (n log n) | O (n log n) |
8 | Shell sort | O(n log2n) | - | - |