The History of Parallel Processing in Java: From Threads to Virtual Threads

Table of Contents

Since the early days of computer science, parallel processing has represented one of the greatest challenges and opportunities. Since its inception in 1995, Java has undergone a significant journey in the world of parallel programming to provide developers with ever-better tools. This story is a fascinating journey through threads, the executor framework, fork/join, parallel streams, CompletableFuture and the latest developments in Project Loom. In this blog post, we take a detailed look at the evolution of parallel processing in Java and the innovations that came with it.

Locking mechanisms in thread programming in Java
#

To make parallel processing safe and efficient, Java provides various locking mechanisms that help coordinate access to shared resources and avoid data corruption. These mechanisms are critical to ensure data consistency and integrity, especially when multiple threads access the same resources simultaneously. The most critical locking mechanisms in Java are described below:

`synchronized` keyword
#

The synchronized keyword is the most basic mechanism for synchronising threads in Java. It ensures that only one thread can access a synchronised method or code block at a time. This mechanism uses a so-called monitor, which is automatically locked and unlocked.

public synchronized void increment() {
    // Critical section
    counter++;
}

Advantages:

- Easy to use and integrate into code.

- Automatic management of bans.

Disadvantages:

- Can lead to performance issues if many threads want to access the synchronised resource simultaneously.

- No flexibility, such as B. the ability to set timeouts or conditional locks.

`ReentrantLock`
#

ReentrantLock is a class from the java.util.concurrent.locks package that provides greater flexibility than the synchronized keyword. As the name suggests, ReentrantLock is a “reentrant” lock, meaning that a thread that already holds the lock can reacquire it without getting into a deadlock.

private final ReentrantLock lock = new ReentrantLock();
public void increment() {
    lock.lock();
    try {
        // Critical section
        counter++;
    } finally {
        lock.unlock();
    }
}

Advantages:

- Provides more control over the locking process, e.g. B. the ability to acquire locks with timeouts (tryLock).

- Supports fair locking, which ensures that threads are granted access in the requested order (lock(true)).

Disadvantages:

- Requires manual release of locks, increasing the risk of deadlocks if unlock is not performed correctly.

`ReadWriteLock`
#

ReadWriteLock is a unique locking mechanism that distinguishes read and write locks. It consists of two locks: a read lock and a write lock. Multiple threads can acquire a read lock simultaneously as long as no write operation is performed. However, only one thread can acquire the write lock, keeping writes exclusive.

private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
public void read() {
    lock.readLock().lock();
    try {
        // Read operation
    } finally {
        lock.readLock().unlock();
    }
}
public void write() {
    lock.writeLock().lock();
    try {
        // Write operation
    } finally {
        lock.writeLock().unlock();
    }
}

Advantages:

- Increases concurrency as multiple threads can read simultaneously as long as there are no writes.

- Reduces the likelihood of blocking read access.

Disadvantages:

- More complex to use than synchronized or ReentrantLock.

- Requires careful design to ensure no deadlocks or race conditions occur.

`StampedLock`
#

StampedLock is another variant of ReadWriteLock introduced in Java 8. Unlike ReadWriteLock, StampedLock provides optimistic read locks that enable even higher concurrency. A stamp is returned each time the lock is acquired to ensure the data remains valid.

private final StampedLock lock = new StampedLock();
public void read() {
    long stamp = lock.tryOptimisticRead();
    // Read operation
    if (!lock.validate(stamp)) {
        stamp = lock.readLock();
        try {
            // Read again
        } finally {
            lock.unlockRead(stamp);
        }
    }
}

Advantages:

- Provides optimistic read locks that enable high concurrency as long as no writes occur.

- Lower read operation overhead compared to traditional locks.

Disadvantages:

- Complex to use as developers need to ensure that the stamp is valid.

- No “re-enterable” locks, which may limit usage for some scenarios.

`Semaphore`
#

Semaphore is another synchronisation mechanism that allows several threads to access a resource simultaneously. It is often used to control simultaneous access to limited resources.

private final Semaphore semaphore = new Semaphore(3);
public void accessResource() {
    try {
        semaphore.acquire();
        // Access the resource
    } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
    } finally {
        semaphore.release();
    }
}

Advantages:

- Allows you to limit the number of threads that are allowed to access a resource at the same time.

- Flexible and applicable to various scenarios, e.g. B. to implement pooling mechanisms.

Disadvantages:

- Can become complex when using multiple semaphores in an application.

- Requires careful management to ensure that acquire and release are called correctly.

Summary of the locking mechanisms
#

Java offers a variety of locking mechanisms, each suitable for different scenarios. The synchronized keyword is easy to use, but less flexible for complex scenarios. ReentrantLock and ReadWriteLock provide more control and enable higher parallelism, while StampedLock is suitable for particularly demanding read operations. Semaphore, on the other hand, is ideal for controlling concurrent access to limited resources. Choosing the proper mechanism depends on the application’s requirements, particularly regarding concurrency, resource contention, and maintainability.

The early years: Threads and `Runnable`
#

Initially, Java’s concept of threads was the cornerstone of parallel processing. Java was developed as a platform-independent language intended for networked, distributed systems. With a focus on multithreading, Java offered a built-in Thread class and the Runnable interface, making it possible to execute multiple tasks simultaneously.

class MyTask implements Runnable {
    @Override
    public void run() {
        // Task running in parallel
        System.out.println("Hello from thread: " + Thread.currentThread().getName());
    }
}
public class Main {
    public static void main(String[] args) {
        Thread thread = new Thread(new MyTask());
        thread.start();
    }
}

In the early versions of Java, threads were the only option for parallel processing. These were supported by operating system threads, which meant managing threads could be heavy and resource-intensive. Developers had to take care of details such as synchronizing shared resources and avoiding deadlocks themselves, which made developing parallel applications challenging.

Using threads: advantages and disadvantages
#

Advantages of threads

Simple modelling of parallel tasks : Threads allow developers to divide parallel tasks into separate units that can run concurrently.

Direct control : Threads give developers fine control over parallel execution, which is particularly useful when specific thread management requirements exist.

Good support from the operating system : Threads are supported directly by the operating system, meaning they can access all system resources and benefit from operating system optimisations.

Disadvantages of threads

Complexity of synchronisation : When multiple threads access shared resources, developers must use synchronisation mechanisms such as synchronized blocks or locks to avoid data corruption. Such mechanisms ensure that only one thread can access a critical resource at a time, ensuring data consistency. However, this often results in complicated and error-prone code, as developers must carefully ensure that all necessary sections are correctly synchronised. An incorrectly set or forgotten synchronisation block can lead to severe errors, such as race conditions, in which the program’s output depends on the timing of thread executions. Additionally, synchronisation mechanisms such as locks or synchronized blocks can lead to a performance penalty as threads often have to wait until a resource is released, which limits parallelism. These wait times can cause bottlenecks in more complex applications, mainly when multiple threads compete for different resources. Therefore, correctly applying synchronisation techniques requires a deep understanding of thread interactions and careful design to ensure data integrity and maximise performance.

Resource intensive : Each thread requires space for its stack and additional resources such as thread handling and context switching. These resource requirements can quickly add up with many threads and lead to system overload. In particular, the memory consumption for the thread stacks and the management of the threads by the operating system lead to increased resource requirements. With a large number of threads, the frequency of context switches also increases, which can lead to a significant reduction in overall performance. This often makes threads inefficient and difficult to manage at a large scale.

The danger of deadlocks : When using synchronisation mechanisms, there is a risk of deadlocks when two or more threads are blocked in a cyclic dependency on each other. Deadlocks often arise when multiple threads hold different locks in a mismatched order, waiting for other threads to release the needed resources. This leads to a situation where none of the threads can continue working because they are all waiting for the others to release resources. Deadlocks are often difficult to reproduce and debug because they only occur under certain runtime conditions. Strategies to avoid deadlocks include using timeout mechanisms, avoiding nested locks, and implementing lock ordering schemes to ensure all threads acquire locks in the same order.

Difficult scalability : Manual management of threads makes it difficult to scale applications, especially on systems with many processor cores. One of the main reasons for this is the challenge of determining the optimal number of threads to use system resources efficiently. Too many threads can cause system overload because management and context switching between threads consume significant CPU resources. On the other hand, too few threads can result in under-utilization of available processor resources, degrading overall performance. Additionally, it is challenging to adapt thread management to the dynamic needs of an application, especially when the load is variable. Developers often have to resort to complex heuristics or dynamic thread pools to control the number of active threads, which significantly complicates the implementation and maintenance of the application. These challenges make it complicated to efficiently scale applications to modern multicore processors because the balance between parallelism and overhead is challenging.

Java 5: The Executor Framework and `java.util.concurrent`
#

The introduction of Java 5 in 2004 introduced the java.util.concurrent package, intended to address many of the problems of early parallel programming in Java. The Executor framework enabled higher abstraction of threads. Instead of starting and managing threads manually, developers could now rely on a task-based architecture to pass tasks to an ExecutorService.

The Executor framework introduced classes like ThreadPoolExecutor, ScheduledExecutorService, and many synchronization helper classes like Semaphore, CountDownLatch, and ConcurrentHashMap. This not only made thread management easier but also led to more efficient use of system resources.

ExecutorService executor = Executors.newFixedThreadPool(4);
executor.submit(() -> {
    System.out.println("Task is running in thread: " + Thread.currentThread().getName());
});
executor.shutdown();

The Executor Framework changed the way developers modelled parallel tasks. Instead of focusing on thread creation, they could define tasks and let the infrastructure handle execution.

Java 7: Fork/Join Framework
#

Java 7 introduced the Fork/Join framework in 2011, explicitly designed for computationally intensive tasks that could be broken down into smaller subtasks. The fork/join framework provided a powerful recursion and divide-and-conquer infrastructure, allowing complex problems to be broken down into smaller, more manageable sub-problems. This division enabled the efficient use of modern multi-core processors.

The fork/join framework was particularly useful for problems broken down into independent subproblems, such as B. calculating Fibonacci numbers, sorting large arrays (e.g. with merge sort) or processing large amounts of data in parallel. The central component of the framework is the ForkJoinPool, which handles the management of task transfer between threads. The ForkJoinPool uses the so-called work stealing process, in which less busy threads take over work from more busy threads. This ensures better load balancing and increases the efficiency of parallel processing.

Another advantage of the fork/join framework is the ability to handle recursive tasks efficiently. Developers can use classes like RecursiveTask or RecursiveAction to define tasks that either provide a return value (RecursiveTask) or do not require a return (RecursiveAction). The fork/join approach makes it possible to recursively split the tasks (fork) and then combine the results again (join).

ForkJoinPool forkJoinPool = new ForkJoinPool();
forkJoinPool.invoke(new RecursiveTask<Integer>() {
    @Override
    protected Integer compute() {
        // Split and calculate task
        if (/* base case reached */) {
            return 42;  // Example return value
        } else {
            // Divide the task further
            RecursiveTask<Integer> subtask1 = new Subtask();
            RecursiveTask<Integer> subtask2 = new Subtask();
            subtask1.fork();
            subtask2.fork();
            return subtask1.join() + subtask2.join();
        }
    }
});

The fork/join framework delivered significant performance improvements for CPU-intensive workloads, particularly for tasks easily broken down into smaller, independent pieces. It made it easier to write parallel algorithms without the developer worrying about thread distribution details. The ‘ForkJoinPool’ handles the distribution of tasks and uses work stealing to ensure that the processor resources are used optimally. This significantly increases performance compared to manual thread management, especially for computationally intensive and highly parallelisable tasks.

Java 8: Parallel Streams
#

Java 8, released in 2014, marked a milestone in the evolution of Java, particularly with the introduction of lambda expressions, streams, and functional programming. These new features made the language more flexible and easier to use, especially for parallel operations. One of the most significant new features for parallel processing was Parallel Streams.

Parallel Streams allowed developers to effortlessly parallelise operations across collections without explicitly dealing with threads or synchronisation. This was achieved by integrating the fork/join framework behind the scenes. Parallel streams use the ForkJoinPool internally to distribute tasks efficiently across multiple processor cores. This approach is based on the divide-and-conquer design principle, in which an enormous task is broken down into smaller subtasks that can be executed in parallel.

The developer uses the parallelStream() method to convert a collection into a parallel stream. This results in the processing of the collection’s elements occurring simultaneously, with individual parts of the task being automatically distributed across the available CPU cores. In contrast to manual thread management, this approach offers a high level of abstraction and relieves the developer of the complex management of threads and synchronisation.

An example of using Parallel Streams is processing a list of numbers in parallel. Here, the operations applied to each element are carried out simultaneously, which can achieve a significant increase in performance on large data sets:

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
numbers.parallelStream().map(n -> n * n).forEach(System.out::println);

The design of Parallel Streams aims to simplify the development of parallel applications by using a declarative syntax that allows the developer to focus on the logic of data processing rather than on the details of thread management and distribution of the tasks to deal with. This higher abstraction makes parallel processing more accessible, which leads to better performance, especially in multi-core systems.

Java 8: CompletableFuture
#

The CompletableFuture API was also introduced in Java 8 and significantly expanded the possibilities of asynchronous programming. CompletableFuture allows the creation, chaining and combining asynchronous tasks, making it a handy tool for developing event-driven applications. Using methods like thenApply, thenAccept and thenCombine makes it easy to define a sequence of asynchronous operations that should be executed sequentially or whose results can be combined.

A CompletableFuture represents a future calculation and allows the different steps of the workflow to be defined declaratively. For example, an asynchronous calculation can be started, and the result can be processed further without worrying about explicitly managing the threads. This significantly simplifies the code and makes it more readable.

An example of using CompletableFuture shows how to execute multiple asynchronous operations one after the other:

CompletableFuture.supplyAsync(() -> "Hello")
                .thenApply(s -> s + " World")
                .thenAccept(System.out::println);

In this example, an asynchronous calculation is first started that returns the string “Hello”. Then, the result of this calculation is modified by the thenApply method by adding " World". Finally, thenAccept prints the result on the console. The entire process occurs asynchronously, without the developer worrying about explicitly managing threads.

The architecture of CompletableFuture is based on the concept of so-called “completion stages”, which allow the creation of asynchronous pipelines. Each Completion Stage can either trigger a new calculation or process the result of a previous calculation. This enables the modelling of complex workflows, such as: E.g. executing multiple tasks in parallel and then combining the results (thenCombine), or defining actions that should be carried out in the event of an error (exceptionally).

Another significant advantage of CompletableFuture is the ability to combine asynchronous tasks. For example, two independent asynchronous calculations can be performed in parallel, and their results can then be merged:

CompletableFuture<Integer> future1 = CompletableFuture.supplyAsync(() -> 10);
CompletableFuture<Integer> future2 = CompletableFuture.supplyAsync(() -> 20);
CompletableFuture<Integer> combinedFuture = future1.thenCombine(future2, (result1, result2) -> result1 + result2);
combinedFuture.thenAccept(result -> System.out.println("Combined Result: " + result));

In this example, two calculations are performed in parallel, and their results are combined. The thenCombine method allows the results of the two futures to be added, and thenAccept prints the combined result.

This model allows complex workflows to be created without explicitly relying on threads or callbacks, making the code cleaner, more modular, and easier to maintain. CompletableFuture also provides methods such as allOf() and anyOf() that allow multiple futures to be monitored and processed simultaneously. This is particularly useful for scenarios where numerous independent tasks must be executed in parallel.

Overall, the CompletableFuture API makes asynchronous programming in Java much more accessible and allows developers to develop reactive and non-blocking applications with relatively little effort.

Java 9 to Java 19: Improvements and Project Loom
#

After Java 8, the parallel programming models continuously improved. Java 9 brought improvements to the fork/join framework and introduced the ‘Flow’ API, which supported a reactive streaming model. Java 9 to 17 focused primarily on performance improvements, security, and the modularization of the JDK (Project Jigsaw).

However, one of the most significant innovations in recent times is Project Loom. Since Java 19, Virtual Threads has been available as a preview feature to revolutionise parallel programming in Java. Virtual threads are lightweight threads that enable many concurrent tasks to run without the typical overhead of traditional operating system threads. While traditional threads are limited by resource costs (such as memory for stacks and context switches), virtual threads are designed to be much more resource-efficient. This means developers can create millions of virtual threads that work independently without overloading the system.

Virtual threads are handy for server-side applications that handle many concurrent connections, such as web servers or microservices. In traditional approaches, handling each request in its thread often leads to scaling problems because hardware resources limit the number of possible threads. Virtual threads, on the other hand, allow each incoming request to be handled in its virtual thread, significantly increasing parallelism.

Virtual threads work because they are efficiently managed by the Java runtime system rather than directly by the operating system like traditional threads. This significantly speeds up context switching and makes managing millions of threads realistic. Virtual threads are programmed like traditional threads, allowing existing code to be easily adapted to take advantage of this new technology.

A simple example of using virtual threads shows how to use an executor to execute a task in a virtual thread:

try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    executor.submit(() -> {
        System.out.println("Running in a virtual thread");
    });
}

This example creates an executor that uses a new virtual thread for each task. The submit method starts the task in a virtual thread, which requires significantly fewer resources than a traditional thread.

Project Loom can potentially make parallel programming in Java much more accessible by eliminating the need for developers to worry about thread scaling. Virtual threads are significantly more efficient and offer much higher parallelism without the programmer having to work with thread pools or complex synchronisation mechanisms explicitly. This increased scalability is particularly valuable in applications where concurrent operations must be dynamically scaled, as in many modern web applications and cloud environments. The introduction of virtual threads makes Java an even stronger choice for developing highly scalable, parallel applications by dramatically reducing the complexity of thread management.

The Evolution of Parallel Processing in Java: Conclusion
#

The journey of parallel processing in Java reflects the language’s evolutionary nature. From the early days of threads, when developers had to rely on low-level APIs, to the highly abstract paradigms such as the Executor framework, Fork/Join, and Parallel Streams, Java has continually introduced improvements to make parallel application development easier.

With recent developments such as ‘CompletableFuture’ and Project Loom, Java can meet the needs of modern software development, especially in scalability and performance. Parallel processing in Java is now simpler, safer and more efficient than ever before - providing developers with powerful tools to exploit the full potential of modern multicore systems.

Looking into the future
#

With Project Loom on the path to stability, we could see a further shift in focus towards even simpler and more performant parallel processing techniques. Virtual threads will likely pave the way for new frameworks and libraries that benefit from this lightweight parallelism. Developers will continue to have access to the best parallel processing tools without worrying about thread management’s complexities.

Java has proven that it can keep changing with the times as a parallel programming language—a language that rises to the challenges and adapts to the needs of modern developers. Java’s history of parallel processing is one of progress and continuous innovation. With the upcoming developments, it will remain exciting to see what new possibilities the future will bring us.

Locking mechanisms in thread programming in Java#

synchronized keyword#

ReentrantLock#

ReadWriteLock#

StampedLock#

Semaphore#

Summary of the locking mechanisms#

The early years: Threads and Runnable#

Using threads: advantages and disadvantages#

Java 5: The Executor Framework and java.util.concurrent#

Java 7: Fork/Join Framework#

Java 8: Parallel Streams#

Java 8: CompletableFuture#

Java 9 to Java 19: Improvements and Project Loom#

The Evolution of Parallel Processing in Java: Conclusion#

Looking into the future#

Related

Locking mechanisms in thread programming in Java
#

`synchronized` keyword
#

`ReentrantLock`
#

`ReadWriteLock`
#

`StampedLock`
#

`Semaphore`
#

Summary of the locking mechanisms
#

The early years: Threads and `Runnable`
#

Using threads: advantages and disadvantages
#

Java 5: The Executor Framework and `java.util.concurrent`
#

Java 7: Fork/Join Framework
#

Java 8: Parallel Streams
#

Java 8: CompletableFuture
#

Java 9 to Java 19: Improvements and Project Loom
#

The Evolution of Parallel Processing in Java: Conclusion
#

Looking into the future
#