The performance of the IO system has not kept pace would the

The performance of the I/O system has not kept pace would the increasing speed of processors. How has this disparity in performance affected your use of computers and what techniques have you had to employ to adapt to this disparity?

Solution

The improvement rate in microprocessor speed by far exceeds the one in DRAM memory. A number of reasons are in the genesis of this growing disparity:

the prime reason is the division of the semiconductor industry into microprocessor and memory fields. As a consequence their technology headed in different directions: the first one has increased in speed, while the latter has increased in capacity.

        The result of this two approaches lead to an improvement rate of 60%/year in microprocessor performance, while the access time to DRAM has been improving at less than 10%/year.

         The performance gap grows exponentially. Although the disparity between microprocessor and memory speed is currently a problem, it will increase in the next few years. This increasing processor-memory performance gap is now the primary obstacle to improved computer system performance.  

      The performance of the processor-memory interface is characterized by two parameters:

the latency and the bandwidth.

The latency is the time between the initiation of a memory request, by the processor, and its completion. In fact the problem of the increasing divergence between the memory and processor speeds is a latency growing trouble.

The bandwidth is the rate at which information can be transferred to or from the memory system. Maximum performance is achieved by zero latency and an infinite bandwidth, which characterizes the ideal memory system. It exists a close and subtle relationship among the bandwidth and the latency that is explored in the sense of improving the performance of the memory hierarchy.

Techniques have you had to employ to adapt to this disparity:

1.Improving Cache Performance by Exploiting Read-Write Disparity

we propose cache management techniques that increase the probability of cache hits for critical read requests, potentially at the cost of causing less critical write requests to miss. To accomplish this, we must distinguish between cache lines that will be read in the future and those that will not. Prior works sought to distinguish between cache lines that are reused and those that are not reused, with the intent of filtering lines that were unlikely to be reused. However, prior work did not distinguish between reuse due to critical read requests and less critical write requests, hence potentially leading to additional read misses on the critical path. The key contribution of this paper is the new idea of distinguishing between lines that are reused by reads versus those that are reused only by writes to focus cache management policies on the more critical read lines. To our knowledge, this is the first work that uses whether or not a line is written to as an indicator of future criticality of the cache line.

• We present data highlighting the criticality and locality differences between reads and writes within the cache (i.e., cache lines that are read versus cache lines that are only written).

• To exploit the disparity in read-write criticality, we propose Read Write Partitioning (RWP), a mechanism that divides the last-level cache into two logical partitions for clean and dirty lines. RWP predicts the best partition sizes to increase the likelihood of future read hits. This could lead to allocating more lines in the clean partition or the dirty partition, depending on which partition serves more read requests.

• To show the potential for favoring lines that serve reads, we discuss a complex PC-based predictor, Read Reference Predictor (RRP). RRP uses the PC of the first reference to identify cache lines likely to service future reads, and therefore avoids allocating write-only lines and lines that are unlikely to get reused by reads.

• We show that our RWP mechanism is close in performance to the more complex RRP with only 5.4% of RRP’s state overhead. We also show that RWP outperforms prior state of-the-art cache management policies.

2. Processing performance of computers is increased by using multi-core processors:

Processing performance of computers is increased by using multi-core processors, which essentially is plugging two or more individual processors (called cores in this sense) into one integrated circuit. Ideally, a dual core processor would be nearly twice as powerful as a single core processor. In practice, the performance gain is far smaller, only about 50%, due to imperfect software algorithms and implementation.[61] Increasing the number of cores in a processor (i.e. dual-core, quad-core, etc.) increases the workload that can be handled. This means that the processor can now handle numerous asynchronous events, interrupts, etc. which can take a toll on the CPU when overwhelmed. These cores can be thought of as different floors in a processing plant, with each floor handling a different task. Sometimes, these cores will handle the same tasks as cores adjacent to them if a single core is not enough to handle the information.

3.Stereo Vision Disparity Map Algorithms

In stereo vision disparity map processing, the number of calculations required increases with an increasing number of pixels per image. This phenomenon causes the matching problem to be computationally complex . The improvements to and reduction in computational complexity that have been achieved with recent advances in hardware technology have been beneficial for the advancement of research in the stereo vision field. Thus, the main motivation for hardware-based implementation is to achieve real time processing . In real time stereo vision applications, such as autonomous driving, 3D gaming, and autonomous robotic navigation, fast but accurate depth estimations are required . Additional processing hardware is therefore necessary to improve the processing speed.

      In general, stereo vision disparity map algorithms can be classified into local or global approaches. A local approach is also known as area based or window based approach. This is because the disparity computation at a given point (or pixel) depends only on the intensity values within a predefined support window. Thus, such method considers only local information and therefore has a low computational complexity and a short run time.

      By contrast, a global method treats disparity assignment as a problem of minimizing a global energy function for all disparity values. Such a method is formulated as an energy minimization process with two terms in the objective function (i.e., a data term, which penalizes solutions that are inconsistent with the target data and a smoothness term, which enforces the piecewise smoothing assumption with neighboring pixels).

4.Parallel processing

     To tackle both the capacity and performance problems, disk arrays and parallel I/O issues are being widely investigated. The basic objective of integrating multiple disks together is to create highly reliable, mass storage systems. To improve I/O performance, storage units in disk arrays are accessed in parallel. Although, using I/O devices in parallel increases the capacity and performance of storage systems, it does not reduce the probability of disk failure. Hence, integrated mechanisms for improving storage reliability, capacity, and performance have been proposed.

5.Other techniques

    There are several techniques proposed and used in the design and development of high performance storage systems. These include parallel I/O, caching, prefetching, smart and serverless file systems, adaptive techniques driven by storage access patterns, and rich I/O interfaces. Caching and prefetching mechanisms greatly enhance the performance of storage subsystems due to improved cache hit-ratio and reduced disk I/O access time. File systems have been modified to take advantage of the possibility of using several devices in parallel. Furthermore, these mechanisms have been enhanced for taking advantage of resources located in different nodes in a cluster or storage network. A number of high performance storage subsystems such as Storage Area Networks (SAN) and Network-Attached Storage (NAS) provide efficient I/O interfaces with rich semantics

The performance of the I/O system has not kept pace would the increasing speed of processors. How has this disparity in performance affected your use of compute
The performance of the I/O system has not kept pace would the increasing speed of processors. How has this disparity in performance affected your use of compute

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site