How to determine if the database IO is slow

1. Introduction#

The problem of slow database response in a project is concluded to be a problem of poor IO performance at the operating system level. How can we make it more convincing? We generally measure IO performance using two main indicators. Response time: measured in microseconds, it measures the time required to complete an operation, which is generally collected and statistics by Oracle. Throughput: measured by the number of operations completed per unit of time. This is usually measured using tools in the operating system, such as iostat. This article mainly explains how to determine whether IO is slow from the perspective of Oracle, without going into detail about throughput and its measurement methods.

2. What is considered "slow" IO#

Slow IO is a subjective term that depends more on the user's expectations and the actual differences in the system and hardware. This difference is a subjective feeling. When quantified into specific performance indicators, in an enterprise-level platform, users will start to become sensitive when the response time of IO requests exceeds 10ms. However, response time is constantly changing. It may be due to data migration from the file system to shared storage, storage device anomalies, or peak IO performance caused by business growth. This requires multidimensional measurement of response time.

3. Response Time#

Hardware does not necessarily have the same response for each IO request. There will always be peaks and valleys. Therefore, using the average value is a common method to measure response time.
Note: In order to mitigate the problems caused by these peak/valley abnormal scenarios, the sample data size needs to be relatively large. The sample data size should be at least 1000 operations per hour, in order to provide more reliable and practical basis for decision-making.

Types of IO

Average response time is directly related to specific IO types:

Read or write

Single block or multi-block

Single block IO refers to reading one block at a time. For example, when a session waits for a single block IO, the typical wait event is "db file sequential read", indicating that it is waiting for the required block.
Multi-block read refers to reading multiple blocks at a time, ranging from 2 to 128 Oracle blocks, depending on the block size and operating system settings. Usually, a multi-block request has a capacity limit of 1MB. For example, when a session waits for a multi-block IO, the typical wait event is "db file scattered read", indicating that it is waiting for the required blocks.

Synchronous or asynchronous

Synchronous (blocking) operations wait for the hardware to complete physical IO and receive notifications after completion, allowing for proper management of operation success or failure (receiving results in the case of successful reads). When waiting for the result of a system call, the process execution is blocked.
For asynchronous (non-blocking) operations, once the IO request is passed to the hardware or placed in the operating system's queue (typically before physical IO starts), the system call returns immediately. The process execution is not blocked because it does not need to wait for the result of the system call. It can continue to execute and receive when the IO operation has a result.

Threshold for Response Time

The average time for a typical multi-block synchronous read of 64x 8k (total 512KB) should be around 20 milliseconds without IO slowing down. Small requests should be faster (10-20 milliseconds), and the time consumed by large requests should not exceed 25 milliseconds. Asynchronous operations should be at least as fast as synchronous operations, or even faster. Single block reads should be at least as fast as multi-block reads, or even faster. The wait time for "log file parallel write", "control file write", and "direct path writes" should not exceed 15 milliseconds. Measuring data file writes is not as simple as reads. DBWR writes blocks in batches ("db file parallel write") asynchronously, and there is currently no standard for write response time. If DBWR (multi-block or single-block, with or without IO slaves) is fast enough to clean up dirty blocks, other wait events and statistics will be revealed. As a rule, wait events that exceed the above wait event times should be analyzed in detail, especially when there is a significant change in time consumption compared to before.

Note: When the system is below these maximum thresholds, it does not mean that there are no other tuning methods.#

Response time varies depending on the system. For example, the following can be considered as normal average values:

Multi-block synchronous read time is 10 milliseconds.
Single block synchronous read time is 5 milliseconds.
'log file parallel write' time is 3 milliseconds.

The above is based on the premise that multi-block IO requires more IO subsystem resources than single-block IO. If you accept these suggestions, it is best to place the redo log on the fastest disk and avoid contention with other concurrent activities.

The following are the response time thresholds for various IO-related wait events:

Wait Event	R/W	Synchronous/ Asynchronous	Singleblock/ Multiblock	Elapsed Time
control file parallel write	Write	Asynchronous	Multi	< 15ms
control file sequential read	Read	Synchronous	Single	< 20 ms
db file parallel read	Read	Asynchronous	Multi	< 20 ms
db file scattered read	Read	Synchronous	Multi	< 20 ms
db file sequential read	Read	Synchronous	Single	< 20 ms
direct path read	Read	Asynchronous	Multi	< 20 ms
direct path read temp	Read	Asynchronous	Multi	< 20 ms
direct path write	Write	Asynchronous	Multi	< 15 ms
direct path write temp	Write	Asynchronous	Multi	< 15 ms
log file parallel write	Write	Asynchronous	Multi	< 15 ms

Ways to Determine Response Time

10046 trace file

When using level 8 or 12 in the 10046 trace, it includes information about the relevant wait events, and the response time is in the ela field, measured in microseconds.
WAIT #5: nam='cell single block physical read' ela= 672 cellhash#=2520626383 diskhash#=1377492511 bytes=16384 obj#=63 tim=1280416903276618
>> 672 microseconds = 0.672 ms
WAIT #5: nam='db file sequential read' ela= 1018 file#=2 block#=558091 blocks=1 obj#=0 tim=10191852599110
>> 1018 microseconds => 1.018 ms

System State Dump

For each system-level process, the wait information is included in the process information. It usually shows an active waiting for or waiting for completion, and the process is executing in the CPU while waiting for/last wait for.
The "waiting for" indicates that the process is in a waiting state. Before 11g, you can view the "seconds since wait started" field, which shows how long the process has been waiting. Starting from 11gR1, the "total" field shows the total waiting time. If "waiting for" shows that a process is waiting for an IO-related operation and "seconds since wait started" > 0, it indicates that there may be IO loss and the session is in a hang state. (Because it was mentioned earlier that the average acceptable time is 20 milliseconds, and any IO wait time exceeding 1 second needs attention).
"last wait for" is related to versions before 11g, indicating that the process is not waiting (for example, it is using the CPU). The wait time is recorded in the "wait_time" field. (In 11g, "wait_time" is replaced by "not in wait")

last wait for 'db file sequential read' blocking sess=0x0 seq=100 wait_time=2264 seconds since wait started=0 file#=45, block#=17a57, blocks=1
>> 2264 microseconds => 2.264 ms

"waited for" indicates that the session is not waiting. It is usually used in system-level traces after 11gR1. The "total" field indicates the total waiting time.
0: waited for 'db file sequential read' file#=9, block#=46526, blocks=1
wait_id=179 seq_num=180 snap_id=1
wait times: snap=0.007039 sec, exc=0.007039 sec, total=0.007039 sec
wait times: max=infinite
wait counts: calls=0 os=0
>> 0.007039 sec => 7.039 ms

Statspack and AWR reports

The average response time for foreground and background process wait events is reflected through Wait Avg (ms) (average read in milliseconds).

Tablespace IO
The average response time is reflected through Av Rd (ms) (average read in milliseconds).

The wait event histogram provides the distribution of write operation time that makes up these averages. It shows whether all write operations are close to the average or whether there are several peaks or valleys. Each column indicates the percentage distribution of wait event time between each bucket. For example, waits <16ms are greater than waits <8ms. As long as the largest percentage is within the range of <1ms to 16ms, the IO performance is generally acceptable.

4. Conclusion#

The goal of this article is not to troubleshoot the reasons for slow IO, but to explain how to find evidence to determine if IO is slow. If performance deteriorates, slow IO may become a potential cause of performance issues, and it needs to be analyzed from the perspective of the database on how to collect supporting evidence. If the potential cause is slow IO at the operating system level, the IO subsystem engineer needs to be involved in diagnosing and fixing the problem.