RK3576 with UFS Storage: In-depth Analysis of Performance Advantages and Read-Write Test Data

In embedded storage field, UFS (Universal Flash Storage) is gradually emerging. UFS is a type of flash memory. Similar to eMMC, it integrates a control chip, accesses a standard interface, and undergoes standard packaging on the basis of NAND storage chips, thus forming a highly integrated storage chip. Due to its compact characteristics, UFS is widely used in embedded devices such as mobile phones and tablets. Moreover, since UFS far outperforms eMMC in terms of performance, it is often used in high-end products.

Advantages of UFS

1. Faster response speed for multitasking

Devices using UFS2.0. LVDS (Low-Voltage Differential Signaling) has a dedicated serial interface, allowing read and write operations to be carried out simultaneously. The CQ (Command) queue dynamically allocates tasks without waiting for the previous process to end. It’s like a car getting on the highway, with multiple lanes allowing high-speed and smooth travel. In contrast, mobile phones using EMMC must perform read and write operations separately, and the instructions are also packaged. In terms of speed, EMMC is already at a disadvantage, and it is naturally slower when performing multitasking. It likes traveling on an common two-lane road with speed limits.

2. Low latency, UFS has a 3-times faster response speed

When reading large-scale games and large-volume files, UFS2.0 takes less time. The time required to load a game is one-third of that of EMMC5.0. Correspondingly, when experiencing games, mobile phones with UFS2.0 have lower latency and smoother pictures.

3. Shorter loading time for photo thumbnails in the album

Taking the mobile phone album as an example, many people’s mobile phones are filled with hundreds or even thousands of photos. When you open the photo thumbnails in the album, you can clearly see the loading process. This is caused by the fact that the mobile phone cannot keep up with the refresh speed when reading photos from the flash memory. On a mobile phone with a good screen, the pictures will load smoothly as you scroll, while on a less-capable mobile phone, you can clearly feel the lag during loading.

4. Faster speed and lower power consumption

After the UFS chip improves its speed, it means that it takes less time to complete the same task. Higher efficiency means lower power consumption. When working simultaneously, the power consumption of UFS is 10% lower than that of eMMC, and it can save approximately 35% of power consumption in daily work.

UFS interface read-write performance test

RK3576 CPU also provides a UFS2.0 interface and an emmc5.1 interface.

FET3576-C SoM also reserves a UFS interface.

Refer to Rockchip’s official document “Rockchip_Developer_Guide_UFS_CN_V1.3.0” to conduct read-write tests on the UFS flash memory of OK3576-C.

Sequential write test

root@ok3576-buildroot:/# fio -filename=/dev /sda -direct=1 -iodepth 
32 -thread -rw=write -bs=1024k -size=1G -numjobs=8 -runtime=180 
-group_reporting -name=seq_100write_1024k
seq_100write_1024k: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 
1024KiB-1024KiB, ioengine=psync, iodepth=32 
... 
fio-3.34 
Starting 8 threads 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
Jobs: 8 (f=8): [W(8)][96.0%][w=359MiB/s][w=359 IOPS][eta 00m:01s] 
seq_100write_1024k: (groupid=0, jobs=8): err= 0: pid=1296: Thu Jan 1 00:01:32 1970 
write: IOPS=332, BW=333MiB/s (349MB/s)(8192MiB/24631msec); 0 zone resets 
clat (msec): min=2, max=103, avg=23.55, stdev= 9.15 
lat (msec): min=2, max=104, avg=23.77, stdev= 9.15 
clat percentiles (msec): 
| 1.00th=[ 12], 5.00th=[ 14], 10.00th=[ 15], 20.00th=[ 16], 
| 30.00th=[ 18], 40.00th=[ 20], 50.00th=[ 22], 60.00th=[ 25], 
| 70.00th=[ 27], 80.00th=[ 31], 90.00th=[ 36], 95.00th=[ 41], 
| 99.00th=[ 53], 99.50th=[ 59], 99.90th=[ 68], 99.95th=[ 73], 
| 99.99th=[ 105] 
bw ( KiB/s): min=206590, max=432470, per=100.00%, avg=342387.68, stdev=7157.63, 
samples=385 
iops : min= 196, max= 421, avg=331.98, stdev= 7.14, samples=385 
lat (msec) : 4=0.11%, 10=0.49%, 20=42.49%, 50=55.44%, 100=1.45% 
lat (msec) : 250=0.01% 
cpu : usr=1.12%, sys=1.83%, ctx=18228, majf=0, minf=0 
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
issued rwts: total=0,8192,0,0 short=0,0,0,0 dropped=0,0,0,0 
latency : target=0, window=0, percentile=100.00%, depth=32 
Run status group 0 (all jobs): 
WRITE: bw=333MiB/s (349MB/s), 333MiB/s-333MiB/s (349MB/s-349MB/s), io=8192MiB 
(8590MB), run=24631-24631msec 
Disk stats (read/write): 
sda: ios=165/65464, merge=0/0, ticks=178/1074993, in_queue=1075171, util=99.64%

The print information is as described above, from which it can be known that the speed of sequential writing is 349 MB/s.

Sequential read test

root@ok3576-buildroot:/#fio -filename=/dev/sda -direct=1 -iodepth 32 -thread -rw=read-bs=1024k -size=1G -numjobs=8 -runtime=180 -group_reporting -name=seq_100read_1024k
seq_100read_1024k: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 
1024KiB-1024KiB, ioengine=psync, iodepth=32 
... 
fio-3.34 
Starting 8 threads 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be 
capped at 1 
Jobs: 8 (f=8): [R(8)][100.0%][r=756MiB/s][r=755 IOPS][eta 00m:00s] 
seq_100read_1024k: (groupid=0, jobs=8): err= 0: pid=1329: Thu Jan 1 00:08:54 1970 
read: IOPS=754, BW=755MiB/s (791MB/s)(8192MiB/10857msec) 
clat (usec): min=2331, max=16444, avg=10573.01, stdev=646.85 
lat (usec): min=2335, max=16447, avg=10575.10, stdev=646.84 
clat percentiles (usec): 
| 1.00th=[ 9896], 5.00th=[10159], 10.00th=[10159], 20.00th=[10290], 
| 30.00th=[10290], 40.00th=[10421], 50.00th=[10421], 60.00th=[10421], 
| 70.00th=[ 10552], 80.00th=[ 10683], 90.00th=[ 10945], 95.00th=[ 12518], 
| 99.00th=[ 13042], 99.50th=[ 13173], 99.90th=[ 13960], 99.95th=[ 15139], 
| 99.99th=[16450] 
bw ( KiB/s): min=762938, max=786629, per=100.00%, avg=772720.14, stdev=979.45, 
samples=168 
iops : min= 740, max= 767, avg=749.19, stdev= 1.02, samples=168 
lat (msec) : 4=0.01%, 10=1.65%, 20=98.34% 
cpu : usr=0.37%, sys=3.81%, ctx=24750, majf=0, minf=2048 
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
issued rwts: total=8192,0,0,0 short=0,0,0,0 dropped=0,0,0,0 
latency : target=0, window=0, percentile=100.00%, depth=32 
Run status group 0 (all jobs): 
READ: bw=755MiB/s (791MB/s), 755MiB/s-755MiB/s (791MB/s-791MB/s), io=8192MiB 
(8590MB), run=10857-10857msec 
Disk stats (read/write): 
sda: ios=64132/0, merge=0/0, ticks=544319/0, in_queue=544320, util=99.26%

The print information is as described above, from which it can be known that the speed of sequential writing is 791 MB/s.

With the continuous development of embedded storage technology and the increasing richness of application scenarios, embedded storage has become indispensable in many fields such as smart homes, in-vehicle infotainment systems, and mobile devices. In the future, both eMMC and UFS will play irreplaceable roles in different application fields by virtue of their respective characteristics.




Dear friends, we have created an exclusive embedded technical exchange group on Facebook, where our experts share the latest technological trends and practical skills. Join us and grow together!