Ffmpeg adding watermark with qsv hardware acceleration optimize performance

I add text watermark on video with ffmpeg but i'm new with ffmpeg and try to optimize performance for this.

My test setup has i5-7500 and Intel HD 630. I tried this code to add watermark on video. If I do not set -hwaccel_output_format to yuv420p or nv12 , it gives error.

ffmpeg -threads 4 -hwaccel qsv -hwaccel_output_format yuv420p -i "input.mp4" -vf "drawtext=text='TEST':x=(W-tw)/2:y=(H-th)/2:fontfile=arial.ttf:fontsize=250:fontcolor=white@0.4:shadowcolor=black@0.4:shadowx=2:shadowy=2" -c:v h264_qsv "output.mp4"

When I run this code, Cpu usage 53% / fps = 90-95 / gpu_load(GPU-Z) = 35-38%
When I changed -threads 1, Cpu usage 35% / fps = 68-72 / gpu_load(GPU-Z) = 28-30%

Find -async_depth keyword on the Internet and tried it with 5 but nothing happens or I used it wrong.

How can use more gpu and less cpu for this operation?