I have an application that access a resource from multiple threads. The resource is a device attached to USB. Its a simple command/response interface and I use a small lock()
block to ensure that the thread that sends a command also gets the response. My implementation uses the lock(obj) keyword:
lock (threadLock)
{
WriteLine(commandString);
rawResponse = ReadLine();
}
When I access this from 3 threads as fast as possible (in a tight loop) the CPU usage is about 24% on a high-end computer. Due to the nature of the USB port only about 1000 command/response operations are performed per second. Then I implemented the lock mechanism described here SimpleExclusiveLock and the code now looks similar to this (some try/catch stuff to release the lock in case of an I/O exception is removed):
Lock.Enter();
WriteLine(commandString);
rawResponse = ReadLine();
Lock.Exit();
Using this implementation the CPU usage drops to <1% with the same 3 thread test program while still getting the 1000 command/response operations per second.
The question is: What, in this case, is the problem using the built-in lock()
keyword?
Have I accidentally stumbled upon a case where the lock()
mechanism has exceptionally high overhead? The thread that enters the critical section will hold the lock for only about 1 ms.
No, sorry, this is not possible. There's no scenario where you have 3 threads with 2 of them blocking on the lock and 1 blocking on an I/O operation that takes a millisecond can get you 24% cpu utilization. The linked article is perhaps interesting, but the .NET Monitor class does the exact same thing. Including the CompareExchange() optimization and the wait queue.
The only way you can get to 24% is through other code that runs in your program. With the common cycle stealer being the UI thread that you pummel a thousand times per second. Very easy to burn core that way. A classic mistake, human eyes can't read that fast. With the further extrapolation that you then wrote a test program that doesn't update UI. And thus doesn't burn core.
A profiler will of course tell you exactly where those cycles go. It should be your next step.
I have an application that access a resource from multiple threads. The resource is a device attached to USB. Its a simple command/response interface and I use a small lock()
block to ensure that the thread that sends a command also gets the response. My implementation uses the lock(obj) keyword:
lock (threadLock)
{
WriteLine(commandString);
rawResponse = ReadLine();
}
When I access this from 3 threads as fast as possible (in a tight loop) the CPU usage is about 24% on a high-end computer. Due to the nature of the USB port only about 1000 command/response operations are performed per second. Then I implemented the lock mechanism described here SimpleExclusiveLock and the code now looks similar to this (some try/catch stuff to release the lock in case of an I/O exception is removed):
Lock.Enter();
WriteLine(commandString);
rawResponse = ReadLine();
Lock.Exit();
Using this implementation the CPU usage drops to <1% with the same 3 thread test program while still getting the 1000 command/response operations per second.
The question is: What, in this case, is the problem using the built-in lock()
keyword?
Have I accidentally stumbled upon a case where the lock()
mechanism has exceptionally high overhead? The thread that enters the critical section will hold the lock for only about 1 ms.
No, sorry, this is not possible. There's no scenario where you have 3 threads with 2 of them blocking on the lock and 1 blocking on an I/O operation that takes a millisecond can get you 24% cpu utilization. The linked article is perhaps interesting, but the .NET Monitor class does the exact same thing. Including the CompareExchange() optimization and the wait queue.
The only way you can get to 24% is through other code that runs in your program. With the common cycle stealer being the UI thread that you pummel a thousand times per second. Very easy to burn core that way. A classic mistake, human eyes can't read that fast. With the further extrapolation that you then wrote a test program that doesn't update UI. And thus doesn't burn core.
A profiler will of course tell you exactly where those cycles go. It should be your next step.
0 commentaires:
Enregistrer un commentaire