Semaphore performance issue #113411

eduardo-vp · 2025-03-12T06:21:03Z

Description

There's a performance issue regarding Semaphore in scenarios where threads request and release the semaphore with no work in between as shown in the snippet below. The rate of transactions is up to 3-4 times lower compared to other runtimes.

using System;
using System.Diagnostics;
using System.Threading;

namespace MultiThreadedScaling
{
  internal class ScalingIssue
  {
    static ReaderWriterLockSlim slimRWlock = null;
    static Semaphore normalSemaphore = null;

    static int numThreads = 8;
    static int run_time = 30; // seconds

    static void Main(string[] args)
    {
      Console.WriteLine("Running test with " + numThreads + " threads for " + run_time + " seconds");

      slimRWlock = new ReaderWriterLockSlim();
      normalSemaphore = new Semaphore(numThreads, Int32.MaxValue);
      for (int i = 0; i < numThreads; i++)
      {
        Thread t1 = new Thread(new ParameterizedThreadStart(OpenCloseSimulation));
        t1.Name = i.ToString();
        t1.Start(i);
      }
    }

    static void OpenCloseSimulation(object state)
    {
      int index = (int)state;
      int numTxns = 0;
      var start_time = DateTime.UtcNow;
      while (DateTime.UtcNow - start_time < TimeSpan.FromSeconds(run_time))
      {
        OpenConnSimulation(index);
        CloseConnSimulation(index);
        numTxns += 1;
      }
      var end_time = DateTime.UtcNow;
      long totRunTime = (long)(end_time - start_time).TotalSeconds;
      Console.WriteLine("Thread " + Thread.CurrentThread.Name + " Transaction rate: commits= " + (numTxns / totRunTime));
    }

    static void OpenConnSimulation(int index)
    {
      normalSemaphore.WaitOne();
      DoWork();
      slimRWlock.EnterReadLock();
      DoWork();
      slimRWlock.ExitReadLock();
    }

    static void CloseConnSimulation(int index)
    {
      DoWork();
      normalSemaphore.Release();
    }

    internal static void DoWork()
    {
      return;
    }
  }
}

This scenario was tested with .NET 9, arch x64 for both Windows and Linux. Machines had 16 vcpus and the test was ran using 8 threads during 30 seconds.

The transactions per second rate on Windows is ~320K whereas on Linux the rate is ~130K. Other runtimes can achieve around ~500K+ on both platforms.

Native AOT achieved similar results to the non Native AOT versions.

dotnet-policy-service · 2025-03-12T06:21:36Z

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

hez2010 · 2025-03-12T06:30:53Z

You can try SemaphoreSlim instead of Semaphore. Semaphore is a type that wraps the system semaphore directly and has performance implications from the OS.

mangod9 · 2025-03-12T06:44:07Z

Other runtimes are able to get better performance so certainly something we can investigate.

En3Tho · 2025-03-12T06:55:07Z

Is using ReaderWriter lock necessary here? Is issue scoped to Semaphore only or a combination of Semaphore and ReaderWriter lock?

When you say other runtimes, which ones? Can they optimize out an empty acquire-release seeing that DoWork actually does nothing?

mangod9 · 2025-03-12T07:09:08Z

Is using ReaderWriter lock necessary here? Is issue scoped to Semaphore only or a combination of Semaphore and ReaderWriter lock?

Dont believe RWLock makes a difference.

When you say other runtimes, which ones? Can they optimize out an empty acquire-release seeing that DoWork actually does nothing?

Fair point, it's a possibility, but unlikely that a compiler would be able to make the determination and completely remove the acquire/release? But worth looking into.

stephentoub · 2025-03-12T11:22:35Z

Other runtimes are able to get better performance so certainly something we can investigate.

Using a semaphore that wraps the OS primitive? Or using their built-in semaphore type? Or using a hand-rolled semaphore?

neon-sunset · 2025-03-12T12:19:19Z

Dont believe RWLock makes a difference.

ReaderWriterLockSlim is plenty fast, only slightly slower than a plain lock (which in itself is fast too, contended or not). Without looking at a profile I can bet the most time is spent on waiting for OS-level semaphore. SemaphoreSlim should be used instead. Although the naming is a bit unfortunate but you can always wrap it in a struct which will also provide better enter/exit semantics through IDisposable if needed.

eduardo-vp added the tenet-performance Performance related issue label Mar 12, 2025

dotnet-issue-labeler bot added the area-System.Threading label Mar 12, 2025

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 12, 2025

mangod9 removed the untriaged New issue has not been triaged by the area owner label Mar 12, 2025

mangod9 added this to the 10.0.0 milestone Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semaphore performance issue #113411

Semaphore performance issue #113411

eduardo-vp commented Mar 12, 2025

dotnet-policy-service bot commented Mar 12, 2025

hez2010 commented Mar 12, 2025 •

edited

Loading

mangod9 commented Mar 12, 2025

En3Tho commented Mar 12, 2025 •

edited

Loading

mangod9 commented Mar 12, 2025

stephentoub commented Mar 12, 2025 •

edited

Loading

neon-sunset commented Mar 12, 2025 •

edited

Loading

Semaphore performance issue #113411

Semaphore performance issue #113411

Comments

eduardo-vp commented Mar 12, 2025

Description

dotnet-policy-service bot commented Mar 12, 2025

hez2010 commented Mar 12, 2025 • edited Loading

mangod9 commented Mar 12, 2025

En3Tho commented Mar 12, 2025 • edited Loading

mangod9 commented Mar 12, 2025

stephentoub commented Mar 12, 2025 • edited Loading

neon-sunset commented Mar 12, 2025 • edited Loading

hez2010 commented Mar 12, 2025 •

edited

Loading

En3Tho commented Mar 12, 2025 •

edited

Loading

stephentoub commented Mar 12, 2025 •

edited

Loading

neon-sunset commented Mar 12, 2025 •

edited

Loading