Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semaphore performance issue #113411

Open
eduardo-vp opened this issue Mar 12, 2025 · 7 comments
Open

Semaphore performance issue #113411

eduardo-vp opened this issue Mar 12, 2025 · 7 comments
Labels
Milestone

Comments

@eduardo-vp
Copy link
Member

Description

There's a performance issue regarding Semaphore in scenarios where threads request and release the semaphore with no work in between as shown in the snippet below. The rate of transactions is up to 3-4 times lower compared to other runtimes.

using System;
using System.Diagnostics;
using System.Threading;

namespace MultiThreadedScaling
{
  internal class ScalingIssue
  {
    static ReaderWriterLockSlim slimRWlock = null;
    static Semaphore normalSemaphore = null;

    static int numThreads = 8;
    static int run_time = 30; // seconds

    static void Main(string[] args)
    {
      Console.WriteLine("Running test with " + numThreads + " threads for " + run_time + " seconds");

      slimRWlock = new ReaderWriterLockSlim();
      normalSemaphore = new Semaphore(numThreads, Int32.MaxValue);
      for (int i = 0; i < numThreads; i++)
      {
        Thread t1 = new Thread(new ParameterizedThreadStart(OpenCloseSimulation));
        t1.Name = i.ToString();
        t1.Start(i);
      }
    }

    static void OpenCloseSimulation(object state)
    {
      int index = (int)state;
      int numTxns = 0;
      var start_time = DateTime.UtcNow;
      while (DateTime.UtcNow - start_time < TimeSpan.FromSeconds(run_time))
      {
        OpenConnSimulation(index);
        CloseConnSimulation(index);
        numTxns += 1;
      }
      var end_time = DateTime.UtcNow;
      long totRunTime = (long)(end_time - start_time).TotalSeconds;
      Console.WriteLine("Thread " + Thread.CurrentThread.Name + " Transaction rate: commits= " + (numTxns / totRunTime));
    }

    static void OpenConnSimulation(int index)
    {
      normalSemaphore.WaitOne();
      DoWork();
      slimRWlock.EnterReadLock();
      DoWork();
      slimRWlock.ExitReadLock();
    }

    static void CloseConnSimulation(int index)
    {
      DoWork();
      normalSemaphore.Release();
    }

    internal static void DoWork()
    {
      return;
    }
  }
}

This scenario was tested with .NET 9, arch x64 for both Windows and Linux. Machines had 16 vcpus and the test was ran using 8 threads during 30 seconds.

The transactions per second rate on Windows is ~320K whereas on Linux the rate is ~130K. Other runtimes can achieve around ~500K+ on both platforms.

Native AOT achieved similar results to the non Native AOT versions.

@eduardo-vp eduardo-vp added the tenet-performance Performance related issue label Mar 12, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 12, 2025
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@hez2010
Copy link
Contributor

hez2010 commented Mar 12, 2025

You can try SemaphoreSlim instead of Semaphore. Semaphore is a type that wraps the system semaphore directly and has performance implications from the OS.

@mangod9
Copy link
Member

mangod9 commented Mar 12, 2025

Other runtimes are able to get better performance so certainly something we can investigate.

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Mar 12, 2025
@mangod9 mangod9 added this to the 10.0.0 milestone Mar 12, 2025
@En3Tho
Copy link
Contributor

En3Tho commented Mar 12, 2025

Is using ReaderWriter lock necessary here? Is issue scoped to Semaphore only or a combination of Semaphore and ReaderWriter lock?

When you say other runtimes, which ones? Can they optimize out an empty acquire-release seeing that DoWork actually does nothing?

@mangod9
Copy link
Member

mangod9 commented Mar 12, 2025

Is using ReaderWriter lock necessary here? Is issue scoped to Semaphore only or a combination of Semaphore and ReaderWriter lock?

Dont believe RWLock makes a difference.

When you say other runtimes, which ones? Can they optimize out an empty acquire-release seeing that DoWork actually does nothing?

Fair point, it's a possibility, but unlikely that a compiler would be able to make the determination and completely remove the acquire/release? But worth looking into.

@stephentoub
Copy link
Member

stephentoub commented Mar 12, 2025

Other runtimes are able to get better performance so certainly something we can investigate.

Using a semaphore that wraps the OS primitive? Or using their built-in semaphore type? Or using a hand-rolled semaphore?

@neon-sunset
Copy link
Contributor

neon-sunset commented Mar 12, 2025

Dont believe RWLock makes a difference.

ReaderWriterLockSlim is plenty fast, only slightly slower than a plain lock (which in itself is fast too, contended or not). Without looking at a profile I can bet the most time is spent on waiting for OS-level semaphore. SemaphoreSlim should be used instead. Although the naming is a bit unfortunate but you can always wrap it in a struct which will also provide better enter/exit semantics through IDisposable if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants