Local Hugging Face Model Inteference Server

Important

This learning sample is for educational purposes only and should not be used in any production use case. It is intended to make Semantic Kernel features more accessible for scenarios that do not require an OpenAI or Azure OpenAI endpoint.

This application provides an API service for interacting with models available through Hugging Face. The request bodies and responses are modeled after OpenAI and Azure OpenAI for smooth transition to more capable LLMs.

Building the Sample Container

docker image build -t hf_model_server .

This step will take some minutes to download Docker image dependencies.

Running the Sample Container

docker run -p 5000:5000 -d hf_model_server

This will run the service at http://localhost:5000. Navigating to http://localhost:5000 in a browser window will provide instruction on how to construct requests to the service.

Important

If the model has not been cached (ex: first time calling it) the response can take some time due to the model being downloaded. Using this service to generate images can also take a very long time - a factor that scales with your hardware.

Alternative: Bare-Metal

Alternatively, the service can be started on bare-metal. To do this, you will need to have Python 3.9 installed.

Before proceeding, it is highly recommended that you create a Python 3.9 virtual environment.

Example: python -m venv myvenv or python3 -m venv myvenv.

Make sure your environment is activated:

For Windows, run in PowerShell: ./myvenv/Scripts/Activate. For Linux/macOS, run: source myvenv/bin/activate.

Then, run pip install -r requirements.txt.

Once all the required dependencies have been installed, you can run the service using python inference_app.py. Navigating to http://localhost:5000 in a browser window will provide instruction on how to construct requests to the service.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
static/css		static/css
templates		templates
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
inference_app.py		inference_app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Hugging Face Model Inteference Server

Building the Sample Container

Running the Sample Container

Alternative: Bare-Metal

About

Releases

Packages

Languages

License

mlnethub/hugging-face-http-server

Folders and files

Latest commit

History

Repository files navigation

Local Hugging Face Model Inteference Server

Building the Sample Container

Running the Sample Container

Alternative: Bare-Metal

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages