Skip to content

Commit

Permalink
New markup: Change format a bit (wrap in HTML comment) (huggingface#820)
Browse files Browse the repository at this point in the history
* Change format a bit (wrap in comment)

* Update README

* update today's blog to new format

cc @alaradirik @sayakpaul
julien-c authored Feb 3, 2023
1 parent 914c35f commit 62a253d
Showing 178 changed files with 355 additions and 365 deletions.
4 changes: 2 additions & 2 deletions 1b-sentence-embeddings.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

# Train a Sentence Embedding Model with 1 Billion Training Pairs

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

**Sentence embedding** is a method that maps sentences to vectors of real numbers. Ideally, these vectors would capture the semantic of a sentence and be highly generic. Such representations could then be used for many downstream applications such as clustering, text mining, or question answering.

17 changes: 13 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@ This is the official repository of the [Hugging Face Blog](https://hf.co/blog).
2️⃣ Create a md (markdown) file, **use a short file name**.
For instance, if your title is "Introduction to Deep Reinforcement Learning", the md file name could be `intro-rl.md`. This is important because the **file name will be the blogpost's URL**.

3️⃣ Create a new folder in `assets`. Use the same name as the name of the md file. Optionally you may add a numerical prefix to that folder, using the number that hasn't been used yet. But this is no longer required. i.e. the asset folder in this example will be `123_intro-rl` or `intro-rl`. This folder will contain **your thumbnail only**. The folder number is mostly for (rough) ordering purposes, so it's no big deal if two concurrent articles use the same number.
3️⃣ Create a new folder in `assets`. Use the same name as the name of the md file. Optionally you may add a numerical prefix to that folder, using the number that hasn't been used yet. But this is no longer required. i.e. the asset folder in this example could be `123_intro-rl` or `intro-rl`. This folder will contain **your thumbnail only**. The folder number is mostly for (rough) ordering purposes, so it's no big deal if two concurrent articles use the same number.

For the rest of your files, create a mirrored folder in the HuggingFace Documentation Images [repo](https://huggingface.co/datasets/huggingface/documentation-images/tree/main/blog). This is to reduce bloat in the GitHub base repo when cloning and pulling.

@@ -29,10 +29,19 @@ authors:
# Train your first Decision Transformer
{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->
Your content here [...]
```

The blog_metadata and authors HTML comments are meant to mark where in the file will be inserted the following UI elements:
- "Published on [date]"
- "Update on GitHub" button
- avatars of the authors that were listed in authors.

⚠️ Please keep the blog_metadata and authors comments exactly equal to those strings otherwise they won't be replaced.

5️⃣ Then, you can add your content. It's markdown system so if you wrote your text on notion just control shift v to copy/paste as markdown.

6️⃣ Modify `_blog.yml` to add your blogpost.
@@ -41,7 +50,7 @@ authors:

8️⃣ The article will be **published automatically when you merge your pull request**.

## How to get a responsive thumbnail?
## How to get a nice responsive thumbnail?
1️⃣ Create a `1300x650` image

2️⃣ Use [this template](https://github.com/huggingface/blog/blob/main/assets/thumbnail-template.svg) and fill the content part.
4 changes: 2 additions & 2 deletions accelerate-deepspeed.md
Original file line number Diff line number Diff line change
@@ -8,8 +8,8 @@ authors:

<h1>Accelerate Large Model Training using DeepSpeed</h1>

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

In this post we will look at how we can leverage the **[Accelerate](https://github.com/huggingface/accelerate)** library for training large models which enables users to leverage the ZeRO features of **[DeeSpeed](https://www.deepspeed.ai)**.

4 changes: 2 additions & 2 deletions accelerate-large-models.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

# How 🤗 Accelerate runs very large models thanks to PyTorch

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

## Load and run large models

4 changes: 2 additions & 2 deletions accelerate-library.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

# Introducing 🤗 Accelerate

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

## 🤗 Accelerate

2 changes: 1 addition & 1 deletion accelerated-inference.md
Original file line number Diff line number Diff line change
@@ -5,7 +5,7 @@ thumbnail: /blog/assets/09_accelerated_inference/thumbnail.png

<h1>How we sped up transformer inference 100x for 🤗 API customers</h1>

{blog_metadata}
<!-- {blog_metadata} -->

🤗 Transformers has become the default library for data scientists all around the world to explore state of the art NLP models and build new NLP features. With over 5,000 pre-trained and fine-tuned models available, in over 250 languages, it is a rich playground, easily accessible whichever framework you are working in.

4 changes: 2 additions & 2 deletions accelerating-pytorch.md
Original file line number Diff line number Diff line change
@@ -8,8 +8,8 @@ authors:
# Accelerating PyTorch distributed fine-tuning with Intel technologies


{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

For all their amazing performance, state of the art deep learning models often take a long time to train. In order to speed up training jobs, engineering teams rely on distributed training, a divide-and-conquer technique where clustered servers each keep a copy of the model, train it on a subset of the training set, and exchange results to converge to a final model.

4 changes: 2 additions & 2 deletions ai-residency.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

# Announcing the 🤗 AI Research Residency Program 🎉 🎉 🎉

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->


The 🤗 Research Residency Program is a 9-month opportunity to launch or advance your career in machine learning research 🚀. The goal of the residency is to help you grow into an impactful AI researcher. Residents will work alongside Researchers from our Science Team. Together, you will pick a research problem and then develop new machine learning techniques to solve it in an open & collaborative way, with the hope of ultimately publishing your work and making it visible to a wide audience.
4 changes: 2 additions & 2 deletions ambassadors.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

# Student Ambassador Program’s call for applications is open!

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

As an open-source company democratizing machine learning, Hugging Face believes it is essential to **[teach](https://huggingface.co/blog/education)** open-source ML to people from all backgrounds worldwide. **We aim to teach machine learning to 5 million people by 2023**.

4 changes: 2 additions & 2 deletions annotated-diffusion.md
Original file line number Diff line number Diff line change
@@ -8,8 +8,8 @@ authors:

# The Annotated Diffusion Model

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

<script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>

4 changes: 2 additions & 2 deletions arxiv.md
Original file line number Diff line number Diff line change
@@ -9,8 +9,8 @@ authors:

# Hugging Face Machine Learning Demos on arXiv

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

We’re very excited to announce that Hugging Face has collaborated with arXiv to make papers more accessible, discoverable, and fun! Starting today, [Hugging Face Spaces](https://huggingface.co/spaces) is integrated with arXivLabs through a Demo tab that includes links to demos created by the community or the authors themselves. By going to the Demos tab of your favorite paper, you can find links to open-source demos and try them out immediately 🔥

4 changes: 2 additions & 2 deletions asr-chunking.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

# Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

```
Tl;dr: This post explains how to use the specificities of the Connectionist
4 changes: 2 additions & 2 deletions audio-datasets.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

# A Complete Guide to Audio Datasets

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

<!--- Note to reviewer: comments and TODOs are included in this format. --->

4 changes: 2 additions & 2 deletions autonlp-prodigy.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

<h1>Active Learning with AutoNLP and Prodigy</h1>

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

Active learning in the context of Machine Learning is a process in which you iteratively add labeled data, retrain a model and serve it to the end user. It is an endless process and requires human interaction for labeling/creating the data. In this article, we will discuss how to use [AutoNLP](https://huggingface.co/autonlp) and [Prodigy](https://prodi.gy/) to build an active learning pipeline.

4 changes: 2 additions & 2 deletions autotrain-image-classification.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

# Image Classification with AutoTrain

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

<script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>

4 changes: 2 additions & 2 deletions bert-101.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:
<html itemscope itemtype="https://schema.org/FAQPage">
<h1>BERT 101 🤗 State Of The Art NLP Model Explained</h1>

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

<script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>

4 changes: 2 additions & 2 deletions bert-cpu-scaling-part-1.md
Original file line number Diff line number Diff line change
@@ -19,8 +19,8 @@ authors:
}
</style>

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

# Scaling up BERT-like model Inference on modern CPU - Part 1

4 changes: 2 additions & 2 deletions bert-cpu-scaling-part-2.md
Original file line number Diff line number Diff line change
@@ -9,8 +9,8 @@ authors:

# Scaling up BERT-like model Inference on modern CPU - Part 2

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

<script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>

4 changes: 2 additions & 2 deletions bert-inferentia-sagemaker.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

<h1>Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia</h1>

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

<script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>

4 changes: 2 additions & 2 deletions big-bird.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

# Understanding BigBird's Block Sparse Attention

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

## Introduction

4 changes: 2 additions & 2 deletions bloom-inference-optimization.md
Original file line number Diff line number Diff line change
@@ -6,8 +6,8 @@ authors:
---

<h1>Optimization story: Bloom inference</h1>
{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

This article gives you the behind-the-scenes of how we made an efficient inference server that powers bloom.
inference server that powers [https://huggingface.co/bigscience/bloom]().
4 changes: 2 additions & 2 deletions bloom-inference-pytorch-scripts.md
Original file line number Diff line number Diff line change
@@ -8,8 +8,8 @@ authors:

<h1>Incredibly Fast BLOOM Inference with DeepSpeed and Accelerate</h1>

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

This article shows how to get an incredibly fast per token throughput when generating with the 176B parameter [BLOOM model](https://huggingface.co/bigscience/bloom).

4 changes: 2 additions & 2 deletions bloom-megatron-deepspeed.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

<h1>The Technology Behind BLOOM Training</h1>

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->



4 changes: 2 additions & 2 deletions bloom.md
Original file line number Diff line number Diff line change
@@ -18,8 +18,8 @@ authors:
</style>
<h1>🌸 Introducing The World's Largest Open Multilingual Language Model: BLOOM 🌸</h1>

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->
</head>
<body>
<a href="https://huggingface.co/bigscience/bloom"><img style="middle" width="950" src="/blog/assets/86_bloom/thumbnail-2.png"></a>
4 changes: 2 additions & 2 deletions carbon-emissions-on-the-hub.md
Original file line number Diff line number Diff line change
@@ -9,8 +9,8 @@ authors:

<h1> CO2 Emissions and the 🤗 Hub: Leading the Charge </h1>

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

## What are CO2 Emissions and why are they important?

4 changes: 2 additions & 2 deletions clipseg-zero-shot.md
Original file line number Diff line number Diff line change
@@ -9,8 +9,8 @@ authors:

# Zero-shot image segmentation with CLIPSeg

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

<script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>

4 changes: 2 additions & 2 deletions codeparrot.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

<h1>Training CodeParrot 🦜 from Scratch</h1>

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->


In this blog post we'll take a look at what it takes to build the technology behind [GitHub CoPilot](https://copilot.github.com/), an application that provides suggestions to programmers as they code. In this step by step guide, we'll learn how to train a large GPT-2 model called CodeParrot 🦜, entirely from scratch. CodeParrot can auto-complete your Python code - give it a spin [here](https://huggingface.co/spaces/lvwerra/codeparrot-generation). Let's get to building it from scratch!
4 changes: 2 additions & 2 deletions collaborative-training.md
Original file line number Diff line number Diff line change
@@ -9,8 +9,8 @@ authors:

# Deep Learning over the Internet: Training Language Models Collaboratively

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

<small>
With the additional help of Quentin Lhoest and Sylvain Lesage.
4 changes: 2 additions & 2 deletions constrained-beam-search.md
Original file line number Diff line number Diff line change
@@ -8,8 +8,8 @@ authors:

# Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

<a target="_blank" href="https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/53_constrained_beam_search.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
4 changes: 2 additions & 2 deletions convert-transformers-to-onnx.md
Original file line number Diff line number Diff line change
@@ -6,8 +6,8 @@ authors:
---
# Convert Transformers to ONNX with Hugging Face Optimum

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

Hundreds of Transformers experiments and models are uploaded to the [Hugging Face Hub](https://huggingface.co/) every single day. Machine learning engineers and students conducting those experiments use a variety of frameworks like PyTorch, TensorFlow/Keras, or others. These models are already used by thousands of companies and form the foundation of AI-powered products.

2 changes: 1 addition & 1 deletion course-launch-event.md
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@ authors:

# Course Launch Community Event

{authors}
<!-- {authors} -->

We are excited to share that after a lot of work from the Hugging Face team, part 2 of the [Hugging Face Course](https://hf.co/course) will be released on November 15th! Part 1 focused on teaching you how to use a pretrained model, fine-tune it on a text classification task then upload the result to the [Model Hub](https://hf.co/models). Part 2 will focus on all the other common NLP tasks: token classification, language modeling (causal and masked), translation, summarization and question answering. It will also take a deeper dive in the whole Hugging Face ecosystem, in particular [🤗 Datasets](https://github.com/huggingface/datasets) and [🤗 Tokenizers](https://github.com/huggingface/tokenizers).

4 changes: 2 additions & 2 deletions cv_state.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

# The State of Computer Vision at Hugging Face 🤗

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

At Hugging Face, we pride ourselves on democratizing the field of artificial intelligence together with the community. As a part of that mission, we began focusing our efforts on computer vision over the last year. What started as a [PR for having Vision Transformers (ViT) in 🤗 Transformers](https://github.com/huggingface/transformers/pull/10950) has now grown into something much bigger – 8 core vision tasks, over 3000 models, and over 100 datasets on the Hugging Face Hub.

4 changes: 2 additions & 2 deletions data-measurements-tool.md
Original file line number Diff line number Diff line change
@@ -9,8 +9,8 @@ authors:

# Introducing the 🤗 Data Measurements Tool: an Interactive Tool for Looking at Datasets

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->



4 changes: 2 additions & 2 deletions datasets-docs-update.md
Original file line number Diff line number Diff line change
@@ -7,8 +7,8 @@ authors:

# Introducing new audio and vision documentation in 🤗 Datasets

{blog_metadata}
{authors}
<!-- {blog_metadata} -->
<!-- {authors} -->

Open and reproducible datasets are essential for advancing good machine learning. At the same time, datasets have grown tremendously in size as rocket fuel for large language models. In 2020, Hugging Face launched 🤗 Datasets, a library dedicated to:

Loading

0 comments on commit 62a253d

Please sign in to comment.