New markup: Change format a bit (wrap in HTML comment) (huggingface#820)

* Change format a bit (wrap in comment) * Update README * update today's blog to new format cc @alaradirik @sayakpaul
danielkorat · Feb 3, 2023 · 62a253d · 62a253d
1 parent 914c35f
commit 62a253d
Showing 178 changed files with 355 additions and 365 deletions.
diff --git a/1b-sentence-embeddings.md b/1b-sentence-embeddings.md
@@ -7,8 +7,8 @@ authors:
 
 # Train a Sentence Embedding Model with 1 Billion Training Pairs
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 **Sentence embedding** is a method that maps sentences to vectors of real numbers. Ideally, these vectors would capture the semantic of a sentence and be highly generic. Such representations could then be used for many downstream applications such as clustering, text mining, or question answering.
 

diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@ This is the official repository of the [Hugging Face Blog](https://hf.co/blog).
 2️⃣ Create a md (markdown) file, **use a short file name**.
 For instance, if your title is "Introduction to Deep Reinforcement Learning", the md file name could be `intro-rl.md`. This is important because the **file name will be the blogpost's URL**.
 
-3️⃣ Create a new folder in `assets`. Use the same name as the name of the md file. Optionally you may add a numerical prefix to that folder, using the number that hasn't been used yet. But this is no longer required. i.e. the asset folder in this example will be `123_intro-rl` or `intro-rl`. This folder will contain **your thumbnail only**. The folder number is mostly for (rough) ordering purposes, so it's no big deal if two concurrent articles use the same number.
+3️⃣ Create a new folder in `assets`. Use the same name as the name of the md file. Optionally you may add a numerical prefix to that folder, using the number that hasn't been used yet. But this is no longer required. i.e. the asset folder in this example could be `123_intro-rl` or `intro-rl`. This folder will contain **your thumbnail only**. The folder number is mostly for (rough) ordering purposes, so it's no big deal if two concurrent articles use the same number.
 
 For the rest of your files, create a mirrored folder in the HuggingFace Documentation Images [repo](https://huggingface.co/datasets/huggingface/documentation-images/tree/main/blog). This is to reduce bloat in the GitHub base repo when cloning and pulling.
 
@@ -29,10 +29,19 @@ authors:
 
 # Train your first Decision Transformer
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
+
+Your content here [...]
 ```
 
+The blog_metadata and authors HTML comments are meant to mark where in the file will be inserted the following UI elements:
+- "Published on [date]"
+- "Update on GitHub" button
+- avatars of the authors that were listed in authors.
+
+⚠️ Please keep the blog_metadata and authors comments exactly equal to those strings otherwise they won't be replaced.
+
 5️⃣ Then, you can add your content. It's markdown system so if you wrote your text on notion just control shift v to copy/paste as markdown.
 
 6️⃣ Modify `_blog.yml` to add your blogpost.
@@ -41,7 +50,7 @@ authors:
 
 8️⃣ The article will be **published automatically when you merge your pull request**.
 
-## How to get a responsive thumbnail?
+## How to get a nice responsive thumbnail?
 1️⃣ Create a `1300x650` image 
 
 2️⃣ Use [this template](https://github.com/huggingface/blog/blob/main/assets/thumbnail-template.svg) and fill the content part.

diff --git a/accelerate-deepspeed.md b/accelerate-deepspeed.md
@@ -8,8 +8,8 @@ authors:
 
 <h1>Accelerate Large Model Training using DeepSpeed</h1>
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 In this post we will look at how we can leverage the **[Accelerate](https://github.com/huggingface/accelerate)** library for training large models which enables users to leverage the ZeRO features of **[DeeSpeed](https://www.deepspeed.ai)**.
 

diff --git a/accelerate-large-models.md b/accelerate-large-models.md
@@ -7,8 +7,8 @@ authors:
 
 # How 🤗 Accelerate runs very large models thanks to PyTorch
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 ## Load and run large models
 

diff --git a/accelerate-library.md b/accelerate-library.md
@@ -7,8 +7,8 @@ authors:
 
 # Introducing 🤗 Accelerate
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 ## 🤗 Accelerate
 

diff --git a/accelerated-inference.md b/accelerated-inference.md
@@ -5,7 +5,7 @@ thumbnail: /blog/assets/09_accelerated_inference/thumbnail.png
 
 <h1>How we sped up transformer inference 100x for 🤗 API customers</h1>
 
-{blog_metadata}
+<!-- {blog_metadata} -->
 
 🤗 Transformers has become the default library for data scientists all around the world to explore state of the art NLP models and build new NLP features. With over 5,000 pre-trained and fine-tuned models available, in over 250 languages, it is a rich playground, easily accessible whichever framework you are working in.
 

diff --git a/accelerating-pytorch.md b/accelerating-pytorch.md
@@ -8,8 +8,8 @@ authors:
 # Accelerating PyTorch distributed fine-tuning with Intel technologies
 
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 For all their amazing performance, state of the art deep learning models often take a long time to train. In order to speed up training jobs, engineering teams rely on distributed training, a divide-and-conquer technique where clustered servers each keep a copy of the model, train it on a subset of the training set, and exchange results to converge to a final model.
 

diff --git a/ai-residency.md b/ai-residency.md
@@ -7,8 +7,8 @@ authors:
 
 # Announcing the 🤗 AI Research Residency Program 🎉 🎉 🎉
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 
 The 🤗 Research Residency Program is a 9-month opportunity to launch or advance your career in machine learning research 🚀. The goal of the residency is to help you grow into an impactful AI researcher. Residents will work alongside Researchers from our Science Team. Together, you will pick a research problem and then develop new machine learning techniques to solve it in an open & collaborative way, with the hope of ultimately publishing your work and making it visible to a wide audience.

diff --git a/ambassadors.md b/ambassadors.md
@@ -7,8 +7,8 @@ authors:
 
 # Student Ambassador Program’s call for applications is open!
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 As an open-source company democratizing machine learning, Hugging Face believes it is essential to **[teach](https://huggingface.co/blog/education)** open-source ML to people from all backgrounds worldwide. **We aim to teach machine learning to 5 million people by 2023**.
 

diff --git a/annotated-diffusion.md b/annotated-diffusion.md
@@ -8,8 +8,8 @@ authors:
 
 # The Annotated Diffusion Model
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 <script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>
 

diff --git a/arxiv.md b/arxiv.md
@@ -9,8 +9,8 @@ authors:
 
 # Hugging Face Machine Learning Demos on arXiv
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 We’re very excited to announce that Hugging Face has collaborated with arXiv to make papers more accessible, discoverable, and fun! Starting today, [Hugging Face Spaces](https://huggingface.co/spaces) is integrated with arXivLabs through a Demo tab that includes links to demos created by the community or the authors themselves. By going to the Demos tab of your favorite paper, you can find links to open-source demos and try them out immediately 🔥
 

diff --git a/asr-chunking.md b/asr-chunking.md
@@ -7,8 +7,8 @@ authors:
 
 # Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 ```
 Tl;dr: This post explains how to use the specificities of the Connectionist

diff --git a/audio-datasets.md b/audio-datasets.md
@@ -7,8 +7,8 @@ authors:
 
 # A Complete Guide to Audio Datasets
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 <!--- Note to reviewer: comments and TODOs are included in this format. --->
 

diff --git a/autonlp-prodigy.md b/autonlp-prodigy.md
@@ -7,8 +7,8 @@ authors:
 
 <h1>Active Learning with AutoNLP and Prodigy</h1>
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 Active learning in the context of Machine Learning is a process in which you iteratively add labeled data, retrain a model and serve it to the end user. It is an endless process and requires human interaction for labeling/creating the data. In this article, we will discuss how to use [AutoNLP](https://huggingface.co/autonlp) and [Prodigy](https://prodi.gy/) to build an active learning pipeline.
 

diff --git a/autotrain-image-classification.md b/autotrain-image-classification.md
@@ -7,8 +7,8 @@ authors:
 
 # Image Classification with AutoTrain
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 <script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>
 

diff --git a/bert-101.md b/bert-101.md
@@ -7,8 +7,8 @@ authors:
 <html itemscope itemtype="https://schema.org/FAQPage">
 <h1>BERT 101 🤗 State Of The Art NLP Model Explained</h1>
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 <script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>
 

diff --git a/bert-cpu-scaling-part-1.md b/bert-cpu-scaling-part-1.md
@@ -19,8 +19,8 @@ authors:
   }
 </style>
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 # Scaling up BERT-like model Inference on modern CPU - Part 1
 

diff --git a/bert-cpu-scaling-part-2.md b/bert-cpu-scaling-part-2.md
@@ -9,8 +9,8 @@ authors:
 
 # Scaling up BERT-like model Inference on modern CPU  - Part 2
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 <script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>
 

diff --git a/bert-inferentia-sagemaker.md b/bert-inferentia-sagemaker.md
@@ -7,8 +7,8 @@ authors:
 
 <h1>Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia</h1>
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 <script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>
 

diff --git a/big-bird.md b/big-bird.md
@@ -7,8 +7,8 @@ authors:
 
 # Understanding BigBird's Block Sparse Attention
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 ## Introduction
 

diff --git a/bloom-inference-optimization.md b/bloom-inference-optimization.md
@@ -6,8 +6,8 @@ authors:
 ---
 
 <h1>Optimization story: Bloom inference</h1>
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 This article gives you the behind-the-scenes of how we made an efficient inference server that powers bloom.
 inference server that powers [https://huggingface.co/bigscience/bloom]().

diff --git a/bloom-inference-pytorch-scripts.md b/bloom-inference-pytorch-scripts.md
@@ -8,8 +8,8 @@ authors:
 
 <h1>Incredibly Fast BLOOM Inference with DeepSpeed and Accelerate</h1>
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 This article shows how to get an incredibly fast per token throughput when generating with the 176B parameter [BLOOM model](https://huggingface.co/bigscience/bloom).
 

diff --git a/bloom-megatron-deepspeed.md b/bloom-megatron-deepspeed.md
@@ -7,8 +7,8 @@ authors:
 
 <h1>The Technology Behind BLOOM Training</h1>
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 
 

diff --git a/bloom.md b/bloom.md
@@ -18,8 +18,8 @@ authors:
 </style>
 <h1>🌸 Introducing The World's Largest Open Multilingual Language Model: BLOOM 🌸</h1>
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 </head>
 <body>
 <a href="https://huggingface.co/bigscience/bloom"><img style="middle" width="950" src="/blog/assets/86_bloom/thumbnail-2.png"></a>  

diff --git a/carbon-emissions-on-the-hub.md b/carbon-emissions-on-the-hub.md
@@ -9,8 +9,8 @@ authors:
 
 <h1> CO2 Emissions and the 🤗 Hub: Leading the Charge </h1> 
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 ## What are CO2 Emissions and why are they important?
 

diff --git a/clipseg-zero-shot.md b/clipseg-zero-shot.md
@@ -9,8 +9,8 @@ authors:
 
 # Zero-shot image segmentation with CLIPSeg
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 <script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>
 

diff --git a/codeparrot.md b/codeparrot.md
@@ -7,8 +7,8 @@ authors:
 
 <h1>Training CodeParrot 🦜 from Scratch</h1>
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 
 In this blog post we'll take a look at what it takes to build the technology behind [GitHub CoPilot](https://copilot.github.com/), an application that provides suggestions to programmers as they code. In this step by step guide, we'll learn how to train a large GPT-2 model called CodeParrot 🦜, entirely from scratch. CodeParrot can auto-complete your Python code - give it a spin [here](https://huggingface.co/spaces/lvwerra/codeparrot-generation). Let's get to building it from scratch!

diff --git a/collaborative-training.md b/collaborative-training.md
@@ -9,8 +9,8 @@ authors:
 
 # Deep Learning over the Internet: Training Language Models Collaboratively
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 <small>
 With the additional help of Quentin Lhoest and Sylvain Lesage.

diff --git a/constrained-beam-search.md b/constrained-beam-search.md
@@ -8,8 +8,8 @@ authors:
 
 # Guiding Text Generation with Constrained Beam Search in 🤗 Transformers
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 <a target="_blank" href="https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/53_constrained_beam_search.ipynb">
     <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>

diff --git a/convert-transformers-to-onnx.md b/convert-transformers-to-onnx.md
@@ -6,8 +6,8 @@ authors:
 ---
 # Convert Transformers to ONNX with Hugging Face Optimum
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 Hundreds of Transformers experiments and models are uploaded to the [Hugging Face Hub](https://huggingface.co/) every single day. Machine learning engineers and students conducting those experiments use a variety of frameworks like PyTorch, TensorFlow/Keras, or others. These models are already used by thousands of companies and form the foundation of AI-powered products.
 

diff --git a/course-launch-event.md b/course-launch-event.md
@@ -7,7 +7,7 @@ authors:
 
 # Course Launch Community Event
 
-{authors}
+<!-- {authors} -->
 
 We are excited to share that after a lot of work from the Hugging Face team, part 2 of the [Hugging Face Course](https://hf.co/course) will be released on November 15th! Part 1 focused on teaching you how to use a pretrained model, fine-tune it on a text classification task then upload the result to the [Model Hub](https://hf.co/models). Part 2 will focus on all the other common NLP tasks: token classification, language modeling (causal and masked), translation, summarization and question answering. It will also take a deeper dive in the whole Hugging Face ecosystem, in particular [🤗 Datasets](https://github.com/huggingface/datasets) and [🤗 Tokenizers](https://github.com/huggingface/tokenizers).
 

diff --git a/cv_state.md b/cv_state.md
@@ -7,8 +7,8 @@ authors:
 
 # The State of Computer Vision at Hugging Face 🤗
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 At Hugging Face, we pride ourselves on democratizing the field of artificial intelligence together with the community. As a part of that mission, we began focusing our efforts on computer vision over the last year. What started as a [PR for having Vision Transformers (ViT) in 🤗 Transformers](https://github.com/huggingface/transformers/pull/10950) has now grown into something much bigger – 8 core vision tasks, over 3000 models, and over 100 datasets on the Hugging Face Hub.
 

diff --git a/data-measurements-tool.md b/data-measurements-tool.md
@@ -9,8 +9,8 @@ authors:
 
 # Introducing the 🤗 Data Measurements Tool: an Interactive Tool for Looking at Datasets
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 
 

diff --git a/datasets-docs-update.md b/datasets-docs-update.md
@@ -7,8 +7,8 @@ authors:
 
 # Introducing new audio and vision documentation in 🤗 Datasets
 
-{blog_metadata}
-{authors}
+<!-- {blog_metadata} -->
+<!-- {authors} -->
 
 Open and reproducible datasets are essential for advancing good machine learning. At the same time, datasets have grown tremendously in size as rocket fuel for large language models. In 2020, Hugging Face launched 🤗 Datasets, a library dedicated to: