Code Intelligence

💻 Welcome to the world of Code Intelligence!

Code Intelligence is an exciting field focused on automating code completion and generation. The ultimate objective is to develop intelligent models capable of generating code based on specific requirements. This repository serves as a comprehensive collection of the latest research and advancements in this domain.

🎆 Foundation Models for Code

OctoPack: Instruction Tuning Code Large Language Models (paper, github, open-source ⭕)
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback (paper, close-source ❌)
CodeGen2.5: Small, but mighty (github, blog, open-source ⭕)
Phi-1: Textbooks Are All You Need (paper, 2023, close-source ❌)
👑WizardCoder: Empowering Code Large Language Models with Evol-Instruct (github, paper, 2023, open-source ⭕)
CodeT5+: Open Code Large Language Models for Code Understanding and Generation (github, paper, 2023, open-source ⭕)
StarCoder: May the source be with you! (github, paper, 2023, open-source ⭕)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages (github, paper, 2023, open-source ⭕)
Replit-code-v1-3b (github, twitter, 2023, open-source ⭕)
GPT4 (paper, 2023, close-source ❌)
SantaCoder: don't reach for the stars! (github, paper, 2022, open-source ⭕)
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X (github, paper, 2022, open-source ⭕)
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis (github, paper, 2022, open-source ⭕)
Codex: Evaluating Large Language Models Trained on Code (2021, close-source ❌)

🔨 Training Methods for Code LLMs

Tuning Models of Code with Compiler-Generated Reinforcement Learning Feedback (paper, 2023)
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (github, paper, 2022)

🔧 Prompt Engineering for Code LLMs

Demystifying GPT Self-Repair for Code Generation (paper, 2023)
Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation (paper, 2023)
Teaching Large Language Models to Self-Debug (paper, 2023)
CodeT: Code Generation with Generated Tests (github, paper, 2022)

📋 Benchmarks

Function-level Code Generation

APPS (Execution-Based)
HumanEval: Python Code Completion (Execution-Based)
MBPP: Python Code Completion (Execution-Based)
MultiPL-E: Multi-language Code Completion (Execution-Based)
HumanEval-Plus: Same code problems as HumanEval, but contain much more test cases. (Execution-Based)
HumanEvalPack Extend HumanEval to Bugfix and Code Explain. (Execution-Based)
CodeXGLUE (Text-Code) A Large Benchmark for Code Generation. (BLEU-Based)
Concode. Java Code Completion. (BLEU-Based)

Class-level Code Generation

ClassEval: Python Class-level Code Completion. (Execution-Based)

Statement-level Code Generation

DS-1000: Python data science code completion and insertion. (Execution-Based)
CoNaLA: Statement-level Python Code Generation. (BLEU-Based)

📑 Code-Related Data

bigcode/Stack: The pre-training data of starcoder.
CodeAlpaca: 20K instruction-following data generated by text-davinci-003.
LeetCode-Solution-Python: Solutions and explanations for most of the leetcode problems.

📈 Leaderboard on HumanEval for Open-Source Models

Model	HumanEval Pass@1
🎃 w/o SFT 🎃
CodeGen-16B-Multi	18.3
CodeGen-16B-Mono	29.3
CodeGen2.5-7B-Multi	28.4
CodeGen2.5-7B-Mono	33.4
CodeGeeX-13B	22.9
Replit-code-v1-3B	17.1
LLaMA-13B	15.8
LLaMA-33B	21.7
LLaMA-65B	23.7
StarCoderBase-15B	30.1
StarCoder-15B	33.6
🎃 w/ SFT 🎃
InstructCodeT5+	35.0
CodeGen2.5-7B-instruct	36.2
OctoCoder-15B	45.8
WizardLM-30B 1.0	37.8
👑 WizardCoder-15B 1.0	57.3

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Intelligence

🎆 Foundation Models for Code

🔨 Training Methods for Code LLMs

🔧 Prompt Engineering for Code LLMs

📋 Benchmarks

Function-level Code Generation

Class-level Code Generation

Statement-level Code Generation

📑 Code-Related Data

📈 Leaderboard on HumanEval for Open-Source Models

About

Releases

Packages

ChiYeungLaw/Awsome-Code-Intelligence

Folders and files

Latest commit

History

Repository files navigation

Code Intelligence

🎆 Foundation Models for Code

🔨 Training Methods for Code LLMs

🔧 Prompt Engineering for Code LLMs

📋 Benchmarks

Function-level Code Generation

Class-level Code Generation

Statement-level Code Generation

📑 Code-Related Data

📈 Leaderboard on HumanEval for Open-Source Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages