Skip to content

In this repository, you'll find a curated selection of recent research papers, articles, and implementations from leading experts in the field of Code Intelligence.

Notifications You must be signed in to change notification settings

ChiYeungLaw/Awsome-Code-Intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 

Repository files navigation

Code Intelligence

πŸ’» Welcome to the world of Code Intelligence!

Code Intelligence is an exciting field focused on automating code completion and generation. The ultimate objective is to develop intelligent models capable of generating code based on specific requirements. This repository serves as a comprehensive collection of the latest research and advancements in this domain.

πŸŽ† Foundation Models for Code

  • OctoPack: Instruction Tuning Code Large Language Models (paper, github, open-source β­•)
  • PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback (paper, close-source ❌)
  • CodeGen2.5: Small, but mighty (github, blog, open-source β­•)
  • Phi-1: Textbooks Are All You Need (paper, 2023, close-source ❌)
  • πŸ‘‘WizardCoder: Empowering Code Large Language Models with Evol-Instruct (github, paper, 2023, open-source β­•)
  • CodeT5+: Open Code Large Language Models for Code Understanding and Generation (github, paper, 2023, open-source β­•)
  • StarCoder: May the source be with you! (github, paper, 2023, open-source β­•)
  • CodeGen2: Lessons for Training LLMs on Programming and Natural Languages (github, paper, 2023, open-source β­•)
  • Replit-code-v1-3b (github, twitter, 2023, open-source β­•)
  • GPT4 (paper, 2023, close-source ❌)
  • SantaCoder: don't reach for the stars! (github, paper, 2022, open-source β­•)
  • CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X (github, paper, 2022, open-source β­•)
  • CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis (github, paper, 2022, open-source β­•)
  • Codex: Evaluating Large Language Models Trained on Code (2021, close-source ❌)

πŸ”¨ Training Methods for Code LLMs

  • Tuning Models of Code with Compiler-Generated Reinforcement Learning Feedback (paper, 2023)
  • CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (github, paper, 2022)

πŸ”§ Prompt Engineering for Code LLMs

  • Demystifying GPT Self-Repair for Code Generation (paper, 2023)
  • Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation (paper, 2023)
  • Teaching Large Language Models to Self-Debug (paper, 2023)
  • CodeT: Code Generation with Generated Tests (github, paper, 2022)

πŸ“‹ Benchmarks

Function-level Code Generation

  • APPS (Execution-Based)
  • HumanEval: Python Code Completion (Execution-Based)
  • MBPP: Python Code Completion (Execution-Based)
  • MultiPL-E: Multi-language Code Completion (Execution-Based)
  • HumanEval-Plus: Same code problems as HumanEval, but contain much more test cases. (Execution-Based)
  • HumanEvalPack Extend HumanEval to Bugfix and Code Explain. (Execution-Based)
  • CodeXGLUE (Text-Code) A Large Benchmark for Code Generation. (BLEU-Based)
  • Concode. Java Code Completion. (BLEU-Based)

Class-level Code Generation

  • ClassEval: Python Class-level Code Completion. (Execution-Based)

Statement-level Code Generation

  • DS-1000: Python data science code completion and insertion. (Execution-Based)
  • CoNaLA: Statement-level Python Code Generation. (BLEU-Based)

πŸ“‘ Code-Related Data

πŸ“ˆ Leaderboard on HumanEval for Open-Source Models

Model HumanEval Pass@1
πŸŽƒ w/o SFT πŸŽƒ
CodeGen-16B-Multi 18.3
CodeGen-16B-Mono 29.3
CodeGen2.5-7B-Multi 28.4
CodeGen2.5-7B-Mono 33.4
CodeGeeX-13B 22.9
Replit-code-v1-3B 17.1
LLaMA-13B 15.8
LLaMA-33B 21.7
LLaMA-65B 23.7
StarCoderBase-15B 30.1
StarCoder-15B 33.6
πŸŽƒ w/ SFT πŸŽƒ
InstructCodeT5+ 35.0
CodeGen2.5-7B-instruct 36.2
OctoCoder-15B 45.8
WizardLM-30B 1.0 37.8
πŸ‘‘ WizardCoder-15B 1.0 57.3

About

In this repository, you'll find a curated selection of recent research papers, articles, and implementations from leading experts in the field of Code Intelligence.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published