Skip to content

Papers and Book to look at when starting AGI ๐Ÿ“š

License

Notifications You must be signed in to change notification settings

spacetrip-1004/AGI-Papers

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŒŸ AGI-Papers ๐ŸŒŸ

LLM ยท NLP
Text2All ยท All2All
Multi-modal ยท Multi-task

licenses GitHub stars GitHub watching contributors

Let's find out the latest and various LLM-related papers. ๐Ÿ™‡โ€โ™‚๏ธ๐Ÿ™‡โ€โ™€๏ธ by Stargazers

AGI

Large language models (LLMs) have achieved remarkable progress in various natural language processing tasks with emergent abilities. However, they face inherent limitations, such as an inability to access up-to-date information, utilize external tools, or perform precise mathematical reasoning. In this paper, we introduce Chameleon, a plug-and-play compositional reasoning framework that augments LLMs to help address these challenges. Chameleon synthesizes programs to compose various tools, including LLM models, off-the-shelf vision models, web search engines, Python functions, and rule-based modules tailored to user interests. Built on top of an LLM as a natural language planner, Chameleon infers the appropriate sequence of tools to compose and execute in order to generate a final response. We showcase the adaptability and effectiveness of Chameleon on two tasks: ScienceQA and TabMWP. Notably, Chameleon with GPT-4 achieves an 86.54% accuracy on ScienceQA, significantly improving upon the best published few-shot model by 11.37%; using GPT-4 as the underlying LLM, Chameleon achieves a 17.8% increase over the state-of-the-art model, leading to a 98.78% overall accuracy on TabMWP. Further studies suggest that using GPT-4 as a planner exhibits more consistent and rational tool selection and is able to infer potential constraints given the instructions, compared to other LLMs like ChatGPT.

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.

Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making processes, specifically the ability to learn from mistakes. Self-reflection allows humans to efficiently solve novel problems through a process of trial and error. Building on recent research, we propose Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. To achieve full automation, we introduce a straightforward yet effective heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and, in some environments, construct an internal memory map of the given environment. To assess our approach, we evaluate the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. We observe success rates of 97% and 51%, respectively, and provide a discussion on the emergent property of self-reflection.

Like people, LLMs do not always generate the best text for a given generation problem on their first try (e.g., summaries, answers, explanations). Just as people then refine their text, we introduce SELF-REFINE, a framework for similarly improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an output using an LLM, then allow the same model to provide multi-aspect feedback for its own output; finally, the same model refines its previously generated output given its own feedback. Unlike earlier work, our iterative refinement framework does not require supervised training data or reinforcement learning, and works with a single LLM. We experiment with 7 diverse tasks, ranging from review rewriting to math reasoning, demonstrating that our approach outperforms direct generation. In all tasks, outputs generated with SELF-REFINE are preferred by humans and by automated metrics over those generated directly with GPT-3.5 and GPT-4, improving on average by absolute 20% across tasks.

Solving complicated AI tasks with different domains and modalities is a key step toward advanced artificial intelligence. While there are abundant AI models available for different domains and modalities, they cannot handle complicated AI tasks. Considering large language models (LLMs) have exhibited exceptional ability in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks and language could be a generic interface to empower this. Based on this philosophy, we present HuggingGPT, a framework that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in Hugging Face, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in Hugging Face, HuggingGPT is able to cover numerous sophisticated AI tasks in different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards advanced artificial intelligence.

Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, chains together LLM "thoughts", to autonomously achieve whatever goal you set. As one of the first examples of GPT-4 running fully autonomously, Auto-GPT pushes the boundaries of what is possible with AI.

์‚ฌ๋‚ด์˜ ์ œํ’ˆ ์ •๋ณด, ๋ฌผ๋ฅ˜ ์ •๋ณด, ์ธ์‚ฌ๊ทœ์ •, ํšŒ๊ณ„๊ธฐ์ค€๊ณผ ๊ฐ™์€ ์ •๋ณด๋Š” ์‚ฌ๋‚ด์— ์œ ์ง€๋˜์–ด์•ผ ํ•˜๋ฉฐ, ํ•ด๋‹น ์‚ฌํ•ญ์— ๋Œ€ํ•œ ์งˆ์˜์™€ ๋‹ต๋ณ€์— ๋Œ€ํ•ด์„œ๋„ ๋น„๋ฐ€์ด ์œ ์ง€๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด ์™ธ๋ถ€ ํด๋ผ์šฐ๋“œ์—์„œ ์ œ๊ณต๋˜๋Š” ์–ธ์–ด๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ์‚ฌ๋‚ด ์ •๋ณด๊ฐ€ ์œ ์ถœ๋  ๊ฐ€๋Šฅ์„ฑ์„ ํ†ต์ œํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์ˆ ์ ์ธ ๋ฐฉ๋ฒ•์ด ์—†์œผ๋ฏ€๋กœ, ์–ธ์–ด๋ชจ๋ธ์„ ์‚ฌ๋‚ด์— ์„ค์น˜ํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์œ ์ผํ•ฉ๋‹ˆ๋‹ค. LiOn์€ ์‚ฌ๋‚ด์— ์„ค์น˜ํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ๋Ÿ‰ํ™”๋œ ์ดˆ๊ฑฐ๋Œ€ ์–ธ์–ด๋ชจ๋ธ๋กœ์„œ ์‚ฌ๋‚ด์˜ ์ •๋ณด๋ฅผ ์•ˆ์ „ํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋ฉด์„œ ๊ตฌ์„ฑ์›๋“ค์ด ์•ˆ์ „ํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋Œ€์•ˆ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ๊ทธ์ค‘ ํ•˜๋‚˜์˜ ์˜ˆ์‹œ์ด๋ฉฐ์‚ฌ๋‚ด์—์„œ์˜ ์ง์›๋“ค๊ณผ์˜ ๋ถˆํ™”์— ๋Œ€ํ•œ ์ƒ๋‹ด์— ์žˆ์–ด LiOn์ด ์ƒ๋‹ดํ•˜๋Š” ์‚ฌ๋ก€๋ฅผ ๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์™ธ์—๋„ LiOn์€ ์‚ฌ๋‚ด์—์„œ ์ผ์–ด๋‚  ์ˆ˜ ์žˆ๋Š” ์ˆ˜๋งŽ์€ ์ƒํ™ฉ์—์„œ ๋‹ค์–‘ํ•œ ํ•ด๊ฒฐ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•จ์œผ๋กœ์„œ 24/7 ๊ตฌ์„ฑ์›๋“ค์˜ ์—…๋ฌด๋ฅผ ๋•๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

Read 2023

Paper to read

before 2023

MLLMArxivTalk

์ตœ์‹  MLLM ๊ด€๋ จ ์Šคํ„ฐ๋””. ๊ธฐ๋ณธ ์˜คํ›„์— ์ง„ํ–‰. ๋…ผ๋ฌธ, ๊ฐ•์˜, ์ฝ”๋“œ, ๋‰ด์Šค, ๋ธ”๋กœ๊ทธ ๋“ฑ ๋‹ค์–‘ํ•œ ์ž๋ฃŒ๋กœ ํ•™์Šต.

MLLM, LLM, NLG, Dialogue, Reinforcement learning, Distillation, Efficient, Sentence similarity, multiple tasks, multimodal, Stable diffusion, TTS, Text-To-Video, All-To-All, ์šฐ์ฃผ, ์ƒ๋ช…, ์ง€๋Šฅ, ์œค๋ฆฌ, ๊ทœ์ œ, ๋ฒ•, ๋…ธํ™”, ์˜ํ•™, ํˆฌ์ž, ๊ฐœ๋ฐœ, ์ธํ”„๋ผ, ๋””์ž์ธ, ๊ฒฝ์˜, ETC...

์œ ๋ง ์Šคํƒ€ํŠธ์—… C๋ ˆ๋ฒจ, ๊ตญ๋‚ด์™ธ ํƒ‘ํ‹ฐ์–ด ์—ฐ๊ตฌ์ž, ๊ตญ๋‚ด์™ธ ํƒ‘ํ‹ฐ์–ด ๋Œ€ํ•™, ๋Œ€ํ•™์› ์žฌํ•™์ƒ๊ณผ ์กธ์—…์ƒ, ์„ํ•™, ๊ต์ˆ˜ ๋“ฑ A๊ธ‰ ์ธ์žฌ๋“ค์ด ์ตœ์‹  ๋…ผ๋ฌธ, ๊ฐ•์˜ ๋“ฑ ์Šคํ„ฐ๋”” ๋ฐ ํ”„๋กœ์ ํŠธ ์ง„ํ–‰.

๊ธฐ๋ณธ ๋งค์ฃผ ์ˆ˜์š”์ผ ์˜คํ›„ 7์‹œ๋ฐ˜. ์‚ฌ์ „ ํ•™์Šต ์—†์ด ๋…ผ๋ฌธ ์ฝ๊ธฐ ์ตœ๋Œ€ 20๋ถ„, ํ† ๋ก  ์ตœ๋Œ€ 40๋ถ„. ํ•œ ๋ฒˆ์— 1 ~ 10๊ฐœ ๋…ผ๋ฌธ, ๊ฐ•์˜ ๋“ฑ ์ง„ํ–‰. ์ง€๊ธˆ๊นŒ์ง€๋Š” ํ•ญ์ƒ 3๊ฐœ. ์ฃผ์ œ ๋…ผ๋ฌธ ์„ ์ •์€ ์ž์œ . ํƒ‘ํ‹ฐ์–ด ํ•™ํšŒ ๋…ผ๋ฌธ ๋ฐ ํ”„๋กœ์ ํŠธ ์ œ์ž‘ ์˜ˆ์ •.

์ฃผ๋ง์„ ํฌํ•จํ•˜์—ฌ, ๊ฑฐ์˜ ๋งค์ผ ์ถ”๊ฐ€ ์Šคํ„ฐ๋”” ์กด์žฌ. ํฅ๋ฏธ๋กœ์šด ์ฃผ์ œ๊ฑฐ๋‚˜ ์ฐธ์—ฌ ๋˜๋Š” ๋‚ ๋งŒ ์ค‘๊ฐ„์— ๋“ค์–ด์™€์„œ ์ค‘๊ฐ„์— ๋‚˜๊ฐ€๋„ ๋ฌด๊ด€. ๋ชจ๋“  ๊ทœ์น™์€ ํ˜‘์˜ ๊ฐ€๋Šฅ. ์˜คํ”„๋ผ์ธ ๋ชจ์ž„๋„ ์˜ˆ์ •. ์ž์œจ ์ฐธ์—ฌ.

์Šคํ„ฐ๋”” ๊ทœ์น™

  1. ์˜์–ด๋งŒ ์‚ฌ์šฉ์€ ๊ธˆ์ง€. ํ•œ๊ตญ์–ด ์ค‘์‹ฌ ์‚ฌ์šฉ. ํŠน์ˆ˜ ์šฉ์–ด๋Š” ์˜์–ด ์‚ฌ์šฉ.
  2. 1์ฃผ์ผ์— ๋…ผ๋ฌธ 2๊ฐœ ์ด์ƒ ์Šคํ„ฐ๋””. ๋˜๋Š” ์‚ฌ๋žŒ์€ 10๊ฐœ ์ด์ƒ.
  3. 3๋ถ„์—์„œ 20๋ถ„ ํ˜„์žฅ์—์„œ ๋…ผ๋ฌธ ์ฝ๊ธฐ. 5๋ถ„์—์„œ 30๋ถ„ ํ† ๋ก .
  4. 1์‹œ๊ฐ„ ์Šคํ„ฐ๋”” ์‹œ, ๋ฐ”๋กœ ๋‚˜๊ฐ€๋„ ๋จ. ์›ํ•  ๋•Œ 10๋ถ„ ์ดํ•˜ ์ฐธ์—ฌ๋„ ๋ฌด๊ด€. ์ž์œ ๋กญ๊ฒŒ ์ง„ํ–‰. 2์‹œ๊ฐ„ ๋งค์ผ๋„ ๊ฐ€๋Šฅ.
  5. ๊ฐ์ž ๋” ๋›ฐ์–ด๋‚œ ๊ฒŒ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์ธ์ง€. ๋‹ค๋“ค ๋Œ€๋‹จํ•œ ๋ถ„๋“ค์ด๋‹ˆ ์งˆ๋ฌธ ๋งŽ์ด ํ•˜๊ณ , ์ •๋ณด ๊ณต์œ  ์ž์ฃผ.
  6. ๋ณธ์ธ์ด ํ•˜๊ธฐ๋กœ ํ•œ ์ผ๋งŒ์€ ์ˆ˜ํ–‰. ํ•œ๋‹ค๊ณ  ๋งํ•˜๊ณ , ์•ˆ ํ•˜๋Š” ๊ฒƒ์€ ๋ฏผํ๋‹ค.
  7. ๊ธฐ๋ณธ์ ์œผ๋กœ ๋…นํ™” ํ›„ ๋‚ด๋ถ€ ๊ณต์œ .
  8. ์ •๋ณด๋ฅผ ํ˜ผ์ž ์•Œ๊ฒŒ ์“ฐ์ง€ ๋ง๊ณ , ๋‹ค ๊ฐ™์ด ์•Œ๊ฒŒ ๋งํ•˜๊ธฐ.
  9. ๊ฐœ์ธ ์‚ฌ์ •์œผ๋กœ ์Šคํ„ฐ๋”” ํƒˆํ‡ด ์‹œ, ์ž๊ธฐ์†Œ๊ฐœ์— ์ธ์‚ฌ ์ž‘์„ฑ.
  10. ์—ฌ๋Ÿฌ ๊ธฐ๊ด€ ์ข‹์€ ๊ทœ์น™ ๋ถ™์—ฌ๋„ฃ๊ธฐ.
  11. ํŒ€์— ๋„์›€์ด ๋œ๋‹ค๊ณ  ํŒ๋‹จํ•˜๋ฉด, ์œ„ ๊ทœ์น™์„ ๋ชจ๋‘ ๋ฌด์‹œํ•˜๊ณ  ํ–‰๋™.
  12. ์ถ”๊ฐ€.

Basic knowledge

mathematics machine learning Transformer Hugging Face
image
mathematics for machine learning Pattern Recognition and Machine Learning Getting Started with Google BERT Natural Language Processing with Transformers

About

Papers and Book to look at when starting AGI ๐Ÿ“š

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published