A curated list of papers that may be of interest to Software Engineering students or professionals.
- Von Neumann's First Computer Program. Knuth (1970).
- The Education of a Computer. Hopper (1952).
- Recursive Programming. Dijkstra (1960).
- Goto Statement Considered Harmful. Dijkstra (1968).
- Program development by stepwise refinement. Wirth (1971).
- The paradigms of programming. Floyd (1979).
- Computing Machinery and Intelligence. Turing (1950).
- Some Moral and Technical Consequences of Automation. Wiener (1960).
- Steps towards Artificial Intelligence. Minsky (1960).
- ELIZA—a computer program for the study of natural language communication between man and machine. Weizenbaum (1966).
- A Theory of the Learnable. Valiant (1984).
- A Method for the Construction of Minimum-Redundancy Codes. Huffman (1952).
- A Universal Algorithm for Sequential Data Compression. Ziv, Lempel (1977).
- On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Kruskal (1956).
- A Note on Two Problems in Connexion with Graphs. Dijkstra (1959).
- Reducibility among Combinatorial Problems. Karp (1972).
- Big Omicron and big Omega and big Theta. Knuth (1976).
- Amortized Computational Complexity. Tarjan (1985).
- The Ubiquitous B-Tree. Comer (1979).
- Space/Time Trade-offs in Hash Coding with Allowable Errors. Bloom (1970).
- Ordered hash tables. Amble, Knuth (1974).
- Making data structures persistent. Driscoll et al (1986).
- Engineering a Sort Function. Bentley, McIlroy (1993).
- Quicksort. Hoare (1962).
- Computer Programming as an Art. Knuth (1974).
- The Humble Programmer. Dijkstra (1972).
- The Emperor’s Old Clothes. Hoare (1981).
- Literate Programming. Knuth (1984).
- Programming as Theory Building. Naur (1985).
- Programming with Abstract Data Types. Liskov, Zilles (1974).
- A Design Methodology for Reliable Software Systems. Liskov (1972).
- A Theory of Type Polymorphism in Programming. Milner (1978).
- On understanding types, data abstraction, and polymorphism. Cardelli, Wegner (1985).
- SELF: The Power of Simplicity. Ungar, Smith (1991).
- Why Functional Programming Matters. Hughes (1990).
- Recursive Functions of Symbolic Expressions and Their Computation by Machine. McCarthy (1960).
- Can Programming Be Liberated from the von Neumann Style?. Backus (1978).
- The Art of the Interpreter. Steele, Sussman (1978).
- The Semantic Elegance of Applicative Languages. Turner (1981).
- QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. Claessen, Hughes (2000).
- On the Criteria To Be Used in Decomposing Systems into Modules. Parnas (1971).
- Information Distribution Aspects of Design Methodology. Parnas (1972).
- Designing Software for Ease of Extension and Contraction. Parnas (1979).
- The Modular Structure of Complex Systems. Parnas, Clements, Weiss (1984).
- Toward higher-level abstractions for software systems. Shaw (1990).
- Foundations for the Study of Software Architecture. Perry, Wolf (1992).
- Software Aging. Parnas (1994).
- Software Engineering: An Unconsummated Marriage. Parnas (1997).
- The Mythical Man Month. Brooks (1975).
- How do committees invent?. Conway (1968).
- Managing the Development of Large Software Systems. Royce (1970).
- Lisp: Good news, bad news, how to win big. Gabriel (1991).
- The Cathedral and the Bazaar. Raymond (1998).
- No Silver Bullet: Essence and Accidents of Software Engineering. Brooks (1987).
- Out of the Tar Pit. Moseley, Marks (2006).
- Communicating sequential processes. Hoare (1976).
- Solution Of a Problem in Concurrent Program Control. Dijkstra (1965).
- Monitors: An operating system structuring concept. Hoare (1974).
- On the Duality of Operating System Structures. Lauer, Needham (1978).
- Software Transactional Memory. Shavit, Touitou (1997).
- The UNIX Time- Sharing System. Ritchie, Thompson (1974).
- An Experimental Time-Sharing System. CorbatĂł, Merwin Daggett, Daley (1962).
- The Structure of the "THE"-Multiprogramming System. Dijkstra (1968).
- Reflections on Trusting Trust. Thompson (1984).
- The Design and Implementation of a Log-Structured File System. Rosenblum, Ousterhout (1991).
- A Protocol for Packet Network Intercommunication. Cerf, Kahn (1974).
- Ethernet: Distributed packet switching for local computer networks. Metcalfe, Boggs (1978).
- An algorithm for distributed computation of a Spanning Tree in an Extended LAN. Perlman (1985).
- New Directions in Cryptography. Diffie, Hellman (1976).
- A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Rivest, Shamir, Adleman (1978).
- How To Share A Secret. Shamir (1979).
- A Certified Digital Signature. Merkle (1979).
- K-Anonymity: A Model For Protecting Privacy. Sweeney (2002).
- Hints for Computer System Design. Lampson (1983).
- End-To-End Arguments in System Design. Saltzer, Reed, Clark (1984).
- A Note on Distributed Computing. Waldo et al (1994).
- Rules of Thumb in Data Engineering. Gray, Shenay (1999).
- The Network is Reliable. Bailis, Kingsbury (2014).
- Thinking Methodically about Performance. Gregg (2012).
- Performance Anti-Patterns. Smaalders (2006).
- Thinking Clearly about Performance. Millsap (2010).
- A Relational Model of Data for Large Shared Data Banks. Codd (1970).
- Granularity of Locks and Degrees of Consistency in a Shared Data Base. Gray et al (1975).
- System R: Relational Approach to Database Management. Astrahan et al. (1976).
- Access Path Selection in a Relational Database Management System. Selinger et al (1979).
- The Transaction Concept: Virtues and Limitations. Gray (1981).
- The design of POSTGRES. Stonebraker, Rowe (1986).
- Time, Clocks, and the Ordering of Events in a Distributed System. Lamport (1978).
- Self-stabilizing systems in spite of distributed control. Dijkstra (1974).
- The Byzantine Generals Problem. Lamport, Shostak, Pease (1982).
- Impossibility of Distributed Consensus With One Faulty Process. Fisher, Lynch, Patterson (1985).
- Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial. Schneider (1990).
- How to Build a Highly Available System Using Consensus. Lampson (1996).
- Paxos made simple. Lamport (2001).
- In Search of an Understandable Consensus Algorithm. Ongaro, Ousterhout (2014).
- CAP Twelve Years Later: How the "Rules" Have Changed. Brewer (2012).
- Epidemic Algorithms for Replicated Database Maintenance. Demers et al (1987).
- The Dangers of Replication. Gray et al (1996).
- Harvest, Yield, and Scalable Tolerant Systems. Fox, Brewer (1999).
- Building on Quicksand. Helland, Campbell (2009).
- Life Beyond Distributed Transactions: An apostate's opinion. Helland (2016).
- The anatomy of a large-scale hypertextual Web search engine. Brin, Page (1998).
- World-Wide Web: Information Universe. Berners-Lee et al (1992).
- The PageRank Citation Ranking: Bringing Order to the Web. Page, Brin, Motwani (1999).
- Dynamo, Amazon’s Highly Available Key-value store. DeCandia et al (2007).
- The Google File System. Ghemawat, Gobioff, Leung (2003).
- MapReduce: Simplified Data Processing on Large Clusters. Dean, Ghemawat (2004).
- Bigtable: A Distributed Storage System for Structured Data. Chang et al (2006).
- ZooKeeper: wait-free coordination for internet scale systems. Hunt et al (2010).
- Kafka: a Distributed Messaging System for Log Processing. Kreps, Narkhede, Rao (2011).
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. Verbitski et al (2017).
- On Designing and Deploying Internet Scale Services. Hamilton (2007).
- Ironies of automation. Bainbridge (1983).
- How Complex Systems Fail. Cook (2000).
- Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. Patterson et al (2002).
- Crash-Only Software. Candea, Fox (2003).
- Nines are Not Enough: Meaningful Metrics for Clouds. Mogul, Wilkes (2019).
- Bitcoin, A peer-to-peer electronic cash system. Nakomoto (2008).
- Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform. Buterin (2014).
Show full list in chronological order
- Computing Machinery and Intelligence. Turing (1950)
- A Method for the Construction of Minimum-Redundancy Codes. Huffman (1952)
- The Education of a Computer. Hopper (1952)
- On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Kruskal (1956)
- A Note on Two Problems in Connexion with Graphs. Dijkstra (1959)
- Recursive Programming. Dijkstra (1960)
- Some Moral and Technical Consequences of Automation. Wiener (1960)
- Steps towards Artificial Intelligence. Minsky (1960)
- Recursive Functions of Symbolic Expressions and Their Computation by Machine. McCarthy (1960)
- Quicksort. Hoare (1962)
- An Experimental Time-Sharing System. CorbatĂł, Merwin Daggett, Daley (1962)
- Solution Of a Problem in Concurrent Program Control. Dijkstra (1965)
- ELIZA—a computer program for the study of natural language communication between man and machine. Weizenbaum (1966)
- Goto Statement Considered Harmful. Dijkstra (1968)
- How do committees invent?. Conway (1968)
- The Structure of the "THE"-Multiprogramming System. Dijkstra (1968)
- Von Neumann's First Computer Program. Knuth (1970)
- A Relational Model of Data for Large Shared Data Banks. Codd (1970)
- Space/Time Trade-offs in Hash Coding with Allowable Errors. Bloom (1970)
- Managing the Development of Large Software Systems. Royce (1970)
- On the Criteria To Be Used in Decomposing Systems into Modules. Parnas (1971)
- Program development by stepwise refinement. Wirth (1971)
- Reducibility among Combinatorial Problems. Karp (1972)
- The Humble Programmer. Dijkstra (1972)
- A Design Methodology for Reliable Software Systems. Liskov (1972)
- Information Distribution Aspects of Design Methodology. Parnas (1972)
- Computer Programming as an Art. Knuth (1974)
- Programming with Abstract Data Types. Liskov, Zilles (1974)
- The UNIX Time- Sharing System. Ritchie, Thompson (1974)
- A Protocol for Packet Network Intercommunication. Cerf, Kahn (1974)
- Ordered hash tables. Amble, Knuth (1974)
- Monitors: An operating system structuring concept. Hoare (1974)
- Self-stabilizing systems in spite of distributed control. Dijkstra (1974)
- The Mythical Man Month. Brooks (1975)
- Granularity of Locks and Degrees of Consistency in a Shared Data Base. Gray et al (1975)
- Communicating sequential processes. Hoare (1976)
- New Directions in Cryptography. Diffie, Hellman (1976)
- Big Omicron and big Omega and big Theta. Knuth (1976)
- System R: Relational Approach to Database Management. Astrahan et al. (1976)
- A Universal Algorithm for Sequential Data Compression. Ziv, Lempel (1977)
- Time, Clocks, and the Ordering of Events in a Distributed System. Lamport (1978)
- A Theory of Type Polymorphism in Programming. Milner (1978)
- Can Programming Be Liberated from the von Neumann Style?. Backus (1978)
- The Art of the Interpreter. Steele, Sussman (1978)
- On the Duality of Operating System Structures. Lauer, Needham (1978)
- Ethernet: Distributed packet switching for local computer networks. Metcalfe, Boggs (1978)
- A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Rivest, Shamir, Adleman (1978)
- The Ubiquitous B-Tree. Comer (1979)
- The paradigms of programming. Floyd (1979)
- Designing Software for Ease of Extension and Contraction. Parnas (1979)
- How To Share A Secret. Shamir (1979)
- A Certified Digital Signature. Merkle (1979)
- Access Path Selection in a Relational Database Management System. Selinger et al (1979)
- The Emperor’s Old Clothes. Hoare (1981)
- The Semantic Elegance of Applicative Languages. Turner (1981)
- The Transaction Concept: Virtues and Limitations. Gray (1981)
- The Byzantine Generals Problem. Lamport, Shostak, Pease (1982)
- Hints for Computer System Design. Lampson (1983)
- Ironies of automation. Bainbridge (1983)
- A Theory of the Learnable. Valiant (1984)
- Literate Programming. Knuth (1984)
- The Modular Structure of Complex Systems. Parnas, Clements, Weiss (1984)
- Reflections on Trusting Trust. Thompson (1984)
- End-To-End Arguments in System Design. Saltzer, Reed, Clark (1984)
- Amortized Computational Complexity. Tarjan (1985)
- Programming as Theory Building. Naur (1985)
- On understanding types, data abstraction, and polymorphism. Cardelli, Wegner (1985)
- An algorithm for distributed computation of a Spanning Tree in an Extended LAN. Perlman (1985)
- Impossibility of Distributed Consensus With One Faulty Process. Fisher, Lynch, Patterson (1985)
- Making data structures persistent. Driscoll et al (1986)
- The design of POSTGRES. Stonebraker, Rowe (1986)
- No Silver Bullet: Essence and Accidents of Software Engineering. Brooks (1987)
- Epidemic Algorithms for Replicated Database Maintenance. Demers et al (1987)
- Why Functional Programming Matters. Hughes (1990)
- Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial. Schneider (1990)
- Toward higher-level abstractions for software systems. Shaw (1990)
- SELF: The Power of Simplicity. Ungar, Smith (1991)
- Lisp: Good news, bad news, how to win big. Gabriel (1991)
- The Design and Implementation of a Log-Structured File System. Rosenblum, Ousterhout (1991)
- Foundations for the Study of Software Architecture. Perry, Wolf (1992)
- World-Wide Web: Information Universe. Berners-Lee et al (1992)
- Engineering a Sort Function. Bentley, McIlroy (1993)
- Software Aging. Parnas (1994)
- A Note on Distributed Computing. Waldo et al (1994)
- How to Build a Highly Available System Using Consensus. Lampson (1996)
- The Dangers of Replication. Gray et al (1996)
- Software Engineering: An Unconsummated Marriage. Parnas (1997)
- Software Transactional Memory. Shavit, Touitou (1997)
- The anatomy of a large-scale hypertextual Web search engine. Brin, Page (1998)
- The Cathedral and the Bazaar. Raymond (1998)
- Rules of Thumb in Data Engineering. Gray, Shenay (1999)
- Harvest, Yield, and Scalable Tolerant Systems. Fox, Brewer (1999)
- The PageRank Citation Ranking: Bringing Order to the Web. Page, Brin, Motwani (1999)
- QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. Claessen, Hughes (2000)
- How Complex Systems Fail. Cook (2000)
- Paxos made simple. Lamport (2001)
- K-Anonymity: A Model For Protecting Privacy. Sweeney (2002)
- Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. Patterson et al (2002)
- The Google File System. Ghemawat, Gobioff, Leung (2003)
- Crash-Only Software. Candea, Fox (2003)
- MapReduce: Simplified Data Processing on Large Clusters. Dean, Ghemawat (2004)
- Out of the Tar Pit. Moseley, Marks (2006)
- Performance Anti-Patterns. Smaalders (2006)
- Bigtable: A Distributed Storage System for Structured Data. Chang et al (2006)
- Dynamo, Amazon’s Highly Available Key-value store. DeCandia et al (2007)
- On Designing and Deploying Internet Scale Services. Hamilton (2007)
- Bitcoin, A peer-to-peer electronic cash system. Nakomoto (2008)
- Building on Quicksand. Helland, Campbell (2009)
- Thinking Clearly about Performance. Millsap (2010)
- ZooKeeper: wait-free coordination for internet scale systems. Hunt et al (2010)
- Kafka: a Distributed Messaging System for Log Processing. Kreps, Narkhede, Rao (2011)
- Thinking Methodically about Performance. Gregg (2012)
- CAP Twelve Years Later: How the "Rules" Have Changed. Brewer (2012)
- The Network is Reliable. Bailis, Kingsbury (2014)
- In Search of an Understandable Consensus Algorithm. Ongaro, Ousterhout (2014)
- Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform. Buterin (2014)
- Life Beyond Distributed Transactions: An apostate's opinion. Helland (2016)
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. Verbitski et al (2017)
- Nines are Not Enough: Meaningful Metrics for Clouds. Mogul, Wilkes (2019)
This list was inspired by (and draws from) the Papers We Love project and the Ideas That Created the Future book by Harry R. Lewis. I also found the morning paper blog to be an extremely useful resource. For distributed systems, I used Distributed systems for fun and profit and for Relational Data Bases, I used the Red Book.
A few interesting resources about reading papers from Papers We Love and elsewhere:
- Should I read papers?
- How to Read an Academic Article
- How to Read a Paper. Keshav (2007).
- Efficient Reading of Papers in Science and Technology. Hanson (1999).
- On ICSE’s “Most Influential Papers”. Parnas (1995).
- The list should stay short. Let's say no more than 30 papers.
- The idea is not to include every interesting paper that I come across but rather to keep a representative list that's possible to read from start to finish with a similar level of effort as reading a technical book from cover to cover.
- I tried to include one paper per each major topic and author. Since in the process I found a lot of noteworthy alternatives, related or follow-up papers and I wanted to keep track of those as well, I included them as sublist items (some of these sublists are currently longer than they should).
- The papers shouldn't be too long. For the same reasons as the previous item, I try to avoid papers longer than 20 or 30 pages.
- They should be self-contained and readable enough to be approachable by the casual technical reader.
- They should be freely available online.
- Although historical relevance was taken into account, I omitted seminal papers in the cases where I found them hard to approach, when the main subject of the paper wasn't the thing that made them influential, etc.
- That being said, where possible I preferred the original paper on each subject over modern updates or summary papers.
- I tended to prefer topics that I can relate to my professional practice, typically papers originated in the industry
or about innovations that later saw wide adoption.
- Similarly, I tended to skip more theoretical papers, those focusing on mathematical foundations for Computer Science, electronic aspects of hardware, etc.
- UI/UX and modern Machine Learning are missing because I'm not familiar enough with those areas to find relevant, non overly specific papers. Suggestions are welcome.
Disclaimer: I'm not a frequent paper reader, so I made this list as a sort of roadmap for myself. I haven't read all of the papers in the list yet; as I do, I may find than some don't meet the described criteria after all and remove them, or decide to add new ones.
And, yes, this repository is a way to procrastinate on the actual reading after I finished making the list.