My personal roadmap to become a great programmer.
Inspired on Sijin Joseph's Programmer Competency Matrix and John Washam's Coding Interview University.
Some things like communication and experience will be trained within the method: writing blogs and delibarated practices.
My study annotations will be found in my personal blog.
- Data structures
- Algorithms
- Advanced algorithms
- System architecture
- Computer networks
- Automated testing
- System Design, Scalability, Data Handling
- Problem decomposition
- Systems decomposition
- Database
- Security
You should have knowledge of advanced data structures like B-trees, binomial and fibonacci heaps, AVL/Red Black trees, Splay Trees, Skip Lists, tries etc.
- Arrays
- Linked Lists
- Stack
- Queue
- Hash table
- Trees - Notes & Background
- Binary search trees: BSTs
- Heap / Priority Queue / Binary Heap
- AVL trees
- Splay trees
- 2-3 search trees
- 2-3-4 Trees (aka 2-4 trees)
- B-Trees
- Red/black trees
- N-ary (K-ary, M-ary) trees
- Tries
- Note there are different kinds of tries. Some have prefixes, some don't, and some use string instead of bits to track the path.
- I read through code, but will not implement.
- Sedgewick - Tries (3 videos)
- Notes on Data Structures and Programming Techniques
- Short course videos:
- The Trie: A Neglected Data Structure
- TopCoder - Using Tries
- Stanford Lecture (real world use case) (video)
- MIT, Advanced Data Structures, Strings (can get pretty obscure about halfway through)
You should be able to recognize and code dynamic programming solutions, good knowledge of graph algorithms, good knowledge of numerical computation algorithms, able to identify NP problems etc.
- Binary search
- Bitwise operations
- Sorting Algorithm Stability
- Bubble Sort
- Analyzing Bubble Sort
- Insertion Sort
- Merge Sort
- Quicksort
- Selection Sort
- Heap sort
- Radix Sort
- Sorting in Linear Time (video)
- Graphs
- There are three basic ways to represent a graph in memory:
- objects and pointers
- matrix
- adjacency list
- Familiarize yourself with each representation and its pros & cons
- BFS and DFS - know their computational complexity, their tradeoffs, and how to implement them in real code
- Check for cycle (needed for topological sort, since we'll check for cycle before starting)
- Topological sort
- Count connected components in a graph
- List strongly connected components (Cliques)
- Check for bipartite graph
- There are three basic ways to represent a graph in memory:
- Dijkstra's algorithm
- A* algorithm
- Recursion
- When it is appropriate to use it
- How is tail recursion better than not?
- String searching & manipulations
- Compression
- Compressor Head videos
- (optional) Google Developers Live: GZIP is not enough!
- MP3
- MPEG
- Linear Programming
- Simplex
You should understands the entire programming stack, hardware (CPU + Memory + Cache + Interrupts + microcode), binary code, assembly, static and dynamic linking, compilation, interpretation, JIT compilation, garbage collection, heap, stack, memory addressing and others.
-
Compilers
-
Write Great Code: Volume 1: Understanding the Machine
- Numeric Representation
- Binary Arithmetic and Bit Operations
- Floating-Point Representation
- Character Representation
- Memory Organization and Access
- Composite Data Types and Memory Objects
- CPU Architecture
- Instruction Set Architecture
- Memory Architecture and Organization
-
Caches
-
Processes and Threads
- Computer Science 162 - Operating Systems (25 videos):
- for processes and threads see videos 1-11
- Operating Systems and System Programming (video)
- What Is The Difference Between A Process And A Thread?
- Covers:
- Processes, Threads, Concurrency issues
- difference between processes and threads
- processes
- threads
- locks
- mutexes
- semaphores
- monitors
- how they work
- deadlock
- livelock
- CPU activity, interrupts, context switching
- Modern concurrency constructs with multicore processors
- Paging, segmentation and virtual memory (video)
- Interrupts (video)
- Scheduling (video)
- Process resource needs (memory: code, static storage, stack, heap, and also file descriptors, i/o)
- Thread resource needs (shares above (minus stack) with other threads in the same process but each has its own pc, stack counter, registers, and stack)
- Forking is really copy on write (read-only) until the new process writes to memory, then it does a full copy.
- Context switching
- How context switching is initiated by the operating system and underlying hardware
- Processes, Threads, Concurrency issues
- threads in C++ (series - 10 videos)
- concurrency in Python (videos):
- Computer Science 162 - Operating Systems (25 videos):
-
Floating Point Numbers
-
Unicode
-
Endianness
- Big And Little Endian
- Big Endian Vs Little Endian (video)
- Big And Little Endian Inside/Out (video)
- Very technical talk for kernel devs. Don't worry if most is over your head.
- The first half is enough.
-
Scheduling
- in an OS, how it works
- can be gleaned from Operating System videos
-
Implement system routines
- understand what lies beneath the programming APIs you use
- can you implement them?
-
Unix command line tools
- Khan Academy
- UDP and TCP: Comparison of Transport Protocols
- TCP/IP and the OSI Model Explained!
- Packet Transmission across the Internet. Networking & TCP/IP tutorial.
- HTTP
- SSL and HTTPS
- SSL/TLS
- HTTP 2.0
- Video Series (21 videos)
- Subnetting Demystified - Part 5 CIDR Notation
- Sockets:
Knowledge of distributed VCS systems. Has tried out Bzr/Mercurial/Darcs/Git
Can setup a script to build the system and also documentation, installers, generate release notes and tag the code in source control
Understands and is able to setup automated functional, load/performance and UI tests
- How unit testing works
- What are mock objects
- What is integration testing
- What is dependency injection
- Agile Software Testing with James Bach (video)
- Open Lecture by James Bach on Software Testing (video)
- Steve Freeman - Test-Driven Development (that’s not what we meant) (video)
- TDD is dead. Long live testing.
- Is TDD dead? (video)
- Video series (152 videos) - not all are needed (video)
- Test-Driven Web Development with Python
- Dependency injection:
- How to write tests
- You can expect system design questions if you have 4+ years of experience.
- Scalability and System Design are very large topics with many topics and resources, since there is a lot to consider when designing a software/hardware system that can scale. Expect to spend quite a bit of time on this.
- Considerations:
- scalability
- Distill large data sets to single values
- Transform one data set to another
- Handling obscenely large amounts of data
- system design
- features sets
- interfaces
- class hierarchies
- designing a system under certain constraints
- simplicity and robustness
- tradeoffs
- performance analysis and optimization
- scalability
- START HERE: The System Design Primer
- System Design from HiredInTech
- How Do I Prepare To Answer Design Questions In A Technical Inverview?
- 8 Things You Need to Know Before a System Design Interview
- Algorithm design
- Database Normalization - 1NF, 2NF, 3NF and 4NF (video)
- System Design Interview - There are a lot of resources in this one. Look through the articles and examples. I put some of them below.
- How to ace a systems design interview
- Numbers Everyone Should Know
- How long does it take to make a context switch?
- Transactions Across Datacenters (video)
- A plain English introduction to CAP Theorem
- Paxos Consensus algorithm:
- Consistent Hashing
- NoSQL Patterns
- Scalability:
- Great overview (video)
- Short series:
- Scalable Web Architecture and Distributed Systems
- Fallacies of Distributed Computing Explained
- Pragmatic Programming Techniques
- Jeff Dean - Building Software Systems At Google and Lessons Learned (video)
- Introduction to Architecting Systems for Scale
- Scaling mobile games to a global audience using App Engine and Cloud Datastore (video)
- How Google Does Planet-Scale Engineering for Planet-Scale Infra (video)
- The Importance of Algorithms
- Sharding
- Scale at Facebook (2009)
- Scale at Facebook (2012), "Building for a Billion Users" (video)
- Engineering for the Long Game - Astrid Atkinson Keynote(video)
- 7 Years Of YouTube Scalability Lessons In 30 Minutes
- How PayPal Scaled To Billions Of Transactions Daily Using Just 8VMs
- How to Remove Duplicates in Large Datasets
- A look inside Etsy's scale and engineering culture with Jon Cowie (video)
- What Led Amazon to its Own Microservices Architecture
- To Compress Or Not To Compress, That Was Uber's Question
- Asyncio Tarantool Queue, Get In The Queue
- When Should Approximate Query Processing Be Used?
- Google's Transition From Single Datacenter, To Failover, To A Native Multihomed Architecture
- Spanner
- Egnyte Architecture: Lessons Learned In Building And Scaling A Multi Petabyte Distributed System
- Machine Learning Driven Programming: A New Programming For A New World
- The Image Optimization Technology That Serves Millions Of Requests Per Day
- A Patreon Architecture Short
- Tinder: How Does One Of The Largest Recommendation Engines Decide Who You'll See Next?
- Design Of A Modern Cache
- Live Video Streaming At Facebook Scale
- A Beginner's Guide To Scaling To 11 Million+ Users On Amazon's AWS
- How Does The Use Of Docker Effect Latency?
- Does AMP Counter An Existential Threat To Google?
- A 360 Degree View Of The Entire Netflix Stack
- Latency Is Everywhere And It Costs You Sales - How To Crush It
- Serverless (very long, just need the gist)
- What Powers Instagram: Hundreds of Instances, Dozens of Technologies
- Cinchcast Architecture - Producing 1,500 Hours Of Audio Every Day
- Justin.Tv's Live Video Broadcasting Architecture
- Playfish's Social Gaming Architecture - 50 Million Monthly Users And Growing
- TripAdvisor Architecture - 40M Visitors, 200M Dynamic Page Views, 30TB Data
- PlentyOfFish Architecture
- Salesforce Architecture - How They Handle 1.3 Billion Transactions A Day
- ESPN's Architecture At Scale - Operating At 100,000 Duh Nuh Nuhs Per Second
- See "Messaging, Serialization, and Queueing Systems" way below for info on some of the technologies that can glue services together
- Twitter:
- For even more, see "Mining Massive Datasets" video series in the Video Series section.
- Practicing the system design process: Here are some ideas to try working through on paper, each with some documentation on how it was handled in the real world:
- review: The System Design Primer
- System Design from HiredInTech
- cheat sheet
- flow:
- Understand the problem and scope:
- define the use cases, with interviewer's help
- suggest additional features
- remove items that interviewer deems out of scope
- assume high availability is required, add as a use case
- Think about constraints:
- ask how many requests per month
- ask how many requests per second (they may volunteer it or make you do the math)
- estimate reads vs. writes percentage
- keep 80/20 rule in mind when estimating
- how much data written per second
- total storage required over 5 years
- how much data read per second
- Abstract design:
- layers (service, data, caching)
- infrastructure: load balancing, messaging
- rough overview of any key algorithm that drives the service
- consider bottlenecks and determine solutions
- Understand the problem and scope:
Use of appropriate data structures and algorithms and comes up with generic/object-oriented code that encapsulate aspects of the problem that are subject to change.
- Dynamic Programming
- NP, NP-Complete and Approximation Algorithms
- Know about the most famous classes of NP-complete problems, such as traveling salesman and the knapsack problem.
- Know what NP-complete means.
- Greedy Algs
Able to visualize and design complex systems with multiple product lines and integrations with external systems. Also should be able to design operations support systems like monitoring, reporting, fail overs etc.
- Design patterns
- Describe all the 24 patterns in the GOF book and a working knowledge in the patterns of POSA books
- Design principles
- Know the SOLID principles and have a good understanding of the component principles
- Methods
- XP
- Scrum
- Lean
- Kanban
- Waterfall
- Structured Analysis
- Structured Desing
- UML
- DFDs
- Structure charts
- Petri Nets
- State Transition Diagrams
- Decision tables
- Object oriented programming
Code organization at a physical level closely matches design and looking at file names and folder distribution provides insights into design
Physical layout of source tree matches logical hierarchy and organization. The directory names and organization provide insights into the design of the system.
Able to suggest better alternatives and flows to given requirements based on experience
Can do basic database administration, performance optimization, index optimization, write advanced select queries, able to replace cursor usage with relational sql, understands how data is stored internally, understands how indexes are stored internally, understands how databases can be mirrored, replicated etc. Understands how the two phase commit works.
- Understand the benefits of relational data, e.g. SQL.
- Learn about NoSQL databases, e.g. MongoDB.
- Understand which would be better in certain situations.
- Know how to connect a database with your chosen back-end language (e.g. Node.js + MongoDB).
- Understand the benefits of in-memory data stores like Redis or memcached.
- Web storage to store sessions, cookies, and cached data in the browser.
- Scaling databases, ACID, and ORM (all optional).
- Computer Security
-
Combinatorics (n choose k) & Probability
-
Information theory (videos)
- Khan Academy
- more about Markov processes:
- See more in MIT 6.050J Information and Entropy series below.
-
Parity & Hamming Code (videos)
- Intro
- Parity
- Hamming Code:
- Error Checking
-
Entropy
- make sure to watch information theory videos first
- Information Theory, Claude Shannon, Entropy, Redundancy, Data Compression & Bits (video)
To be categorized.
- Cryptography
- Computer Intelligence
- Reading all from end to end with full comprehension will likely take more time than you have. I recommend being selective on papers and their sections.
- Love classic papers?
- 1978: Communicating Sequential Processes
- 2003: The Google File System
- replaced by Colossus in 2012
- 2004: MapReduce: Simplified Data Processing on Large Clusters
- mostly replaced by Cloud Dataflow?
- 2006: Bigtable: A Distributed Storage System for Structured Data
- 2006: The Chubby Lock Service for Loosely-Coupled Distributed Systems
- 2007: Dynamo: Amazon’s Highly Available Key-value Store
- The Dynamo paper kicked off the NoSQL revolution
- 2007: What Every Programmer Should Know About Memory (very long, and the author encourages skipping of some sections)
- 2010: Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
- 2010: Dremel: Interactive Analysis of Web-Scale Datasets
- 2012: Google's Colossus
- paper not available
- 2012: AddressSanitizer: A Fast Address Sanity Checker:
- 2013: Spanner: Google’s Globally-Distributed Database:
- 2014: Machine Learning: The High-Interest Credit Card of Technical Debt
- 2015: Continuous Pipelines at Google
- 2015: High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads
- 2015: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
- 2015: How Developers Search for Code: A Case Study
- 2016: Borg, Omega, and Kubernetes