You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: chapters/ch04.asciidoc
+58-5Lines changed: 58 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -507,7 +507,7 @@ At its heart, state is mutable. Even if the variable bindings themselves are imm
507
507
508
508
Consider a game of chess, where each of two players starts with 16 pieces, each deterministically assigned a position on a checkerboard. The initial state is always the same. As each player inputs their actions, moving and trading pieces, the system state mutates. A few moves into the game, there is a good chance we'll be facing a game state we haven't ever experienced before. Computer program state is a lot like a game of chess, except there's more nuance in the way of user input, and an infinitude of possible board positions and state permutations.
509
509
510
-
In the world of web development, a human decides to open a new tab in their favorite web browser and they then google for "cat in a pickle gifs". The browser allocates a new process through a system call to the operating system, which chemically shifts some bits around on the physical hardware that lies inside the human's computer. Before the HTTP request hits the network, we need to hit DNS servers, engaging in the ellaborate process of casting `google.com` into an IP address. The browser then checks whether there's a ServiceWorker installed, and assuming there isn't one the request finally takes the default route of querying Google's servers for "cat in a pickle gifs". Naturally, Google receives this request at one of the front-end edges of its public network, in charge of balancing the load and routing requests to healthy back-end services. The query goes through a variety of analyzers that attempt to break it down to its semantic roots, stripping the query down to its essential keywords in an attempt to better match relevant results. As the search engine figures out the 10 most relevant results for "cat pickle gif" out of billions of pages in its index -- which was of course primed by a different system that's also part of the whole -- Google pulls down a highly targeted piece of relevant advertisement about cat gifs that matches what they believe is the demographic the human making the query belongs to, thanks to a sophisticated ad network, figures out whether the user is authenticated with Google through an HTTP header session cookie and the search results page starts being constructed and streamed to the human, who now appears impatient and fidgety. As the first bits of HTML being streaming down the wire, the search engine produces its results and hands them back to the front-end servers, which includes it in the HTML stream that's sent back to the human. The web browser has been working hard at this too, parsing the incomplete pieces of HTML that have been streaming down the wire as best it could, even daring to launch other admirably and equally-mind-boggling requests for HTTP resources presumed to be JavaScript, CSS, font, and image files as the HTML continues to stream down the wire. As the first few chunks of HTML are converted into a DOM tree, the browser would finally be able to begin rendering bits and pieces of the page on the screen, if it weren't because it's still waiting on those equally-mind-boggling CSS and font requests. As the CSS stylesheets and fonts are transmitted, the browser begins modelling the CSSOM and getting a more complete picture of how to turn the HTML and CSS plain text chunks provided by Google servers into a graphical representation that the human finds pleasant. Browser extensions get a chance to meddle with the content, removing the highly targeted piece of relevant advertisement about cat gifs before I even realize Google hoped I wouldn't block ads this time around. A few seconds have passed by since I first decided to search for cats in a pickle. Needless to say, thousands of others brought similarly inane requests to the same systems during this time.
510
+
In the world of web development, a human decides to open a new tab in their favorite web browser and they then google for "cat in a pickle gifs". The browser allocates a new process through a system call to the operating system, which shifts some bits around on the physical hardware that lies inside the human's computer. Before the HTTP request hits the network, we need to hit DNS servers, engaging in the elaborate process of casting `google.com` into an IP address. The browser then checks whether there's a ServiceWorker installed, and assuming there isn't one the request finally takes the default route of querying Google's servers for "cat in a pickle gifs". Naturally, Google receives this request at one of the front-end edges of its public network, in charge of balancing the load and routing requests to healthy back-end services. The query goes through a variety of analyzers that attempt to break it down to its semantic roots, stripping the query down to its essential keywords in an attempt to better match relevant results. As the search engine figures out the 10 most relevant results for "cat pickle gif" out of billions of pages in its index -- which was of course primed by a different system that's also part of the whole -- Google pulls down a highly targeted piece of relevant advertisement about cat gifs that matches what they believe is the demographic the human making the query belongs to, thanks to a sophisticated ad network, figures out whether the user is authenticated with Google through an HTTP header session cookie and the search results page starts being constructed and streamed to the human, who now appears impatient and fidgety. As the first bits of HTML being streaming down the wire, the search engine produces its results and hands them back to the front-end servers, which includes it in the HTML stream that's sent back to the human. The web browser has been working hard at this too, parsing the incomplete pieces of HTML that have been streaming down the wire as best it could, even daring to launch other admirably and equally-mind-boggling requests for HTTP resources presumed to be JavaScript, CSS, font, and image files as the HTML continues to stream down the wire. As the first few chunks of HTML are converted into a DOM tree, the browser would finally be able to begin rendering bits and pieces of the page on the screen, if it weren't because it's still waiting on those equally-mind-boggling CSS and font requests. As the CSS stylesheets and fonts are transmitted, the browser begins modeling the CSSOM and getting a more complete picture of how to turn the HTML and CSS plain text chunks provided by Google servers into a graphical representation that the human finds pleasant. Browser extensions get a chance to meddle with the content, removing the highly targeted piece of relevant advertisement about cat gifs before I even realize Google hoped I wouldn't block ads this time around. A few seconds have passed by since I first decided to search for "cat in a pickle gifs". Needless to say, thousands of others brought similarly inane requests to the same systems during this time.
511
511
512
512
Not only does this example demonstrate the marvelous machinery and infrastructure that fuels even our most flippant daily computing experiences, but it also illustrates how abundantly hopeless it is to make sense of a system as a whole, let alone its comprehensive state at any given point in time. After all, where do we draw the boundaries? Within the code we wrote? The code that powers our customer's computers? Their hardware? The code that powers our servers? Its hardware? The internet as a whole? The power grid?
513
513
@@ -519,28 +519,81 @@ Whenever there's persistance involved, there's going to be a discrepancy between
519
519
520
520
Incidental state can occur when we have a piece of data that's used in several parts of an application, and which is derived from other pieces of data. When the original piece of data is updated, it wouldn't be hard to inadvertently leave the derived pieces of data in their current state, making them stale in comparison with the updated original pieces of data. As an example, consider a piece of user input in Markdown and the HTML representation derived from that piece of Markdown. If the piece of Markdown is updated but the previously compiled pieces of HTML are not, then different parts of the system might display different bits of HTML out of what was apparently the same single Markdown source.
521
521
522
-
When we persist derived state, we're putting the original and the derived data at risk of falling out of sync. This isn't the case just when dealing with persistance layers, but can also occur in a few other scenarios as well. When dealing with caching layers, which may become stale because the underlying original piece of content is updated but we forget to invalidate the stale piece of derived content. Database denormalization is another common occurrence of this problem, whereby creating a lot of derived state can result in synchronization problems. An example of this might be forum software where user profiles are denormalized into comments in an effort to save a database roundtrip, but then when users update their profile, their old comments preserve an stale avatar, signature, or display name. To avoid this kind of issue, we should always consider recomputing derived state from its roots. Even though doing so won't always be possible or even practical, encouraging this kind of thinking will, if anything, increase awareness about the subtle intricacies of denormalized state.
522
+
When we persist derived state, we're putting the original and the derived data at risk of falling out of sync. This isn't the case just when dealing with persistance layers, but can also occur in a few other scenarios as well. When dealing with caching layers, their content may become stale because the underlying original piece of content is updated but we forget to invalidate pieces of content derived from the updated data. Database denormalization is another common occurrence of this problem, whereby creating derived state can result in synchronization problems and stale byproducts of the original data.
523
523
524
-
==== 4.3.3 Containing State
524
+
This lack of synchronization is often observed in discussion forum software, where user profiles are denormalized into comment objects in an effort to save a database roundtrip. When users update later update their profile, however, their old comments preserve an stale avatar, signature, or display name. To avoid this kind of issue, we should always consider recomputing derived state from its roots. Even though doing so won't always be possible, performant, or even practical, encouraging this kind of thinking across a development team will, if anything, increase awareness about the subtle intricacies of denormalized state.
525
525
526
-
.. when we must have state, keep it as constrained as possible
526
+
As long as we're aware of the risks of data denormalization, we can then indulge in it. A parallel could be drawn to the case of performance optimization, where we should be aware of how attempting to optimize a program basing off of microbenchmarks in stead of data-driven optimization will most likely result in wasted developer time. Furthermore, just like with caches and other intermediate representations of data, performance optimization can lead to bugs and code that's ultimately harder to maintain, which is why neither should be embarked upon lightly, unless there's a business case where performance is hurting the bottom line.
527
527
528
+
==== 4.3.3 Containing State
528
529
530
+
State is inevitable. As we discussed in section 4.3.1, though, the full picture hardly affects our ability to maintain small parts of that state tree. In the local case -- each of the interrelated but ultimately separate pieces of code we work with in our day to day -- all that matters are the inputs we receive and the outputs we produce. That said, generating a large amount of output where we could instead emit a single piece of information is undesirable.
529
531
532
+
When all intermediate state is contained inside a component instead of being leaked to others, we're reducing the friction in interacting with our component or function. The more we condense state into its smallest possible representation for output purposes, the better contained our functions will become. Incidentaly, we're making the interface easier to consume. Since there's less state to draw from, there's fewer ways of consuming that state. This reduces the amount of possible use cases, but by favoring composability over serving every possible need, we're making each piece of functionality, when evaluated on its own, simpler.
530
533
534
+
One other case where we may incidentally increase complexity is whenever we modify the property values of an input. This type of operation should be made extremely explicit, as to not be confused, and avoided where possible. If we assume functions to be defined as the equation between the inputs we receive and the outputs we produce, then the side-effects are ill-advised. Mutations on the input within the body of a function is one example of side-effects, which can be a source of bugs and confusion, particularly due to the difficulty in tracking down the source for these mutations.
531
535
536
+
It is not uncommon to observe functions that modify an input parameter and then return that parameter. This is often the case with `Array#map` callbacks, where the developer wants to change a property or two on each object in a list, but also to preserve the original objects as the elements in the collection, as shown in the following example.
532
537
538
+
[source,javascript]
539
+
----
540
+
movies.map(movie => {
541
+
movie.profit = movie.gross - movie.budget
542
+
return movie
543
+
})
544
+
----
533
545
546
+
In these cases it might be best to avoid using `Array#map` altogether, using `Array#forEach` or `for..of` instead, as shown next.
534
547
548
+
[source,javascript]
549
+
----
550
+
for (const movie of movies) {
551
+
movie.profit = movie.gross - movie.budget
552
+
}
553
+
----
535
554
555
+
Neither `Array#forEach` nor `for..of` allow for chaining, assuming you wanted to filter the `movies` by a criteria such as "profit is greater than $15M": they're pure loops that don't produce any output. This is a good problem to have, however, because it explicitly separates data mutations at the `movie` item level, where we're adding a `profit` property to each item in `movies`; from transformations at the `movies` level, where we want to produce an entirely new collection consisting only of expensive movies.
556
+
557
+
[source,javascript]
558
+
----
559
+
for (const movie of movies) {
560
+
movie.profit = movie.amount * movie.unitCost
561
+
}
562
+
const successfulMovies = movies.filter(
563
+
movie => movie.profit > 15
564
+
)
565
+
----
536
566
567
+
Relying on immutability would be an alternative that doesn't involve pure loops nor does resort to breakage-prone side-effects.
537
568
538
569
==== 4.3.4 Leveraging Immutability
539
570
540
-
..
571
+
The following example takes advantage of the object spread operator to copy every property of `movie` into a new object, and then adds a `profit` property to it. Here we're creating a new collection, made up of new `movie` objects.
572
+
573
+
[source,javascript]
574
+
----
575
+
const movieModels = movies.map(movie => ({
576
+
...movie,
577
+
profit: movie.amount * movie.unitCost
578
+
}))
579
+
const successfulMovies = movieModels.filter(
580
+
movie => movie.profit > 15
581
+
)
582
+
----
583
+
584
+
Thanks to us making fresh copies of the objects we're working with, we've preserved the `movies` collection intact. If we now assume that `movies` was an input to our function, we could say that modifying any movie in that collection would've made our function impure, since it'd have the side-effect of unexpectedly altering the input.
585
+
586
+
By introducing immutability, we've kept the function pure. That means that its output only depends on its inputs, and that we don't create any side-effects such as changing the inputs themselves. This in turn guarantees that the function is idempotent, where calling a function repeatedly with the same input always produces the same result, given the output depends solely on the inputs and there are no side-effects. In contrast, the idempotence property would've been brought into question if we had tainted the input by adding a `profit` field to every movie.
587
+
588
+
Large amounts of intermediate state or logic which permutates data into different shapes, back and forth, may be a signal that we've picked poor representations of our data. When the right data structures are identified, we'll notice there's a lot less transformation, mapping, and looping involved into getting inputs to become the outputs we need to produce. In section 4.4 we'll dive deeper into data structures.
541
589
542
590
=== 4.4 Data Structures are King
543
591
592
+
593
+
594
+
595
+
596
+
544
597
.. a section on how data structures make or break an application, and why data-driven is better than state or logic driven
0 commit comments