Dit is een gids voor de betrouwbaarheid van JavaScript & Node.js van A-Z. Het vat tientallen van de beste blogposts, boeken en tools die de markt te bieden heeft samen.
Geavanceerde onderwerpen zoals testen in productie, mutatietesten, op property-based testen en vele andere strategische en professionele tools worden behandeld. Elk woord in deze gids zal de testvaardigheden boven het gemiddelde brengen.
Begin met het begrijpen van de testmethoden die de basis vormen voor elke applicatielaag zoals: frontend / UI, backend, CI of misschien allemaal?
- Een JavaScript & Node.js consultant
- 📗 Testing Node.js & JavaScript From A To Z - Mijn uitgebreide online cursus met meer dan 10 uur aan video, 14 test types en meer dan 40 best practices
- Volg mij op Twitter
- 🇨🇳Chinese - met dank aan Yves yao
- 🇰🇷Korean - met dank aan Rain Byun
- 🇵🇱Polish - met dank aan Michal Biesiada
- 🇪🇸Spanish - met dank aan Miguel G. Sanguino
- Vertalen naar een eigen taal? open een issue 💜
Advies dat alle anderen inspireert (1 speciale punt)
De basis - structeren van testen (12 punten)
Efficientl schrijven van backend and Microservices testen (8 punten)
Schrijven van web UI testen inclusief component en E2E tests (11 punten)
Kijken naar de watchman - meten van test kwaliteit (4 punten)
Richtlijnen voor CI in de JavaScript wereld (9 punten)
✅ Doen:
Test code is geen productie-code - ontwerp het om doodeenvoudig te zijn, kort, abstractie-vrij, vlak, heerlijk om mee te werken, lean. Iemand moet de test en de intentie direct begrijpen.
Ons hoofd zit al vol met de belangrijkste stukken productiecode, we hebben geen ruimte voor extra complexiteit. Als we proberen om andere uitdagende code in ons arme brein te persen, zal dat het team vertragen, wat tegenstrijdig is met de reden waarom we testen. Praktisch gezien is dit de reden waarom teams het testen in de steek laten.
Testen is een kans, heerlijk om mee te werken en die een grote meerwaarde biedt voor zo'n kleine investering. Het is een vriendelijke en goedlachse assistent. De wetenschap vertelt ons dat we twee hersensystemen hebben: systeem 1 wordt gebruikt voor moeiteloze activiteiten zoals autorijden op een lege weg en systeem 2 dat bedoeld is voor complexe en bewuste operaties zoals het oplossen van een wiskundige vergelijking. Ontwerp je test voor systeem 1, als je naar testcode kijkt, zou het net zo eenvoudig moeten zijn als het aanpassen van een HTML-document. En niet het oplossen van bijvoorbeeld 2X (17 × 24).
Dit kan worden bereikt door selectieve cherry-picking-technieken, tools en testdoelen die kosteneffectief zijn en een geweldige ROI opleveren. Test alleen zoveel als nodig is, streef ernaar om het wendbaar te houden, soms is het zelfs de moeite waard om enkele tests te laten vallen en betrouwbaarheid in te ruilen voor behendigheid en eenvoud.
De meeste van de onderstaande adviezen zijn afgeleid van dit principe.
✅ Doen: Een testrapport moet uitwijzen of de huidige applicatierevisie voldoet aan de eisen voor de mensen die niet persé bekend zijn met de code: de tester, de DevOps engineer en de toekomstige jij, twee jaar van nu. Dit kan het beste worden bereikt als de tests op het niveau van de vereisten spreken en uit 3 onderdelen bestaan:
(1) Wat wordt er getest? Bijvoorbeeld de methode ProductsService.addNewProduct
(2) Onder welke omstandigheden en scenario? Er wordt bijvoorbeeld geen prijs doorgegeven aan de methode
(3) Wat is het verwachte resultaat? Het nieuwe product is bijvoorbeeld niet goedgekeurd
❌ Anders: Een deployment is zojuist gefaald, een test genaamd "Add product" faalde. Vertelt dit wat er precies defect is?
👇 Aantekening: Elk punt heeft code voorbeelden en some een illustratie. Klik om uit te breiden.
✏ Code Voorbeelden
//1. unit under test
describe('Products Service', function() {
describe('Add new product', function() {
//2. scenario and 3. expectation
it('When no price is specified, then the product status is pending approval', ()=> {
const newProduct = new ProductService().add(...);
expect(newProduct.status).to.equal('pendingApproval');
});
});
});
© Credits & read-more
1. Roy Osherove - Naming standards for unit tests✅ Doen: Structureer je test met 3 Structure your tests with 3 goed gescheiden secties Arrange, Act & Assert (AAA). Het volgen van deze structuur garandeert dat de lezer geen hersen-CPU uitgeeft om de test te begrijpen.
1e A - Arrange (Inrichten): Alle setup code die nodig is om het systeem naar het scenario te brengen dat de test probeert te simuleren. Dit kan het instantiëren van de unit die je wilt testen, het toevoegen van DB-records, het mocken/stubben van objecten en andere voorbereidingscode.
2e A - Act (Uitvoeren): Voer de unit wat getest moet worden uit. Meestal is dat 1 regel code.
3e A - Assert (Verificatie): Zorg ervoor dat de ontvangen waarde voldoet aan de verwachting. Meestal 1 regel code.
❌ Anders: Je besteedt niet alleen uren aan het begrijpen van de hoofdcode, maar wat het eenvoudigste deel van de dag had moeten zijn (testen), rekt je hersenen uit
✏ Code Voorbeelden
describe("Customer classifier", () => {
test("When customer spent more than 500$, should be classified as premium", () => {
//Arrange
const customerToClassify = { spent: 505, joined: new Date(), id: 1 };
const DBStub = sinon.stub(dataAccess, "getCustomer").reply({ id: 1, classification: "regular" });
//Act
const receivedClassification = customerClassifier.classifyCustomer(customerToClassify);
//Assert
expect(receivedClassification).toMatch("premium");
});
});
test("Should be classified as premium", () => {
const customerToClassify = { spent: 505, joined: new Date(), id: 1 };
const DBStub = sinon.stub(dataAccess, "getCustomer").reply({ id: 1, classification: "regular" });
const receivedClassification = customerClassifier.classifyCustomer(customerToClassify);
expect(receivedClassification).toMatch("premium");
});
✅ Doen: Declaratieve stijl van testen coderen, kan de lezer de intentie onmiddellijk begrijpen zonder zelfs maar een enkele hersen-CPU-cyclus te besteden. Wanneer we dwingende code schrijven vol met voorwaardelijke logica, wordt de lezer gedwongen om meer hersen-CPU-cycli uit te voeren. Codeer in dat geval de verwachting in een mensachtige taal, declaratieve BDD-stijl met expect
of should
en niet met aangepaste code. Als Chai & Jest niet de gewenste verificatie bevat en het is zeer herhaalbaar, overweeg dan extending Jest matcher (Jest) of schrijf een custom Chai plugin
❌ Anders: Team zal minder testen schrijven markeren de vervelende met .skip()
✏ Code Voorbeelden
👎 Anti-Pattern Voorbeeld: De lezer moet door lange en dwingende code duiken om het verhaal achter de test te begrijpen
test("When asking for an admin, ensure only ordered admins in results", () => {
//assuming we've added here two admins "admin1", "admin2" and "user1"
const allAdmins = getUsers({ adminOnly: true });
let admin1Found,
adming2Found = false;
allAdmins.forEach(aSingleUser => {
if (aSingleUser === "user1") {
assert.notEqual(aSingleUser, "user1", "A user was found and not admin");
}
if (aSingleUser === "admin1") {
admin1Found = true;
}
if (aSingleUser === "admin2") {
admin2Found = true;
}
});
if (!admin1Found || !admin2Found) {
throw new Error("Not all admins were returned");
}
});
it("When asking for an admin, ensure only ordered admins in results", () => {
//assuming we've added here two admins
const allAdmins = getUsers({ adminOnly: true });
expect(allAdmins)
.to.include.ordered.members(["admin1", "admin2"])
.but.not.include.ordered.members(["user1"]);
});
✅ Doen: Het testen van de internals brengt enorme overheadkosten met zich mee. En dit levert tevens vrij weinig op. Als de code / API de juiste resultaten oplevert, moet je dan echt uren investeren in het testen HOE het intern werkt en vervolgens deze kwetsbare tests onderhouden? Telkens wanneer een openbaar gedrag wordt gecontroleerd, wordt de privé-implementatie ook impliciet getest en zullen de tests alleen breken als er een bepaald probleem is (bijv. verkeerde uitvoer). Deze benadering wordt ook wel behavioral testing
genoemd. Aan de andere kant, als je de interne onderdelen zou testen (white box-benadering) - je focus verschuift van het plannen van de componentuitkomst naar nitty-gritty details en je test kan breken vanwege kleine code-wijzigingen, hoewel de resultaten prima zijn - dit verhoogt het last van onderhoud aanzienlijk
❌ Anders: Je test gedraagd zich als boy who cried wolf: en schreeft vals positieve resultaten (bijvoorbeeld, een test faaltomdat een prive variabele naam was veranderd). Het is niet verwonderlijk dat mensen binnenkort de CI-meldingen zullen negeren totdat op een dag een echte bug wordt genegeerd ...
✏ Code voorbeelden
class ProductService {
//this method is only used internally
//Change this name will make the tests fail
calculateVATAdd(priceWithoutVAT) {
return { finalPrice: priceWithoutVAT * 1.2 };
//Change the result format or key name above will make the tests fail
}
//public method
getPrice(productId) {
const desiredProduct = DB.getProduct(productId);
finalPrice = this.calculateVATAdd(desiredProduct.price).finalPrice;
return finalPrice;
}
}
it("White-box test: When the internal methods get 0 vat, it return 0 response", async () => {
//There's no requirement to allow users to calculate the VAT, only show the final price. Nevertheless we falsely insist here to test the class internals
expect(new ProductService().calculateVATAdd(0).finalPrice).to.equal(0);
});
✅ Doen: Test doubles zijn een noodzakelijk kwaad omdat ze gekoppeld zijn aan de internals van de applicatie, toch bieden sommige een enorme waarde (Lees hier een herinneren over testdoubles: mocks vs stubs vs spies).
Voordat je testdubbels gebruikt, moet je een heel eenvoudige vraag stellen: gebruik ik deze om functionaliteit te testen die in de requirements verschijnt of zou kunnen verschijnen? Zo nee, dan is het een white-box testing smell.
Als je bijvoorbeeld wilt testen of uw app zich redelijk gedraagt wanneer de betalingsservice niet beschikbaar is, kun je de betalingsservice stubben en een ‘No Response’ laten retourneren om ervoor te zorgen dat de unit die wordt getest de juiste waarde retourneert. Dit controleert ons applicatiegedrag / respons / resultaat onder bepaalde scenario's. Je kunt ook een spy gebruiken om te controleren dat er een e-mail verzonden is toen die service niet beschikbaar was - dit is opnieuw een gedragscontrole die waarschijnlijk zal verschijnen in de requirements ("Stuur een e-mail als de betaling niet kon worden opgeslagen"). Aan de andere kant, als je de betalingsservice mockt en ervoor zorgt dat deze werd aangeroepen met de juiste JavaScript-typen, dan is je test gericht op interne zaken die niets hebben met de functionaliteit van de applicatie en die waarschijnlijk vaak zullen veranderen
❌ Anders: Elke herstructurering/wijziging van code vereist zoeken naar alle mocks in de code en dient overeenkomstig bijgewerkt te worden. Tests worden eerder een last dan een behulpzame vriend
✏ Code Voorbeelden
it("When a valid product is about to be deleted, ensure data access DAL was called once, with the right product and right config", async () => {
//Assume we already added a product
const dataAccessMock = sinon.mock(DAL);
//hmmm BAD: testing the internals is actually our main goal here, not just a side-effect
dataAccessMock
.expects("deleteProduct")
.once()
.withArgs(DBConfig, theProductWeJustAdded, true, false);
new ProductService().deletePrice(theProductWeJustAdded);
dataAccessMock.verify();
});
👏Het juiste Voorbeeld: spies gefocussed op het testen van requirements, maar als een bijwerking zijn de interne onderdelen onvermijdelijk
it("When a valid product is about to be deleted, ensure an email is sent", async () => {
//Assume we already added here a product
const spy = sinon.spy(Emailer.prototype, "sendEmail");
new ProductService().deletePrice(theProductWeJustAdded);
//hmmm OK: we deal with internals? Yes, but as a side effect of testing the requirements (sending an email)
expect(spy.calledOnce).to.be.true;
});
Bezoek mijn online cursus Testing Node.js & JavaScript From A To Z
✅ Doen: Vaak worden productiebugs onthuld onder een aantal zeer specifieke en verrassende input - hoe realistischer de testinvoer is, hoe groter de kans is dat bugs vroegtijdig worden ontdekt. Gebruik speciale bibliotheken zoals Faker om pseudo-echte gegevens te genereren die lijken op de verscheidenheid en vorm van productiegegevens. Dergelijke bibliotheken kunnen bijvoorbeeld realistische telefoonnummers, gebruikersnamen, creditcard-, bedrijfsnamen en zelfs ‘lorem ipsum'-tekst genereren. Tests kunnen gemaakt worden (bovenop unit-tests, niet als vervanging) die fakersgegevens willekeurig verdelen om uw te testen eenheid uit te rekken of zelfs echte gegevens uit uw productieomgeving importeren. Wil je het naar een hoger niveau tillen? Zie het volgende (property-based testing).
❌ Anders: Development tests zullen ten onrechte groen worden weergegeven als synthetische invoer wordt gebruikt zoals "Foo" gebruikt, maar de productie kan rood worden wanneer een hacker een vervelende string zoals "@3e2ddsf" invoert . ## ’1 fdsfds. fds432 AAAA "
✏ Code Voorbeelden
const addProduct = (name, price) => {
const productNameRegexNoSpace = /^\S*$/; //no white-space allowed
if (!productNameRegexNoSpace.test(name)) return false; //this path never reached due to dull input
//some logic here
return true;
};
test("Wrong: When adding new product with valid properties, get successful confirmation", async () => {
//The string "Foo" which is used in all tests never triggers a false result
const addProductResult = addProduct("Foo", 5);
expect(addProductResult).toBe(true);
//Positive-false: the operation succeeded because we never tried with long
//product name including spaces
});
it("Better: When adding new valid product, get successful confirmation", async () => {
const addProductResult = addProduct(faker.commerce.productName(), faker.random.number());
//Generated random input: {'Sleek Cotton Computer', 85481}
expect(addProductResult).to.be.true;
//Test failed, the random input triggered some path we never planned for.
//We discovered a bug early!
});
✅ Doen: Meestal kiezen we voor elke test een paar invoermonsters. Zelfs als het invoerformaat lijkt op gegevens uit de echte wereld (zie opsommingsteken 'Gebruik geen foo'), behandelen we slechts een paar invoercombinaties (method ('', true, 1), method ("string", false ", 0) ), Maar in productie kan een API die wordt aangeroepen met 5 parameters worden aangeroepen met duizenden verschillende permutaties, een ervan kan ons proces verslechten (zie Fuzz Testing). Wat als er een enkele test zou kunnen worden geschreven die automatisch 1000 permutaties van verschillende inputs verzendt en ontdekt dat onze code niet het juiste antwoord geeft? Property based testing is een techniek die precies dat doet: door alle mogelijke invoercombinaties te gebruiken, verhoogt het de kans op het vinden van een bug. Gegeven een methode - addNewProduct (id, naam, isDiscount) - zullen de ondersteunende bibliotheken deze methode aanroepen met veel combinaties van (nummer, tekenreeks, boolean) zoals (1, "iPhone", false), (2, "Galaxy ”, Waar). Property based testing kan uitgevoerd worden met verschillende testrunner (Mocha, Jest, enz.) Met behulp van bibliotheken zoals js-verify of testcheck (veel betere documentatie). Update: Nicolas Dubien suggereert in de reacties hieronder om checkout fast-check te gebruiken, die enkele extra functies lijkt te bieden en ook actief wordt onderhouden
❌ Anders: Onbewust kiezen we de testinputs die alleen codepaden dekken die goed werken. Helaas vermindert dit de efficiëntie van het testen om bugs bloot te leggen
✏ Code Voorbeelden
import fc from "fast-check";
describe("Product service", () => {
describe("Adding new", () => {
//this will run 100 times with different random properties
it("Add new product with random yet valid properties, always successful", () =>
fc.assert(
fc.property(fc.integer(), fc.string(), (id, name) => {
expect(addNewProduct(id, name).status).toEqual("approved");
})
));
});
});
✅ Doen: Als er behoefte is aan snapshot testing, gebruik dan alleen korte en gerichte snapshots (dwz 3-7 regels) die zijn opgenomen als onderdeel van de test (Inline Snapshot) en niet in externe bestanden. Deze zorgt ervoor dat tests voor zichzelf spreken en minder kwetsbaar zijn.
Aan de andere kant moedigen ‘klassieke snapshots’ tutorials en tools aan om grote bestanden (bijv. Component rendering markup, API JSON-resultaat) op een extern medium op te slaan en ervoor te zorgen dat elke keer dat de test wordt uitgevoerd, het ontvangen resultaat wordt vergeleken met de opgeslagen versie. Dit kan bijvoorbeeld onze test impliciet koppelen aan 1000 regels met 3000 datawaarden die de testschrijver nooit heeft gelezen en waarover de reden onbekend is. Waarom is dit fout? Door dit te doen, zijn er 1000 redenen waarom een test mislukt - het volstaat dat één regel verandert om de momentopname ongeldig te maken, en dit zal waarschijnlijk vaak gebeuren. Hoe vaak? voor elke spatie, opmerking of kleine CSS / HTML-wijziging. Niet alleen dit, de testnaam zou geen idee geven van de storing, omdat het alleen controleert of 1000 regels niet zijn veranderd, het moedigt de testschrijver ook aan om een lang document dat hij niet kon inspecteren als de gewenste waarheid te accepteren en verifiëren. Dit zijn allemaal symptomen van een obscure en gretige test die niet gefocust is en te veel wil bereiken
Het is vermeldenswaard dat er maar weinig gevallen zijn waarin lange en externe momentopnamen acceptabel zijn - wanneer wordt gecheckt op schema en niet op gegevens (waarden extraheren en focussen op velden) of wanneer het ontvangen document zelden verandert
❌ Anders: Een UI-test mislukt. De code lijkt goed, het scherm geeft perfecte pixels weer, wat is er gebeurd? uw snapshot-testen hebben zojuist een verschil gevonden tussen het oorspronkelijke document en het huidige ontvangen - er is een enkele spatie toegevoegd aan de markdown ...
✏ Code Voorbeelden
it("TestJavaScript.com is renderd correctly", () => {
//Arrange
//Act
const receivedPage = renderer
.create(<DisplayPage page="http://www.testjavascript.com"> Test JavaScript </DisplayPage>)
.toJSON();
//Assert
expect(receivedPage).toMatchSnapshot();
//We now implicitly maintain a 2000 lines long document
//every additional line break or comment - will break this test
});
it("When visiting TestJavaScript.com home page, a menu is displayed", () => {
//Arrange
//Act
const receivedPage = renderer
.create(<DisplayPage page="http://www.testjavascript.com"> Test JavaScript </DisplayPage>)
.toJSON();
//Assert
const menu = receivedPage.content.menu;
expect(menu).toMatchInlineSnapshot(`
<ul>
<li>Home</li>
<li> About </li>
<li> Contact </li>
</ul>
`);
});
✅ Doen: Volgens de gouden regel (bullet 0) moet elke test een eigen set DB-rijen toevoegen en hierop reageren om koppeling te voorkomen en gemakkelijk de testflow te begrijpen. In werkelijkheid wordt dit vaak geschonden door testers die de database voorzien van gegevens voordat de tests worden uitgevoerd (ook bekend als 'test fixture') omwille van prestatie verbetering. Hoewel prestatie inderdaad een terechte zorg is - kan het worden verzacht (zie punt “Component testing”), maar de testcomplexiteit is een zeer pijnlijk verdriet dat meestal andere overwegingen zou moeten beheersen. Zorg er praktisch voor dat elke testcase expliciet de DB-records toevoegt die het nodig heeft en handel alleen op die records. Als prestaties van de testuitvoer een kritieke zorg wordt, kan er een compromis komen in de vorm van het voorzien van testdata bij testen die geen gegevens muteren (bijv.query's)
❌ Anders: Er zijn maar weinig tests die falen, een deployment wordt afgebroken, ons team gaat nu kostbare tijd besteden, hebben we een bug? laten we eens kijken, oh nee - het lijkt erop dat twee tests dezelfde testgegevens muteerden
✏ Code Voorbeelden
👎 Anti-Pattern Voorbeeld: tests zijn niet onafhankelijk en vertrouwen op een globale hook om globale DB-gegevens te voeden
before(async () => {
//adding sites and admins data to our DB. Where is the data? outside. At some external json or migration framework
await DB.AddSeedDataFromJson('seed.json');
});
it("When updating site name, get successful confirmation", async () => {
//I know that site name "portal" exists - I saw it in the seed files
const siteToUpdate = await SiteService.getSiteByName("Portal");
const updateNameResult = await SiteService.changeName(siteToUpdate, "newName");
expect(updateNameResult).to.be(true);
});
it("When querying by site name, get the right site", async () => {
//I know that site name "portal" exists - I saw it in the seed files
const siteToCheck = await SiteService.getSiteByName("Portal");
expect(siteToCheck.name).to.be.equal("Portal"); //Failure! The previous test change the name :[
});
👏 Het juiste voorbeeld: We kunnen binnen de test blijven, elke test handelt op zijn eigen set gegevens
it("When updating site name, get successful confirmation", async () => {
//test is adding a fresh new records and acting on the records only
const siteUnderTest = await SiteService.addSite({
name: "siteForUpdateTest"
});
const updateNameResult = await SiteService.changeName(siteUnderTest, "newName");
expect(updateNameResult).to.be(true);
});
✅ Doen: Als je probeert te controleren dat bepaalde invoer een fout veroorzaakt, lijkt het misschien goed om try-catch-finally te gebruiken en de catch-clausule te controleren of het is ingevoerd. Het resultaat is een lastige en uitgebreide testcase (voorbeeld hieronder) die de eenvoudige testintentie en de resultaatverwachtingen verbergt
Een eleganter alternatief is het gebruik van de eenregelige speciale Chai-assertion: expect(methode).to.throw(of in Jest: expect(methode).toThrow()). Het is absoluut verplicht om er ook voor te zorgen dat de exceptie een eigenschap bevat die het fouttype bevat, anders kan de toepassing bij een algemene fout niet veel meer doen dan een teleurstellend bericht aan de gebruiker laten zien
❌ Anders: Het zal een uitdaging zijn om uit de testrapporten (bijvoorbeeld CI-rapporten) af te leiden wat er mis is gegaan
✏ Code Voorbeelden
👎 Anti-pattern Example: Een lange testcase die het bestaan van een fout probeert te bevestigen met try-catch
it("When no product name, it throws error 400", async () => {
let errorWeExceptFor = null;
try {
const result = await addNewProduct({});
} catch (error) {
expect(error.code).to.equal("InvalidInput");
errorWeExceptFor = error;
}
expect(errorWeExceptFor).not.to.be.null;
//if this assertion fails, the tests results/reports will only show
//that some value is null, there won't be a word about a missing Exception
});
👏 Het juiste voorbeeld: Een door mensen leesbare verwachting die gemakkelijk kan worden begrepen, misschien zelfs door QA of technische PM
it("When no product name, it throws error 400", async () => {
await expect(addNewProduct({}))
.to.eventually.throw(AppError)
.with.property("code", "InvalidInput");
});
✅ Do: Different tests must run on different scenarios: quick smoke, IO-less, tests should run when a developer saves or commits a file, full end-to-end tests usually run when a new pull request is submitted, etc. This can be achieved by tagging tests with keywords like #cold #api #sanity so you can grep with your testing harness and invoke the desired subset. For example, this is how you would invoke only the sanity test group with Mocha: mocha — grep ‘sanity’
❌ Otherwise: Running all the tests, including tests that perform dozens of DB queries, any time a developer makes a small change can be extremely slow and keeps developers away from running tests
✏ Code Examples
👏 Doing It Right Example: Tagging tests as ‘#cold-test’ allows the test runner to execute only fast tests (Cold===quick tests that are doing no IO and can be executed frequently even as the developer is typing)
//this test is fast (no DB) and we're tagging it correspondigly
//now the user/CI can run it frequently
describe("Order service", function() {
describe("Add new order #cold-test #sanity", function() {
test("Scenario - no currency was supplied. Expectation - Use the default currency #sanity", function() {
//code logic here
});
});
});
✅ Do: Apply some structure to your test suite so an occasional visitor could easily understand the requirements (tests are the best documentation) and the various scenarios that are being tested. A common method for this is by placing at least 2 'describe' blocks above your tests: the 1st is for the name of the unit under test and the 2nd for additional level of categorization like the scenario or custom categories (see code examples and print screen below). Doing so will also greatly improve the test reports: The reader will easily infer the tests categories, delve into the desired section and correlate failing tests. In addition, it will get much easier for a developer to navigate through the code of a suite with many tests. There are multiple alternative structures for test suite that you may consider like given-when-then and RITE
❌ Otherwise: When looking at a report with flat and long list of tests, the reader have to skim-read through long texts to conclude the major scenarios and correlate the commonality of failing tests. Consider the following case: When 7/100 tests fail, looking at a flat list will demand reading the failing tests text to see how they relate to each other. However, in a hierarchical report all of them could be under the same flow or category and the reader will quickly infer what or at least where is the root failure cause
✏ Code Examples
👏 Doing It Right Example: Structuring suite with the name of unit under test and scenarios will lead to the convenient report that is shown below
// Unit under test
describe("Transfer service", () => {
//Scenario
describe("When no credit", () => {
//Expectation
test("Then the response status should decline", () => {});
//Expectation
test("Then it should send email to admin", () => {});
});
});
👎 Anti-pattern Example: A flat list of tests will make it harder for the reader to identify the user stories and correlate failing tests
test("Then the response status should decline", () => {});
test("Then it should send email", () => {});
test("Then there should not be a new transfer record", () => {});
✅ Do: This post is focused on testing advice that is related to, or at least can be exemplified with Node JS. This bullet, however, groups few non-Node related tips that are well-known
Learn and practice TDD principles — they are extremely valuable for many but don’t get intimidated if they don’t fit your style, you’re not the only one. Consider writing the tests before the code in a red-green-refactor style, ensure each test checks exactly one thing, when you find a bug — before fixing write a test that will detect this bug in the future, let each test fail at least once before turning green, start a module by writing a quick and simplistic code that satisfies the test - then refactor gradually and take it to a production grade level, avoid any dependency on the environment (paths, OS, etc)
❌ Otherwise: You‘ll miss pearls of wisdom that were collected for decades
✅ Do: The testing pyramid, though 10> years old, is a great and relevant model that suggests three testing types and influences most developers’ testing strategy. At the same time, more than a handful of shiny new testing techniques emerged and are hiding in the shadows of the testing pyramid. Given all the dramatic changes that we’ve seen in the recent 10 years (Microservices, cloud, serverless), is it even possible that one quite-old model will suit all types of applications? shouldn’t the testing world consider welcoming new testing techniques?
Don’t get me wrong, in 2019 the testing pyramid, TDD and unit tests are still a powerful technique and are probably the best match for many applications. Only like any other model, despite its usefulness, it must be wrong sometimes. For example, consider an IoT application that ingests many events into a message-bus like Kafka/RabbitMQ, which then flow into some data-warehouse and are eventually queried by some analytics UI. Should we really spend 50% of our testing budget on writing unit tests for an application that is integration-centric and has almost no logic? As the diversity of application types increase (bots, crypto, Alexa-skills) greater are the chances to find scenarios where the testing pyramid is not the best match.
It’s time to enrich your testing portfolio and become familiar with more testing types (the next bullets suggest few ideas), mind models like the testing pyramid but also match testing types to real-world problems that you’re facing (‘Hey, our API is broken, let’s write consumer-driven contract testing!’), diversify your tests like an investor that build a portfolio based on risk analysis — assess where problems might arise and match some prevention measures to mitigate those potential risks
A word of caution: the TDD argument in the software world takes a typical false-dichotomy face, some preach to use it everywhere, others think it’s the devil. Everyone who speaks in absolutes is wrong :]
❌ Otherwise: You’re going to miss some tools with amazing ROI, some like Fuzz, lint, and mutation can provide value in 10 minutes
✏ Code Examples
👏 Doing It Right Example: Cindy Sridharan suggests a rich testing portfolio in her amazing post ‘Testing Microservices — the same way’
✅ Do: Each unit test covers a tiny portion of the application and it’s expensive to cover the whole, whereas end-to-end testing easily covers a lot of ground but is flaky and slower, why not apply a balanced approach and write tests that are bigger than unit tests but smaller than end-to-end testing? Component testing is the unsung song of the testing world — they provide the best from both worlds: reasonable performance and a possibility to apply TDD patterns + realistic and great coverage.
Component tests focus on the Microservice ‘unit’, they work against the API, don’t mock anything which belongs to the Microservice itself (e.g. real DB, or at least the in-memory version of that DB) but stub anything that is external like calls to other Microservices. By doing so, we test what we deploy, approach the app from outwards to inwards and gain great confidence in a reasonable amount of time.
❌ Otherwise: You may spend long days on writing unit tests to find out that you got only 20% system coverage
✏ Code Examples
✅ Do: So your Microservice has multiple clients, and you run multiple versions of the service for compatibility reasons (keeping everyone happy). Then you change some field and ‘boom!’, some important client who relies on this field is angry. This is the Catch-22 of the integration world: It’s very challenging for the server side to consider all the multiple client expectations — On the other hand, the clients can’t perform any testing because the server controls the release dates. Consumer-driven contracts and the framework PACT were born to formalize this process with a very disruptive approach — not the server defines the test plan of itself rather the client defines the tests of the… server! PACT can record the client expectation and put in a shared location, “broker”, so the server can pull the expectations and run on every build using PACT library to detect broken contracts — a client expectation that is not met. By doing so, all the server-client API mismatches are caught early during build/CI and might save you a great deal of frustration
❌ Otherwise: The alternatives are exhausting manual testing or deployment fear
✅ Do: Many avoid Middleware testing because they represent a small portion of the system and require a live Express server. Both reasons are wrong — Middlewares are small but affect all or most of the requests and can be tested easily as pure functions that get {req,res} JS objects. To test a middleware function one should just invoke it and spy (using Sinon for example) on the interaction with the {req,res} objects to ensure the function performed the right action. The library node-mock-http takes it even further and factors the {req,res} objects along with spying on their behavior. For example, it can assert whether the http status that was set on the res object matches the expectation (See example below)
❌ Otherwise: A bug in Express middleware === a bug in all or most requests
✏ Code Examples
👏Doing It Right Example: Testing middleware in isolation without issuing network calls and waking-up the entire Express machine
//the middleware we want to test
const unitUnderTest = require("./middleware");
const httpMocks = require("node-mocks-http");
//Jest syntax, equivelant to describe() & it() in Mocha
test("A request without authentication header, should return http status 403", () => {
const request = httpMocks.createRequest({
method: "GET",
url: "/user/42",
headers: {
authentication: ""
}
});
const response = httpMocks.createResponse();
unitUnderTest(request, response);
expect(response.statusCode).toBe(403);
});
✅ Do: Using static analysis tools helps by giving objective ways to improve code quality and keep your code maintainable. You can add static analysis tools to your CI build to abort when it finds code smells. Its main selling points over plain linting are the ability to inspect quality in the context of multiple files (e.g. detect duplications), perform advanced analysis (e.g. code complexity) and follow the history and progress of code issues. Two examples of tools you can use are SonarQube (4,900+ stars) and Code Climate (2,000+ stars)
Credit: Keith Holliday
❌ Otherwise: With poor code quality, bugs and performance will always be an issue that no shiny new library or state of the art features can fix
✏ Code Examples
✅ Do: Weirdly, most software testings are about logic & data only, but some of the worst things that happen (and are really hard to mitigate) are infrastructural issues. For example, did you ever test what happens when your process memory is overloaded, or when the server/process dies, or does your monitoring system realizes when the API becomes 50% slower?. To test and mitigate these type of bad things — Chaos engineering was born by Netflix. It aims to provide awareness, frameworks and tools for testing our app resiliency for chaotic issues. For example, one of its famous tools, the chaos monkey, randomly kills servers to ensure that our service can still serve users and not relying on a single server (there is also a Kubernetes version, kube-monkey, that kills pods). All these tools work on the hosting/platform level, but what if you wish to test and generate pure Node chaos like check how your Node process copes with uncaught errors, unhandled promise rejection, v8 memory overloaded with the max allowed of 1.7GB or whether your UX remains satisfactory when the event loop gets blocked often? to address this I’ve written, node-chaos (alpha) which provides all sort of Node-related chaotic acts
❌ Otherwise: No escape here, Murphy’s law will hit your production without mercy
✏ Code Examples
✅ Do: Going by the golden rule (bullet 0), each test should add and act on its own set of DB rows to prevent coupling and easily reason about the test flow. In reality, this is often violated by testers who seed the DB with data before running the tests (also known as ‘test fixture’) for the sake of performance improvement. While performance is indeed a valid concern — it can be mitigated (see “Component testing” bullet), however, test complexity is a much painful sorrow that should govern other considerations most of the time. Practically, make each test case explicitly add the DB records it needs and act only on those records. If performance becomes a critical concern — a balanced compromise might come in the form of seeding the only suite of tests that are not mutating data (e.g. queries)
❌ Otherwise: Few tests fail, a deployment is aborted, our team is going to spend precious time now, do we have a bug? let’s investigate, oh no — it seems that two tests were mutating the same seed data
✏ Code Examples
👎 Anti-Pattern Example: tests are not independent and rely on some global hook to feed global DB data
before(async () => {
//adding sites and admins data to our DB. Where is the data? outside. At some external json or migration framework
await DB.AddSeedDataFromJson('seed.json');
});
it("When updating site name, get successful confirmation", async () => {
//I know that site name "portal" exists - I saw it in the seed files
const siteToUpdate = await SiteService.getSiteByName("Portal");
const updateNameResult = await SiteService.changeName(siteToUpdate, "newName");
expect(updateNameResult).to.be(true);
});
it("When querying by site name, get the right site", async () => {
//I know that site name "portal" exists - I saw it in the seed files
const siteToCheck = await SiteService.getSiteByName("Portal");
expect(siteToCheck.name).to.be.equal("Portal"); //Failure! The previous test change the name :[
});
it("When updating site name, get successful confirmation", async () => {
//test is adding a fresh new records and acting on the records only
const siteUnderTest = await SiteService.addSite({
name: "siteForUpdateTest"
});
const updateNameResult = await SiteService.changeName(siteUnderTest, "newName");
expect(updateNameResult).to.be(true);
});
✅ Do: When focusing on testing component logic, UI details become a noise that should be extracted, so your tests can focus on pure data. Practically, extract the desired data from the markup in an abstract way that is not too coupled to the graphic implementation, assert only on pure data (vs HTML/CSS graphic details) and disable animations that slow down. You might get tempted to avoid rendering and test only the back part of the UI (e.g. services, actions, store) but this will result in fictional tests that don't resemble the reality and won't reveal cases where the right data doesn't even arrive in the UI
❌ Otherwise: The pure calculated data of your test might be ready in 10ms, but then the whole test will last 500ms (100 tests = 1 min) due to some fancy and irrelevant animation
✏ Code Examples
test("When users-list is flagged to show only VIP, should display only VIP members", () => {
// Arrange
const allUsers = [{ id: 1, name: "Yoni Goldberg", vip: false }, { id: 2, name: "John Doe", vip: true }];
// Act
const { getAllByTestId } = render(<UsersList users={allUsers} showOnlyVIP={true} />);
// Assert - Extract the data from the UI first
const allRenderedUsers = getAllByTestId("user").map(uiElement => uiElement.textContent);
const allRealVIPUsers = allUsers.filter(user => user.vip).map(user => user.name);
expect(allRenderedUsers).toEqual(allRealVIPUsers); //compare data with data, no UI here
});
test("When flagging to show only VIP, should display only VIP members", () => {
// Arrange
const allUsers = [{ id: 1, name: "Yoni Goldberg", vip: false }, { id: 2, name: "John Doe", vip: true }];
// Act
const { getAllByTestId } = render(<UsersList users={allUsers} showOnlyVIP={true} />);
// Assert - Mix UI & data in assertion
expect(getAllByTestId("user")).toEqual('[<li data-testid="user">John Doe</li>]');
});
✅ Do: Query HTML elements based on attributes that are likely to survive graphic changes unlike CSS selectors and like form labels. If the designated element doesn't have such attributes, create a dedicated test attribute like 'test-id-submit-button'. Going this route not only ensures that your functional/logic tests never break because of look & feel changes but also it becomes clear to the entire team that this element and attribute are utilized by tests and shouldn't get removed
❌ Otherwise: You want to test the login functionality that spans many components, logic and services, everything is set up perfectly - stubs, spies, Ajax calls are isolated. All seems perfect. Then the test fails because the designer changed the div CSS class from 'thick-border' to 'thin-border'
✏ Code Examples
// the markup code (part of React component)
<h3>
<Badge pill className="fixed_badge" variant="dark">
<span data-testid="errorsLabel">{value}</span>
<!-- note the attribute data-testid -->
</Badge>
</h3>
// this example is using react-testing-library
test("Whenever no data is passed to metric, show 0 as default", () => {
// Arrange
const metricValue = undefined;
// Act
const { getByTestId } = render(<dashboardMetric value={undefined} />);
expect(getByTestId("errorsLabel").text()).toBe("0");
});
<!-- the markup code (part of React component) -->
<span id="metric" className="d-flex-column">{value}</span>
<!-- what if the designer changes the classs? -->
// this exammple is using enzyme
test("Whenever no data is passed, error metric shows zero", () => {
// ...
expect(wrapper.find("[className='d-flex-column']").text()).toBe("0");
});
✅ Do: Whenever reasonably sized, test your component from outside like your users do, fully render the UI, act on it and assert that the rendered UI behaves as expected. Avoid all sort of mocking, partial and shallow rendering - this approach might result in untrapped bugs due to lack of details and harden the maintenance as the tests mess with the internals (see bullet 'Favour blackbox testing'). If one of the child components is significantly slowing down (e.g. animation) or complicating the setup - consider explicitly replacing it with a fake
With all that said, a word of caution is in order: this technique works for small/medium components that pack a reasonable size of child components. Fully rendering a component with too many children will make it hard to reason about test failures (root cause analysis) and might get too slow. In such cases, write only a few tests against that fat parent component and more tests against its children
❌ Otherwise: When poking into a component's internal by invoking its private methods, and checking the inner state - you would have to refactor all tests when refactoring the components implementation. Do you really have a capacity for this level of maintenance?
✏ Code Examples
class Calendar extends React.Component {
static defaultProps = { showFilters: false };
render() {
return (
<div>
A filters panel with a button to hide/show filters
<FiltersPanel showFilter={showFilters} title="Choose Filters" />
</div>
);
}
}
//Examples use React & Enzyme
test("Realistic approach: When clicked to show filters, filters are displayed", () => {
// Arrange
const wrapper = mount(<Calendar showFilters={false} />);
// Act
wrapper.find("button").simulate("click");
// Assert
expect(wrapper.text().includes("Choose Filter"));
// This is how the user will approach this element: by text
});
test("Shallow/mocked approach: When clicked to show filters, filters are displayed", () => {
// Arrange
const wrapper = shallow(<Calendar showFilters={false} title="Choose Filter" />);
// Act
wrapper
.find("filtersPanel")
.instance()
.showFilters();
// Tap into the internals, bypass the UI and invoke a method. White-box approach
// Assert
expect(wrapper.find("Filter").props()).toEqual({ title: "Choose Filter" });
// what if we change the prop name or don't pass anything relevant?
});
✅ Do: In many cases, the unit under test completion time is just unknown (e.g. animation suspends element appearance) - in that case, avoid sleeping (e.g. setTimeOut) and prefer more deterministic methods that most platforms provide. Some libraries allows awaiting on operations (e.g. Cypress cy.request('url')), other provide API for waiting like @testing-library/dom method wait(expect(element)). Sometimes a more elegant way is to stub the slow resource, like API for example, and then once the response moment becomes deterministic the component can be explicitly re-rendered. When depending upon some external component that sleeps, it might turn useful to hurry-up the clock. Sleeping is a pattern to avoid because it forces your test to be slow or risky (when waiting for a too short period). Whenever sleeping and polling is inevitable and there's no support from the testing framework, some npm libraries like wait-for-expect can help with a semi-deterministic solution
❌ Otherwise: When sleeping for a long time, tests will be an order of magnitude slower. When trying to sleep for small numbers, test will fail when the unit under test didn't respond in a timely fashion. So it boils down to a trade-off between flakiness and bad performance
✏ Code Examples
// using Cypress
cy.get("#show-products").click(); // navigate
cy.wait("@products"); // wait for route to appear
// this line will get executed only when the route is ready
// @testing-library/dom
test("movie title appears", async () => {
// element is initially not present...
// wait for appearance
await wait(() => {
expect(getByText("the lion king")).toBeInTheDocument();
});
// wait for appearance and return the element
const movie = await waitForElement(() => getByText("the lion king"));
});
test("movie title appears", async () => {
// element is initially not present...
// custom wait logic (caution: simplistic, no timeout)
const interval = setInterval(() => {
const found = getByText("the lion king");
if (found) {
clearInterval(interval);
expect(getByText("the lion king")).toBeInTheDocument();
}
}, 100);
// wait for appearance and return the element
const movie = await waitForElement(() => getByText("the lion king"));
});
✅ Do: Apply some active monitor that ensures the page load under real network is optimized - this includes any UX concern like slow page load or un-minified bundle. The inspection tools market is no short: basic tools like pingdom, AWS CloudWatch, gcp StackDriver can be easily configured to watch whether the server is alive and response under a reasonable SLA. This only scratches the surface of what might get wrong, hence it's preferable to opt for tools that specialize in frontend (e.g. lighthouse, pagespeed) and perform richer analysis. The focus should be on symptoms, metrics that directly affect the UX, like page load time, meaningful paint, time until the page gets interactive (TTI). On top of that, one may also watch for technical causes like ensuring the content is compressed, time to the first byte, optimize images, ensuring reasonable DOM size, SSL and many others. It's advisable to have these rich monitors both during development, as part of the CI and most important - 24x7 over the production's servers/CDN
❌ Otherwise: It must be disappointing to realize that after such great care for crafting a UI, 100% functional tests passing and sophisticated bundling - the UX is horrible and slow due to CDN misconfiguration
✅ Do: When coding your mainstream tests (not E2E tests), avoid involving any resource that is beyond your responsibility and control like backend API and use stubs instead (i.e. test double). Practically, instead of real network calls to APIs, use some test double library (like Sinon, Test doubles, etc) for stubbing the API response. The main benefit is preventing flakiness - testing or staging APIs by definition are not highly stable and from time to time will fail your tests although YOUR component behaves just fine (production env was not meant for testing and it usually throttles requests). Doing this will allow simulating various API behavior that should drive your component behavior as when no data was found or the case when API throws an error. Last but not least, network calls will greatly slow down the tests
❌ Otherwise: The average test runs no longer than few ms, a typical API call last 100ms>, this makes each test ~20x slower
✏ Code Examples
// unit under test
export default function ProductsList() {
const [products, setProducts] = useState(false);
const fetchProducts = async () => {
const products = await axios.get("api/products");
setProducts(products);
};
useEffect(() => {
fetchProducts();
}, []);
return products ? <div>{products}</div> : <div data-testid="no-products-message">No products</div>;
}
// test
test("When no products exist, show the appropriate message", () => {
// Arrange
nock("api")
.get(`/products`)
.reply(404);
// Act
const { getByTestId } = render(<ProductsList />);
// Assert
expect(getByTestId("no-products-message")).toBeTruthy();
});
✅ Do: Although E2E (end-to-end) usually means UI-only testing with a real browser (See bullet 3.6), for other they mean tests that stretch the entire system including the real backend. The latter type of tests is highly valuable as they cover integration bugs between frontend and backend that might happen due to a wrong understanding of the exchange schema. They are also an efficient method to discover backend-to-backend integration issues (e.g. Microservice A sends the wrong message to Microservice B) and even to detect deployment failures - there are no backend frameworks for E2E testing that are as friendly and mature as UI frameworks like Cypress and Puppeteer. The downside of such tests is the high cost of configuring an environment with so many components, and mostly their brittleness - given 50 microservices, even if one fails then the entire E2E just failed. For that reason, we should use this technique sparingly and probably have 1-10 of those and no more. That said, even a small number of E2E tests are likely to catch the type of issues they are targeted for - deployment & integration faults. It's advisable to run those over a production-like staging environment
❌ Otherwise: UI might invest much in testing its functionality only to realizes very late that the backend returned payload (the data schema the UI has to work with) is very different than expected
✅ Do: In E2E tests that involve a real backend and rely on a valid user token for API calls, it doesn't payoff to isolate the test to a level where a user is created and logged-in in every request. Instead, login only once before the tests execution start (i.e. before-all hook), save the token in some local storage and reuse it across requests. This seem to violate one of the core testing principle - keep the test autonomous without resources coupling. While this is a valid worry, in E2E tests performance is a key concern and creating 1-3 API requests before starting each individual tests might lead to horrible execution time. Reusing credentials doesn't mean the tests have to act on the same user records - if relying on user records (e.g. test user payments history) than make sure to generate those records as part of the test and avoid sharing their existence with other tests. Also remember that the backend can be faked - if your tests are focused on the frontend it might be better to isolate it and stub the backend API (see bullet 3.6).
❌ Otherwise: Given 200 test cases and assuming login=100ms = 20 seconds only for logging-in again and again
✏ Code Examples
let authenticationToken;
// happens before ALL tests run
before(() => {
cy.request('POST', 'http://localhost:3000/login', {
username: Cypress.env('username'),
password: Cypress.env('password'),
})
.its('body')
.then((responseFromLogin) => {
authenticationToken = responseFromLogin.token;
})
})
// happens before EACH test
beforeEach(setUser => () {
cy.visit('/home', {
onBeforeLoad (win) {
win.localStorage.setItem('token', JSON.stringify(authenticationToken))
},
})
})
✅ Do: For production monitoring and development-time sanity check, run a single E2E test that visits all/most of the site pages and ensures no one breaks. This type of test brings a great return on investment as it's very easy to write and maintain, but it can detect any kind of failure including functional, network and deployment issues. Other styles of smoke and sanity checking are not as reliable and exhaustive - some ops teams just ping the home page (production) or developers who run many integration tests which don't discover packaging and browser issues. Goes without saying that the smoke test doesn't replace functional tests rather just aim to serve as a quick smoke detector
❌ Otherwise: Everything might seem perfect, all tests pass, production health-check is also positive but the Payment component had some packaging issue and only the /Payment route is not rendering
✏ Code Examples
it("When doing smoke testing over all page, should load them all successfully", () => {
// exemplified using Cypress but can be implemented easily
// using any E2E suite
cy.visit("https://mysite.com/home");
cy.contains("Home");
cy.contains("https://mysite.com/Login");
cy.contains("Login");
cy.contains("https://mysite.com/About");
cy.contains("About");
});
✅ Do: Besides increasing app reliability, tests bring another attractive opportunity to the table - serve as live app documentation. Since tests inherently speak at a less-technical and product/UX language, using the right tools they can serve as a communication artifact that greatly aligns all the peers - developers and their customers. For example, some frameworks allow expressing the flow and expectations (i.e. tests plan) using a human-readable language so any stakeholder, including product managers, can read, approve and collaborate on the tests which just became the live requirements document. This technique is also being referred to as 'acceptance test' as it allows the customer to define his acceptance criteria in plain language. This is BDD (behavior-driven testing) at its purest form. One of the popular frameworks that enable this is Cucumber which has a JavaScript flavor, see example below. Another similar yet different opportunity, StoryBook, allows exposing UI components as a graphic catalog where one can walk through the various states of each component (e.g. render a grid w/o filters, render that grid with multiple rows or with none, etc), see how it looks like, and how to trigger that state - this can appeal also to product folks but mostly serves as live doc for developers who consume those components.
❌ Otherwise: After investing top resources on testing, it's just a pity not to leverage this investment and win great value
✏ Code Examples
// this is how one can describe tests using cucumber: plain language that allows anyone to understand and collaborate
Feature: Twitter new tweet
I want to tweet something in Twitter
@focus
Scenario: Tweeting from the home page
Given I open Twitter home
Given I click on "New tweet" button
Given I type "Hello followers!" in the textbox
Given I click on "Submit" button
Then I see message "Tweet saved"
✅ Do: Setup automated tools to capture UI screenshots when changes are presented and detect visual issues like content overlapping or breaking. This ensures that not only the right data is prepared but also the user can conveniently see it. This technique is not widely adopted, our testing mindset leans toward functional tests but it's the visuals what the user experience and with so many device types it's very easy to overlook some nasty UI bug. Some free tools can provide the basics - generate and save screenshots for the inspection of human eyes. While this approach might be sufficient for small apps, it's flawed as any other manual testing that demands human labor anytime something changes. On the other hand, it's quite challenging to detect UI issues automatically due to the lack of clear definition - this is where the field of 'Visual Regression' chime in and solve this puzzle by comparing old UI with the latest changes and detect differences. Some OSS/free tools can provide some of this functionality (e.g. wraith, PhantomCSS but might charge significant setup time. The commercial line of tools (e.g. Applitools, Percy.io) takes is a step further by smoothing the installation and packing advanced features like management UI, alerting, smart capturing by eliminating 'visual noise' (e.g. ads, animations) and even root cause analysis of the DOM/CSS changes that led to the issue
❌ Otherwise: How good is a content page that display great content (100% tests passed), loads instantly but half of the content area is hidden?
✏ Code Examples
# Add as many domains as necessary. Key will act as a label
domains:
english: "http://www.mysite.com"
# Type screen widths below, here are a couple of examples
screen_widths:
- 600
- 768
- 1024
- 1280
# Type page URL paths below, here are a couple of examples
paths:
about:
path: /about
selector: '.about'
subscribe:
selector: '.subscribe'
path: /subscribe
import * as todoPage from "../page-objects/todo-page";
describe("visual validation", () => {
before(() => todoPage.navigate());
beforeEach(() => cy.eyesOpen({ appName: "TAU TodoMVC" }));
afterEach(() => cy.eyesClose());
it("should look good", () => {
cy.eyesCheckWindow("empty todo list");
todoPage.addTodo("Clean room");
todoPage.addTodo("Learn javascript");
cy.eyesCheckWindow("two todos");
todoPage.toggleTodo(0);
cy.eyesCheckWindow("mark as completed");
});
});
✅ Do: The purpose of testing is to get enough confidence for moving fast, obviously the more code is tested the more confident the team can be. Coverage is a measure of how many code lines (and branches, statements, etc) are being reached by the tests. So how much is enough? 10–30% is obviously too low to get any sense about the build correctness, on the other side 100% is very expensive and might shift your focus from the critical paths to the exotic corners of the code. The long answer is that it depends on many factors like the type of application — if you’re building the next generation of Airbus A380 than 100% is a must, for a cartoon pictures website 50% might be too much. Although most of the testing enthusiasts claim that the right coverage threshold is contextual, most of them also mention the number 80% as a thumb of a rule (Fowler: “in the upper 80s or 90s”) that presumably should satisfy most of the applications.
Implementation tips: You may want to configure your continuous integration (CI) to have a coverage threshold (Jest link) and stop a build that doesn’t stand to this standard (it’s also possible to configure threshold per component, see code example below). On top of this, consider detecting build coverage decrease (when a newly committed code has less coverage) — this will push developers raising or at least preserving the amount of tested code. All that said, coverage is only one measure, a quantitative based one, that is not enough to tell the robustness of your testing. And it can also be fooled as illustrated in the next bullets
❌ Otherwise: Confidence and numbers go hand in hand, without really knowing that you tested most of the system — there will also be some fear and fear will slow you down
✏ Code Examples
✅ Do: Some issues sneak just under the radar and are really hard to find using traditional tools. These are not really bugs but more of surprising application behavior that might have a severe impact. For example, often some code areas are never or rarely being invoked — you thought that the ‘PricingCalculator’ class is always setting the product price but it turns out it is actually never invoked although we have 10000 products in DB and many sales… Code coverage reports help you realize whether the application behaves the way you believe it does. Other than that, it can also highlight which types of code is not tested — being informed that 80% of the code is tested doesn’t tell whether the critical parts are covered. Generating reports is easy — just run your app in production or during testing with coverage tracking and then see colorful reports that highlight how frequent each code area is invoked. If you take your time to glimpse into this data — you might find some gotchas
❌ Otherwise: If you don’t know which parts of your code are left un-tested, you don’t know where the issues might come from
✏ Code Examples
Based on a real-world scenario where we tracked our application usage in QA and find out interesting login patterns (Hint: the amount of login failures is non-proportional, something is clearly wrong. Finally it turned out that some frontend bug keeps hitting the backend login API)
✅ Do: The Traditional Coverage metric often lies: It may show you 100% code coverage, but none of your functions, even not one, return the right response. How come? it simply measures over which lines of code the test visited, but it doesn’t check if the tests actually tested anything — asserted for the right response. Like someone who’s traveling for business and showing his passport stamps — this doesn’t prove any work done, only that he visited few airports and hotels.
Mutation-based testing is here to help by measuring the amount of code that was actually TESTED not just VISITED. Stryker is a JavaScript library for mutation testing and the implementation is really neat:
(1) it intentionally changes the code and “plants bugs”. For example the code newOrder.price===0 becomes newOrder.price!=0. This “bugs” are called mutations
(2) it runs the tests, if all succeed then we have a problem — the tests didn’t serve their purpose of discovering bugs, the mutations are so-called survived. If the tests failed, then great, the mutations were killed.
Knowing that all or most of the mutations were killed gives much higher confidence than traditional coverage and the setup time is similar
❌ Otherwise: You’ll be fooled to believe that 85% coverage means your test will detect bugs in 85% of your code
✏ Code Examples
function addNewOrder(newOrder) {
logger.log(`Adding new order ${newOrder}`);
DB.save(newOrder);
Mailer.sendMail(newOrder.assignee, `A new order was places ${newOrder}`);
return { approved: true };
}
it("Test addNewOrder, don't use such test names", () => {
addNewOrder({ assignee: "[email protected]", price: 120 });
}); //Triggers 100% code coverage, but it doesn't check anything
✅ Do: A set of ESLint plugins were built specifically for inspecting the tests code patterns and discover issues. For example, eslint-plugin-mocha will warn when a test is written at the global level (not a son of a describe() statement) or when tests are skipped which might lead to a false belief that all tests are passing. Similarly, eslint-plugin-jest can, for example, warn when a test has no assertions at all (not checking anything)
❌ Otherwise: Seeing 90% code coverage and 100% green tests will make your face wear a big smile only until you realize that many tests aren’t asserting for anything and many test suites were just skipped. Hopefully, you didn’t deploy anything based on this false observation
✏ Code Examples
describe("Too short description", () => {
const userToken = userService.getDefaultToken() // *error:no-setup-in-describe, use hooks (sparingly) instead
it("Some description", () => {});//* error: valid-test-description. Must include the word "Should" + at least 5 words
});
it.skip("Test name", () => {// *error:no-skipped-tests, error:error:no-global-tests. Put tests only under describe or suite
expect("somevalue"); // error:no-assert
});
it("Test name", () => {*//error:no-identical-title. Assign unique titles to tests
});
✅ Do: Linters are a free lunch, with 5 min setup you get for free an auto-pilot guarding your code and catching significant issue as you type. Gone are the days where linting was about cosmetics (no semi-colons!). Nowadays, Linters can catch severe issues like errors that are not thrown correctly and losing information. On top of your basic set of rules (like ESLint standard or Airbnb style), consider including some specializing Linters like eslint-plugin-chai-expect that can discover tests without assertions, eslint-plugin-promise can discover promises with no resolve (your code will never continue), eslint-plugin-security which can discover eager regex expressions that might get used for DOS attacks, and eslint-plugin-you-dont-need-lodash-underscore is capable of alarming when the code uses utility library methods that are part of the V8 core methods like Lodash._map(…)
❌ Otherwise: Consider a rainy day where your production keeps crashing but the logs don’t display the error stack trace. What happened? Your code mistakenly threw a non-error object and the stack trace was lost, a good reason for banging your head against a brick wall. A 5 min linter setup could detect this TYPO and save your day
✏ Code Examples
✅ Do: Using a CI with shiny quality inspections like testing, linting, vulnerabilities check, etc? Help developers run this pipeline also locally to solicit instant feedback and shorten the feedback loop. Why? an efficient testing process constitutes many and iterative loops: (1) try-outs -> (2) feedback -> (3) refactor. The faster the feedback is, the more improvement iterations a developer can perform per-module and perfect the results. On the flip, when the feedback is late to come fewer improvement iterations could be packed into a single day, the team might already move forward to another topic/task/module and might not be up for refining that module.
Practically, some CI vendors (Example: CircleCI local CLI) allow running the pipeline locally. Some commercial tools like wallaby provide highly-valuable & testing insights as a developer prototype (no affiliation). Alternatively, you may just add npm script to package.json that runs all the quality commands (e.g. test, lint, vulnerabilities) — use tools like concurrently for parallelization and non-zero exit code if one of the tools failed. Now the developer should just invoke one command — e.g. ‘npm run quality’ — to get instant feedback. Consider also aborting a commit if the quality check failed using a githook (husky can help)
❌ Otherwise: When the quality results arrive the day after the code, testing doesn’t become a fluent part of development rather an after the fact formal artifact
✏ Code Examples
👏 Doing It Right Example: npm scripts that perform code quality inspection, all are run in parallel on demand or when a developer is trying to push new code
"scripts": {
"inspect:sanity-testing": "mocha **/**--test.js --grep \"sanity\"",
"inspect:lint": "eslint .",
"inspect:vulnerabilities": "npm audit",
"inspect:license": "license-checker --failOn GPLv2",
"inspect:complexity": "plato .",
"inspect:all": "concurrently -c \"bgBlue.bold,bgMagenta.bold,yellow\" \"npm:inspect:quick-testing\" \"npm:inspect:lint\" \"npm:inspect:vulnerabilities\" \"npm:inspect:license\""
},
"husky": {
"hooks": {
"precommit": "npm run inspect:all",
"prepush": "npm run inspect:all"
}
}
✅ Do: End to end (e2e) testing are the main challenge of every CI pipeline — creating an identical ephemeral production mirror on the fly with all the related cloud services can be tedious and expensive. Finding the best compromise is your game: Docker-compose allows crafting isolated dockerized environment with identical containers using a single plain text file but the backing technology (e.g. networking, deployment model) is different from real-world productions. You may combine it with ‘AWS Local’ to work with a stub of the real AWS services. If you went serverless multiple frameworks like serverless and AWS SAM allows the local invocation of FaaS code.
The huge Kubernetes ecosystem is yet to formalize a standard convenient tool for local and CI-mirroring though many new tools are launched frequently. One approach is running a ‘minimized-Kubernetes’ using tools like Minikube and MicroK8s which resemble the real thing only come with less overhead. Another approach is testing over a remote ‘real-Kubernetes’, some CI providers (e.g. Codefresh) has native integration with Kubernetes environment and make it easy to run the CI pipeline over the real thing, others allow custom scripting against a remote Kubernetes.
❌ Otherwise: Using different technologies for production and testing demands maintaining two deployment models and keeps the developers and the ops team separated
✏ Code Examples
👏 Example: a CI pipeline that generates Kubernetes cluster on the fly (Credit: Dynamic-environments Kubernetes)
deploy:
stage: deploy
image: registry.gitlab.com/gitlab-examples/kubernetes-deploy
script:
- ./configureCluster.sh $KUBE_CA_PEM_FILE $KUBE_URL $KUBE_TOKEN
- kubectl create ns $NAMESPACE
- kubectl create secret -n $NAMESPACE docker-registry gitlab-registry --docker-server="$CI_REGISTRY" --docker-username="$CI_REGISTRY_USER" --docker-password="$CI_REGISTRY_PASSWORD" --docker-email="$GITLAB_USER_EMAIL"
- mkdir .generated
- echo "$CI_BUILD_REF_NAME-$CI_BUILD_REF"
- sed -e "s/TAG/$CI_BUILD_REF_NAME-$CI_BUILD_REF/g" templates/deals.yaml | tee ".generated/deals.yaml"
- kubectl apply --namespace $NAMESPACE -f .generated/deals.yaml
- kubectl apply --namespace $NAMESPACE -f templates/my-sock-shop.yaml
environment:
name: test-for-ci
✅ Do: When done right, testing is your 24/7 friend providing almost instant feedback. In practice, executing 500 CPU-bounded unit test on a single thread can take too long. Luckily, modern test runners and CI platforms (like Jest, AVA and Mocha extensions) can parallelize the test into multiple processes and achieve significant improvement in feedback time. Some CI vendors do also parallelize tests across containers (!) which shortens the feedback loop even further. Whether locally over multiple processes, or over some cloud CLI using multiple machines — parallelizing demand keeping the tests autonomous as each might run on different processes
❌ Otherwise: Getting test results 1 hour long after pushing new code, as you already code the next features, is a great recipe for making testing less relevant
✏ Code Examples
👏 Doing It Right Example: Mocha parallel & Jest easily outrun the traditional Mocha thanks to testing parallelization (Credit: JavaScript Test-Runners Benchmark)
✅ Do: Licensing and plagiarism issues are probably not your main concern right now, but why not tick this box as well in 10 minutes? A bunch of npm packages like license check and plagiarism check (commercial with free plan) can be easily baked into your CI pipeline and inspect for sorrows like dependencies with restrictive licenses or code that was copy-pasted from Stack Overflow and apparently violates some copyrights
❌ Otherwise: Unintentionally, developers might use packages with inappropriate licenses or copy paste commercial code and run into legal issues
✏ Code Examples
//install license-checker in your CI environment or also locally
npm install -g license-checker
//ask it to scan all licenses and fail with exit code other than 0 if it found unauthorized license. The CI system should catch this failure and stop the build
license-checker --summary --failOn BSD
✅ Do: Even the most reputable dependencies such as Express have known vulnerabilities. This can get easily tamed using community tools such as npm audit, or commercial tools like snyk (offer also a free community version). Both can be invoked from your CI on every build
❌ Otherwise: Keeping your code clean from vulnerabilities without dedicated tools will require to constantly follow online publications about new threats. Quite tedious
✅ Do: Yarn and npm latest introduction of package-lock.json introduced a serious challenge (the road to hell is paved with good intentions) — by default now, packages are no longer getting updates. Even a team running many fresh deployments with ‘npm install’ & ‘npm update’ won’t get any new updates. This leads to subpar dependent packages versions at best or to vulnerable code at worst. Teams now rely on developers goodwill and memory to manually update the package.json or use tools like ncu manually. A more reliable way could be to automate the process of getting the most reliable dependency versions, though there are no silver bullet solutions yet there are two possible automation roads:
(1) CI can fail builds that have obsolete dependencies — using tools like ‘npm outdated’ or ‘npm-check-updates (ncu)’ . Doing so will enforce developers to update dependencies.
(2) Use commercial tools that scan the code and automatically send pull requests with updated dependencies. One interesting question remaining is what should be the dependency update policy — updating on every patch generates too many overhead, updating right when a major is released might point to an unstable version (many packages found vulnerable on the very first days after being released, see the eslint-scope incident).
An efficient update policy may allow some ‘vesting period’ — let the code lag behind the @latest for some time and versions before considering the local copy as obsolete (e.g. local version is 1.3.1 and repository version is 1.3.8)
❌ Otherwise: Your production will run packages that have been explicitly tagged by their author as risky
✏ Code Examples
👏 Example: ncu can be used manually or within a CI pipeline to detect to which extent the code lag behind the latest versions
✅ Do: This post is focused on testing advice that is related to, or at least can be exemplified with Node JS. This bullet, however, groups few non-Node related tips that are well-known
- Use a declarative syntax. This is the only option for most vendors but older versions of Jenkins allows using code or UI
- Opt for a vendor that has native Docker support
- Fail early, run your fastest tests first. Create a ‘Smoke testing’ step/milestone that groups multiple fast inspections (e.g. linting, unit tests) and provide snappy feedback to the code committer
- Make it easy to skim-through all build artifacts including test reports, coverage reports, mutation reports, logs, etc
- Create multiple pipelines/jobs for each event, reuse steps between them. For example, configure a job for feature branch commits and a different one for master PR. Let each reuse logic using shared steps (most vendors provide some mechanism for code reuse)
- Never embed secrets in a job declaration, grab them from a secret store or from the job’s configuration
- Explicitly bump version in a release build or at least ensure the developer did so
- Build only once and perform all the inspections over the single build artifact (e.g. Docker image)
- Test in an ephemeral environment that doesn’t drift state between builds. Caching node_modules might be the only exception
❌ Otherwise: You‘ll miss years of wisdom
✅ Do: Quality checking is about serendipity, the more ground you cover the luckier you get in detecting issues early. When developing reusable packages or running a multi-customer production with various configuration and Node versions, the CI must run the pipeline of tests over all the permutations of configurations. For example, assuming we use MySQL for some customers and Postgres for others — some CI vendors support a feature called ‘Matrix’ which allow running the suit of testing against all permutations of MySQL, Postgres and multiple Node version like 8, 9 and 10. This is done using configuration only without any additional effort (assuming you have testing or any other quality checks). Other CIs who doesn’t support Matrix might have extensions or tweaks to allow that
❌ Otherwise: So after doing all that hard work of writing testing are we going to let bugs sneak in only because of configuration issues?
✏ Code Examples
👏 Example: Using Travis (CI vendor) build definition to run the same test over multiple Node versions
language: node_js
node_js:
- "7"
- "6"
- "5"
- "4"
install:
- npm install
script:
- npm run test
Role: Writer
About: I'm an independent consultant who works with Fortune 500 companies and garage startups on polishing their JS & Node.js applications. More than any other topic I'm fascinated by and aims to master the art of testing. I'm also the author of Node.js Best Practices
📗 Online Course: Liked this guide and wish to take your testing skills to the extreme? Consider visiting my comprehensive course Testing Node.js & JavaScript From A To Z
Follow:
Role: Tech reviewer and advisor
Took care to revise, improve, lint and polish all the texts
About: full-stack web engineer, Node.js & GraphQL enthusiast
Role: Concept, design and great advice
About: A savvy frontend developer, CSS expert and emojis freak
Role: Helps keep this project running, and reviews security related practices
About: Loves working on Node.js projects and web application security.
Thanks goes to these wonderful people who have contributed to this repository!