Skip to content

Commit

Permalink
Added more faker funcs for things like email address, ip address and …
Browse files Browse the repository at this point in the history
…others
  • Loading branch information
pkdone committed May 1, 2022
1 parent 8361c4d commit 4e37384
Show file tree
Hide file tree
Showing 5 changed files with 179 additions and 13 deletions.
28 changes: 27 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ The _mongo-mangler_ tool also provides a set of [library functions](lib/masksFak
The _[fake_accounts](examples/pipeline_example_fake_accounts.js)_ example pipeline provided in this project shows an example of how to generate fictitious bank account records using the supplied _faker_ library. Below is the list of _faking_ functions the library provides for use in your custom pipelines, with descriptions for each:

```javascript
// Generate a random date between a start point of millis after 01-Jan-1970 and a maximum set of milliseconds after that date
fakeDateMillisFromEpoch(startMillis, maxMillis)

// Generate a random date between now and a maximum number of milliseconds from now
fakeDateAfterNow(maxMillisFromNow)

Expand All @@ -73,6 +76,9 @@ fakeDecimal()
// Generate a decimal number with up to a specified number of significant places (e.g. '3' places -> 736.274473638742)
fakeDecimalSignificantPlaces(maxSignificantPlaces)

// Generate a currency amount with just 2 decimal places and up to a specified number of significant places (e.g. '3' places -> 736.27)
fakeMoneyAmountDecimal(maxSignificantPlaces)

// Generate a True or False value randomly
fakeBoolean()

Expand All @@ -89,14 +95,29 @@ fakeValueFromListWeighted(listOfValues)
fakeListOfSubDocs(numSumDocs, listOfValues)

// Generate string composed of the same character repeated the specified number of times
fakeNChars(char, amount)
fakeNSameChars(char, amount)

// Generate string composed of random English alphabet uppercase characters repeated the specified number of times
fakeNAnyUpperChars(amount)

// Generate string composed of random English alphabet lowercase characters repeated the specified number of times
fakeNAnyLowerChars(amount)

// Generate a typical first name from an internal pre-defined list of common first names
fakeFirstName()

// Generate a typical last name from an internal pre-defined list of common last names
fakeLastName()

// Generate a typical first name and last name from an internal pre-defined list of names
fakeFirstAndLastName()

// Generate a random email address with random chars for the email id @ one of a few fixed .com domains
fakeEmailAddress()

// Generate a random IPv4 address in text format of 'xxx.xxx.xxx.xxx'
fakeIPAddress()

// Generate a typical street name from an internal pre-defined list of common street names
fakeStreetName()

Expand All @@ -108,6 +129,10 @@ fakeCountryName()

// Generate a random US-style zipcode/postcode (e.g. 10144)
fakeZipCode()

// Generate a typical company name from an internal pre-defined list of common company names
fakeCompanyName()

```

### Masking Library
Expand Down Expand Up @@ -141,6 +166,7 @@ maskAlterValueFromList(currentValue, percentSameValue, otherValuesList)

// Change on average a given percentage of the list members values to a random value from the provided alternative list
maskAlterListFromList(currentList, percentSameValues, otherValuesList)

```

Note, for data masking, even though the pipeline is irreversibly obfuscating fields, this doesn't mean that the masked data is useless for performing analytics to gain insight. A pipeline can mask most fields by fluctuating the original values by a small but limited random percentage (e.g. vary a credit card's expiry date or transaction amount by +/- 10%), rather than replacing them with completely random new values. In such cases, if the input data set is sufficiently large, then minor variances will be equalled out. For the fields that are only varied slightly, analysts can derive similar trends and patterns from analysing the masked data as they would the original data. See the _Mask Sensitive Fields_ chapter of the _[Practical MongoDB Aggregations](https://www.practical-mongodb-aggregations.com/)_ book for more information.
Expand Down
66 changes: 62 additions & 4 deletions lib/masksFakesGeneraters.js
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
// Generate a random date between a start point of millis after 01-Jan-1970 and a maximum set of milliseconds after that date
function fakeDateMillisFromEpoch(startMillis, maxMillis) {
return {
"$toDate": {"$add": [{"$dateFromString": {"dateString": "1970-01-01"}}, startMillis, {"$multiply": [{"$rand": {}}, maxMillis]}]}
};
}


// Generate a random date between now and a maximum number of milliseconds from now
function fakeDateAfterNow(maxMillisFromNow) {
return {
Expand Down Expand Up @@ -76,6 +84,14 @@ function fakeDecimalSignificantPlaces(maxSignificantPlaces) {
}


// Generate a currency amount with just 2 decimal places and up to a specified number of significant places (e.g. '3' places -> 736.27)
function fakeMoneyAmountDecimal(maxSignificantPlaces) {
return {
"$round": [fakeDecimalSignificantPlaces(maxSignificantPlaces), 2]
};
}


// Generate a True or False value randomly
function fakeBoolean() {
return {
Expand Down Expand Up @@ -181,7 +197,7 @@ function fakeListOfSubDocs(numSumDocs, listOfValues) {


// Generate string composed of the same character repeated the specified number of times
function fakeNChars(char, amount) {
function fakeNSameChars(char, amount) {
return {
"$reduce": {
"input": {"$range": [0, amount]},
Expand All @@ -192,6 +208,24 @@ function fakeNChars(char, amount) {
}


// Generate string composed of random English alphabet uppercase characters repeated the specified number of times
function fakeNAnyUpperChars(amount) {
return {
"$reduce": {
"input": {"$range": [0, amount]},
"initialValue": "",
"in": {"$concat": ["$$value", fakeValueFromList(["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"])]}
}
};
}


// Generate string composed of random English alphabet lowercase characters repeated the specified number of times
function fakeNAnyLowerChars(amount) {
return {"$toLower": fakeNAnyUpperChars(amount)};
}


// Generate a typical first name from an internal pre-defined list of common first names
function fakeFirstName() {
return fakeValueFromList(["Maria", "Nushi", "Mohammed", "Jose", "Muhammad", "Mohamed", "Wei", "Yan", "John", "David", "Li", "Abdul", "Ana", "Ying", "Michael", "Juan", "Anna", "Mary", "Daniel", "Luis", "Elena", "Marie", "Ibrahim", "Peter", "Sarah", "Xin", "Lin", "Olga"]);
Expand All @@ -204,6 +238,24 @@ function fakeLastName() {
}


// Generate a typical first name and last name from an internal pre-defined list of names
function fakeFirstAndLastName() {
return {"$concat": [fakeFirstName(), " ", fakeLastName()]};
}


// Generate a random email address with random chars for the email id @ one of a few fixed .com domains
function fakeEmailAddress() {
return {"$concat": [fakeNAnyLowerChars(6), "@", fakeValueFromList(["mymail.com", "fastmail.com", "acmemail.com"])]};
}


// Generate a random IPv4 address in text format of 'xxx.xxx.xxx.xxx'
function fakeIPAddress() {
return {"$concat": [{"$toString": fakeNumberBounded(0, 255)}, ".", {"$toString": fakeNumberBounded(0, 255)}, ".", {"$toString": fakeNumberBounded(0, 255)}, ".", {"$toString": fakeNumberBounded(0, 255)}]};
}


// Generate a typical street name from an internal pre-defined list of common street names
function fakeStreetName() {
return fakeValueFromList(["Station Road", "Park Road", "Church Street", "Victoria Road", "Green Lane", "Manor Road", "Rue de l'Église", "Grande Rue", "Rue du Moulin", "Rue du Château", "Hauptstraße", "Dorfstraße", "Schulstraße", "Bahnhofstraße", "Fourth Street", "Park Street", "Kerkstraat"]);
Expand All @@ -228,6 +280,12 @@ function fakeZipCode() {
}


// Generate a typical company name from an internal pre-defined list of common company names
function fakeCompanyName() {
return fakeValueFromList(["Wonka Industries", "Acme Corp.", "Stark Industries", "Gekko & Co", "Wayne Enterprises", "Cyberdyne Systems", "Genco Pura Olive Oil Company", "Bubba Gump", "Olivia Pope & Associates", "Krusty Krab", "Sterling Cooper", "Soylent", "Hooli", "Good Burger", "Globex Corporation", "Initech", "Umbrella Corporation", "Vehement Capital Partners", "Massive Dynamic"]);
}


// Replace the first specified number of characters in a field's value with 'x's
function maskReplaceFirstPart(strOrNum, amount) {
return {
Expand All @@ -243,7 +301,7 @@ function maskReplaceFirstPart(strOrNum, amount) {
"remainder": {"$subtract": [{"$strLenCP": {"$toString": "$$text"}}, "$$amountOrZero"]},
},
"in": {
"$concat": [fakeNChars("x", {"$min": ["$$amountOrZero", "$$length"]}), {"$substrCP": ["$$text", "$$amountOrZero", {"$max": ["$$remainder", 0]}]}]
"$concat": [fakeNSameChars("x", {"$min": ["$$amountOrZero", "$$length"]}), {"$substrCP": ["$$text", "$$amountOrZero", {"$max": ["$$remainder", 0]}]}]
}
}
}
Expand All @@ -267,7 +325,7 @@ function maskReplaceLastPart(strOrNum, amount) {
"remainder": {"$subtract": [{"$strLenCP": "$$text"}, "$$amountOrZero"]},
},
"in": {
"$concat": [{"$substrCP": ["$$text", 0, {"$max": ["$$remainder", 0]}]}, fakeNChars("x", {"$min": ["$$amountOrZero", "$$length"]})]
"$concat": [{"$substrCP": ["$$text", 0, {"$max": ["$$remainder", 0]}]}, fakeNSameChars("x", {"$min": ["$$amountOrZero", "$$length"]})]
}
}
}
Expand All @@ -278,7 +336,7 @@ function maskReplaceLastPart(strOrNum, amount) {

// Replace all the characters in a field's value with 'x's
function maskReplaceAll(strOrNum) {
return fakeNChars("x", {"$strLenCP": {"$toString": strOrNum}})
return fakeNSameChars("x", {"$strLenCP": {"$toString": strOrNum}})
}


Expand Down
70 changes: 64 additions & 6 deletions lib/masksFakesGeneraters.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# Generate a random date between a start point of millis after 01-Jan-1970 and a maximum set of milliseconds after that date
def fakeDateMillisFromEpoch(startMillis, maxMillis):
return {
"$toDate": {"$add": [{"$dateFromString": {"dateString": "1970-01-01"}}, startMillis, {"$multiply": [{"$rand": {}}, maxMillis]}]}
};



# Generate a random date between now and a maximum number of milliseconds from now
def fakeDateAfterNow(maxMillisFromNow):
return {
Expand All @@ -6,7 +14,7 @@ def fakeDateAfterNow(maxMillisFromNow):



# Generate a random date between a maximum number of milliseconds before now and now
# Generate a random date between a maximum number of milliseconds before now
def fakeDateBeforeNow(maxMillisBeforeNow):
return {
"$toDate": {"$subtract": ["$$NOW", {"$multiply": [{"$rand": {}}, maxMillisBeforeNow]}]}
Expand All @@ -29,7 +37,7 @@ def fakeNumber(numberOfDigits):



# Generate a while number between a given minimum and maximum number (inclusive)
# Generate a whole number between a given minimum and maximum number (inclusive)
def fakeNumberBounded(minNumber, maxNumber):
return {
"$toLong": {"$add": [minNumber, {"$floor": {"$multiply": [{"$rand": {}}, {"$subtract": [maxNumber, minNumber]}]}}]}
Expand Down Expand Up @@ -76,6 +84,14 @@ def fakeDecimalSignificantPlaces(maxSignificantPlaces):



# Generate a currency amount with just 2 decimal places and up to a specified number of significant places (e.g. '3' places -> 736.27)
def fakeMoneyAmountDecimal(maxSignificantPlaces):
return {
"$round": [fakeDecimalSignificantPlaces(maxSignificantPlaces), 2]
};



# Generate a True or False value randomly
def fakeBoolean():
return {
Expand Down Expand Up @@ -181,7 +197,7 @@ def fakeListOfSubDocs(numSumDocs, listOfValues):


# Generate string composed of the same character repeated the specified number of times
def fakeNChars(char, amount):
def fakeNSameChars(char, amount):
return {
"$reduce": {
"input": {"$range": [0, amount]},
Expand All @@ -192,6 +208,24 @@ def fakeNChars(char, amount):



# Generate string composed of random English alphabet uppercase characters repeated the specified number of times
def fakeNAnyUpperChars(amount):
return {
"$reduce": {
"input": {"$range": [0, amount]},
"initialValue": "",
"in": {"$concat": ["$$value", fakeValueFromList(["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"])]}
}
};



# Generate string composed of random English alphabet lowercase characters repeated the specified number of times
def fakeNAnyLowerChars(amount):
return {"$toLower": fakeNAnyUpperChars(amount)};



# Generate a typical first name from an internal pre-defined list of common first names
def fakeFirstName():
return fakeValueFromList(["Maria", "Nushi", "Mohammed", "Jose", "Muhammad", "Mohamed", "Wei", "Yan", "John", "David", "Li", "Abdul", "Ana", "Ying", "Michael", "Juan", "Anna", "Mary", "Daniel", "Luis", "Elena", "Marie", "Ibrahim", "Peter", "Sarah", "Xin", "Lin", "Olga"]);
Expand All @@ -204,6 +238,24 @@ def fakeLastName():



# Generate a typical first name and last name from an internal pre-defined list of names
def fakeFirstAndLastName():
return {"$concat": [fakeFirstName(), " ", fakeLastName()]};



# Generate a random email address with random chars for the email id @ one of a few fixed .com domains
def fakeEmailAddress():
return {"$concat": [fakeNAnyLowerChars(6), "@", fakeValueFromList(["mymail.com", "fastmail.com", "acmemail.com"])]};



# Generate a random IPv4 address in text format of 'xxx.xxx.xxx.xxx'
def fakeIPAddress():
return {"$concat": [{"$toString": fakeNumberBounded(0, 255)}, ".", {"$toString": fakeNumberBounded(0, 255)}, ".", {"$toString": fakeNumberBounded(0, 255)}, ".", {"$toString": fakeNumberBounded(0, 255)}]};



# Generate a typical street name from an internal pre-defined list of common street names
def fakeStreetName():
return fakeValueFromList(["Station Road", "Park Road", "Church Street", "Victoria Road", "Green Lane", "Manor Road", "Rue de l'Église", "Grande Rue", "Rue du Moulin", "Rue du Château", "Hauptstraße", "Dorfstraße", "Schulstraße", "Bahnhofstraße", "Fourth Street", "Park Street", "Kerkstraat"]);
Expand All @@ -228,6 +280,12 @@ def fakeZipCode():



# Generate a typical company name from an internal pre-defined list of common company names
def fakeCompanyName():
return fakeValueFromList(["Wonka Industries", "Acme Corp.", "Stark Industries", "Gekko & Co", "Wayne Enterprises", "Cyberdyne Systems", "Genco Pura Olive Oil Company", "Bubba Gump", "Olivia Pope & Associates", "Krusty Krab", "Sterling Cooper", "Soylent", "Hooli", "Good Burger", "Globex Corporation", "Initech", "Umbrella Corporation", "Vehement Capital Partners", "Massive Dynamic"]);



# Replace the first specified number of characters in a field's value with 'x's
def maskReplaceFirstPart(strOrNum, amount):
return {
Expand All @@ -243,7 +301,7 @@ def maskReplaceFirstPart(strOrNum, amount):
"remainder": {"$subtract": [{"$strLenCP": {"$toString": "$$text"}}, "$$amountOrZero"]},
},
"in": {
"$concat": [fakeNChars("x", {"$min": ["$$amountOrZero", "$$length"]}), {"$substrCP": ["$$text", "$$amountOrZero", {"$max": ["$$remainder", 0]}]}]
"$concat": [fakeNSameChars("x", {"$min": ["$$amountOrZero", "$$length"]}), {"$substrCP": ["$$text", "$$amountOrZero", {"$max": ["$$remainder", 0]}]}]
}
}
}
Expand All @@ -267,7 +325,7 @@ def maskReplaceLastPart(strOrNum, amount):
"remainder": {"$subtract": [{"$strLenCP": "$$text"}, "$$amountOrZero"]},
},
"in": {
"$concat": [{"$substrCP": ["$$text", 0, {"$max": ["$$remainder", 0]}]}, fakeNChars("x", {"$min": ["$$amountOrZero", "$$length"]})]
"$concat": [{"$substrCP": ["$$text", 0, {"$max": ["$$remainder", 0]}]}, fakeNSameChars("x", {"$min": ["$$amountOrZero", "$$length"]})]
}
}
}
Expand All @@ -278,7 +336,7 @@ def maskReplaceLastPart(strOrNum, amount):

# Replace all the characters in a field's value with 'x's
def maskReplaceAll(strOrNum):
return fakeNChars("x", {"$strLenCP": {"$toString": strOrNum}})
return fakeNSameChars("x", {"$strLenCP": {"$toString": strOrNum}})



Expand Down
26 changes: 25 additions & 1 deletion lib/masksFakesGeneraters_docs_md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
```javascript
// Generate a random date between a start point of millis after 01-Jan-1970 and a maximum set of milliseconds after that date
fakeDateMillisFromEpoch(startMillis, maxMillis)

// Generate a random date between now and a maximum number of milliseconds from now
fakeDateAfterNow(maxMillisFromNow)

Expand All @@ -20,6 +23,9 @@ fakeDecimal()
// Generate a decimal number with up to a specified number of significant places (e.g. '3' places -> 736.274473638742)
fakeDecimalSignificantPlaces(maxSignificantPlaces)

// Generate a currency amount with just 2 decimal places and up to a specified number of significant places (e.g. '3' places -> 736.27)
fakeMoneyAmountDecimal(maxSignificantPlaces)

// Generate a True or False value randomly
fakeBoolean()

Expand All @@ -36,14 +42,29 @@ fakeValueFromListWeighted(listOfValues)
fakeListOfSubDocs(numSumDocs, listOfValues)

// Generate string composed of the same character repeated the specified number of times
fakeNChars(char, amount)
fakeNSameChars(char, amount)

// Generate string composed of random English alphabet uppercase characters repeated the specified number of times
fakeNAnyUpperChars(amount)

// Generate string composed of random English alphabet lowercase characters repeated the specified number of times
fakeNAnyLowerChars(amount)

// Generate a typical first name from an internal pre-defined list of common first names
fakeFirstName()

// Generate a typical last name from an internal pre-defined list of common last names
fakeLastName()

// Generate a typical first name and last name from an internal pre-defined list of names
fakeFirstAndLastName()

// Generate a random email address with random chars for the email id @ one of a few fixed .com domains
fakeEmailAddress()

// Generate a random IPv4 address in text format of 'xxx.xxx.xxx.xxx'
fakeIPAddress()

// Generate a typical street name from an internal pre-defined list of common street names
fakeStreetName()

Expand All @@ -56,6 +77,9 @@ fakeCountryName()
// Generate a random US-style zipcode/postcode (e.g. 10144)
fakeZipCode()

// Generate a typical company name from an internal pre-defined list of common company names
fakeCompanyName()

// Replace the first specified number of characters in a field's value with 'x's
maskReplaceFirstPart(strOrNum, amount)

Expand Down
Loading

0 comments on commit 4e37384

Please sign in to comment.