-
Notifications
You must be signed in to change notification settings - Fork 0
/
rss.xml
305 lines (255 loc) · 14.1 KB
/
rss.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title><![CDATA[Blog]]></title>
<description><![CDATA[Blog]]></description>
<link>./</link>
<lastBuildDate>Tue, 26 May 2020 22:23:02 +0300</lastBuildDate>
<item>
<title><![CDATA[Leetcode: Check If a String Can Break Another String]]></title>
<description><![CDATA[
<p>
In this post we’re going to solve a medium Leetcode problem called “<a href="https://leetcode.com/problems/check-if-a-string-can-break-another-string/">Check If a String Can Break Another String</a>”, number 1433. It’s nothing special, I just randomly clicked on a link and got there.
</p>
<div id="outline-container-orgaf85b60" class="outline-2">
<h2 id="orgaf85b60">Problem Statement</h2>
<div class="outline-text-2" id="text-orgaf85b60">
<p>
Given two strings: \(s1\) and \(s2\) with the same size, check if some permutation of string \(s1\) can break some permutation of string \(s2\) or vice-versa (in other words \(s2\) can break \(s1\)).
</p>
<p>
A string \(x\) can break string \(y\) (both of size \(n\)) if \(x_i \ge y_i\) (in alphabetical order) for all \(i\) between \(0\) and \(n-1\).
</p>
<p>
Both strings are of the same length \(n\), and \(1 \le n \le 10^5\).
</p>
</div>
</div>
<div id="outline-container-org8c5a8d1" class="outline-2">
<h2 id="org8c5a8d1">Solutions</h2>
<div class="outline-text-2" id="text-org8c5a8d1">
<p>
Next I’m trying to explain, how I could get to the final solution. But in fact, that’s not how it happened. After I read the description, it took me less than a minute to come up with an \(n \log n\) solution.
</p>
</div>
<div id="outline-container-orgd9b07d8" class="outline-3">
<h3 id="orgd9b07d8">Brute-force search</h3>
<div class="outline-text-3" id="text-orgd9b07d8">
<p>
For <a href="https://en.wikipedia.org/wiki/Brute-force_search">exhaustive search</a> we need to generate all permutations of both strings and check if there’s a pair of permutations \((A, B)\), where \(A\) is a permutation of string \(s1\) and \(B\) is a permutation of string \(s2\), such that \(A\) breaks \(B\) or \(B\) breaks \(A\).
</p>
<p>
At first, let’s define a function, that says if a string breaks another string:
</p>
<div class="org-src-container">
<pre class="src src-python">def breaks(A, B): return all(a >= b for a, b in zip(A, B))
</pre>
</div>
<div class="org-src-container">
<pre class="src src-shell">echo $PATH
</pre>
</div>
<p>
The <a href="https://en.wikipedia.org/wiki/Time_complexity">time complexity</a> of this function is \(\Theta(n)\).
</p>
<p>
Python has <a href="https://docs.python.org/3/library/itertools.html#itertools.permutations"><code>itertools.permutations</code></a> function and it makes the bruteforce solution very simple:
</p>
<div class="org-src-container">
<pre class="src src-python">from itertools import permutations
def check_if_can_break(s1: str, s2: str) -> bool:
return any(breaks(A, B) or breaks(B, A)
for A in permutations(s1)
for B in permutations(s2))
</pre>
</div>
<p>
<code>permutations(s1)</code> returns a generator that yields tuples (e.g. <code>('x', 'y', 'z')</code>), not strings. But it doesn’t matter in our case, as the <code>breaks</code> function works with any iterables.
</p>
<p>
What is the time complexity of this solution? To answer this question we need to know the complexity of <code>permutations</code>. Sometimes I forget the answer, but it’s very easy to derive it. For instance, if we need to generate all permutations of string <code>"xyz"</code>, we can put 3 different characters at the first position: <code>x</code>, <code>y</code>, or <code>z</code>. When we have the first spot claimed, we are left with 2 characters, for example, <code>y</code> and <code>z</code>, if we chose <code>x</code> before. So we put <code>y</code> or <code>z</code> on the second spot, and we’re left with only 1 character for the last spot. From this we can see, that the total number of permutations of the string <code>"xyz"</code> is \(3 * 2 * 1 = 6\), in other words it’s \(3!\), and the number of permutations of a string of length \(n\) is \(n!\).
</p>
<p>
Now we can find the complexity of the solution. We have two nested loops. Each runs for \(n!\) iterations and there’s \(\Theta(n)\) call to <code>breaks</code> function in the inner loop. Thus, the complexity is \(O(n!n!n) = O(n!^2 n)\). If \(n = 10^5\), then the algorithm will take \(10^5! 10^5! 10 \approx 8 * 10^{913151}\) operations to finish. That’s a lot.
</p>
<p>
We know that all strings consist of lowercase English letters. Thus we have only 26 different characters in our strings.
</p>
<p>
How many permutations are there of the string <code>"free"</code>? The formula gives us \(4! = 24\). But in fact there are only 12: <code>{"free", "fere", "feer", "rfee", "refe", "reef", "efre", "erfe", "eref", "eefr", "eerf", "efer"}</code>. That formula is only true if all characters in the string are unique. The more general formula looks like this: \(\frac{n!}{m_1! m_2! \dots m_{26}!}\), where \(m_i\) is the number of an \(i\)-th letter occurences in a string. See <a href="https://en.wikipedia.org/wiki/Binomial_coefficient#Generalization_to_multinomials">Multinomial coefficients</a> for details. So for the string <code>"free"</code> that gives us \(\frac{4!}{2!} = 12\).
</p>
<p>
What is the worst case, smallest, denominator? It’s when we have the equal amount of all letters, i.e. \(m_i = \lfloor\frac{n}{26}\rfloor\).
</p>
<p>
Then the complexity of our solution can potentially be lowered down to (double check me here)
</p>
<p>
\[O\left(\left(\frac{n!}{\left(\lfloor\frac{n}{26}\rfloor!\right)^{26}}\right)^2 n\right) .\]
</p>
<p>
For \(n = 10^5\) that gives us around \(1.5 * 10^{282920}\) operations. Quite an improvement! But still more operations than nanoseconds since the birth of the Universe.
</p>
<p>
And to get to this complexity we need to generate only unique permutations. There are algorithms for that:
</p>
<ul class="org-ul">
<li><a href="https://link.springer.com/chapter/10.1007/3-540-46632-0_25">An \(O(1)\) Time Algorithm for Generating Multiset Permutations</a> by Tadao Takaoka (<a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.77.6275&rep=rep1&type=pdf">PDF</a>)</li>
<li>Loopless Generation of Multiset Permutations by Prefix Shifts by Aaron Williams (<a href="https://pdfs.semanticscholar.org/3dbb/b8ded50b1e78bbe2b569704faca959588682.pdf">PDF</a>)</li>
</ul>
<p>
References were found on <a href="https://stackoverflow.com/questions/19676109/how-to-generate-all-the-permutations-of-a-multiset/">StackOverflow</a>. There are simpler slower approaches. But in any case, the brute-force search is too slow, even if we’d have <code>unique_permutations</code> function. We need another approach.
</p>
</div>
</div>
<div id="outline-container-org63019d6" class="outline-3">
<h3 id="org63019d6">Still brute-force</h3>
<div class="outline-text-3" id="text-org63019d6">
<p>
We may note that if a permutation \(a_1 a_2 a_3\) breaks permutation \(b_2 b_1 b_3\), then the same is true for permutations \(a_2 a_3 a_1\) and \(b_1 b_3 b_2\). In other words, we don’t really need to find all permutations of <i>both</i> strings. We can use a fixed permutation of one string and then check all permutations of the other string.
</p>
<div class="org-src-container">
<pre class="src src-python">def check_if_can_break(s1: str, s2: str) -> bool:
return any(breaks(s1, B) or breaks(B, s1)
for B in permutations(s2))
</pre>
</div>
<p>
This gets the complexity down to \(O(n!n)\), which is still very bad. But it leads us to a better solution. What if we don’t just iterate over all permutations of <code>s2</code>, but try to build an appropriate permutation of it directly.
</p>
</div>
</div>
<div id="outline-container-orge3900d9" class="outline-3">
<h3 id="orge3900d9">Linear complexity</h3>
<div class="outline-text-3" id="text-orge3900d9">
<p>
Let’s take two strings, <code>s1 = "abc"</code> and <code>s2 = "xya"</code>.
</p>
<p>
First, let’s take <code>s1</code> as is. For the first character <code>a</code> in this string we need to find some character in the second string such that <code>a</code> is greater or equal to it. The only choice is <code>a</code>. That leaves us with <code>x</code> and <code>y</code>, which means we can’t find a permutation of <code>s2</code> which is breakable by <code>s1</code>, because <code>b</code> is less than either <code>x</code> or <code>y</code>, and there are no other characters in <code>s2</code>.
</p>
<p>
Now let’s swap the strings and use <code>s2</code> to break <code>s1</code>: <code>x</code> is greater than every character of <code>s1</code>, so we can put any of them on the first place in the permutation of <code>s1</code>. But let’s be frugal and take the closest one to <code>x</code>, that is <code>c</code>. Then for <code>y</code> we take <code>b</code>. That leaves us with <code>a</code> for the last place. And we can see that <code>"xya"</code> breaks <code>"cba"</code>.
</p>
<div class="org-src-container">
<pre class="src src-python">def breakable(A: str, B: str) -> Optional[str]:
C = [0]*26
L = []
for b in B: C[ord(b)-ord('a')] += 1
for a in A:
for i in range(ord(a)-ord('a'), -1, -1):
if C[i]:
L.append(chr(ord('a') + i))
C[i] -= 1
break
else:
return None
return ''.join(L)
def check_if_can_break(s1: str, s2: str) -> bool:
return breakable(s1, s2) or breakable(s2, s1)
</pre>
</div>
<p>
The complexity of this solution is \(O(26n)\).
</p>
<p>
[TODO Eliminate 26 constant. Use RB-tree?]
</p>
<p>
Now, as we noted, if we have a pair of strings \((a, b)\), we can apply to both of them any permutation, and the \(\text{breaks}\) relationship between the strings won’t be affected. If \(a \text{ breaks } b\) before permutation, then \(a' \text{ breaks } b'\).
</p>
<p>
Let’s sort string \(a\). Then for the first place in the desired permutation of string \(b\) we need a character that is less or equal to the “smallest” character of the string \(a\). For the second place we need a character that is less or equal to the second “smallest” character of \(a\). This reasoning leads us to conclusion, that if we sort both strings, then we can check if some of them breaks the other, and that’s it.
</p>
<div class="org-src-container">
<pre class="src src-python">def check_if_can_break(self, s1: str, s2: str) -> bool:
A, B = sorted(s1), sorted(s2)
return breaks(A, B) or breaks(B, A)
</pre>
</div>
<p>
This function’s complexity is just \(\Theta(2n \log{n} + 2n) = \Theta(n \log{n})\).
</p>
<p>
Note that \(\log_2 100000 \approx 16.6 < 26\), so this solution is faster given the provided limits.
</p>
<p>
The last solution leads us to another idea. How can we apply the same logic but without sorting? Or better, how could we sort our strings in \(O(n)\)? Well, remember, that we still have only 26 different elements, and it allows us to use <a href="https://en.wikipedia.org/wiki/Counting_sort">counting sort</a>:
</p>
<div class="org-src-container">
<pre class="src src-python">def count(s: str) -> List[int]:
C, a = [0]*26, ord('a')
for c in s: C[ord(c)-a] += 1
return C
</pre>
</div>
<p>
This function returns a list of 26 integers for each letter in the English alphabet. For instance, for word <code>"free"</code> it would return <code>[0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]</code>. Here we see that there are two <code>e</code>’s in the string, one <code>f</code>, and one <code>r</code>.
</p>
<p>
And we need to write a new version of <code>breaks</code> function, that works with these lists:
</p>
<div class="org-src-container">
<pre class="src src-python">def breaks(A: List[int], B: List[int]) -> bool:
s = 0
for a, b in zip(A, B):
s += a-b
if s < 0: return False
return True
</pre>
</div>
<p>
And now the solution is basically the same as before:
</p>
<div class="org-src-container">
<pre class="src src-python">def check_if_can_break(s1: str, s2: str) -> bool:
C1, C2 = count(s1), count(s2)
return breaks(C1, C2) or breaks(C2, C1)
</pre>
</div>
<p>
We could also use <a href="https://docs.python.org/3/library/collections.html#collections.Counter"><code>collections.Counter</code></a> to count sort our strings:
</p>
<div class="org-src-container">
<pre class="src src-python">from collections import Counter
from string import ascii_lowercase
def breaks(A: Counter, B: Counter) -> bool:
s = 0
for c in ascii_lowercase:
s += A[c]-B[c]
if s < 0: return False
return True
def check_if_can_break(s1: str, s2: str) -> bool:
C1, C2 = Counter(s1), Counter(s2)
return breaks(C1, C2) or breaks(C2, C1)
</pre>
</div>
</div>
</div>
<div id="outline-container-org00e3f50" class="outline-3">
<h3 id="org00e3f50"><span class="todo DOING">DOING</span> Benchmarks</h3>
<div class="outline-text-3" id="text-org00e3f50">
<p>
Before we can benchmark our solutions, we need to generate test cases. We could scrape them from Leetcode (it is possible—you could’ve seen some fast solutions on Leetcode that are basically an if-else tree, looking up an answer to a predetermined input), but it’s a very important skill to come up with good test cases ourselves.
</p>
<p>
Generally you need to check all corner cases, empty and large inputs.
</p>
<p>
Use <a href="https://hypothesis.readthedocs.io/en/latest/">Hypothesis</a> to generate test cases?
</p>
<p>
Use <a href="https://seaborn.pydata.org/">seaborn</a> to plot benchmark results?
</p>
</div>
</div>
</div>
<div class="taglist"><a href="./tags.html">Tags</a>: <a href="./tag-leetcode.html">leetcode</a> <a href="./tag-python.html">python</a> <a href="./tag-permutations.html">permutations</a> </div>]]></description>
<category><![CDATA[leetcode]]></category>
<category><![CDATA[python]]></category>
<category><![CDATA[permutations]]></category>
<link>./leetcode-check-if-a-string-can-break-another-string.html</link>
<pubDate>Tue, 12 May 2020 22:28:00 +0300</pubDate>
</item>
</channel>
</rss>