Skip to content

Commit 1d6dcf8

Browse files
committed
divide and conquer with multiprocessing
1 parent 869ea2f commit 1d6dcf8

File tree

1 file changed

+284
-0
lines changed

1 file changed

+284
-0
lines changed

ipython_nbs/essentials/divide-and-conquer-algorithm-intro.ipynb

Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,290 @@
192192
" print(binary_search(lst=lst, item=k))"
193193
]
194194
},
195+
{
196+
"cell_type": "markdown",
197+
"metadata": {},
198+
"source": [
199+
"## Example 2 -- Finding the Majority Element"
200+
]
201+
},
202+
{
203+
"cell_type": "markdown",
204+
"metadata": {},
205+
"source": [
206+
"\"Finding the Majority Element\" is a problem where we want to find an element in an array positive integers with length *n* that occurs more than *n/2* in that array. For example, if we have an array $a = [1, 2, 3, 3, 3]$, $3$ would be the majority element. In another array, b = [1, 2, 3, 3] there exists no majority element, since $2$ (where $2$ is the the count of element $3$) is not greater than $n / 2$.\n",
207+
"\n",
208+
"Let's start with a simple implementation where we count how often each unique element occurs in the array. Then, we return the element that meets the criterion \"$\\text{occurences } > n / 2$\", and if such an element does not exist, we return -1. Note that we return a tuple of three items: (element, number_occurences, count_dictionary), which we will use later ..."
209+
]
210+
},
211+
{
212+
"cell_type": "code",
213+
"execution_count": 7,
214+
"metadata": {
215+
"collapsed": false
216+
},
217+
"outputs": [
218+
{
219+
"name": "stdout",
220+
"output_type": "stream",
221+
"text": [
222+
"[] -> -1\n",
223+
"[1, 2, 3, 4, 4, 5] -> -1\n",
224+
"[1, 2, 4, 4, 4, 5] -> -1\n",
225+
"[4, 2, 4, 4, 4, 5] -> 4\n",
226+
"[5, 4, 4, 4, 2, 4] -> 4\n",
227+
"[2, 3, 9, 2, 2] -> 2\n",
228+
"[2, 2, 9, 3, 2] -> 2\n",
229+
"[0, 0, 2, 2, 2] -> 2\n",
230+
"[2, 2, 2, 0, 0] -> 2\n"
231+
]
232+
}
233+
],
234+
"source": [
235+
"def majority_ele_lin(lst): \n",
236+
" cnt = {}\n",
237+
" for ele in lst:\n",
238+
" if ele not in cnt:\n",
239+
" cnt[ele] = 1\n",
240+
" else:\n",
241+
" cnt[ele] += 1\n",
242+
" for ele, c in cnt.items():\n",
243+
" if c > (len(lst) // 2):\n",
244+
" return (ele, c, cnt)\n",
245+
" return (-1, -1, cnt)\n",
246+
"\n",
247+
"###################################################\n",
248+
"\n",
249+
"lst0 = []\n",
250+
"print(lst0, '->', majority_ele_lin(lst=lst0)[0])\n",
251+
"\n",
252+
"lst1 = [1, 2, 3, 4, 4, 5]\n",
253+
"print(lst1, '->', majority_ele_lin(lst=lst1)[0])\n",
254+
"\n",
255+
"lst2 = [1, 2, 4, 4, 4, 5]\n",
256+
"print(lst2, '->', majority_ele_lin(lst=lst2)[0])\n",
257+
"\n",
258+
"lst3 = [4, 2, 4, 4, 4, 5]\n",
259+
"print(lst3, '->', majority_ele_lin(lst=lst3)[0])\n",
260+
"print(lst3[::-1], '->', majority_ele_lin(lst=lst3[::-1])[0])\n",
261+
"\n",
262+
"lst4 = [2, 3, 9, 2, 2]\n",
263+
"print(lst4, '->',majority_ele_lin(lst=lst4)[0])\n",
264+
"print(lst4[::-1], '->', majority_ele_lin(lst=lst4[::-1])[0])\n",
265+
"\n",
266+
"lst5 = [0, 0, 2, 2, 2]\n",
267+
"print(lst5, '->',majority_ele_lin(lst=lst5)[0])\n",
268+
"print(lst5[::-1], '->', majority_ele_lin(lst=lst5[::-1])[0])"
269+
]
270+
},
271+
{
272+
"cell_type": "markdown",
273+
"metadata": {},
274+
"source": [
275+
"Now, \"finding the majority element\" is a nice task for a Divide and Conquer algorithm. Here, we use the fact that if a list has a majority element it is also the majority element of one of its two sublists, if we split it into 2 halves. \n",
276+
"\n",
277+
"More concretely, what we do is:\n",
278+
"\n",
279+
"1. Split the array into 2 halves\n",
280+
"2. Run the majority element search on each of the two halves\n",
281+
"3. Combine the 2 subresults\n",
282+
" 1. Neither of the 2 sub-arrays has a majority element; thus, the combined list can't have a majority element so that we return -1\n",
283+
" 2. The right sub-array has a majority element, whereas the left sub-array hasn't. Now, we need to take the count of this \"right\" majority element, add the number of times it occurs in the left sub-array, and check if the combined count satisfies the \"$\\text{occurences} > \\frac{n}{2}$\" criterion.\n",
284+
" 3. Same as above but with \"left\" and \"right\" sub-array swapped in the comparison.\n",
285+
" 4. Both sub-arrays have an majority element. Compute the combined count of each of the elements as before and check whether one of these elements satisfies the \"$\\text{occurences} > \\frac{n}{2}$\" criterion."
286+
]
287+
},
288+
{
289+
"cell_type": "code",
290+
"execution_count": 8,
291+
"metadata": {
292+
"collapsed": false
293+
},
294+
"outputs": [
295+
{
296+
"name": "stdout",
297+
"output_type": "stream",
298+
"text": [
299+
"[] -> -1\n",
300+
"[1, 2, 3, 4, 4, 5] -> -1\n",
301+
"[1, 2, 4, 4, 4, 5] -> -1\n",
302+
"[4, 2, 4, 4, 4, 5] -> 4\n",
303+
"[5, 4, 4, 4, 2, 4] -> 4\n",
304+
"[2, 3, 9, 2, 2] -> 2\n",
305+
"[2, 2, 9, 3, 2] -> 2\n",
306+
"[0, 0, 2, 2, 2] -> 3\n",
307+
"[2, 2, 2, 0, 0] -> 3\n"
308+
]
309+
}
310+
],
311+
"source": [
312+
"def majority_ele_dac(lst): \n",
313+
" \n",
314+
" n = len(lst)\n",
315+
" left = lst[:n // 2]\n",
316+
" right = lst[n // 2:]\n",
317+
" \n",
318+
" l_maj = majority_ele_lin(left)\n",
319+
" r_maj = majority_ele_lin(right)\n",
320+
" \n",
321+
" # case 3A\n",
322+
" if l_maj[0] == -1 and r_maj[0] == -1:\n",
323+
" return -1\n",
324+
" \n",
325+
" # case 3B\n",
326+
" elif l_maj[0] == -1 and r_maj[0] > -1:\n",
327+
" cnt = r_maj[1]\n",
328+
" if r_maj[0] in l_maj[2]:\n",
329+
" cnt += l_maj[2][r_maj[0]]\n",
330+
" if cnt > n // 2:\n",
331+
" return r_maj[0]\n",
332+
" \n",
333+
" # case 3C\n",
334+
" elif r_maj[0] == -1 and l_maj[0] > -1:\n",
335+
" cnt = l_maj[1]\n",
336+
" if l_maj[0] in r_maj[2]:\n",
337+
" cnt += r_maj[2][l_maj[0]]\n",
338+
" if cnt > n // 2:\n",
339+
" return l_maj[0]\n",
340+
" \n",
341+
" # case 3D\n",
342+
" else: \n",
343+
" c1, c2 = l_maj[1], r_maj[1]\n",
344+
" if l_maj[0] in r_maj[2]:\n",
345+
" c1 = l_maj[1] + r_maj[2][l_maj[0]]\n",
346+
" if r_maj[0] in l_maj[2]:\n",
347+
" c2 = r_maj[1] + l_maj[2][r_maj[0]]\n",
348+
" m = max(c1, c2)\n",
349+
" if m > n // 2:\n",
350+
" return m\n",
351+
" return -1\n",
352+
"\n",
353+
"###################################################\n",
354+
"\n",
355+
"lst0 = []\n",
356+
"print(lst0, '->', majority_ele_dac(lst=lst0))\n",
357+
"\n",
358+
"lst1 = [1, 2, 3, 4, 4, 5]\n",
359+
"print(lst1, '->', majority_ele_dac(lst=lst1))\n",
360+
"\n",
361+
"lst2 = [1, 2, 4, 4, 4, 5]\n",
362+
"print(lst2, '->', majority_ele_dac(lst=lst2))\n",
363+
"\n",
364+
"lst3 = [4, 2, 4, 4, 4, 5]\n",
365+
"print(lst3, '->', majority_ele_dac(lst=lst3))\n",
366+
"print(lst3[::-1], '->', majority_ele_dac(lst=lst3[::-1]))\n",
367+
"\n",
368+
"lst4 = [2, 3, 9, 2, 2]\n",
369+
"print(lst4, '->',majority_ele_dac(lst=lst4))\n",
370+
"print(lst4[::-1], '->', majority_ele_dac(lst=lst4[::-1]))\n",
371+
"\n",
372+
"lst5 = [0, 0, 2, 2, 2]\n",
373+
"print(lst5, '->',majority_ele_dac(lst=lst5))\n",
374+
"print(lst5[::-1], '->', majority_ele_dac(lst=lst5[::-1]))"
375+
]
376+
},
377+
{
378+
"cell_type": "markdown",
379+
"metadata": {},
380+
"source": [
381+
"#### Adding multiprocessing"
382+
]
383+
},
384+
{
385+
"cell_type": "markdown",
386+
"metadata": {},
387+
"source": [
388+
"Our Divide and Conquer approach above is actually a good candidate for multi-processing, since we can parallelize the majority element search in the two sub-lists. So, let's make a simple modification and use Python's `multiprocessing` module for that. Here, we use the `apply_async` method from the `Pool` class, which doesn't return the results in order (in contrast to the `apply` method). Thus, the left sublist and right sublist may be swapped in the variable assignment `l_maj, r_maj = [p.get() for p in results]`. However, for our implementation, this doesn't make a difference."
389+
]
390+
},
391+
{
392+
"cell_type": "code",
393+
"execution_count": 9,
394+
"metadata": {
395+
"collapsed": false
396+
},
397+
"outputs": [
398+
{
399+
"name": "stdout",
400+
"output_type": "stream",
401+
"text": [
402+
"[] -> -1\n",
403+
"[1, 2, 3, 4, 4, 5] -> -1\n",
404+
"[1, 2, 4, 4, 4, 5] -> -1\n",
405+
"[4, 2, 4, 4, 4, 5] -> 4\n",
406+
"[5, 4, 4, 4, 2, 4] -> 4\n",
407+
"[2, 3, 9, 2, 2] -> 2\n",
408+
"[2, 2, 9, 3, 2] -> 2\n",
409+
"[0, 0, 2, 2, 2] -> 3\n",
410+
"[2, 2, 2, 0, 0] -> 3\n"
411+
]
412+
}
413+
],
414+
"source": [
415+
"import multiprocessing as mp\n",
416+
"\n",
417+
"def majority_ele_dac_mp(lst): \n",
418+
" \n",
419+
" n = len(lst)\n",
420+
" left = lst[:n // 2]\n",
421+
" right = lst[n // 2:]\n",
422+
" \n",
423+
" results = (pool.apply_async(majority_ele_lin, args=(x,)) \n",
424+
" for x in (left, right))\n",
425+
" l_maj, r_maj = [p.get() for p in results]\n",
426+
" \n",
427+
" if l_maj[0] == -1 and r_maj[0] == -1:\n",
428+
" return -1\n",
429+
" \n",
430+
" elif l_maj[0] == -1 and r_maj[0] > -1:\n",
431+
" cnt = r_maj[1]\n",
432+
" if r_maj[0] in l_maj[2]:\n",
433+
" cnt += l_maj[2][r_maj[0]]\n",
434+
" if cnt > n // 2:\n",
435+
" return r_maj[0]\n",
436+
" \n",
437+
" elif r_maj[0] == -1 and l_maj[0] > -1:\n",
438+
" cnt = l_maj[1]\n",
439+
" if l_maj[0] in r_maj[2]:\n",
440+
" cnt += r_maj[2][l_maj[0]]\n",
441+
" if cnt > n // 2:\n",
442+
" return l_maj[0]\n",
443+
" \n",
444+
" else: \n",
445+
" c1, c2 = l_maj[1], r_maj[1]\n",
446+
" if l_maj[0] in r_maj[2]:\n",
447+
" c1 = l_maj[1] + r_maj[2][l_maj[0]]\n",
448+
" if r_maj[0] in l_maj[2]:\n",
449+
" c2 = r_maj[1] + l_maj[2][r_maj[0]]\n",
450+
" m = max(c1, c2)\n",
451+
" if m > n // 2:\n",
452+
" return m\n",
453+
" return -1\n",
454+
"\n",
455+
"###################################################\n",
456+
"\n",
457+
"lst0 = []\n",
458+
"print(lst0, '->', majority_ele_dac(lst=lst0))\n",
459+
"\n",
460+
"lst1 = [1, 2, 3, 4, 4, 5]\n",
461+
"print(lst1, '->', majority_ele_dac(lst=lst1))\n",
462+
"\n",
463+
"lst2 = [1, 2, 4, 4, 4, 5]\n",
464+
"print(lst2, '->', majority_ele_dac(lst=lst2))\n",
465+
"\n",
466+
"lst3 = [4, 2, 4, 4, 4, 5]\n",
467+
"print(lst3, '->', majority_ele_dac(lst=lst3))\n",
468+
"print(lst3[::-1], '->', majority_ele_dac(lst=lst3[::-1]))\n",
469+
"\n",
470+
"lst4 = [2, 3, 9, 2, 2]\n",
471+
"print(lst4, '->',majority_ele_dac(lst=lst4))\n",
472+
"print(lst4[::-1], '->', majority_ele_dac(lst=lst4[::-1]))\n",
473+
"\n",
474+
"lst5 = [0, 0, 2, 2, 2]\n",
475+
"print(lst5, '->',majority_ele_dac(lst=lst5))\n",
476+
"print(lst5[::-1], '->', majority_ele_dac(lst=lst5[::-1]))"
477+
]
478+
},
195479
{
196480
"cell_type": "markdown",
197481
"metadata": {},

0 commit comments

Comments
 (0)