|
192 | 192 | " print(binary_search(lst=lst, item=k))"
|
193 | 193 | ]
|
194 | 194 | },
|
| 195 | + { |
| 196 | + "cell_type": "markdown", |
| 197 | + "metadata": {}, |
| 198 | + "source": [ |
| 199 | + "## Example 2 -- Finding the Majority Element" |
| 200 | + ] |
| 201 | + }, |
| 202 | + { |
| 203 | + "cell_type": "markdown", |
| 204 | + "metadata": {}, |
| 205 | + "source": [ |
| 206 | + "\"Finding the Majority Element\" is a problem where we want to find an element in an array positive integers with length *n* that occurs more than *n/2* in that array. For example, if we have an array $a = [1, 2, 3, 3, 3]$, $3$ would be the majority element. In another array, b = [1, 2, 3, 3] there exists no majority element, since $2$ (where $2$ is the the count of element $3$) is not greater than $n / 2$.\n", |
| 207 | + "\n", |
| 208 | + "Let's start with a simple implementation where we count how often each unique element occurs in the array. Then, we return the element that meets the criterion \"$\\text{occurences } > n / 2$\", and if such an element does not exist, we return -1. Note that we return a tuple of three items: (element, number_occurences, count_dictionary), which we will use later ..." |
| 209 | + ] |
| 210 | + }, |
| 211 | + { |
| 212 | + "cell_type": "code", |
| 213 | + "execution_count": 7, |
| 214 | + "metadata": { |
| 215 | + "collapsed": false |
| 216 | + }, |
| 217 | + "outputs": [ |
| 218 | + { |
| 219 | + "name": "stdout", |
| 220 | + "output_type": "stream", |
| 221 | + "text": [ |
| 222 | + "[] -> -1\n", |
| 223 | + "[1, 2, 3, 4, 4, 5] -> -1\n", |
| 224 | + "[1, 2, 4, 4, 4, 5] -> -1\n", |
| 225 | + "[4, 2, 4, 4, 4, 5] -> 4\n", |
| 226 | + "[5, 4, 4, 4, 2, 4] -> 4\n", |
| 227 | + "[2, 3, 9, 2, 2] -> 2\n", |
| 228 | + "[2, 2, 9, 3, 2] -> 2\n", |
| 229 | + "[0, 0, 2, 2, 2] -> 2\n", |
| 230 | + "[2, 2, 2, 0, 0] -> 2\n" |
| 231 | + ] |
| 232 | + } |
| 233 | + ], |
| 234 | + "source": [ |
| 235 | + "def majority_ele_lin(lst): \n", |
| 236 | + " cnt = {}\n", |
| 237 | + " for ele in lst:\n", |
| 238 | + " if ele not in cnt:\n", |
| 239 | + " cnt[ele] = 1\n", |
| 240 | + " else:\n", |
| 241 | + " cnt[ele] += 1\n", |
| 242 | + " for ele, c in cnt.items():\n", |
| 243 | + " if c > (len(lst) // 2):\n", |
| 244 | + " return (ele, c, cnt)\n", |
| 245 | + " return (-1, -1, cnt)\n", |
| 246 | + "\n", |
| 247 | + "###################################################\n", |
| 248 | + "\n", |
| 249 | + "lst0 = []\n", |
| 250 | + "print(lst0, '->', majority_ele_lin(lst=lst0)[0])\n", |
| 251 | + "\n", |
| 252 | + "lst1 = [1, 2, 3, 4, 4, 5]\n", |
| 253 | + "print(lst1, '->', majority_ele_lin(lst=lst1)[0])\n", |
| 254 | + "\n", |
| 255 | + "lst2 = [1, 2, 4, 4, 4, 5]\n", |
| 256 | + "print(lst2, '->', majority_ele_lin(lst=lst2)[0])\n", |
| 257 | + "\n", |
| 258 | + "lst3 = [4, 2, 4, 4, 4, 5]\n", |
| 259 | + "print(lst3, '->', majority_ele_lin(lst=lst3)[0])\n", |
| 260 | + "print(lst3[::-1], '->', majority_ele_lin(lst=lst3[::-1])[0])\n", |
| 261 | + "\n", |
| 262 | + "lst4 = [2, 3, 9, 2, 2]\n", |
| 263 | + "print(lst4, '->',majority_ele_lin(lst=lst4)[0])\n", |
| 264 | + "print(lst4[::-1], '->', majority_ele_lin(lst=lst4[::-1])[0])\n", |
| 265 | + "\n", |
| 266 | + "lst5 = [0, 0, 2, 2, 2]\n", |
| 267 | + "print(lst5, '->',majority_ele_lin(lst=lst5)[0])\n", |
| 268 | + "print(lst5[::-1], '->', majority_ele_lin(lst=lst5[::-1])[0])" |
| 269 | + ] |
| 270 | + }, |
| 271 | + { |
| 272 | + "cell_type": "markdown", |
| 273 | + "metadata": {}, |
| 274 | + "source": [ |
| 275 | + "Now, \"finding the majority element\" is a nice task for a Divide and Conquer algorithm. Here, we use the fact that if a list has a majority element it is also the majority element of one of its two sublists, if we split it into 2 halves. \n", |
| 276 | + "\n", |
| 277 | + "More concretely, what we do is:\n", |
| 278 | + "\n", |
| 279 | + "1. Split the array into 2 halves\n", |
| 280 | + "2. Run the majority element search on each of the two halves\n", |
| 281 | + "3. Combine the 2 subresults\n", |
| 282 | + " 1. Neither of the 2 sub-arrays has a majority element; thus, the combined list can't have a majority element so that we return -1\n", |
| 283 | + " 2. The right sub-array has a majority element, whereas the left sub-array hasn't. Now, we need to take the count of this \"right\" majority element, add the number of times it occurs in the left sub-array, and check if the combined count satisfies the \"$\\text{occurences} > \\frac{n}{2}$\" criterion.\n", |
| 284 | + " 3. Same as above but with \"left\" and \"right\" sub-array swapped in the comparison.\n", |
| 285 | + " 4. Both sub-arrays have an majority element. Compute the combined count of each of the elements as before and check whether one of these elements satisfies the \"$\\text{occurences} > \\frac{n}{2}$\" criterion." |
| 286 | + ] |
| 287 | + }, |
| 288 | + { |
| 289 | + "cell_type": "code", |
| 290 | + "execution_count": 8, |
| 291 | + "metadata": { |
| 292 | + "collapsed": false |
| 293 | + }, |
| 294 | + "outputs": [ |
| 295 | + { |
| 296 | + "name": "stdout", |
| 297 | + "output_type": "stream", |
| 298 | + "text": [ |
| 299 | + "[] -> -1\n", |
| 300 | + "[1, 2, 3, 4, 4, 5] -> -1\n", |
| 301 | + "[1, 2, 4, 4, 4, 5] -> -1\n", |
| 302 | + "[4, 2, 4, 4, 4, 5] -> 4\n", |
| 303 | + "[5, 4, 4, 4, 2, 4] -> 4\n", |
| 304 | + "[2, 3, 9, 2, 2] -> 2\n", |
| 305 | + "[2, 2, 9, 3, 2] -> 2\n", |
| 306 | + "[0, 0, 2, 2, 2] -> 3\n", |
| 307 | + "[2, 2, 2, 0, 0] -> 3\n" |
| 308 | + ] |
| 309 | + } |
| 310 | + ], |
| 311 | + "source": [ |
| 312 | + "def majority_ele_dac(lst): \n", |
| 313 | + " \n", |
| 314 | + " n = len(lst)\n", |
| 315 | + " left = lst[:n // 2]\n", |
| 316 | + " right = lst[n // 2:]\n", |
| 317 | + " \n", |
| 318 | + " l_maj = majority_ele_lin(left)\n", |
| 319 | + " r_maj = majority_ele_lin(right)\n", |
| 320 | + " \n", |
| 321 | + " # case 3A\n", |
| 322 | + " if l_maj[0] == -1 and r_maj[0] == -1:\n", |
| 323 | + " return -1\n", |
| 324 | + " \n", |
| 325 | + " # case 3B\n", |
| 326 | + " elif l_maj[0] == -1 and r_maj[0] > -1:\n", |
| 327 | + " cnt = r_maj[1]\n", |
| 328 | + " if r_maj[0] in l_maj[2]:\n", |
| 329 | + " cnt += l_maj[2][r_maj[0]]\n", |
| 330 | + " if cnt > n // 2:\n", |
| 331 | + " return r_maj[0]\n", |
| 332 | + " \n", |
| 333 | + " # case 3C\n", |
| 334 | + " elif r_maj[0] == -1 and l_maj[0] > -1:\n", |
| 335 | + " cnt = l_maj[1]\n", |
| 336 | + " if l_maj[0] in r_maj[2]:\n", |
| 337 | + " cnt += r_maj[2][l_maj[0]]\n", |
| 338 | + " if cnt > n // 2:\n", |
| 339 | + " return l_maj[0]\n", |
| 340 | + " \n", |
| 341 | + " # case 3D\n", |
| 342 | + " else: \n", |
| 343 | + " c1, c2 = l_maj[1], r_maj[1]\n", |
| 344 | + " if l_maj[0] in r_maj[2]:\n", |
| 345 | + " c1 = l_maj[1] + r_maj[2][l_maj[0]]\n", |
| 346 | + " if r_maj[0] in l_maj[2]:\n", |
| 347 | + " c2 = r_maj[1] + l_maj[2][r_maj[0]]\n", |
| 348 | + " m = max(c1, c2)\n", |
| 349 | + " if m > n // 2:\n", |
| 350 | + " return m\n", |
| 351 | + " return -1\n", |
| 352 | + "\n", |
| 353 | + "###################################################\n", |
| 354 | + "\n", |
| 355 | + "lst0 = []\n", |
| 356 | + "print(lst0, '->', majority_ele_dac(lst=lst0))\n", |
| 357 | + "\n", |
| 358 | + "lst1 = [1, 2, 3, 4, 4, 5]\n", |
| 359 | + "print(lst1, '->', majority_ele_dac(lst=lst1))\n", |
| 360 | + "\n", |
| 361 | + "lst2 = [1, 2, 4, 4, 4, 5]\n", |
| 362 | + "print(lst2, '->', majority_ele_dac(lst=lst2))\n", |
| 363 | + "\n", |
| 364 | + "lst3 = [4, 2, 4, 4, 4, 5]\n", |
| 365 | + "print(lst3, '->', majority_ele_dac(lst=lst3))\n", |
| 366 | + "print(lst3[::-1], '->', majority_ele_dac(lst=lst3[::-1]))\n", |
| 367 | + "\n", |
| 368 | + "lst4 = [2, 3, 9, 2, 2]\n", |
| 369 | + "print(lst4, '->',majority_ele_dac(lst=lst4))\n", |
| 370 | + "print(lst4[::-1], '->', majority_ele_dac(lst=lst4[::-1]))\n", |
| 371 | + "\n", |
| 372 | + "lst5 = [0, 0, 2, 2, 2]\n", |
| 373 | + "print(lst5, '->',majority_ele_dac(lst=lst5))\n", |
| 374 | + "print(lst5[::-1], '->', majority_ele_dac(lst=lst5[::-1]))" |
| 375 | + ] |
| 376 | + }, |
| 377 | + { |
| 378 | + "cell_type": "markdown", |
| 379 | + "metadata": {}, |
| 380 | + "source": [ |
| 381 | + "#### Adding multiprocessing" |
| 382 | + ] |
| 383 | + }, |
| 384 | + { |
| 385 | + "cell_type": "markdown", |
| 386 | + "metadata": {}, |
| 387 | + "source": [ |
| 388 | + "Our Divide and Conquer approach above is actually a good candidate for multi-processing, since we can parallelize the majority element search in the two sub-lists. So, let's make a simple modification and use Python's `multiprocessing` module for that. Here, we use the `apply_async` method from the `Pool` class, which doesn't return the results in order (in contrast to the `apply` method). Thus, the left sublist and right sublist may be swapped in the variable assignment `l_maj, r_maj = [p.get() for p in results]`. However, for our implementation, this doesn't make a difference." |
| 389 | + ] |
| 390 | + }, |
| 391 | + { |
| 392 | + "cell_type": "code", |
| 393 | + "execution_count": 9, |
| 394 | + "metadata": { |
| 395 | + "collapsed": false |
| 396 | + }, |
| 397 | + "outputs": [ |
| 398 | + { |
| 399 | + "name": "stdout", |
| 400 | + "output_type": "stream", |
| 401 | + "text": [ |
| 402 | + "[] -> -1\n", |
| 403 | + "[1, 2, 3, 4, 4, 5] -> -1\n", |
| 404 | + "[1, 2, 4, 4, 4, 5] -> -1\n", |
| 405 | + "[4, 2, 4, 4, 4, 5] -> 4\n", |
| 406 | + "[5, 4, 4, 4, 2, 4] -> 4\n", |
| 407 | + "[2, 3, 9, 2, 2] -> 2\n", |
| 408 | + "[2, 2, 9, 3, 2] -> 2\n", |
| 409 | + "[0, 0, 2, 2, 2] -> 3\n", |
| 410 | + "[2, 2, 2, 0, 0] -> 3\n" |
| 411 | + ] |
| 412 | + } |
| 413 | + ], |
| 414 | + "source": [ |
| 415 | + "import multiprocessing as mp\n", |
| 416 | + "\n", |
| 417 | + "def majority_ele_dac_mp(lst): \n", |
| 418 | + " \n", |
| 419 | + " n = len(lst)\n", |
| 420 | + " left = lst[:n // 2]\n", |
| 421 | + " right = lst[n // 2:]\n", |
| 422 | + " \n", |
| 423 | + " results = (pool.apply_async(majority_ele_lin, args=(x,)) \n", |
| 424 | + " for x in (left, right))\n", |
| 425 | + " l_maj, r_maj = [p.get() for p in results]\n", |
| 426 | + " \n", |
| 427 | + " if l_maj[0] == -1 and r_maj[0] == -1:\n", |
| 428 | + " return -1\n", |
| 429 | + " \n", |
| 430 | + " elif l_maj[0] == -1 and r_maj[0] > -1:\n", |
| 431 | + " cnt = r_maj[1]\n", |
| 432 | + " if r_maj[0] in l_maj[2]:\n", |
| 433 | + " cnt += l_maj[2][r_maj[0]]\n", |
| 434 | + " if cnt > n // 2:\n", |
| 435 | + " return r_maj[0]\n", |
| 436 | + " \n", |
| 437 | + " elif r_maj[0] == -1 and l_maj[0] > -1:\n", |
| 438 | + " cnt = l_maj[1]\n", |
| 439 | + " if l_maj[0] in r_maj[2]:\n", |
| 440 | + " cnt += r_maj[2][l_maj[0]]\n", |
| 441 | + " if cnt > n // 2:\n", |
| 442 | + " return l_maj[0]\n", |
| 443 | + " \n", |
| 444 | + " else: \n", |
| 445 | + " c1, c2 = l_maj[1], r_maj[1]\n", |
| 446 | + " if l_maj[0] in r_maj[2]:\n", |
| 447 | + " c1 = l_maj[1] + r_maj[2][l_maj[0]]\n", |
| 448 | + " if r_maj[0] in l_maj[2]:\n", |
| 449 | + " c2 = r_maj[1] + l_maj[2][r_maj[0]]\n", |
| 450 | + " m = max(c1, c2)\n", |
| 451 | + " if m > n // 2:\n", |
| 452 | + " return m\n", |
| 453 | + " return -1\n", |
| 454 | + "\n", |
| 455 | + "###################################################\n", |
| 456 | + "\n", |
| 457 | + "lst0 = []\n", |
| 458 | + "print(lst0, '->', majority_ele_dac(lst=lst0))\n", |
| 459 | + "\n", |
| 460 | + "lst1 = [1, 2, 3, 4, 4, 5]\n", |
| 461 | + "print(lst1, '->', majority_ele_dac(lst=lst1))\n", |
| 462 | + "\n", |
| 463 | + "lst2 = [1, 2, 4, 4, 4, 5]\n", |
| 464 | + "print(lst2, '->', majority_ele_dac(lst=lst2))\n", |
| 465 | + "\n", |
| 466 | + "lst3 = [4, 2, 4, 4, 4, 5]\n", |
| 467 | + "print(lst3, '->', majority_ele_dac(lst=lst3))\n", |
| 468 | + "print(lst3[::-1], '->', majority_ele_dac(lst=lst3[::-1]))\n", |
| 469 | + "\n", |
| 470 | + "lst4 = [2, 3, 9, 2, 2]\n", |
| 471 | + "print(lst4, '->',majority_ele_dac(lst=lst4))\n", |
| 472 | + "print(lst4[::-1], '->', majority_ele_dac(lst=lst4[::-1]))\n", |
| 473 | + "\n", |
| 474 | + "lst5 = [0, 0, 2, 2, 2]\n", |
| 475 | + "print(lst5, '->',majority_ele_dac(lst=lst5))\n", |
| 476 | + "print(lst5[::-1], '->', majority_ele_dac(lst=lst5[::-1]))" |
| 477 | + ] |
| 478 | + }, |
195 | 479 | {
|
196 | 480 | "cell_type": "markdown",
|
197 | 481 | "metadata": {},
|
|
0 commit comments