{ "cells": [ { "cell_type": "markdown", "id": "94d6cef4", "metadata": {}, "source": [ "# 3. Memory-Optimized Mutiprocess Example \n", "\n", "\n", "This notebook demonstrates how to effectively run a heavy model with a large data set.\n", "The demonstration utilizes techniques such as:\n", "\n", "* Multiprocessing using [ipyparallel]\n", "* Memory-optimized runs introduced in [modelx v0.19.0](https://docs.modelx.io/en/latest/releases/relnotes_v0_19_0.html)\n", "\n", "**Multiprocessing by ipyparallel**\n", "\n", "Multiprocessing is a technique to make use of multiple CPU cores on a machine or on multiple machines\n", "by splitting a task into multiple processes.\n", "The multiple processes run in parallel on the multiple CPU cores, reducing the total run time required to finish the task.\n", "The [ipyparallel] package makes it easy and simple to perform multiprocessing.\n", "[ipyparallel] provides the following functions that are required for multiprocessing:\n", "\n", "\n", "* Handling both synchronous and asynchronous communications.\n", "* Sending and receiving numpy and pandas objects fast to and from engines.\n", "* Communicating with remote machines in the same way as with the localhost.\n", "\n", "**Memory-optimized run**\n", "\n", "Memory-optimized run is a feature that is introduced in modelx v0.19. \n", "Running a heaviy model with a large data set consumes a lot of memory. \n", "This feature makes it possible to run the model using less memory.\n", "For this feature, you need 2 runs. The first run is to get a list of *actions* and\n", "the model should be run with a small set of data.\n", "The second run is performed with the entire data set for getting the desired output.\n", "For more about memory-optimized run, see [modelx documentation](https://docs.modelx.io/en/latest/reference/generated/modelx.core.model.Model.execute_actions.html).\n", "\n", "\n", "**Steps in this notebook**\n", "\n", "The demonstration in this notebook involves the following steps:\n", "\n", "* A table of 100,000 model points is loaded.\n", "* A sample model is loaded in this Python process.\n", "* The model is run to generate *actions*. The actions are saved within the model.\n", "* 10 IPython engines are invoked, and a block of 10,000 model points is sent to each engine.\n", "* Each engine loads and runs the saved model with the 10,000 model by executing the actions saved in the model.\n", "* This process receives the results from all the engines and concatinate them.\n", "\n", "\n", "\n", "[ipyparallel]: https://ipyparallel.readthedocs.io/en/latest/\n", "[modelx]: https://docs.modelx.io\n", "\n", "
\n", "\n", "**Note:**\n", "\n", "The techniques above were discussed before in the following blog posts on https://modelx.io.\n", " \n", "\n", "* [_Running modelx in parallel using ipyparallel_](https://modelx.io/blog/2022/03/13/prallel-computing-with-ipyparallel/)\n", "* [_Running a heavy model while saving memory_](https://modelx.io/blog/2022/03/26/running-model-while-saving-memory/)\n", "\n", "
\n", "\n", "## Prerequisites\n", "\n", "To run this notebook, the following packages are required.\n", "\n", "* [modelx] 0.19.1 or newer is required to run this notebook.\n", "* [ipyparallel] 8.2.0 or newer is required to run this notebook.\n", "* `numpy`, `pandas`, `openpyxl`\n", "\n", "If either of the packages above is missing or outdated, install them using either `pip` or `conda` command depending on your Python environment.\n", "\n", "As of March 12, 2022, ipyparallel 8.2.0 is available on [conda-forge](https://anaconda.org/conda-forge/ipyparallel/), but not in [the anaconda package](https://docs.anaconda.com/anaconda/packages/py3.9_win-64/).\n", "So if you're using anaconda, update ipyparallel by `conda install -c conda-forge ipyparallel=8.2.0`.\n", "\n", "Running this notebook consumes 8 to 9 GB memory, so make sure your machine has enough memory space available for the run.\n", "\n", "\n", "## Sample Model and Model Points\n", "\n", "This notebook uses the `CashValue_ME` model in the `savings` library with 100 model points generated by a notebook \n", "`generate_100K_model_points.ipynb` included in the library. \n", "Run th notebook and a model point file named `model_point_table_100K.xlsx` is created.\n", "\n", "\n", "This notebook demonstrates how to run modelx models in parallel using \n", "[ipyparallel].\n", "\n", "This example:\n", "\n", "* launches 10 IPython engines,\n", "* loads 100,000 model points,\n", "* Send a block of 10,000 model points to each engine,\n", "* runs modelx with the 10,000 model points on each engine and\n", "* get results from all the engines and concatinate them.\n", "\n", "\n", "[lifelib]: https://lifelib.io\n", "[modelx]: http://docs.modelx.io\n", " " ] }, { "cell_type": "markdown", "id": "9f658fa9", "metadata": {}, "source": [ "Read the entire model point table into a DataFrame in this process. The table has 100,000 model points. Give `index_col=0` to `pd.read_excel` to use the first column as the DataFrame index. " ] }, { "cell_type": "code", "execution_count": 1, "id": "2e920326", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
spec_idage_at_entrysexpolicy_termpolicy_countsum_assuredduration_mthpremium_ppav_pp_init
policy_id
1A47M202280400008040000
2C29F99997551900009000
3A51F10540900004090000
4B32M156012800001280000
5D28M999945698000012000
..............................
99996A21M103415200001520000
99997D24F999953928000014000
99998B46F157266200006620000
99999A46M153658300005830000
100000B35M15363800006380000
\n", "

100000 rows × 9 columns

\n", "
" ], "text/plain": [ " spec_id age_at_entry sex policy_term policy_count sum_assured \\\n", "policy_id \n", "1 A 47 M 20 22 804000 \n", "2 C 29 F 9999 75 519000 \n", "3 A 51 F 10 5 409000 \n", "4 B 32 M 15 60 128000 \n", "5 D 28 M 9999 45 698000 \n", "... ... ... .. ... ... ... \n", "99996 A 21 M 10 34 152000 \n", "99997 D 24 F 9999 53 928000 \n", "99998 B 46 F 15 72 662000 \n", "99999 A 46 M 15 36 583000 \n", "100000 B 35 M 15 3 638000 \n", "\n", " duration_mth premium_pp av_pp_init \n", "policy_id \n", "1 0 804000 0 \n", "2 0 900 0 \n", "3 0 409000 0 \n", "4 0 128000 0 \n", "5 0 1200 0 \n", "... ... ... ... \n", "99996 0 152000 0 \n", "99997 0 1400 0 \n", "99998 0 662000 0 \n", "99999 0 583000 0 \n", "100000 0 638000 0 \n", "\n", "[100000 rows x 9 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "model_point_all = pd.read_excel('model_point_table_100K.xlsx', index_col=0)\n", "model_point_all" ] }, { "cell_type": "markdown", "id": "0c6dab36", "metadata": {}, "source": [ "Next, read the sample model to generate actions." ] }, { "cell_type": "code", "execution_count": 2, "id": "8f9b45d0", "metadata": {}, "outputs": [], "source": [ "import modelx as mx\n", "model = mx.read_model('CashValue_ME')" ] }, { "cell_type": "markdown", "id": "a084a2af", "metadata": {}, "source": [ "`model_point_table` in the `Projection` space holds the model point table. By default, `CashValue_ME` has 4 model points, so we use this table for generate actions." ] }, { "cell_type": "code", "execution_count": 3, "id": "feba3d73", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
spec_idage_at_entrysexpolicy_termpolicy_countsum_assuredduration_mthpremium_ppav_pp_initaccum_prem_init_pp
poind_id
1A20M10100500000050000000
2B50M20100500000050000000
3C20M99991005000000100000
4D50M99991005000000100000
\n", "
" ], "text/plain": [ " spec_id age_at_entry sex policy_term policy_count sum_assured \\\n", "poind_id \n", "1 A 20 M 10 100 500000 \n", "2 B 50 M 20 100 500000 \n", "3 C 20 M 9999 100 500000 \n", "4 D 50 M 9999 100 500000 \n", "\n", " duration_mth premium_pp av_pp_init accum_prem_init_pp \n", "poind_id \n", "1 0 500000 0 0 \n", "2 0 500000 0 0 \n", "3 0 1000 0 0 \n", "4 0 1000 0 0 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.Projection.model_point_table" ] }, { "cell_type": "markdown", "id": "e8b94479", "metadata": {}, "source": [ "## Generate Actions" ] }, { "cell_type": "markdown", "id": "ef8f9c6b", "metadata": {}, "source": [ "In this example, we want to get the value of `result_pv` in the end, so we give the node of `result_pv` as a target parameter to `generate_actions`.\n", "Then `generate_actions` method of the model returns a list of actions to instruct a memory-optimized run that calculates and preserves the value of `result_pv`.\n", "\n", "`node` method on the `result_pv` Cells creates an ItemNode object representing `result_pv` with no arguments. Since `result_pv` does not have any parameter, this is the only node associated with `result_pv`. Note that `generate_actions` takes a list of target nodes, so \n", "a list is created to enclose the `result_pv` node and passed to `generate_actions` below.\n", "\n", "\n", "The returned list of actions are assigned to a Reference named `actions` in the `Projection` space, and the model is saved as 'CashValue_ME_with_actions' to be later loaded by IPython engines.\n", "\n", "The model used for generating the actions are not needed anymore, so it is closed here." ] }, { "cell_type": "code", "execution_count": 4, "id": "e7cc86b4", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "UserWarning: call stack trace activated\n", "UserWarning: call stack trace deactivated\n" ] } ], "source": [ "model.Projection.actions = model.generate_actions([model.Projection.result_pv.node()])\n", "model.write('CashValue_ME_with_actions')\n", "model.close()" ] }, { "cell_type": "markdown", "id": "4beae058", "metadata": {}, "source": [ "## Start IPython engines" ] }, { "cell_type": "markdown", "id": "78521710", "metadata": {}, "source": [ "Start a `ipyparallel` cluster with 10 engines.`rc` is a `Client` object. To know more about what is done in the cell below, consult with the [ipyparallel] documentation.\n", "\n", "[ipyparallel]: https://ipyparallel.readthedocs.io/en/latest/" ] }, { "cell_type": "code", "execution_count": 5, "id": "cc029c10", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting 10 engines with \n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "039b5bb0327c49a89870559309d7fe8f", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/10 [00:00" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "code = \"\"\"\n", "import modelx as mx\n", "m = mx.read_model('CashValue_ME_with_actions')\n", "m.Projection.model_point_table = model_point\n", "m.execute_actions(m.Projection.actions)\n", "result = m.Projection.result_pv()\n", "\"\"\"\n", "rc[:].execute(code, block=True)" ] }, { "cell_type": "markdown", "id": "b2599f1b", "metadata": {}, "source": [ "## Get the Results\n", "\n", "Get the results back from all the engines and concatinate into one DataFrame." ] }, { "cell_type": "code", "execution_count": 8, "id": "073cd836", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PremiumsDeathSurrenderMaturityExpensesCommissionsInvestment IncomeChange in AVNet Cashflow
policy_id
11.768800e+074.207421e+057.167661e+068.447648e+062.549999e+058.844000e+055.002695e+062.777545e+062.737698e+06
21.728737e+075.399209e+068.199806e+060.000000e+001.372393e+068.643683e+058.088528e+066.038786e+063.501331e+06
32.045000e+062.352243e+046.103083e+051.164191e+064.394721e+041.022500e+052.460592e+051.528495e+051.939911e+05
47.680000e+065.362278e+042.844819e+064.554170e+066.191257e+053.840000e+051.803562e+061.027055e+067.688294e+02
51.388815e+074.410248e+067.023923e+060.000000e+008.270160e+056.944077e+056.925307e+065.242075e+062.615791e+06
..............................
999965.168000e+061.433198e+041.546823e+062.983362e+062.994351e+052.584000e+056.263479e+053.887630e+053.032331e+05
999971.936913e+075.442479e+061.005014e+070.000000e+009.893691e+059.684563e+059.870065e+067.846641e+063.942107e+06
999984.766400e+077.247140e+051.760871e+072.791406e+077.413578e+052.383200e+061.113241e+076.340720e+063.083640e+06
999992.098800e+073.009029e+057.454509e+061.104981e+073.706789e+051.049400e+064.407965e+062.510836e+062.659825e+06
1000001.914000e+061.542257e+047.087322e+051.133148e+063.094749e+049.570000e+044.491574e+052.557838e+051.234236e+05
\n", "

100000 rows × 9 columns

\n", "
" ], "text/plain": [ " Premiums Death Surrender Maturity \\\n", "policy_id \n", "1 1.768800e+07 4.207421e+05 7.167661e+06 8.447648e+06 \n", "2 1.728737e+07 5.399209e+06 8.199806e+06 0.000000e+00 \n", "3 2.045000e+06 2.352243e+04 6.103083e+05 1.164191e+06 \n", "4 7.680000e+06 5.362278e+04 2.844819e+06 4.554170e+06 \n", "5 1.388815e+07 4.410248e+06 7.023923e+06 0.000000e+00 \n", "... ... ... ... ... \n", "99996 5.168000e+06 1.433198e+04 1.546823e+06 2.983362e+06 \n", "99997 1.936913e+07 5.442479e+06 1.005014e+07 0.000000e+00 \n", "99998 4.766400e+07 7.247140e+05 1.760871e+07 2.791406e+07 \n", "99999 2.098800e+07 3.009029e+05 7.454509e+06 1.104981e+07 \n", "100000 1.914000e+06 1.542257e+04 7.087322e+05 1.133148e+06 \n", "\n", " Expenses Commissions Investment Income Change in AV \\\n", "policy_id \n", "1 2.549999e+05 8.844000e+05 5.002695e+06 2.777545e+06 \n", "2 1.372393e+06 8.643683e+05 8.088528e+06 6.038786e+06 \n", "3 4.394721e+04 1.022500e+05 2.460592e+05 1.528495e+05 \n", "4 6.191257e+05 3.840000e+05 1.803562e+06 1.027055e+06 \n", "5 8.270160e+05 6.944077e+05 6.925307e+06 5.242075e+06 \n", "... ... ... ... ... \n", "99996 2.994351e+05 2.584000e+05 6.263479e+05 3.887630e+05 \n", "99997 9.893691e+05 9.684563e+05 9.870065e+06 7.846641e+06 \n", "99998 7.413578e+05 2.383200e+06 1.113241e+07 6.340720e+06 \n", "99999 3.706789e+05 1.049400e+06 4.407965e+06 2.510836e+06 \n", "100000 3.094749e+04 9.570000e+04 4.491574e+05 2.557838e+05 \n", "\n", " Net Cashflow \n", "policy_id \n", "1 2.737698e+06 \n", "2 3.501331e+06 \n", "3 1.939911e+05 \n", "4 7.688294e+02 \n", "5 2.615791e+06 \n", "... ... \n", "99996 3.032331e+05 \n", "99997 3.942107e+06 \n", "99998 3.083640e+06 \n", "99999 2.659825e+06 \n", "100000 1.234236e+05 \n", "\n", "[100000 rows x 9 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = pd.concat(rc[:]['result']) \n", "result" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "nbsphinx": { "execute": "never" } }, "nbformat": 4, "nbformat_minor": 5 }