Generating seriatim model points for cluster analysis example¶
This notebook is modified from generate_model_points_with_duration.ipynb in the basiclife library and generates the seriatim policies for the example performed by the cluster_model_points.ipynb notebook. The modifications are:
policy_count
is set to 1 for all the model points.duration_mth
is modified to be positive, i.e. all model points are existing policies.
Columns:
point_id
: Model point identifierage_at_entry
: Issue age. The samples are distributed uniformly from 20 to 59.sex
: “M” or “F” to indicate policy holder’s sex. Not used.policy_term
: Policy term in years. The samples are evenly distriubted among 10, 15 and 20.policy_count
: The number of policies. Uniformly distributed from 0 to 100.sum_assured
: Sum assured. The samples are uniformly distributed from 10,000 to 1,000,000.duration_mth
: Months elapsed from the issue til t=0. Uniformly distributed from 1 to 12 timespolicy_term
- 1.
Number of model points:
10000
[35]:
import numpy as np
from numpy.random import default_rng # Requires NumPy 1.17 or newer
rng = default_rng(12345)
# Number of Model Points
MPCount = 10000
# Issue Age (Integer): 20 - 59 year old
age_at_entry = rng.integers(low=20, high=60, size=MPCount)
# Sex (Char)
Sex = [
"M",
"F"
]
sex = np.fromiter(map(lambda i: Sex[i], rng.integers(low=0, high=len(Sex), size=MPCount)), np.dtype('<U1'))
# Policy Term (Integer): 10, 15, 20
policy_term = rng.integers(low=0, high=3, size=MPCount) * 5 + 10
# Sum Assured (Float): 10000 - 1000000
sum_assured = np.round((1000000 - 10000) * rng.random(size=MPCount) + 10000, -3)
# Duration in month (Int): 0 < Duration(mth) < Policy Term in month
duration_mth = np.floor((policy_term * 12 - 1) * rng.random(size=MPCount)).astype(int) + 1
# Policy Count (Integer): 1
policy_count = 1
[36]:
import pandas as pd
attrs = [
"age_at_entry",
"sex",
"policy_term",
"policy_count",
"sum_assured",
"duration_mth"
]
data = [
age_at_entry,
sex,
policy_term,
policy_count,
sum_assured,
duration_mth
]
model_point_table = pd.DataFrame(dict(zip(attrs, data)), index=range(1, MPCount+1))
model_point_table.index.name = "policy_id"
model_point_table
[36]:
age_at_entry | sex | policy_term | policy_count | sum_assured | duration_mth | |
---|---|---|---|---|---|---|
policy_id | ||||||
1 | 47 | M | 10 | 1 | 622000.0 | 28 |
2 | 29 | M | 20 | 1 | 752000.0 | 213 |
3 | 51 | F | 10 | 1 | 799000.0 | 39 |
4 | 32 | F | 20 | 1 | 422000.0 | 140 |
5 | 28 | M | 15 | 1 | 605000.0 | 76 |
... | ... | ... | ... | ... | ... | ... |
9996 | 47 | M | 20 | 1 | 827000.0 | 168 |
9997 | 30 | M | 15 | 1 | 826000.0 | 169 |
9998 | 45 | F | 20 | 1 | 783000.0 | 158 |
9999 | 39 | M | 20 | 1 | 302000.0 | 41 |
10000 | 22 | F | 15 | 1 | 576000.0 | 167 |
10000 rows × 6 columns