Generating seriatim model points for cluster analysis example¶

This notebook is modified from generate_model_points_with_duration.ipynb in the basiclife library and generates the seriatim policies for the example performed by the cluster_model_points.ipynb notebook. The modifications are:

policy_count is set to 1 for all the model points.
duration_mth is modified to be positive, i.e. all model points are existing policies.

Columns:

point_id: Model point identifier
age_at_entry: Issue age. The samples are distributed uniformly from 20 to 59.
sex: “M” or “F” to indicate policy holder’s sex. Not used.
policy_term: Policy term in years. The samples are evenly distriubted among 10, 15 and 20.
policy_count: The number of policies. Uniformly distributed from 0 to 100.
sum_assured: Sum assured. The samples are uniformly distributed from 10,000 to 1,000,000.
duration_mth: Months elapsed from the issue til t=0. Uniformly distributed from 1 to 12 times policy_term - 1.

Number of model points:

10000

[35]:

import numpy as np
from numpy.random import default_rng  # Requires NumPy 1.17 or newer

rng = default_rng(12345)

# Number of Model Points
MPCount = 10000

# Issue Age (Integer): 20 - 59 year old
age_at_entry = rng.integers(low=20, high=60, size=MPCount)

# Sex (Char)
Sex = [
    "M",
    "F"
]

sex = np.fromiter(map(lambda i: Sex[i], rng.integers(low=0, high=len(Sex), size=MPCount)), np.dtype('<U1'))

# Policy Term (Integer): 10, 15, 20
policy_term = rng.integers(low=0, high=3, size=MPCount) * 5 + 10


# Sum Assured (Float): 10000 - 1000000
sum_assured = np.round((1000000 - 10000) * rng.random(size=MPCount) + 10000, -3)

# Duration in month (Int): 0 < Duration(mth) < Policy Term in month
duration_mth = np.floor((policy_term * 12 - 1) * rng.random(size=MPCount)).astype(int) + 1

# Policy Count (Integer): 1
policy_count = 1

[36]:

import pandas as pd

attrs = [
    "age_at_entry",
    "sex",
    "policy_term",
    "policy_count",
    "sum_assured",
    "duration_mth"
]

data = [
    age_at_entry,
    sex,
    policy_term,
    policy_count,
    sum_assured,
    duration_mth
]

model_point_table = pd.DataFrame(dict(zip(attrs, data)), index=range(1, MPCount+1))
model_point_table.index.name = "policy_id"
model_point_table

[36]:

	age_at_entry	sex	policy_term	policy_count	sum_assured	duration_mth
policy_id
1	47	M	10	1	622000.0	28
2	29	M	20	1	752000.0	213
3	51	F	10	1	799000.0	39
4	32	F	20	1	422000.0	140
5	28	M	15	1	605000.0	76
...	...	...	...	...	...	...
9996	47	M	20	1	827000.0	168
9997	30	M	15	1	826000.0	169
9998	45	F	20	1	783000.0	158
9999	39	M	20	1	302000.0	41
10000	22	F	15	1	576000.0	167

10000 rows × 6 columns

Selecting model points by cluster analysis

Project fastlife