Generating model points with duration

This notebook is modified from generate_model_points.ipynb and generates the sample model points for the BasicTerm_SE and BasicTerm_ME model, by using random numbers. The model ponints have the duration_mth attribute, which indicates how many months elapsed from the issue of each model point to time 0. Negative duration_mth indicate future new business.

Columns:

  • point_id: Model point identifier

  • age_at_entry: Issue age. The samples are distributed uniformly from 20 to 59.

  • sex: “M” or “F” to indicate policy holder’s sex. Not used by default.

  • policy_term: Policy term in years. The samples are evenly distriubted among 10, 15 and 20.

  • policy_count: The number of policies. Uniformly distributed from 0 to 100.

  • sum_assured: Sum assured. The samples are uniformly distributed from 10,000 to 1,000,000.

  • duration_mth: Months elapsed from the issue til t=0. Negative values indicate future new business. Uniformly distributed from -36 to 12 times policy_term.

Number of model points:

  • 10000

[68]:
import numpy as np
from numpy.random import default_rng  # Requires NumPy 1.17 or newer

rng = default_rng(12345)

# Number of Model Points
MPCount = 10000

# Issue Age (Integer): 20 - 59 year old
age_at_entry = rng.integers(low=20, high=60, size=MPCount)

# Sex (Char)
Sex = [
    "M",
    "F"
]

sex = np.fromiter(map(lambda i: Sex[i], rng.integers(low=0, high=len(Sex), size=MPCount)), np.dtype('<U1'))

# Policy Term (Integer): 10, 15, 20
policy_term = rng.integers(low=0, high=3, size=MPCount) * 5 + 10


# Sum Assured (Float): 10000 - 1000000
sum_assured = np.round((1000000 - 10000) * rng.random(size=MPCount) + 10000, -3)

# Duration in month (Int): -36 < Duration(mth) < Policy Term in month
duration_mth = np.rint((policy_term + 3) * 12 * rng.random(size=MPCount) - 36).astype(int)

# Policy Count (Integer): 1
policy_count = np.rint(100 * rng.random(size=MPCount)).astype(int)
[69]:
import pandas as pd

attrs = [
    "age_at_entry",
    "sex",
    "policy_term",
    "policy_count",
    "sum_assured",
    "duration_mth"
]

data = [
    age_at_entry,
    sex,
    policy_term,
    policy_count,
    sum_assured,
    duration_mth
]

model_point_table = pd.DataFrame(dict(zip(attrs, data)), index=range(1, MPCount+1))
model_point_table.index.name = "policy_id"
model_point_table
[69]:
age_at_entry sex policy_term policy_count sum_assured duration_mth
policy_id
1 47 M 10 86 622000.0 1
2 29 M 20 56 752000.0 210
3 51 F 10 83 799000.0 15
4 32 F 20 72 422000.0 125
5 28 M 15 99 605000.0 55
... ... ... ... ... ... ...
9996 47 M 20 25 827000.0 157
9997 30 M 15 81 826000.0 168
9998 45 F 20 10 783000.0 146
9999 39 M 20 9 302000.0 11
10000 22 F 15 18 576000.0 166

10000 rows × 6 columns

[70]:
model_point_table.to_excel("model_point_table.xlsx")