Generating model points for ASL¶

This notebook is modified from generate_model_points.ipynb and generates the sample model points for the BasicTermASL_ME model using random numbers.

Columns:

point_id: Model point identifier
age_at_entry: Issue age. The samples are distributed uniformly from 20 to 59.
sex: “M” or “F” to indicate policy holder’s sex. Not used by default.
policy_term: Policy term in years. The samples are evenly distriubted among 10, 15 and 20.
policy_count: The number of policies. Uniformly distributed from 0 to 100.
sum_assured: Sum assured. The samples are uniformly distributed from 10,000 to 1,000,000.
issue_date: Issue date. Dates are pandas.Timestamp objects.
payment_freq: Premium payment frequency as the number of payments in a year.
payment_term: Payment term in years. Set equal to or shorter than the policy term.

Number of model points:

10000

[1]:

import numpy as np
from numpy.random import default_rng  # Requires NumPy 1.17 or newer
import pandas as pd

rng = default_rng(12345)

# Number of Model Points
point_size = 10000

# Issue Age (Integer): 20 - 59 year old
age_at_entry = rng.integers(low=20, high=60, size=point_size)

# Sex (Char)
Sex = [
    "M",
    "F"
]

sex = np.fromiter(map(lambda i: Sex[i], rng.integers(low=0, high=len(Sex), size=point_size)), np.dtype('<U1'))

# Policy Term (Integer): 10, 15, 20
policy_term = rng.integers(low=0, high=3, size=point_size) * 5 + 10


# Sum Assured (Float): 10000 - 1000000
sum_assured = np.round((1000000 - 10000) * rng.random(size=point_size) + 10000, -3)

# Issue Date:
# For existing policies, issue dates are set so that the policies are in-force on 2022/1/1.
# For new business policie,issue dates are within 3 years from 2022/1/1.

dur_offset = (policy_term + 3) * 12 * rng.random(size=point_size) - 36
mth_offset = (dur_offset // 1).astype(int)
day_offset = 1 - (dur_offset - mth_offset)
issue_mth = pd.Series([pd.Period('2022-01', 'M')] * point_size) -1 - mth_offset
day_offset = (np.ceil(issue_mth.dt.days_in_month * day_offset)).astype(int)
issue_date = (issue_mth.dt.to_timestamp().dt.to_period('D') -1 + day_offset).dt.to_timestamp().to_numpy()

# Policy Count (Integer):
policy_count = np.rint(100 * rng.random(size=point_size)).astype(int)

# Payment Frequency
payment_freq = rng.choice([1, 2, 12], size=point_size)

# Premium Payment Term
short_paidup = pd.Series(rng.choice([True, False], size=point_size))
polterm = pd.Series(policy_term)
payment_term = polterm.mask(short_paidup & (polterm==10), 5).mask(short_paidup & (polterm>=15), 10).to_numpy()

[2]:

import pandas as pd

attrs = [
    "age_at_entry",
    "sex",
    "policy_term",
    "policy_count",
    "sum_assured",
    "issue_date",
    "payment_freq",
    "payment_term"
]

data = [
    age_at_entry,
    sex,
    policy_term,
    policy_count,
    sum_assured,
    issue_date,
    payment_freq,
    payment_term
]

model_point_table = pd.DataFrame(dict(zip(attrs, data)), index=range(1, point_size+1))
model_point_table.index.name = "policy_id"
model_point_table

[2]:

	age_at_entry	sex	policy_term	policy_count	sum_assured	issue_date	payment_freq	payment_term
policy_id
1	47	M	10	86	622000.0	2021-12-15	1	5
2	29	M	20	56	752000.0	2004-07-02	2	20
3	51	F	10	83	799000.0	2020-10-02	12	10
4	32	F	20	72	422000.0	2011-08-05	1	10
5	28	M	15	99	605000.0	2017-05-22	2	10
...	...	...	...	...	...	...	...	...
9996	47	M	20	25	827000.0	2008-12-01	1	10
9997	30	M	15	81	826000.0	2008-01-13	1	15
9998	45	F	20	10	783000.0	2009-11-07	2	10
9999	39	M	20	9	302000.0	2021-01-22	12	10
10000	22	F	15	18	576000.0	2008-03-16	2	10

10000 rows × 8 columns

[3]:

model_point_table.to_excel("model_point_table.xlsx")

Generating model points with duration

Creating premium table