Generating model points for ASL¶
This notebook is modified from generate_model_points.ipynb and generates the sample model points for the BasicTermASL_ME
model using random numbers.
Columns:
point_id
: Model point identifierage_at_entry
: Issue age. The samples are distributed uniformly from 20 to 59.sex
: “M” or “F” to indicate policy holder’s sex. Not used by default.policy_term
: Policy term in years. The samples are evenly distriubted among 10, 15 and 20.policy_count
: The number of policies. Uniformly distributed from 0 to 100.sum_assured
: Sum assured. The samples are uniformly distributed from 10,000 to 1,000,000.issue_date
: Issue date. Dates are pandas.Timestamp objects.payment_freq
: Premium payment frequency as the number of payments in a year.payment_term
: Payment term in years. Set equal to or shorter than the policy term.
Number of model points:
10000
[1]:
import numpy as np
from numpy.random import default_rng # Requires NumPy 1.17 or newer
import pandas as pd
rng = default_rng(12345)
# Number of Model Points
point_size = 10000
# Issue Age (Integer): 20 - 59 year old
age_at_entry = rng.integers(low=20, high=60, size=point_size)
# Sex (Char)
Sex = [
"M",
"F"
]
sex = np.fromiter(map(lambda i: Sex[i], rng.integers(low=0, high=len(Sex), size=point_size)), np.dtype('<U1'))
# Policy Term (Integer): 10, 15, 20
policy_term = rng.integers(low=0, high=3, size=point_size) * 5 + 10
# Sum Assured (Float): 10000 - 1000000
sum_assured = np.round((1000000 - 10000) * rng.random(size=point_size) + 10000, -3)
# Issue Date:
# For existing policies, issue dates are set so that the policies are in-force on 2022/1/1.
# For new business policie,issue dates are within 3 years from 2022/1/1.
dur_offset = (policy_term + 3) * 12 * rng.random(size=point_size) - 36
mth_offset = (dur_offset // 1).astype(int)
day_offset = 1 - (dur_offset - mth_offset)
issue_mth = pd.Series([pd.Period('2022-01', 'M')] * point_size) -1 - mth_offset
day_offset = (np.ceil(issue_mth.dt.days_in_month * day_offset)).astype(int)
issue_date = (issue_mth.dt.to_timestamp().dt.to_period('D') -1 + day_offset).dt.to_timestamp().to_numpy()
# Policy Count (Integer):
policy_count = np.rint(100 * rng.random(size=point_size)).astype(int)
# Payment Frequency
payment_freq = rng.choice([1, 2, 12], size=point_size)
# Premium Payment Term
short_paidup = pd.Series(rng.choice([True, False], size=point_size))
polterm = pd.Series(policy_term)
payment_term = polterm.mask(short_paidup & (polterm==10), 5).mask(short_paidup & (polterm>=15), 10).to_numpy()
[2]:
import pandas as pd
attrs = [
"age_at_entry",
"sex",
"policy_term",
"policy_count",
"sum_assured",
"issue_date",
"payment_freq",
"payment_term"
]
data = [
age_at_entry,
sex,
policy_term,
policy_count,
sum_assured,
issue_date,
payment_freq,
payment_term
]
model_point_table = pd.DataFrame(dict(zip(attrs, data)), index=range(1, point_size+1))
model_point_table.index.name = "policy_id"
model_point_table
[2]:
age_at_entry | sex | policy_term | policy_count | sum_assured | issue_date | payment_freq | payment_term | |
---|---|---|---|---|---|---|---|---|
policy_id | ||||||||
1 | 47 | M | 10 | 86 | 622000.0 | 2021-12-15 | 1 | 5 |
2 | 29 | M | 20 | 56 | 752000.0 | 2004-07-02 | 2 | 20 |
3 | 51 | F | 10 | 83 | 799000.0 | 2020-10-02 | 12 | 10 |
4 | 32 | F | 20 | 72 | 422000.0 | 2011-08-05 | 1 | 10 |
5 | 28 | M | 15 | 99 | 605000.0 | 2017-05-22 | 2 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
9996 | 47 | M | 20 | 25 | 827000.0 | 2008-12-01 | 1 | 10 |
9997 | 30 | M | 15 | 81 | 826000.0 | 2008-01-13 | 1 | 15 |
9998 | 45 | F | 20 | 10 | 783000.0 | 2009-11-07 | 2 | 10 |
9999 | 39 | M | 20 | 9 | 302000.0 | 2021-01-22 | 12 | 10 |
10000 | 22 | F | 15 | 18 | 576000.0 | 2008-03-16 | 2 | 10 |
10000 rows × 8 columns
[3]:
model_point_table.to_excel("model_point_table.xlsx")