Tokenizer usage examplesΒΆ
The Tokenizer
class allows transforming arbitrary inputs into integer classes
[1]:
from nfp.preprocessing import Tokenizer
[2]:
tokenizer = Tokenizer()
tokenizer.train = True
The 0
and 1
classes are reserved for the <MASK>
and missing labels, respectively.
[3]:
[tokenizer(item) for item in ['A', 'B', 'C', 'A']]
[3]:
[2, 3, 4, 2]
When train is set to False
, unknown items are assigned the missing label
[4]:
tokenizer.train = False
[tokenizer(item) for item in ['A', 'D']]
[4]:
[2, 1]
The total number of seen classes is available from the num_classes
property, useful to initializing embedding layer weights.
[5]:
tokenizer.num_classes
[5]:
4