Tokenizer usage examplesΒΆ
The Tokenizer class allows transforming arbitrary inputs into integer classes
[1]:
from nfp.preprocessing import Tokenizer
[2]:
tokenizer = Tokenizer()
tokenizer.train = True
The 0 and 1 classes are reserved for the <MASK> and missing labels, respectively.
[3]:
[tokenizer(item) for item in ['A', 'B', 'C', 'A']]
[3]:
[2, 3, 4, 2]
When train is set to False, unknown items are assigned the missing label
[4]:
tokenizer.train = False
[tokenizer(item) for item in ['A', 'D']]
[4]:
[2, 1]
The total number of seen classes is available from the num_classes property, useful to initializing embedding layer weights.
[5]:
tokenizer.num_classes
[5]:
4