Format of a PFA document
A PFA document is a JSON document with additional constraints. The JSON content describes algorithms, data types, model parameters, and other aspects of the scoring engine. Some structures have no effect on the scoring procedure and are only intended for archival purposes.
The outermost structure of a PFA document is a JSON object, a collection of key-value pairs surrounded by curly brackets. We refer to these as the top-level fields: the field name (key) determines the allowed content (value). Some of those contents are recursively defined, so a PFA document may have arbitrary depth.
The following are all of the allowed top-level fields.
Name | Default | Constraints | Description |
---|---|---|---|
input |
required | Avro type | Input data type. Input data that do not conform to this schema should be rejected before sending it to PFA. |
output |
required | Avro type | Output data type. A valid PFA file will always produce data of this type (or an exception). |
begin |
[] | expressions | Algorithm to be executed before evaluating any data. See tutorial. |
action |
required | expressions | Algorithm to be executed for each input datum. See tutorial. |
end |
[] | expressions | Algorithm to be executed after evaluating all data (if such a time exists). See tutorial. |
fcns |
JSON object of functions | Named user-defined functions that can be called in any expression (including other functions). | |
cells |
JSON object of cells | Persistent storage containing model parameters, intermediate results, etc. See tutorial. | |
pools |
JSON object of pools | Like cells, but items can be created and destroyed at runtime and concurrent access extends only to a pool item, not the entire pool. See tutorial. | |
method |
"map" | "map", "emit", or "fold" | How results from the scoring engine will be served to the environment. See tutorial. |
zero |
required when method is "fold" | JSON matching type output | The starting value for a "fold" tally. See tutorial. |
merge |
required when method is "fold" | expressions | Algorithm to combine two partial tallies. |
randseed |
optional | integer | Global seed used to generate all random numbers. Multiple scoring engines derived from the same PFA file have different seeds generated from the global one. |
name |
none | name | An optional name for the scoring engine. |
version |
optional | integer | Sequential version number for the model. |
doc |
optional | string | Documentation string for archival purposes. |
metadata |
JSON object of strings | Computer-readable documentation for archival purposes. | |
options |
JSON object; value types depend on option name | Initialization or runtime options to customize implementation (e.g. optimization switches). May be overridden or ignored by PFA consumer. |
Example 1¶
The smallest possible PFA document (by number of bytes) is
1 2 3 4 5 |
|
Example 2¶
The following example uses every kind of top-level field except pools, zero, and merge (which are similar to other fields and/or are only applicable to “fold” engines).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
Source