Skip to content

PFA Document Structure

A PFA document is a JSON/YAML document with additional constraints. The JSON/YAML content describes algorithms, data types, model parameters, and other aspects of the scoring engine. Some structures have no effect on the scoring procedure and are only intended for archival purposes.

Read the Full Specification here

In [1]:
from titus.genpy import PFAEngine

input, output and action

YAML

In [2]:
pfa_yml = """
input: int
output: int
action: input
"""

engine, = PFAEngine.fromYaml(pfa_yml)
engine.action(1)
Out[2]:
1

JSON

In [3]:
pfa_json = """
{
    "input": "int",
    "output": "int",
    "action": "input"
}
"""

engine, = PFAEngine.fromJson(pfa_json)
engine.action(1)
Out[3]:
1

method, zero and merge

PFA supports 3 methods:

  • map
  • emit
  • fold

1. Map

Map method is simply a mathematical function: one input yields one output.

YAML

In [4]:
pfa_yml = """
input: double
output: double
method: map
action:
  - {m.sqrt: input}"""

engine, = PFAEngine.fromYaml(pfa_yml)
print(engine.action(2.0))
1.4142135623730951

JSON

In [5]:
pfa_json = """
{
    "input": "double",
    "output": "double",
    "method": "map",
    "action": [
        {"m.sqrt": "input"}
    ]
}
"""
engine, = PFAEngine.fromJson(pfa_json)
print(engine.action(2.0))
1.4142135623730951

2. Emit

Of the three types of PFA scoring engine (map, emit, and fold), emit requires special attention in scoring. Map and fold engines yield results as the return value of the function (and fold do so cumulatively), but emit engines always return None.

The only way to get results from them is by passing a callback function.

YAML

In [6]:
engine, = PFAEngine.fromYaml('''
input: double
output: double
method: emit
action:
    - if:
        ==: [{"%": [input, 2]}, 0]
      then:
        - emit: input
        - emit: {/: [input, 2]}
''')
 
def newEmit(x):
    print("output:", x)

engine.emit = newEmit
 
for x in range(1, 6):
    print("input:", x)
    engine.action(x)
input: 1
input: 2
output: 2.0
output: 1.0
input: 3
input: 4
output: 4.0
output: 2.0
input: 5

JSON

In [7]:
engine, = PFAEngine.fromJson('''
{
    "input": "double",
    "output": "double",
    "method": "emit",
    "action": [{
        "if": {
            "==": [{
                "%": ["input", 2]
            }, 0]
        },
        "then": [{
            "emit": "input"
        }, {
            "emit": {
                "/": ["input", 2]
            }
        }]
    }]
}
''')

def newEmit(x):
    print("output:", x)

engine.emit = newEmit
 
for x in range(1, 6):
    print("input:", x)
    engine.action(x)
input: 1
input: 2
output: 2.0
output: 1.0
input: 3
input: 4
output: 4.0
output: 2.0
input: 5

3. Fold

Fold method is for aggregation. Rather than waiting till the end of the (potentially infinite) dataset, folding engines return a partial result with each call. The previous partial result becomes available to the next action as a symbol tally. If you are only interested in the total, ignore all but the last output.

In [8]:
engine, = PFAEngine.fromYaml('''
input: double
output: double
method: fold
zero: 0
action:
  - {"-": [input, tally]}
merge:
  - {"+": [tallyOne, tallyTwo]}
''')

print(engine.action(1)) # 1-0 -> tally is now 1 after execution
print(engine.action(2)) # 2-1
print(engine.action(3)) # 3-1
print(engine.action(4)) # 4-2
print(engine.action(5)) # 5-2
1.0
1.0
2.0
2.0
3.0
In [9]:
engine, = PFAEngine.fromYaml('''
input: int
output: string
method: fold
zero: ""
action:
  - {s.concat: [tally, {s.int: input}]}
merge:
  - {s.concat: [tallyOne, tallyTwo]}
''')

print(engine.action(1))
print(engine.action(2))
print(engine.action(3))
print(engine.action(4))
print(engine.action(5))
1
12
123
1234
12345

The zero and merge sections are required for fold engines, and must not be present in map or emit engines.

begin, end, fncs, ranseed

In [10]:
pfa = """
{
    "input": "string",
    "output": {"type": "array", "items": "string"},
    "cells": {
       "accumulate": {"type": {"type": "array", "items": "string"},
                      "init": []}},
    "method": "map",
    "begin":
       {"log": {"rand.gaussian": [0.0, 1.0]}},
    "action":
       {"cell": "accumulate",
        "to": {"fcn": "u.addone", "fill": {"newitem": "input"}}},
    "end":
       {"log": {"rand.choice": {"cell": "accumulate"}}},
    "fcns":
       {"addone":
         {"params": [{"old": {"type": "array", "items": "string"}},
                     {"newitem": "string"}],
          "ret": {"type": "array", "items": "string"},
          "do": {"a.append": ["old", "newitem"]}}},
    "randseed": 12345,
    "name": "ExampleScoringEngine",
    "version": 1,
    "doc": "Doesn't do much.",
    "metadata": {"does": "notmuch"},
    "options": {"timeout": 1000}
}
"""

engine, = PFAEngine.fromJson(pfa)
engine.action("abc")
Out[10]:
['abc']

Fibonacci in PFA (Recursion)

In [11]:
pfa = """
{
    "input": "int",
    "output": "int",
    "method": "map",
    "action": [{"u.fib": ["input"]}],
    "fcns": 
    {
        "fib": 
        {
            "params": [{"n": "int"}], 
            "ret": "int", 
            "do":
            {
              "cond":[ 
              {"if": {"==": ["n", 0]}, "then": 0},
              {"if": {"==": ["n", 1]}, "then": 1}],
              "else": {"+": [
                {"u.fib": [{"-": ["n", 1]}]},
                {"u.fib": [{"-": ["n", 2]}]}
              ]}
            }
        }
    }
}
"""

engine, = PFAEngine.fromJson(pfa)
In [12]:
engine.action(12)
Out[12]:
144

Fibonacci in PFA (Loops)

In [13]:
pfa = """
{
"input": "int",
"output": "int",
"method": "map",
"action": [{"u.fib": ["input"]}],
"fcns": {
    "fib": {
        "params": [{"n": "int"}], 
        "ret": "int", 
        "do": [
                {"let": {"now": 0, "next": 1}},
                {"for": {"i": "n"}, 
                "while": {">": ["i", 1]}, 
                "step": {"i": {"-": ["i", 1]}},
                "do": 
                    [
                    {"let": {"tmp": {"+": ["now", "next"]}}},
                    {"set": {"now": "next",
                            "next": "tmp"}}
                    ]
                 },
                {"if": {"==": ["n", 0]}, "then": 0, "else": "next"}
            ]
        }
    }
}
"""

engine, = PFAEngine.fromJson(pfa)
In [14]:
engine.action(12)
Out[14]:
144

begin and end

In some cases, you may want to perform special actions at the beginning and end of a data stream.

PFA has begin and end routines for this purpose.

The begin and end routines do not accept input and do not return output; they only manipulate persistent storage.

In [15]:
engine.begin()
engine.end()

Locator Marks & Names

Any JSON object in a PFA document may include "@" as a string-valued field. This string is used to provide a line number from the original source file so that errors can be traced back to their source.

Following Avro convention, names of PFA identifiers start with [A-Za-z_] & subsequently contain only [A-Za-z0-9_]