PFA Document Structure

A PFA document is a JSON/YAML document with additional constraints. The JSON/YAML content describes algorithms, data types, model parameters, and other aspects of the scoring engine. Some structures have no effect on the scoring procedure and are only intended for archival purposes.

Read the Full Specification here

In [1]:

from titus.genpy import PFAEngine

input, output and action¶

YAML

In [2]:

pfa_yml = """
input: int
output: int
action: input
"""

engine, = PFAEngine.fromYaml(pfa_yml)
engine.action(1)

Out[2]:

JSON

In [3]:

pfa_json = """
{
    "input": "int",
    "output": "int",
    "action": "input"
}
"""

engine, = PFAEngine.fromJson(pfa_json)
engine.action(1)

Out[3]:

method, zero and merge¶

PFA supports 3 methods:

map
emit
fold

1. Map¶

Map method is simply a mathematical function: one input yields one output.

YAML

In [4]:

pfa_yml = """
input: double
output: double
method: map
action:
  - {m.sqrt: input}"""

engine, = PFAEngine.fromYaml(pfa_yml)
print(engine.action(2.0))

1.4142135623730951

JSON

In [5]:

pfa_json = """
{
    "input": "double",
    "output": "double",
    "method": "map",
    "action": [
        {"m.sqrt": "input"}
    ]
}
"""
engine, = PFAEngine.fromJson(pfa_json)
print(engine.action(2.0))

1.4142135623730951

2. Emit¶

Of the three types of PFA scoring engine (map, emit, and fold), emit requires special attention in scoring. Map and fold engines yield results as the return value of the function (and fold do so cumulatively), but emit engines always return None.

The only way to get results from them is by passing a callback function.

YAML

In [6]:

engine, = PFAEngine.fromYaml('''
input: double
output: double
method: emit
action:
    - if:
        ==: [{"%": [input, 2]}, 0]
      then:
        - emit: input
        - emit: {/: [input, 2]}
''')
 
def newEmit(x):
    print("output:", x)

engine.emit = newEmit
 
for x in range(1, 6):
    print("input:", x)
    engine.action(x)

input: 1
input: 2
output: 2.0
output: 1.0
input: 3
input: 4
output: 4.0
output: 2.0
input: 5

JSON

In [7]:

engine, = PFAEngine.fromJson('''
{
    "input": "double",
    "output": "double",
    "method": "emit",
    "action": [{
        "if": {
            "==": [{
                "%": ["input", 2]
            }, 0]
        },
        "then": [{
            "emit": "input"
        }, {
            "emit": {
                "/": ["input", 2]
            }
        }]
    }]
}
''')

def newEmit(x):
    print("output:", x)

engine.emit = newEmit
 
for x in range(1, 6):
    print("input:", x)
    engine.action(x)

input: 1
input: 2
output: 2.0
output: 1.0
input: 3
input: 4
output: 4.0
output: 2.0
input: 5

3. Fold¶

Fold method is for aggregation. Rather than waiting till the end of the (potentially infinite) dataset, folding engines return a partial result with each call. The previous partial result becomes available to the next action as a symbol tally. If you are only interested in the total, ignore all but the last output.

In [8]:

engine, = PFAEngine.fromYaml('''
input: double
output: double
method: fold
zero: 0
action:
  - {"-": [input, tally]}
merge:
  - {"+": [tallyOne, tallyTwo]}
''')

print(engine.action(1)) # 1-0 -> tally is now 1 after execution
print(engine.action(2)) # 2-1
print(engine.action(3)) # 3-1
print(engine.action(4)) # 4-2
print(engine.action(5)) # 5-2

1.0
1.0
2.0
2.0
3.0

In [9]:

engine, = PFAEngine.fromYaml('''
input: int
output: string
method: fold
zero: ""
action:
  - {s.concat: [tally, {s.int: input}]}
merge:
  - {s.concat: [tallyOne, tallyTwo]}
''')

print(engine.action(1))
print(engine.action(2))
print(engine.action(3))
print(engine.action(4))
print(engine.action(5))

The zero and merge sections are required for fold engines, and must not be present in map or emit engines.

begin, end, fncs, ranseed¶

In [10]:

pfa = """
{
    "input": "string",
    "output": {"type": "array", "items": "string"},
    "cells": {
       "accumulate": {"type": {"type": "array", "items": "string"},
                      "init": []}},
    "method": "map",
    "begin":
       {"log": {"rand.gaussian": [0.0, 1.0]}},
    "action":
       {"cell": "accumulate",
        "to": {"fcn": "u.addone", "fill": {"newitem": "input"}}},
    "end":
       {"log": {"rand.choice": {"cell": "accumulate"}}},
    "fcns":
       {"addone":
         {"params": [{"old": {"type": "array", "items": "string"}},
                     {"newitem": "string"}],
          "ret": {"type": "array", "items": "string"},
          "do": {"a.append": ["old", "newitem"]}}},
    "randseed": 12345,
    "name": "ExampleScoringEngine",
    "version": 1,
    "doc": "Doesn't do much.",
    "metadata": {"does": "notmuch"},
    "options": {"timeout": 1000}
}
"""

engine, = PFAEngine.fromJson(pfa)
engine.action("abc")

Out[10]:

['abc']

Fibonacci in PFA (Recursion)¶

In [11]:

pfa = """
{
    "input": "int",
    "output": "int",
    "method": "map",
    "action": [{"u.fib": ["input"]}],
    "fcns": 
    {
        "fib": 
        {
            "params": [{"n": "int"}], 
            "ret": "int", 
            "do":
            {
              "cond":[ 
              {"if": {"==": ["n", 0]}, "then": 0},
              {"if": {"==": ["n", 1]}, "then": 1}],
              "else": {"+": [
                {"u.fib": [{"-": ["n", 1]}]},
                {"u.fib": [{"-": ["n", 2]}]}
              ]}
            }
        }
    }
}
"""

engine, = PFAEngine.fromJson(pfa)

In [12]:

engine.action(12)

Out[12]:

Fibonacci in PFA (Loops)¶

In [13]:

pfa = """
{
"input": "int",
"output": "int",
"method": "map",
"action": [{"u.fib": ["input"]}],
"fcns": {
    "fib": {
        "params": [{"n": "int"}], 
        "ret": "int", 
        "do": [
                {"let": {"now": 0, "next": 1}},
                {"for": {"i": "n"}, 
                "while": {">": ["i", 1]}, 
                "step": {"i": {"-": ["i", 1]}},
                "do": 
                    [
                    {"let": {"tmp": {"+": ["now", "next"]}}},
                    {"set": {"now": "next",
                            "next": "tmp"}}
                    ]
                 },
                {"if": {"==": ["n", 0]}, "then": 0, "else": "next"}
            ]
        }
    }
}
"""

engine, = PFAEngine.fromJson(pfa)

In [14]:

engine.action(12)

Out[14]:

begin and end¶

In some cases, you may want to perform special actions at the beginning and end of a data stream.

PFA has begin and end routines for this purpose.

The begin and end routines do not accept input and do not return output; they only manipulate persistent storage.

In [15]:

engine.begin()
engine.end()

Locator Marks & Names¶

Any JSON object in a PFA document may include "@" as a string-valued field. This string is used to provide a line number from the original source file so that errors can be traced back to their source.

Following Avro convention, names of PFA identifiers start with [A-Za-z_] & subsequently contain only [A-Za-z0-9_]