[stack] peg: a lazy, non-deterministic concatenative language

Discussion:

John Nowak

2012-04-16 14:47:27 UTC

Permalink

https://github.com/HackerFoo/peg

Not mine.

- jn

William Tanksley, Jr

2012-04-17 12:26:43 UTC

Permalink

Post by John Nowak
https://github.com/HackerFoo/peg
Not mine.

Amazing... This is creative. He's parsing right to left, and executing
only what's needed to evaluate the last word. In cases where
evaluation is ambiguous, he chooses "nondeterministically" (I'm not
sure what that means). If execution fails, he backtracks to a previous
ambiguous point. Words are provided to force failures.

The language uses stack manipulation words, but I'm not sure that it's
actually a stack. Anyone have any discussion on that point?

Post by John Nowak
- jn

-Wm

Don Groves

2012-04-17 12:36:20 UTC

Permalink

Agreed. Peg is the most interesting variant I've seen in quite a while.
I haven't looked deeply yet but it seems to have an Icon flavor to it.

Of course, any concatentative language doesn't have to use a stack,
it's just most convenient.
--
don

Post by William Tanksley, Jr

Post by John Nowak
https://github.com/HackerFoo/peg
Not mine.

Amazing... This is creative. He's parsing right to left, and executing
only what's needed to evaluate the last word. In cases where
evaluation is ambiguous, he chooses "nondeterministically" (I'm not
sure what that means). If execution fails, he backtracks to a previous
ambiguous point. Words are provided to force failures.
The language uses stack manipulation words, but I'm not sure that it's
actually a stack. Anyone have any discussion on that point?

Post by John Nowak
- jn

-Wm

[Non-text portions of this message have been removed]

------------------------------------

Yahoo! Groups Links

<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/concatenative/

<*> Your email settings:
Individual Email | Traditional

<*> To change settings online go to:
http://groups.yahoo.com/group/concatenative/join
(Yahoo! ID required)

<*> To change settings via email:
concatenative-***@yahoogroups.com
concatenative-***@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
concatenative-***@yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/

William Tanksley, Jr

2012-04-17 14:48:21 UTC

Permalink

Post by Don Groves
Of course, any concatentative language doesn't have to use a stack,
it's just most convenient.

"Convenient" is kind of vague. A stack is the only parameter-passing
data structure we know of that's computationally complete/Turing
equivalent and doesn't require application. You can also make a
concatenative language using a single integer accumulator, for
example, and it's simple to implement and understand (but it's
ridiculously weak). Since a stack is the only one we know about it's
of course the most convenient one we know about... But what ones don't
we know about?

Here's another question: is peg _formally_ concatenative? That is,
given two valid programs, is their concatenation also a valid program?
Are the semantics of the resulting program the composition of the
semantics of the original two?

I think the answer is "no". A program "a1" and "a2", both of which
deterministically execute something like "False assert" or an
undefined word (which means that they always halt). The concatenation
"a1 a2" will perform (deterministically) the semantics of a2, but
never a1.

This doesn't make "peg" a bad language... But it does suggest my
"formal" rule for testing concatenativity wants a weaker variant.
Another example of the need for a weakening is words that perform
control flow alteration, like "RETURN" and "call/cc", or Forth's R>
and EXIT.

For "peg" itself, it's easy to see that concatenativity DOES hold for
programs where the parse "continues" past the beginning of the
program. For Forth, concatenativity holds when certain words in the
dictionary are not used. Rice's theorem will tangle us up for both
languages, but in general it's easy to see when a Forth program is
concatenative. Parsing a peg program MIGHT be possible to explode to
O(2^n), but I think it'll always terminate.

Post by Don Groves
don

-Wm

Stephen De Gabrielle

2012-04-17 15:46:08 UTC

Permalink

Post by William Tanksley, Jr
**

Post by Don Groves
Of course, any concatentative language doesn't have to use a stack,
it's just most convenient.

possibility of a concatenative language that operated on a 'graph-structured
stack'? (and if such a thing would be useful)

Post by William Tanksley, Jr
Here's another question: is peg _formally_ concatenative? That is,
given two valid programs, is their concatenation also a valid program?
Are the semantics of the resulting program the composition of the
semantics of the original two?
I think the answer is "no". A program "a1" and "a2", both of which
deterministically execute something like "False assert" or an
undefined word (which means that they always halt). The concatenation
"a1 a2" will perform (deterministically) the semantics of a2, but
never a1.
This doesn't make "peg" a bad language... But it does suggest my
"formal" rule for testing concatenativity wants a weaker variant.
Another example of the need for a weakening is words that perform
control flow alteration, like "RETURN" and "call/cc", or Forth's R>
and EXIT.
For "peg" itself, it's easy to see that concatenativity DOES hold for
programs where the parse "continues" past the beginning of the
program. For Forth, concatenativity holds when certain words in the
dictionary are not used. Rice's theorem will tangle us up for both
languages, but in general it's easy to see when a Forth program is
concatenative. Parsing a peg program MIGHT be possible to explode to
O(2^n), but I think it'll always terminate.

Post by Don Groves
don

-Wm

--
--
Stephen De Gabrielle
***@acm.org
Telephone +44 (0)20 85670911
Mobile +44 (0)79 85189045
http://www.degabrielle.name/stephen
----
Professor: Oh God! I clicked without reading!
Cubert: And I slightly modified something I own!
Professor: We're monsters!

[Non-text portions of this message have been removed]

William Tanksley, Jr

2012-04-19 05:49:57 UTC

Permalink

Post by William Tanksley, Jr
Your comment and the \/ branch operator made me wonder about the
possibility of a concatenative language that operated on a 'graph-structured
stack'? (and if such a thing would be useful)

That's a good question. You'd have to provide more structure than a
simple "graph", since a graph has no orientation. Obviously you've got
a root and at least one current position... But how do branches
happen, and how do mergers happen?

Post by William Tanksley, Jr

Since hackerfoo just joined the group, we may be soon receiving an
answer! I look forward to it.

-Wm

dustin.deweese

2012-04-19 08:53:39 UTC

Permalink

Post by William Tanksley, Jr

Post by Don Groves
Of course, any concatentative language doesn't have to use a stack,
it's just most convenient.

The stack works essentially as a sort of caching structure. Whichever caching strategy you choose determines the data structure.

Last in, first out = queue
Highest/lowest weighted out = priority queue
First in, first out = stack
There can be only one = a single item (accumulator)

I can't think of any other strategies that have a single unambiguous "top" element. FIFO seems to be the only reasonable caching strategy, though.

Peg is a pure functional language, so it doesn't matter if a1 is executed if a2 yields no results; the execution of a1 has no observable effects. The exception is if a1 performs I/O. Unfortunately, I can't see any way to fix this, because the Peg interpreter could never perform any I/O for fear of an unseen 'False assert' being concatenated to the right.

For Peg, it is formally concatenative with some modifications:

1. The expression to the right must be fully evaluated before evaluation of a sub-expression containing an IO token.
2. Concatenation must be performed on the Cartesian products of the results of evaluation of the sub-expressions

William Tanksley, Jr

2012-04-20 17:21:38 UTC

Permalink

Post by William Tanksley, Jr

Post by Don Groves
Of course, any concatentative language doesn't have to use a stack,
it's just most convenient.

The stack works essentially as a sort of caching structure. Whichever caching strategy you choose determines the data structure.
Last in, first out = queue
Highest/lowest weighted out = priority queue
First in, first out = stack
There can be only one = a single item (accumulator)
I can't think of any other strategies that have a single unambiguous "top" element. FIFO seems to be the only reasonable caching strategy, though.

That's a clever approach. Although I liked it at first glance, I think
it's not quite complete. We know concatenative languages that use an
accumulator and ones that use a stack; we also know ones that use
multiples of each (for example, ANS Forth can have a separate floating
point stack, and Machine Forth uses a memory fetch register; note that
the return stack in Forth does NOT count, since accessing that makes
the program NOT concatenative). The problem is that the choice between
stack and accumulator changes the language in a profound way. So if it
were possible to arrange a queue in some way instead of a stack, what
would the resulting language look like? We have no semantics for such
a language (yet).

I'm also thinking about the statement that there should be only one
unambiguous "top" element. This is essentially right, of course;
accepting that when there are two top elements they can be
disambiguated by the word that's needing the data (as with ANS Forth's
floating point vocabulary, whose words may pull from a distinct
stack). There are more details, of course, since some words accept
more than one parameter, so at the time the word is called the
structure must provide ALL of its elements to it unambiguously (so not
just the top item).

A stack meets that requirement by imposing a total ordering of all of
its contents at all times. I think it's fair to say that an
accumulator does as well (consider a language over matrices or quantum
mechanical operators instead of over the integers). If the only
possible data structure must impose a total ordering, then a stack is
the only possibility. BUT... does peg impose a total ordering? Doesn't
the nondeterminism hint that there's something other than a total
ordering?

Actually, because a peg program must be parsed backward, it occurs to
me to take a lesson from Stevan Apter, and make the language's parse
structure explicit. In order to be parsed backward, it would make
sense to describe the start of a parse as pushing the REVERSE of the
initial program onto a stack (which plays the same role as Forth's
return stack and xy's dequeue.

I'd better send this -- otherwise I'll never finish writing it.
Undeniably peg is stack-based, but there's a lot more going on in
terms of data structures and algorithms than just the usual return
stack-data stack pair, and I think the additional stuff is
interesting.

Peg is a pure functional language, so it doesn't matter if a1 is executed if a2
yields no results; the execution of a1 has no observable effects.

Oh, so I actually completely misunderstood. "False assert" doesn't
halt the PROGRAM; rather, it halts the PARSE; it actually sends the
program into a "never terminates" state. That makes perfect sense; and
seems quite clever.

The exception is if a1 performs I/O. Unfortunately, I can't see any way to fix
this, because the Peg interpreter could never perform any I/O for fear of an
unseen 'False assert' being concatenated to the right.

Then let's just note that performing IO renders undefined results in
the presence of nonterminating programs. Good ol' "undefined results".

1. The expression to the right must be fully evaluated before evaluation of a sub-expression containing an IO token.

That seems like a nice rule.

2. Concatenation must be performed on the Cartesian products of the results of evaluation of the sub-expressions

You MIGHT mean composition rather than concatenation. Is that the case?

If so, you've discovered a new subvariant of a "concatenative
language". The usual rule is "the syntax of concatenation implies the
semantics of composition", but for your language the rule is "the
syntax of concatenation implies the semantics of composition of all
Cartesian products".

I like it.

Now, Enchilada did something similar, but IIRC it didn't do it with
mere concatenation (van Dalen?); I don't know enough about Ripple to
say anything, but it either did this or allowed some explicit
operations to cause it (fortytwo, are you listening?).

Either way, I just learned something new.

-Wm

dustin.deweese

2012-04-22 21:09:07 UTC

Permalink

Post by William Tanksley, Jr
A stack meets that requirement by imposing a total ordering of all of
its contents at all times. I think it's fair to say that an
accumulator does as well (consider a language over matrices or quantum
mechanical operators instead of over the integers). If the only
possible data structure must impose a total ordering, then a stack is
the only possibility. BUT... does peg impose a total ordering? Doesn't
the nondeterminism hint that there's something other than a total
ordering?

The stack for any particular execution path has a total ordering. You could use non-determinism to make the stack appear only partially-ordered, though, with an expression such as:

1 2 3 [] [swap] \/ $

Post by William Tanksley, Jr

Post by dustin.deweese
2. Concatenation must be performed on the Cartesian products of the results of evaluation of the sub-expressions

You MIGHT mean composition rather than concatenation. Is that the case?

The second rule wasn't worded very well. I tried to make it more precise:

2. The composition of two sub-expressions is equivalent to concatenating each of the Cartesian products of the results of evaluating the sub-expressions individualy.

See this wiki page (https://github.com/HackerFoo/peg/wiki/Properties-of-Peg) for an example, as well as other information on the properties of Peg you might find interesting.

Joshua Shinavier

2012-04-23 04:24:58 UTC

Permalink

Post by William Tanksley, Jr
You MIGHT mean composition rather than concatenation. Is that the case?
If so, you've discovered a new subvariant of a "concatenative
language". The usual rule is "the syntax of concatenation implies the
semantics of composition", but for your language the rule is "the
syntax of concatenation implies the semantics of composition of all
Cartesian products".
I like it.
Now, Enchilada did something similar, but IIRC it didn't do it with
mere concatenation (van Dalen?); I don't know enough about Ripple to
say anything, but it either did this or allowed some explicit
operations to cause it (fortytwo, are you listening?).

Present, just lazy :-)

That's an interesting property, but it's not true of Ripple, and lazy
evaluation is the reason it's not true. Suppose we split a Ripple program
arbitrarily and evaluate the fragments, e.g.

4 sqrt. 10 add.

to

4 sqrt.

and

10 add.

They're both valid programs, yet the right-hand-side program has no
solutions while the complete program has two solutions ([8] and [12], or
(8) and (12) in Ripple syntax). Any time the right-hand-side does have a
solution, e.g. if the program had been (4 sqrt. 10), which is already fully
reduced, the left-hand-side will never be reduced in the complete program,
so it doesn't matter what its solutions are.

If Ripple used an eager evaluation strategy instead, the "cartesian
composition" rule would more or less hold.

Best,

Josh

Post by William Tanksley, Jr
Either way, I just learned something new.
-Wm

eas lab

2012-04-25 09:54:24 UTC

Permalink

Post by dustin.deweese
Last in, first out = queue
First in, first out = stack

Didn't anybody pick up this 'typo'?

Post by dustin.deweese
I can't think of any other strategies that have a single unambiguous
"top" element.

Think rather in term of the 'most recent' element.

] We know concatenative languages that use an accumulator and ones that
] use a stack; we also know ones that use multiples of each (for example,
] ANS Forth can have a separate floating point stack, and Machine Forth
] uses a memory fetch register; note that the return stack in Forth does
] NOT count, since accessing that makes the program NOT concatenative).
I disagree: it's the concept of LIFO/most-recent, and not the implementation
details, that count..

Consider the following real-life-task:
papers are 'stacked' in their order of creation;
we need to present them to the emperor in their order of creation;
so we could sequentially 'pop' them to another 'pile';
and than 'pop & present' from that 'pile'.

NB. I distinguish between the stack and the pile;
where the pile is stack2 of pop(stack).

If some of the papers may not be handled by the 'presenter',
they could be piled on a separate/floating-point-pile.

But then we've destroyed/lost the <sequence markers>,
by bifubricating the stack/pile.

I'm guessing that in FORTH, when/if you ADD a floating-point-facility,
you must have a mechanism to keep the 2nd [or more] stack 'syncronised'.
So effectively/conceptually there's only ONE stack. Or?

== Chris Glur.

dustin.deweese

2012-04-25 11:12:52 UTC

Permalink

Post by eas lab

Post by dustin.deweese
Last in, first out = queue
First in, first out = stack

Didn't anybody pick up this 'typo'?

This is why I prefer a pencil to a pen.

Oh well. Point and laugh.

Post by eas lab

Post by dustin.deweese
I can't think of any other strategies that have a single unambiguous
"top" element.

Think rather in term of the 'most recent' element.
] We know concatenative languages that use an accumulator and ones that
] use a stack; we also know ones that use multiples of each (for example,
] ANS Forth can have a separate floating point stack, and Machine Forth
] uses a memory fetch register; note that the return stack in Forth does
] NOT count, since accessing that makes the program NOT concatenative).
I disagree: it's the concept of LIFO/most-recent, and not the implementation
details, that count..
papers are 'stacked' in their order of creation;
we need to present them to the emperor in their order of creation;
so we could sequentially 'pop' them to another 'pile';
and than 'pop & present' from that 'pile'.
NB. I distinguish between the stack and the pile;
where the pile is stack2 of pop(stack).
If some of the papers may not be handled by the 'presenter',
they could be piled on a separate/floating-point-pile.
But then we've destroyed/lost the <sequence markers>,
by bifubricating the stack/pile.
I'm guessing that in FORTH, when/if you ADD a floating-point-facility,
you must have a mechanism to keep the 2nd [or more] stack 'syncronised'.
So effectively/conceptually there's only ONE stack. Or?
== Chris Glur.

I considered having separate stacks for each type in Peg. Each argument to a word would be taken from the appropriate stack depending on the type required. Then, values could be wrapped in semantic types to minimize stack manipulation.

For instance a word 'greeting' could require a 'name' and a 'title', both of which are simply wrappers for strings, and the greeting would return "Hello [title] [name]!" regardless of the order of arguments. It would just use the title and name on the top of their respective stacks.

I decided against this, because it would complicate many other things.

dustin.deweese

2012-04-25 11:38:10 UTC

Permalink

Post by Joshua Shinavier
That's an interesting property, but it's not true of Ripple, and lazy
evaluation is the reason it's not true. Suppose we split a Ripple program
arbitrarily and evaluate the fragments, e.g.
4 sqrt. 10 add.
to
4 sqrt.
and
10 add.
They're both valid programs, yet the right-hand-side program has no
solutions while the complete program has two solutions ([8] and [12], or
(8) and (12) in Ripple syntax). Any time the right-hand-side does have a
solution, e.g. if the program had been (4 sqrt. 10), which is already fully
reduced, the left-hand-side will never be reduced in the complete program,
so it doesn't matter what its solutions are.
If Ripple used an eager evaluation strategy instead, the "cartesian
composition" rule would more or less hold.

Peg also uses lazy evaluation, but the pureness of the language means that:

1 2 +
3
7 4 -

are all equivalent and indistinguishable, whether evaluated or not, because observing them requires evaluation.

Because expressions need to terminate first in order to use the Cartesian composition rule, non-termination does not pose a problem either.

Also, in Peg, an expression will not reduce, rather than yielding no solutions, if an argument to a word is missing i.e. the expression is a partially applied function. So,

10 + --> 10 +

which is different from

"ten" + --> no (no results)

John Nowak

2012-04-17 12:48:39 UTC

Permalink

Reddit thread (with comments) here:

http://www.reddit.com/r/haskell/comments/sc858/peg_a_lazy_nondeterministic_concatenative/