Parsing in Prolog without cut?

Spread the love

Question Description

I found this nice snippet for parsing lisp in Prolog (from here):

ws --> [W], { code_type(W, space) }, ws.
ws --> [].

parse(String, Expr) :- phrase(expressions(Expr), String).

expressions([E|Es]) -->
    ws, expression(E), ws,
    !, % single solution: longest input match
expressions([]) --> [].

% A number N is represented as n(N), a symbol S as s(S).

expression(s(A))         --> symbol(Cs), { atom_codes(A, Cs) }.
expression(n(N))         --> number(Cs), { number_codes(N, Cs) }.
expression(List)         --> "(", expressions(List), ")".
expression([s(quote),Q]) --> "'", expression(Q).

number([D|Ds]) --> digit(D), number(Ds).
number([D])    --> digit(D).

digit(D) --> [D], { code_type(D, digit) }.

symbol([A|As]) -->
    { memberchk(A, "+/-*><=") ; code_type(A, alpha) },

symbolr([A|As]) -->
    { memberchk(A, "+/-*><=") ; code_type(A, alnum) },
symbolr([]) --> [].

However expressions uses a cut. I’m assuming this is for efficiency. Is it possible to write this code so that it works efficiently without cut?

Would also be in interested answers that involve Mercury’s soft-cut / committed choice.

Practice As Follows

The cut is not used for efficiency, but to commit to the first solution (see the comment next to the !/0: “single solution: longest input match”). If you comment out the !/0, you get for example:

?- parse("abc", E).
E = [s(abc)] ;
E = [s(ab), s(c)] ;
E = [s(a), s(bc)] ;
E = [s(a), s(b), s(c)] ;

It is clear that only the first solution, consisting of the longest sequence of characters that form a token, is desired in such cases. Given the example above, I therefore disagree with “false”: expression//1 is ambiguous, because number//1 and symbolr//1 are. In Mercury, you could use the determinism declaration cc_nondet to commit to a solution, if any.