Erlang Notes: body-recursive list-building

Consider a reduced example from Erlang's Tail Recursion is Not a Silver Bullet, linked from the Efficiency Guide:

-module(mapbench).
-compile(export_all).

tail_map(F, L) ->
    tail_map(F, L, []).

tail_map(_, [], Acc) -> lists:reverse(Acc);
tail_map(F, [H|T], Acc) -> tail_map(F, T, [F(H)|Acc]).

body_map(_, []) -> [];
body_map(F, [H|T]) -> [F(H) | body_map(F, T)].

A common expectation - from Scheme, and Erlang, and any language with tail-call optimization - is that tail_map should be preferred. Tail-recursive code that doesn't build lists generally wins.

But consider erlperf and a tiny 'memused' escript from dev environment:

$ for cmd in memused erlperf; do for x in 10 100 100_000 1_000_000; do $cmd "bench:tail_map(fun(X) -> X+1 end, lists:seq(1,$x))."  "bench:body_map(fun(X) -> X+1 end, lists:seq(1,$x))."; echo; done done
       193 - bench:tail_map(fun(X) -> X+1 end, lists:seq(1,10)).
        40 - bench:body_map(fun(X) -> X+1 end, lists:seq(1,10)).

       733 - bench:tail_map(fun(X) -> X+1 end, lists:seq(1,100)).
       400 - bench:body_map(fun(X) -> X+1 end, lists:seq(1,100)).

    617885 - bench:tail_map(fun(X) -> X+1 end, lists:seq(1,100_000)).
    417752 - bench:body_map(fun(X) -> X+1 end, lists:seq(1,100_000)).

   6017885 - bench:tail_map(fun(X) -> X+1 end, lists:seq(1,1_000_000)).
   4017752 - bench:body_map(fun(X) -> X+1 end, lists:seq(1,1_000_000)).

Code                                                        ||        QPS       Time   Rel
bench:body_map(fun(X) -> X+1 end, lists:seq(1,10)).          1   13648 Ki      73 ns  100%
bench:tail_map(fun(X) -> X+1 end, lists:seq(1,10)).          1   12689 Ki      78 ns   93%

Code                                                         ||        QPS       Time   Rel
bench:tail_map(fun(X) -> X+1 end, lists:seq(1,100)).          1    1684 Ki     594 ns  100%
bench:body_map(fun(X) -> X+1 end, lists:seq(1,100)).          1    1217 Ki     821 ns   72%

Code                                                             ||        QPS       Time   Rel
bench:body_map(fun(X) -> X+1 end, lists:seq(1,100_000)).          1        887    1128 us  100%
bench:tail_map(fun(X) -> X+1 end, lists:seq(1,100_000)).          1        518    1932 us   58%

Code                                                               ||        QPS       Time   Rel
bench:body_map(fun(X) -> X+1 end, lists:seq(1,1_000_000)).          1         53   18756 us  100%
bench:tail_map(fun(X) -> X+1 end, lists:seq(1,1_000_000)).          1         50   20007 us   94%

The tail-recursive version always creates more garbage, starting at 4.8x as much when building a tiny list, and going down to about 1.5x as much garbage for much longer lists.

The tail-recursive version has a sweet spot where it's faster than the body recursive version, but it's usually slower.