Consider a reduced example from Erlang's Tail Recursion is Not a Silver Bullet, linked from the Efficiency Guide:
-module(mapbench).
-compile(export_all).
tail_map(F, L) ->
tail_map(F, L, []).
tail_map(_, [], Acc) -> lists:reverse(Acc);
tail_map(F, [H|T], Acc) -> tail_map(F, T, [F(H)|Acc]).
body_map(_, []) -> [];
body_map(F, [H|T]) -> [F(H) | body_map(F, T)].
A common expectation - from Scheme, and Erlang, and any language with tail-call optimization - is that tail_map should be preferred. Tail-recursive code that doesn't build lists generally wins.
But consider erlperf and a tiny 'memused' escript from dev environment:
$ for cmd in memused erlperf; do for x in 10 100 100_000 1_000_000; do $cmd "bench:tail_map(fun(X) -> X+1 end, lists:seq(1,$x))." "bench:body_map(fun(X) -> X+1 end, lists:seq(1,$x))."; echo; done done
193 - bench:tail_map(fun(X) -> X+1 end, lists:seq(1,10)).
40 - bench:body_map(fun(X) -> X+1 end, lists:seq(1,10)).
733 - bench:tail_map(fun(X) -> X+1 end, lists:seq(1,100)).
400 - bench:body_map(fun(X) -> X+1 end, lists:seq(1,100)).
617885 - bench:tail_map(fun(X) -> X+1 end, lists:seq(1,100_000)).
417752 - bench:body_map(fun(X) -> X+1 end, lists:seq(1,100_000)).
6017885 - bench:tail_map(fun(X) -> X+1 end, lists:seq(1,1_000_000)).
4017752 - bench:body_map(fun(X) -> X+1 end, lists:seq(1,1_000_000)).
Code || QPS Time Rel
bench:body_map(fun(X) -> X+1 end, lists:seq(1,10)). 1 13648 Ki 73 ns 100%
bench:tail_map(fun(X) -> X+1 end, lists:seq(1,10)). 1 12689 Ki 78 ns 93%
Code || QPS Time Rel
bench:tail_map(fun(X) -> X+1 end, lists:seq(1,100)). 1 1684 Ki 594 ns 100%
bench:body_map(fun(X) -> X+1 end, lists:seq(1,100)). 1 1217 Ki 821 ns 72%
Code || QPS Time Rel
bench:body_map(fun(X) -> X+1 end, lists:seq(1,100_000)). 1 887 1128 us 100%
bench:tail_map(fun(X) -> X+1 end, lists:seq(1,100_000)). 1 518 1932 us 58%
Code || QPS Time Rel
bench:body_map(fun(X) -> X+1 end, lists:seq(1,1_000_000)). 1 53 18756 us 100%
bench:tail_map(fun(X) -> X+1 end, lists:seq(1,1_000_000)). 1 50 20007 us 94%
The tail-recursive version always creates more garbage, starting at 4.8x as much when building a tiny list, and going down to about 1.5x as much garbage for much longer lists.
The tail-recursive version has a sweet spot where it's faster than the body recursive version, but it's usually slower.