Scalaz（35）－ Free ：運算－Trampoline，say NO to StackOverflowError

在前面幾次討論中我們介紹了Free是個產生Monad的最基本結構。它的原理是把一段程式（AST）一連串的運算指令（ADT）轉化成數據結構存放在記憶體里，這個過程是個獨立的功能描述過程。然後另一個獨立運算過程的Interpreter會遍歷（traverse）AST結構，讀取結構里的運算指令，實際運行指令 ...

在前面幾次討論中我們介紹了Free是個產生Monad的最基本結構。它的原理是把一段程式（AST）一連串的運算指令（ADT）轉化成數據結構存放在記憶體里，這個過程是個獨立的功能描述過程。然後另一個獨立運算過程的Interpreter會遍歷（traverse）AST結構，讀取結構里的運算指令，實際運行指令。這裡的重點是把一連串運算結構化（reify）延遲運行，具體實現方式是把Monad的連續運算方法flatMap轉化成一串Suspend結構（case class），把運算過程轉化成創建（construct）Suspend過程。flatMap的表現形式是這樣的：flatMap(a => flatMap(b => flatMap(c => ....)))，這是是明顯的遞歸演算法，很容易產生堆棧溢出異常（StackOverflow Exception）,無法保證程式的安全運行，如果不能有效解決則FP編程不可行。Free正是解決這個問題的有效方法，因為它把Monad的遞歸演算法flatMap轉化成了一個創建數據結構實例的過程。每創建一個Suspend，立即完成一個運算。我們先用個例子來證明Monad flatMap的遞歸演算法問題：

 1 ef zipIndex[A](xa: List[A]): List[(Int,A)] =
 2   xa.foldLeft(State.state[Int,List[(Int,A)]](List()))(
 3     (acc,a) => for {
 4       xn <- acc
 5       s <- get[Int]
 6       _ <- put[Int](s+1)
 7     } yield ((s,a) :: xn)
 8   ).eval(1).reverse                               //> zipIndex: [A](xa: List[A])List[(Int, A)]
 9 
10 zipIndex(1 |-> 10)                                //> res6: List[(Int, Int)] = List((1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9), (10,10))
11 zipIndex(1 |-> 10000)                             //> java.lang.StackOverflowError

在這個例子里我們使用了State Monad。我們知道for-comprehension就是flatMap鏈條，是一種遞歸演算法，所以當zipIndex針對大List時就產生了StackOverflowError。

我們提到過用Trampoline可以heap換stack，以遍曆數據結構代替遞歸運算來實現運行安全。那麼什麼是Trampoline呢？

1 sealed trait Trampoline[+A] 
2 case class Done[A](a: A) extends Trampoline[A]
3 case class More[A](k: () => Trampoline[A]) extends Trampoline[A]

Trampoline就是一種數據結構。它有兩種狀態：Done(a: A)代表結構記憶體放了一個A值，More(k: ()=>Trampoline[A])代表結構記憶體放的是一個Function0函數，就是一個運算Trampoline[A]。

我們先試個遞歸演算法例子：

 1 def isEven(xa: List[Int]): Boolean =
 2   xa match {
 3     case Nil => true
 4     case h :: t => isOdd(t)
 5   }                                               //> isEven: (xa: List[Int])Boolean
 6 def isOdd(xa: List[Int]): Boolean =
 7   xa match {
 8     case Nil => false
 9     case h :: t => isEven(t)
10   }                                               //> isOdd: (xa: List[Int])Boolean
11 
12 isOdd(0 |-> 100)                                  //> res0: Boolean = true
13 isEven(0 |-> 10000)                               //> java.lang.StackOverflowError

可以看到isEven和isOdd這兩個函數相互遞歸調用，最終用大點的List就產生了StackOverflowError。

現在重新調整一下函數isEven和isOdd的返回結構類型：從Boolean換成Trampoline，意思是從返回一個結果值變成返回一個數據結構：

 1 def even(xa: List[Int]): Trampoline[Boolean] =
 2   xa match {
 3     case Nil => Done(true)
 4     case h :: t => More(() => odd(t))
 5   }                                               //> even: (xa: List[Int])Exercises.trampoline.Trampoline[Boolean]
 6 def odd(xa:  List[Int]): Trampoline[Boolean] =
 7   xa match {
 8     case Nil => Done(false)
 9     case h :: t => More(() => even(t))
10   }                                               //> odd: (xa: List[Int])Exercises.trampoline.Trampoline[Boolean]
11   
12 even(1 |-> 123001)                                //> res0: Exercises.trampoline.Trampoline[Boolean] = More(<function0>)

現在我們獲得了一個在heap上存放了123001個元素的數據結構More(<function0>)。這是一個在記憶體heap上存放的過程，並沒有任何實質運算。
現在我們需要一個方法來遍歷這個返回的結構，逐個運行結構中的function0：

1 sealed trait Trampoline[+A] {
2   final def runT: A =
3     this match {
4       case Done(a) => a
5       case More(k) => k().runT
6     }
7 }
8 
9 even(1 |-> 123001).runT                           //> res0: Boolean = false

由於這個runT是個尾遞歸（Tail Call Elimination TCE）演算法，所以沒有出現StackOverflowError。

實際上scalaz也提供了Trampoline類型：scalaz/Free.scala

  /** A computation that can be stepped through, suspended, and paused */
  type Trampoline[A] = Free[Function0, A]
...
object Trampoline extends TrampolineInstances {

  def done[A](a: A): Trampoline[A] =
    Free.Return[Function0,A](a)

  def delay[A](a: => A): Trampoline[A] =
    suspend(done(a))

  def suspend[A](a: => Trampoline[A]): Trampoline[A] =
    Free.Suspend[Function0, A](() => a)
}

Trampoline就是Free[S,A]的一種特例。S == Function0，或者說Trampoline就是Free針對Function0生成的Monad，因為我們可以用Free.Return和Free.Suspend來實現Done和More。我們可以把scalaz的Trampoline用在even,odd函數里：

 1 import scalaz.Free.Trampoline
 2 def even(xa: List[Int]): Trampoline[Boolean] =
 3   xa match {
 4     case Nil => Trampoline.done(true)
 5     case h :: t => Trampoline.suspend(odd(t))
 6   }                                               //> even: (xa: List[Int])scalaz.Free.Trampoline[Boolean]
 7 def odd(xa:  List[Int]): Trampoline[Boolean] =
 8   xa match {
 9     case Nil => Trampoline.done(false)
10     case h :: t => Trampoline.suspend(even(t))
11   }                                               //> odd: (xa: List[Int])scalaz.Free.Trampoline[Boolean]
12   
13 even(1 |-> 123001).run                            //> res0: Boolean = false

語法同我們自定義的Trampoline差不多。

那麼我們能不能把Trampoline用在上面的哪個zipIndex函數里來解決StackOverflowError問題呢？zipIndex里造成問題的Monad是個State Monad，我們可以用State.lift把State[S,A升格成StateT[Trampoline,S,A]。先看看這個lift函數：scalaz/StateT.scala

 def lift[M[_]: Applicative]: IndexedStateT[({type λ[α]=M[F[α]]})#λ, S1, S2, A] = new IndexedStateT[({type λ[α]=M[F[α]]})#λ, S1, S2, A] {
    def apply(initial: S1): M[F[(S2, A)]] = Applicative[M].point(self(initial))
  }

這個函數的功能等於是：State.lift[Trampoline] >>> StateT[Tarmpoline,S,A]。先看另一個簡單例子：

1 def incr: State[Int, Int] =  State {s => (s+1, s)}//> incr: => scalaz.State[Int,Int]
2 incr.replicateM(10000).eval(0) take 10            //> java.lang.StackOverflowError
3 
4 import scalaz.Free.Trampoline
5 incr.lift[Trampoline].replicateM(100000).eval(0).run.take(10)
6                                                   //> res0: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

上面這個例子也使用了State Monad：函數incr返回的是State，這時用replicateM(10000).eval(0)對重覆對10000個State進行運算時產生了StackOverflowError。我們跟著用lift把incr返回類型變成StateT[Trampoline,S,A]，這時replicateM（10000).eval(0)的作用就是進行結構轉化了(State.apply:Trampoline[(S,A)])，再用Trampoline.run作為Interpreter遍歷結構進行運算。用lift升格Trampoline後解決了StackOverflowError。

我們試著調整一下zipIndex函數：

 1 def safeZipIndex[A](xa: List[A]): List[(Int,A)] =
 2   (xa.foldLeft(State.state[Int,List[(Int,A)]](List()))(
 3     (acc,a) => for {
 4       xn <- acc
 5       s <- get[Int]
 6       _ <- put(s + 1)
 7     } yield (s,a) :: xn
 8   ).lift[Trampoline]).eval(1).run.reverse         //> safeZipIndex: [A](xa: List[A])List[(Int, A)]
 9   
10   safeZipIndex(1 |-> 1000).take(10)               //> res2: List[(Int, Int)] = List((1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9), (10,10))

錶面上來看結果好像是正確的。試試大一點的List：

1  safeZipIndex(1 |-> 10000).take(10)   //> java.lang.StackOverflowError
2                                       //| at scalaz.IndexedStateT$$anonfun$flatMap$1.apply(StateT.scala:62)
3                                       //| at scalaz.IndexedStateT$$anon$10.apply(StateT.scala:95)
4                                       //| at scalaz.IndexedStateT$$anonfun$flatMap$1.apply(StateT.scala:62)
5 ...
6

還是StackOverflowError，看錯誤提示是State.flatMap造成的。看來遲點還是按照incr的原理把foldLeft運算階段結果拆分開來分析才行。

以上我們證明瞭Trampoline可以把連續運算轉化成創建數據結構，以heap記憶體換stack，能保證遞歸演算法運行的安全。因為Trampoline是Free的一個特例，所以Free的Interpreter也就可以保證遞歸演算法安全運行。現在可以得出這樣的結論：FP就是Monadic Programming，就是用Monad來編程，我們應該儘量用Free來生成Monad，用Free進行編程以保證FP程式的可靠性。