{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Filtrage par motifs\n", "\n", "Le filtrage par motifs (ou _pattern matching_) est une technique permettant de choisir le comportement d'un morceau de code en fonction d'un paramètre.\n", "C'est une fonctionnalité assez puissante qui simplifie considérablement le code.\n", "Des langages comme le C ou le C++ ne disposant pas d'un tel outil doivent faire appel à des expédients plus ou moins élégants : `switch`, `if`s imbriqués, méthodes virtuelles... \n", "Concrètement, on définit un nombre fini de _motifs_, et on associe à chacun un morceau de code à effectuer." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "120\n" ] }, { "data": { "text/plain": [ "Name: scala.MatchError\n", "Message: -5 (of class java.lang.Integer)\n", "StackTrace: $line10.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$factorial$1.apply$mcII$sp(:22)\n", "$line12.$read$$iwC$$iwC$$iwC$$iwC.(:21)\n", "$line12.$read$$iwC$$iwC$$iwC.(:26)\n", "$line12.$read$$iwC$$iwC.(:28)\n", "$line12.$read$$iwC.(:30)\n", "$line12.$read.(:32)\n", "$line12.$read$.(:36)\n", "$line12.$read$.()\n", "$line12.$eval$.(:7)\n", "$line12.$eval$.()\n", "$line12.$eval.$print()\n", "sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n", "sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n", "sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n", "java.lang.reflect.Method.invoke(Method.java:498)\n", "org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)\n", "org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)\n", "org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)\n", "org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)\n", "org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)\n", "org.apache.toree.kernel.interpreter.scala.ScalaInterpreter$$anonfun$interpretAddTask$1$$anonfun$apply$3.apply(ScalaInterpreter.scala:361)\n", "org.apache.toree.kernel.interpreter.scala.ScalaInterpreter$$anonfun$interpretAddTask$1$$anonfun$apply$3.apply(ScalaInterpreter.scala:356)\n", "org.apache.toree.global.StreamState$.withStreams(StreamState.scala:81)\n", "org.apache.toree.kernel.interpreter.scala.ScalaInterpreter$$anonfun$interpretAddTask$1.apply(ScalaInterpreter.scala:355)\n", "org.apache.toree.kernel.interpreter.scala.ScalaInterpreter$$anonfun$interpretAddTask$1.apply(ScalaInterpreter.scala:355)\n", "org.apache.toree.utils.TaskManager$$anonfun$add$2$$anon$1.run(TaskManager.scala:140)\n", "java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\n", "java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\n", "java.lang.Thread.run(Thread.java:745)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// simple example of pattern matching\n", "// the matching is done on the argument of the function, which needs not to be explicitly named\n", "def factorial: Int => Int = {\n", " case 0 => 1\n", " case n if n > 0 => n * factorial(n-1)\n", "}\n", "println(factorial(5)) // should be 120\n", "println(factorial(-5)) // error, negative integers cannot be matched" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "L'exemple ci-dessus illustre la syntaxe du pattern-matching et son utilisation possible pour la définition de fonction.\n", "La succession des cas rend le code beaucoup plus lisible que s'il avait été écrit avec un `if`. \n", "Le pattern-matching commence à prendre son sens lorsque le nombre de cas croît." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n", "6\n" ] } ], "source": [ "// another simple example of pattern matching\n", "def calc(op: String, x: Int, y: Int) = op match {\n", " case \"+\" => x+y\n", " case \"-\" => x-y\n", " case \"*\" => x*y\n", " case \"/\" => x/y\n", "}\n", "println(calc(\"+\", 2, 3))\n", "println(calc(\"*\", 2, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mais les exemples ci-dessus ne font du filtrage que sur la valeur.\n", "Le filtrage par motifs permet également de filtrer par types, voire de combiner les deux." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6\n", "2\n", "3\n" ] } ], "source": [ "// pattern matching on types\n", "def foo: Any => Int = {\n", " case y: Int if y < 0 => -y\n", " case y: Int => y+1\n", " case s: String => s.length()\n", "}\n", "println(foo(5)) // 6\n", "println(foo(-2)) // 2\n", "println(foo(\"bar\")) // 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On peut aussi utiliser le filtrage par motifs sur plusieurs variables en même temps." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7\n" ] } ], "source": [ "// pattern matching on tuples\n", "def add: (Int, Int) => Int = {\n", " case (x,y) => x+y\n", "}\n", "println(add(2,5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Enfin, les motifs peuvent utiliser des jokers. Ceci est particulièrement utile pour définir des cas par défaut." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6\n", "error\n", "-1\n" ] } ], "source": [ "def choose: (Int, Int) => Int = {\n", " case (_,0) => 1\n", " case (n,k) if n == k => 1\n", " case (n,k) if n > 0 && k > 0 && k < n => choose(n-1,k-1)+choose(n-1,k)\n", " case _ => println(\"error\"); -1\n", "}\n", "\n", "println(choose(4,2))\n", "println(choose(2,3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variables dans les motifs\n", "\n", "Dans un filtrage par motif, les variables utilisées dans la description du motif sont _fraîches_, même si elles portent le même nom qu'une autre variable utilisée précédemment. \n", "Elles servent à nommer les sous-éléments intéressants du motif." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "true\n", "true\n" ] } ], "source": [ "class Match(val x: Int) {\n", " // x is a member of class Match\n", " \n", " // this function is intended to return true if its argument has the same value as x\n", " def equal: Int => Boolean = {\n", " // here, x is a fresh variable, local to this function, and has nothing to do with Match.x\n", " case x => true\n", " case _ => false\n", " }\n", "}\n", "\n", "val m = new Match(2)\n", "println(m.equal(2)) // should be true\n", "println(m.equal(3)) // we wish it to be false" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dans l'exemple ci-dessus, deux variables différentes portent le même nom `x`. \n", "On peut cependant faire référence à des variables existantes dans les motifs, en encadrant le nom de la variable par des accents graves." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "true\n", "false\n" ] } ], "source": [ "class Match(val x: Int) {\n", " // x is a member of class Match\n", " \n", " // this function is intended to return true if its argument has the same value as x\n", " def equal: Int => Boolean = {\n", " // here, `x` indicates that we refer to the existing x, not a fresh variable\n", " case `x` => true\n", " case _ => false\n", " }\n", "}\n", "\n", "val m = new Match(2)\n", "println(m.equal(2)) // should be true\n", "println(m.equal(3)) // we wish it to be false" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les variables dont le nom commencent par une majuscule sont toujours considérées comme étant entre accents graves dans les motifs.\n", "Cela permet d'allèger la syntaxe pour des variables qu'on souhaite utiliser fréquemment dans des motifs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Case classes\n", "\n", "Les _case classes_ sont des classes avec les particularités suivantes (qui sont intimement liées) :\n", "- elles sont immutables ;\n", "- elles sont comparables par _égalité structurelle_ ;\n", "- elles sont décomposables par filtrage de motifs.\n", "\n", "Les case classes permettent l'implémentation de types de données algébriques, et fournissent de puissantes possibilités de filtrage de motifs. \n", "Cela permet d'écrire en quelques lignes des fonctionnalités qui nécessiteraient l'utilisation de visiteurs dans d'autres languages." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les listes sont implémentées à l'aide de deux case classes :\n", "- `Nil`, qui n'a pas de paramètres, représentant la liste vide ;\n", "- `Cons(h, t)` où `h` est un élément et `t` une liste.\n", "Afin d'alléger la notation, `Cons(h, t)` peut aussi s'écrire `h :: t`." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8\n", "List(1, 0, 3, 2, 5, 4)\n" ] } ], "source": [ "def listLength: List[Int] => Int = {\n", " case Nil => 0\n", " case _ :: t => 1+listLength(t)\n", "}\n", "println(listLength(List(0,1,1,2,3,4,5,5)))\n", "\n", "def switch: List[Int] => List[Int] = {\n", " case Nil => Nil\n", " // patterns can be arbitrarily complex\n", " case x :: y :: t => y :: x :: switch(t)\n", "}\n", "println(switch(List(0,1,2,3,4,5)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Les case classes sont souvent utilisées comme feuilles d'une arborescence d'héritage." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "// WARNING: do not run this cell more than once\n", "// it may confuse the kernel with multiply defined classes\n", "class Expr\n", "case class Plus(lhs: Expr, rhs: Expr) extends Expr\n", "case class Constant(n: Int) extends Expr" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def evalExpr(e: Expr): Int = e match {\n", " case Plus(lhs, rhs) => evalExpr(lhs) + evalExpr(rhs)\n", " case Constant(n) => n\n", "}\n", "\n", "evalExpr(Plus(Constant(3), Constant(5)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extracteurs\n", "\n", "Il est aussi possible d'équiper n'importe quelle classe (ou singleton) de méthodes permettant de l'utiliser dans des motifs, même si c'est pas une case class. \n", "Pour cela, le filtrage de motifs s'appuie sur la méthode `unapply`." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "13\n" ] } ], "source": [ "object Twice {\n", " def unapply(x: Int) = \n", " if (x % 2 == 0) Some(x/2) else None\n", "}\n", "\n", "val x = 26\n", "x match {\n", " case Twice(n) => println(n)\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "De manière générale, le type de retour de la méthode `unapply` est `Option[T]`, où `T` est le type de la variable qui apparaîtra dans le motif. \n", "Le type `Option[T]` a deux constructeurs : `None` (sans arguments) et `Some` qui a un argument. \n", "Dans le cas de `unapply`, `None` est utilisé pour indiquer que le motif ne correspond pas, et `Some(t)` indique que `t` est la valeur correspondante au motif.\n", "\n", "Dans l'exemple ci-dessus, `n` est de type `Int`, la méthode `unapply` retourne un objet de type `Option[Int]`. \n", "La méthode `unapply` peut aussi retourner un tuple, de type `Option[(T1, T2, ... Tn)]`.\n", "\n", "Dans le cas où le motif n'utilise pas de variables, le type de retour de `unapply` peut être `Boolean`." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9720 = 2**3 * 3**5 * 5**1\n" ] }, { "data": { "text/plain": [ "Name: Syntax Error.\n", "Message: \n", "StackTrace: " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// a pair of integers (x,y) is encoded on a single integer as 2**x * 3**y\n", "object Pair {\n", " def apply(x: Int, y: Int): Int = Math.pow(2, x).toInt * Math.pow(3, y).toInt\n", " \n", " def unapply(pp: Int) = {\n", " var p = pp\n", " var (x,y) = (0,0)\n", " while (p%2 == 0) { x += 1; p = p/2 }\n", " while (p%3 == 0) { y += 1; p = p/3 }\n", " if (p != 1) None else Some(x,y)\n", " }\n", "}\n", "// a triple of integers (x,y,z) is encoded on a single integer as 2**x * 3**y * 5**z\n", "object Triple {\n", " def apply(x: Int, y: Int, z: Int): Int = Math.pow(2, x).toInt * Math.pow(3, y).toInt * Math.pow(5, z).toInt\n", "\n", " def unapply(pp: Int) = {\n", " var p = pp\n", " var (x,y,z) = (0,0,0)\n", " while (p%2 == 0) { x +=1; p = p/2 }\n", " while (p%3 == 0) { y +=1; p = p/3 }\n", " while (p%5 == 0) { z +=1; p = p/5 }\n", " if (p != 1) None else Some(x,y,z)\n", " }\n", "}\n", "\n", "// NB: A(args) is a shortcut for A.apply(args)\n", "val p = Triple(3, 5, 1)\n", "p match {\n", " case Pair(x,y) => println(p.toString() + \" = 2**\" + x.toString() + \" * 3**\" + y.toString())\n", " case Triple(x,y,z) => println(p.toString() + \" = 2**\" + x.toString() + \" * 3**\" + y.toString() + \" * 5**\" + z.toString())\n", "}\n", "\n", "/*\n", "p match {\n", " case Pair(x,y) => f(x,y)\n", " case Triple(x,y,z) => g(x,y,z)\n", "}\n", "\n", "is similar to\n", "\n", "Pair.unapply(p) match {\n", " case Some(x,y) => f(x,y)\n", " case None => Triple.unapply(p) match {\n", " case Some(x,y,z) => g(x,y,z)\n", " }\n", "}\n", "*/" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3 is odd\n" ] } ], "source": [ "object Even {\n", " def unapply(x: Int) = x % 2 == 0\n", "}\n", "\n", "3 match {\n", " // no variable in the pattern, unapply returns a Boolean\n", " case Even() => println(\"3 is even\")\n", " case _ => println(\"3 is odd\")\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercice\n", "\n", "Implémentez des ensembles de `Int` sous forme d'arbres binaires de recherche balancés. \n", "L'arbre vide est noté `Nil`. \n", "Un arbre non-vide est constitué d'une étiquette `label` de type `Int` et de deux arbres (ses fils), `left` et `right`.\n", "Pour tout noeud dans l'arbre, les étiquettes du fils `left` doivent être toutes inférieures (strictement) à `label`, et les étiquettes du fils `right` doivent être toutes supérieures (strictement) à `label`. \n", "De plus, l'arbre et tous ses sous-arbres doivent être équilibrés : la différence de hauteur des deux fils doit être inférieure ou égale à 2." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Name: Syntax Error.\n", "Message: \n", "StackTrace: " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "// WARNING: do not run this cell more than once\n", "// it may confuse the kernel with multiply defined classes\n", "// if it happens, you can restart the kernel\n", "abstract class BinTree\n", "case class Leaf extends BinTree\n", "case class Node(label: Int, left: BinTree, right: BinTree) extends BinTree\n", "\n", "def height(t: BinTree): Int = t match {\n", " case Leaf() => 0\n", " case Node(_, l, r) => 1 + math.max(height(l), height(r))\n", "}\n", "\n", "// Size of the set\n", "def cardinal(t: BinTree): Int\n", "// Test whether e is in t\n", "def isin(t: BinTree, e: Int): Boolean\n", "// Insert e in t and return the resulting set\n", "def insert(t: BinTree, e: Int): BinTree\n", "// Remove e from t and return the resulting set\n", "// If e is not in t, then t is not modified\n", "def erase(t: BinTree, e: Int): BinTree\n", "// Set operations: union, intersection, set difference\n", "def union(t1: BinTree, t2: BinTree): BinTree\n", "def inter(t1: BinTree, t2: BinTree): BinTree\n", "def diff(t1: BinTree, t2: BinTree): BinTree\n", "// From t, retrieve the elements that satisfy predicate p\n", "def filter(t: BinTree, p: Int => Boolean): BinTree\n", "// Apply sequentially f to every elements of t\n", "def apply(t: BinTree, f: Int => Unit): Unit\n", "// Return the image of t by f\n", "def map(t: BinTree, f: Int => Int): BinTree\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Apache Toree - Scala", "language": "scala", "name": "apache_toree_scala" }, "language_info": { "name": "scala", "version": "2.10.4" } }, "nbformat": 4, "nbformat_minor": 2 }