DSL/Delimiter Directed Translation

区切り主導の変換 †

一言要約 †

入力テキストをチャンク(通常は行)単位に分割することで変換し、各行を解析する

要約 †

仕組み

入力テキストを何らかの区切り文字(最も一般的なのは行末文字)で小さな塊(チャンク)に分割する。行ベースで分割するのはプログラミング的に簡単。

1行あたりが長くなって1行分をエディタ上で複数行で表現したい場合は、継続したい行に特定文字(バックスラッシュなど)を付ける。

行の処理形式について、最もシンプルなのは各行が自立(autonomous)していて同じようなフォーマットをしているものです。以下は常連ホテル宿泊者向けポイントの例。

score 300 for 3 nights at Bree
score 200 for 2 nights at Dol Amroth
score 150 for 2 nights at Orthanc

どれも同じフォーマットなので、各行に対して同じ関数処理を行える。Embedded Translationが最も一般的でTree ConstructionでASTを作るのはほとんどみたことがない。

必要な情報の抽出方法は利用プログラミング言語に依存するが、正規表現を使うのがオススメ。複雑になった場合は、いくつかのサブの正規表現に分けて管理するとよい。

各行が異なるフォーマットの例(新聞サイトのページを表すDSL)。

border grey
headline "Musical Cambridge"
filter by date in this week
show concerts in Cambridge

各行が自立しているので、この場合は条件式で判断する。

if       (isBorder())    parseBorder();
else if  (isHeadline())  parseHeadline();
else if  (isFilter())    parseFilter();
else if  (isShow())      parseShow();
else throw new RecognitionException(input);

同形(isomorphic)や多形(polymorphic)の行も、節(clause)で分割して各節は違うフォーマットだけれども同じ構造を持つハイブリッドにできる。以下はホテルのポイント例。

300 for stay 3 nights at Bree
150 per day for stay 2 nights at Bree
50 for spa treatment at Dol Amroth
60 for stay 1 night at Orthanc or Helm's Deep or Dunharrow
1 per dollar for dinner at Bree

これは以下のようなフォーマットで表せ、広義で同形。

報酬節 for 活動節 at 場所節

まず、トップレベルで節を識別する処理を通し、それぞれの節で対応する処理を行う。Syntax Directed Translationで使われた文法に結びつけることが可能。

行ごとに自立していないDSL(例えば導入部分のステートマシン例)を扱うには状態を記録しておかなければならない。

events
  doorClosed  D1CL
  drawOpened  D2OP
  lightOn     L1ON
  doorOpened  D1OP
  panelClosed PNCL
end
...
commands
  unlockPanel PNUL
  lockPanel   PNLK
  lockDoor    D1LK
  unlockDoor  D1UL
end

イベントセクションとコマンドセクションは同じ構造だけれども、別々に扱われるべき。

こういう場合には、解析の各状態ごとに異なるパーサのグループを持つのが良い。トップレベルの行パーサがeventsを検出すると、現在行パーサをイベントパーサに切り替える。(Stateパターン)

Delimiter Directed Translationを使うにあたって厄介なのは空白。

property = value

という行で = 演算子の周りの空白を任意とするか、強制させるか、許可しないか。任意は複雑になるし、強制または許可しないようにするとDSLとして使いづらい。

「パターンマッチを行ってパターンに応じた処理を行う」という一連の処理が繰り返されるのはある種のフレームワークを思い起こさせる。実際この仕組みはlexerジェネレータにて利用されている。

いつ利用するか

Delimiter Directed Translationの大きな強みはシンプルであること。主な代替方法はSyntax Directed Translationで、いくらかの学習コストがかかる。

Delimiter Directed Translationは複雑になるととたんに難しくなるので、コンテキストのネストが多くならない言語で上手く動作する。

シンプルでコンテキストのネストが1つだけの場合はDelimiter Directed Translationが絶対オススメ。知らないチームと働くことがなく学習準備が整っているのであれば、Syntax Directed Translationもイイ。

例：常連客向けポイント(C#)

300 for stay 3 nights at Bree
150 per day for stay 2 nights at Bree
50 for spa treatment at Dol Amroth
60 for stay 1 night at Orthanc or Helm's Deep or Dunharrow
1 per dollar for dinner at Bree

class Activity...
   public string Type { get;  set; }
   public int Amount { get;  set; }
   public int Revenue { get;  set; }
   public string Location { get; set; }

場所の仕様はホテル名をチェックするだけ。

class LocationSpecification...
    private readonly IList<Hotel> hotels = new List<Hotel>();
    public LocationSpecification(params String[] names) {
      foreach (string n in names)
        hotels.Add(Repository.HotelNamed(n));
    }
    public bool IsSatisfiedBy(Activity a) {
      Hotel hotel = Repository.HotelNamed(a.Location);
      return hotels.Contains(hotel);
    }

 abstract class ActivitySpecification {
   public abstract bool isSatisfiedBy(Activity a);
 }

2種類の活動に関する仕様。1つ目は最低泊数泊まっているか。(例：stay 3 nights)

  class MinimumNightStayActivitySpec : ActivitySpecification {
    private readonly int minimumNumberOfNights;
    public MinimumNightStayActivitySpec(int numberOfNights) {
      this.minimumNumberOfNights = numberOfNights;
    }
    public override bool isSatisfiedBy(Activity a) {
      return a.Type == "stay"
        ? a.Amount >= minimumNumberOfNights
        : false ;
    }

2つ目は活動種類のチェック。(例：spa treatment)

  class TypeActivitySpec : ActivitySpecification {
    private readonly string type;
    public TypeActivitySpec(string type) {
      this.type = type;
    }
    public override bool isSatisfiedBy(Activity a) {
      return a.Type == type;
    }

報酬クラス

 class Reward {
   protected int points;
   public Reward(int points) { this.points = points; }
   virtual public int Score (Activity activity) {
     return points;  
   }
 }
 class RewardPerDay : Reward {
   public RewardPerDay(int points) : base(points) {}

   public override int Score(Activity activity) {
     if (activity.Type != "stay")
       throw new ArgumentException("can only use per day scores on stays");
     return activity.Amount * points;
   }
 }
 class RewardPerDollar : Reward {
   public RewardPerDollar(int points) : base(points) {}

   public override int Score(Activity activity) {
     return activity.Revenue * points;
   }

パーサ

基本は1行ずつ読んで処理。

class OfferScriptParser...
   readonly TextReader input;
   readonly List<Offer> result = new List<Offer>();
   public OfferScriptParser(TextReader input) {
     this.input = input;
   }
   public List<Offer> Run() {
     string line;
     while ((line = input.ReadLine()) != null) {
       line = appendContinuingLine(line);
       parseLine(line);
     }
     return result;
   }

継続行の行末文字を「&」にした場合。

class OfferScriptParser...
   private string appendContinuingLine(string line) {
     if (IsContinuingLine(line)) {
       var first = Regex.Replace(line, @"&\s*$", "");
       var next = input.ReadLine();
       if (null == next) throw new RecognitionException(line);
       return first.Trim() + " " + appendContinuingLine(next);
     } 
     else return line.Trim();
   }
   private bool IsContinuingLine(string line) {
     return Regex.IsMatch(line, @"&\s*$");
   }

コメント除去。(#)

class OfferScriptParser...
   private void parseLine(string line) {
     line = removeComment(line);
     if (IsEmpty(line)) return;
     result.Add(new OfferLineParser().Parse(line.Trim()));
   }
   private bool IsEmpty(string line) {
     return Regex.IsMatch(line, @"^\s*$");
   }
   private string removeComment(string line) {
     return Regex.Replace(line, @"#.*", "");
   }

実際のパースメソッド。正規表現を分割している。(試しに全部まとめたら死ねた)

class OfferLineParser...
   public Offer Parse(string line) {
     var result = new Offer();
     const string rewardRegexp = @"(?<reward>.*)";
     const string activityRegexp = @"(?<activity>.*)";
     const string locationRegexp = @"(?<location>.*)";

     var source = rewardRegexp + keywordToken("for") + 
       activityRegexp + keywordToken("at") + locationRegexp;

     var m = new Regex(source).Match(line);
     if (!m.Success) throw new RecognitionException(line);
     result.Reward = parseReward(m.Groups["reward"].Value);
     result.Location = parseLocation(m.Groups["location"].Value);
     result.Activity = parseActivity(m.Groups["activity"].Value);
     return result;
   }
   private String keywordToken(String keyword) {
     return @"\s+" + keyword + @"\s+";
   }

チャンク単位に分割。まずは場所の仕様から。複数の場所が指定される可能性があることに注意。

class OfferLineParser...
   private LocationSpecification parseLocation(string input) {
     if (Regex.IsMatch(input, @"\bor\b"))
       return parseMultipleHotels(input);
     else
       return new LocationSpecification(input);
   }
   private LocationSpecification parseMultipleHotels(string input) {
     String[] hotelNames = Regex.Split(input, @"\s+or\s+");
     return new LocationSpecification(hotelNames);
   }

活動の仕様。

class OfferLineParser...
   private ActivitySpecification parseActivity(string input) {
     if (input.StartsWith("stay"))
       return parseStayActivity(input);
     else return new TypeActivitySpec(input);
   }

parseStayActivity?は何泊したかをチェック。抽出できたらMinimumNightStayActivitySpec?に委譲。

class OfferLineParser...
   private ActivitySpecification parseStayActivity(string input) {
     const string stayKeyword = @"^stay\s+";
     const string nightsKeyword = @"\s+nights?$";
     const string amount = @"(?<amount>\d+)";

     const string source = stayKeyword + amount + nightsKeyword;

     var m = Regex.Match(input, source);
     if (!m.Success) throw new RecognitionException(input);
     return new MinimumNightStayActivitySpec(Int32.Parse(m.Groups["amount"].Value));
   }

最後に報奨のパース。

class OfferLineParser...
   private Reward parseReward(string input) {
     if (Regex.IsMatch(input, @"^\d+$"))
       return new Reward(Int32.Parse(input));
     else if (Regex.IsMatch(input, @"^\d+ per day$"))
       return new RewardPerDay(Int32.Parse(extractDigits(input)));
     else if (Regex.IsMatch(input, @"^\d+ per dollar$"))
       return new RewardPerDollar(Int32.Parse(extractDigits(input)));
     else throw new RecognitionException(input);
   }
   private string extractDigits(string input) {
     return Regex.Match(input, @"^\d+").Value;
   }

例：グラント嬢のコントローラを使った非自立行の解析(Java)

events
  doorClosed  D1CL
  drawOpened  D2OP
  lightOn     L1ON
  doorOpened  D1OP
  panelClosed PNCL
end
resetEvents
  doorOpened
end
commands
  unlockPanel PNUL
  lockPanel   PNLK
  lockDoor    D1LK
  unlockDoor  D1UL
end

state idle
  actions unlockDoor lockPanel
  doorClosed => active
end

state waitingForLight
  lightOn => unlockedPanel
end
state waitingForDraw
  drawOpened => unlockedPanel
end
state unlockedPanel
  actions unlockPanel lockDoor
  panelClosed => idle
end

...

ここでの問題は、ステートマシンパーサと行パーサとの間の責務の分担。(Stateパターンと同様) この例では行パーサに可能なかぎりの責務を寄せた(分散)。代替方法は、ステートマシンパーサに責務を寄せて、行パーサは単にテキストから必要情報を抽出するのみ(集中)。

分散アプローチ。

class CommandLineParser...
  void doParse() {
    if (hasOnlyWord("end")) returnToTopLevel();
    else if (words().length == 2)
      context.registerCommand(new Command(words(0), words(1)));
    else failToRecognizeLine();
  }

class LineParser...
  protected void returnToTopLevel() {
    context.setLineParser(new TopLevelLineParser(context));
  }

class StateMachineParser...
  void registerCommand(Command c) {
    commands.put(c.getName(), c);
  }
  private Map<String, Command> commands = new HashMap<String, Command>();
  Command getCommand(String word) {
    return commands.get(word);
  }

振る舞いをステートマシンパーサに寄せた集中アプローチ。

class CommandLineParser...
  void doParse() {
    if (hasOnlyWord("end"))
      context.handleEndCommand();
    else if (words().length == 2)
      context.handleCommand(words(0), words(1));
    else failToRecognizeLine();
  }

class StateMachineParser...
  void handleCommand(String name, String code) {
    Command command = new Command(name, code);
    commands.put(command.getName(), command);
  }
  public void handleEndCommand() {
    lineParser = new TopLevelLineParser(this);
  }

分散アプローチの欠点は、ステートマシンパーサがSymbol Tableとしてふるまうので行パーサは繰り返しオブジェクトからデータを抜き出す行為をしなければならない。対して集中アプローチの欠点は、ステートマシンパーサに多くのロジックが入ってしまう。大規模な言語の場合はより問題となる。どちらも一長一短ありで完璧な方法は見つかっていない。

ファウラーへのフィードバック †

なげー

担当者のつぶやき †

翻訳が不完全です。すみませんすみません＞＜
基本は正規表現ゴリゴリでいわゆる力技。まあ、取っつき易いかも。

DSL / Delimiter Directed Translation

Menu

最新の20件