SQL Server のグラフで最短経路と取ってみる

SQL Server のグラフを色々試していくシリーズ。

最初は最短経路です。

SQL Server のグラフには未だ最短経路をサクッと取れるものは用意されていません。
次のステップで追加される予定だそうです。
Ignite - Graph extensions in Microsoft SQL Server 2017 and Azure SQL Database
スライドの P-38 より

What’s Next?
• Future Improvements
  • Shortest Path and Arbitrary length traversals
    • Find friends 1 -5 hops away

というわけで、ゴリゴリ書いてきます。グラフと比較出来るようにグラフ無しのテーブルの例も挙げておきます。

今回の例は阪急電車の一部の駅を抜き出した路線図です。
今のところ、グラフデータの可視化が難点なのでNeo4jで可視化してます。
SSMS か Operation Studio どっちでも良いですけど、グラフっぽく見えるようになると良いですねー。

f:id:odashinsuke:20171224113942p:plain
ノードが駅でエッジが路線です。
駅には駅名を、路線には、距離と分を持っています。

create table G_駅 (
    駅名 nvarchar(10) not null
) as node
create table G_路線 (
    距離 decimal(3, 1), 
    分 int
) as edge

insert into G_駅 values 
(N'梅田'), (N'十三'), (N'西宮北口'), (N'神戸三宮'), (N'宝塚'), (N'高槻市'), (N'河原町')

declare @e1 nvarchar(max) = (select $node_id from G_駅 where 駅名 = N'梅田')
declare @e2 nvarchar(max) = (select $node_id from G_駅 where 駅名 = N'十三')
declare @e3 nvarchar(max) = (select $node_id from G_駅 where 駅名 = N'西宮北口')
declare @e4 nvarchar(max) = (select $node_id from G_駅 where 駅名 = N'神戸三宮')
declare @e5 nvarchar(max) = (select $node_id from G_駅 where 駅名 = N'宝塚')
declare @e6 nvarchar(max) = (select $node_id from G_駅 where 駅名 = N'高槻市')
declare @e7 nvarchar(max) = (select $node_id from G_駅 where 駅名 = N'河原町')

insert into G_路線 values
(@e1, @e2, 2.4, 3), 
(@e2, @e1, 2.4, 3), 
(@e2, @e3, 13.2, 2), 
(@e3, @e2, 13.2, 2),  
(@e2, @e5, 22.1, 31),
(@e5, @e2, 22.1, 31),
(@e2, @e6, 20.6, 18),
(@e6, @e2, 20.6, 18),
(@e3, @e4, 16.7, 14),
(@e4, @e3, 16.7, 14),
(@e3, @e5, 7.7, 14),
(@e5, @e3, 7.7, 14),
(@e6, @e7, 24.7, 22),
(@e7, @e6, 24.7, 22)

ついでに、通常のテーブルの方も作ってしまいましょう。

create table 路線 (
    発駅 nvarchar(10) not null, 
    着駅 nvarchar(10) not null, 
    距離 decimal(3, 1) not null,
    分 int not null
)
insert into 路線 values
(N'梅田', N'十三', 2.4, 3), 
(N'十三', N'梅田', 2.4, 3), 
(N'十三', N'西宮北口', 13.2, 2), 
(N'西宮北口', N'十三', 13.2, 2),  
(N'十三', N'宝塚', 22.1, 31),
(N'宝塚', N'十三', 22.1, 31),
(N'十三', N'高槻市', 20.6, 18),
(N'高槻市', N'十三', 20.6, 18),
(N'西宮北口', N'神戸三宮', 16.7, 14),
(N'神戸三宮', N'西宮北口', 16.7, 14),
(N'西宮北口', N'宝塚', 7.7, 14),
(N'宝塚', N'西宮北口', 7.7, 14),
(N'高槻市', N'河原町', 24.7, 22),
(N'河原町', N'高槻市', 24.7, 22)

こっちは路線に、発駅、着駅、距離、分と持たしています。

今回の例では、梅田から宝塚に行くためのルートを取得します。
この 2パターンありますが、

梅田 => 十三 => 宝塚
梅田 => 十三 => 西宮北口 => 宝塚

途中の経路が少ないのは、梅田 => 十三 => 宝塚パターン。
距離の合計では短くなるのは梅田 => 十三 => 西宮北口 => 宝塚パターンです。

最初に、経路が短いパターンを取得します。
まずはグラフを使ったテーブルから

with cte as (
  select S.駅名 as 発駅, E.駅名 as 着駅, cast(R.距離 as decimal(10, 1)) as 距離, R.分, cast(concat(S.駅名, '-', E.駅名) as nvarchar(max)) as ルート, 1 as cnt
  from G_駅 S, G_路線 R, G_駅 E
  where match (S - (R) -> E) and S.駅名 = N'梅田'
  union all
  select cte.着駅, E.駅名, cast(cte.距離 + R.距離 as decimal(10, 1)), cte.分 + R.分, cast(concat(cte.ルート, '-', E.駅名) as nvarchar(max)), cte.cnt + 1
  from cte, G_駅 S, G_路線 R, G_駅 E
  where cte.着駅 = S.駅名 and match (S - (R) -> E) and charindex(E.駅名, cte.ルート) = 0
)
select top(1)
  cte.ルート
  , cte.距離
  , cte.分
  , cte.cnt  
from
  cte
where
  cte.着駅 = N'宝塚'
order by cte.cnt

f:id:odashinsuke:20171224124435p:plain
match の部分がミソですが、再帰クエリ内で前のデータを直接 match を使えない*1ので、ひと手間余分な感じがします。
通常のテーブルも同じ感じです。

with cte as (
  select 発駅, 着駅, cast(距離 as decimal(10, 1)) as 距離, 分, cast(concat(発駅, '-', 着駅) as nvarchar(max)) as ルート, 1 as cnt from 路線 where 発駅 = N'梅田'
  union all
  select 路線.発駅, 路線.着駅, cast(cte.距離 + 路線.距離 as decimal(10, 1)), cte.分 + 路線.分, cast(concat(cte.ルート, '-', 路線.着駅) as nvarchar(max)), cte.cnt + 1
  from cte inner join 路線 on cte.着駅 = 路線.発駅 and charindex(路線.着駅, cte.ルート) = 0
)
select top (1)
  cte.ルート
  , cte.距離
  , cte.分
  , cte.cnt
from
  cte
where cte.着駅 = N'宝塚'
order by cte.cnt

では、距離が最短の経路を取る場合はどうでしょうか？
これも実は殆ど変わらずで、再帰クエリ内で距離を加算して order by + top(1) で終わりです。

with cte as (
  select S.駅名 as 発駅, E.駅名 as 着駅, cast(R.距離 as decimal(10, 1)) as 距離, R.分, cast(concat(S.駅名, '-', E.駅名) as nvarchar(max)) as ルート, 1 as cnt
  from G_駅 S, G_路線 R, G_駅 E
  where match (S - (R) -> E) and S.駅名 = N'梅田'
  union all
  select cte.着駅, E.駅名, cast(cte.距離 + R.距離 as decimal(10, 1)), cte.分 + R.分, cast(concat(cte.ルート, '-', E.駅名) as nvarchar(max)), cte.cnt + 1
  from cte, G_駅 S, G_路線 R, G_駅 E
  where cte.着駅 = S.駅名 and match (S - (R) -> E) and charindex(E.駅名, cte.ルート) = 0
)
select top(1)
  cte.ルート
  , cte.距離
  , cte.分
  , cte.cnt  
from
  cte
where
  cte.着駅 = N'宝塚'
order by cte.距離

f:id:odashinsuke:20171224124711p:plain
通常のテーブルの方もクエリ貼っときます。

with cte as (
  select 発駅, 着駅, cast(距離 as decimal(10, 1)) as 距離, 分, cast(concat(発駅, '-', 着駅) as nvarchar(max)) as ルート, 1 as cnt from 路線 where 発駅 = N'梅田'
  union all
  select 路線.発駅, 路線.着駅, cast(cte.距離 + 路線.距離 as decimal(10, 1)), cte.分 + 路線.分, cast(concat(cte.ルート, '-', 路線.着駅) as nvarchar(max)), cte.cnt + 1
  from cte inner join 路線 on cte.着駅 = 路線.発駅 and charindex(路線.着駅, cte.ルート) = 0
)
select top (1)
  cte.ルート
  , cte.距離
  , cte.分
  , cte.cnt
from
  cte
where cte.着駅 = N'宝塚'
order by cte.距離

正直あまり面白くない結果ですねー、通常のテーブルとクエリそんな変わんないし。
再帰使わないとちょっと変わるかなー？って感じです。
一応試せる環境は用意しましたので、興味ある方はクエリ書いて試してください。
TSQL Runner Q20180123

一応 Neo4j の cypher も貼っときます。
Neo4j だと "shortestPath" や、任意の数のhopを表す "*" があるのでシンプルに見えます。
Neo4j は試せる環境公開出来ないので、お手元の環境でどぞ。

データ

create (s1:L_Station {name:"梅田"}), 
  (s2:L_Station {name:"十三"}), 
  (s3:L_Station {name:"西宮北口"}), 
  (s4:L_Station {name:"神戸三宮"}), 
  (s5:L_Station {name:"宝塚"}), 
  (s6:L_Station {name:"高槻市"}), 
  (s7:L_Station {name:"河原町"}),
  (s1)-[:Line {distance:2.4, minute:3}]->(s2),
  (s2)-[:Line {distance:2.4, minute:3}]->(s1),
  (s2)-[:Line {distance:13.2, minute:9}]->(s3),
  (s3)-[:Line {distance:13.2, minute:9}]->(s2),
  (s2)-[:Line {distance:22.1, minute:31}]->(s5),
  (s5)-[:Line {distance:22.1, minute:31}]->(s2),
  (s2)-[:Line {distance:20.6, minute:18}]->(s6),
  (s6)-[:Line {distance:20.6, minute:18}]->(s2),
  (s3)-[:Line {distance:16.7, minute:14}]->(s4),
  (s4)-[:Line {distance:16.7, minute:14}]->(s3),
  (s3)-[:Line {distance:7.7, minute:14}]->(s5),
  (s5)-[:Line {distance:7.7, minute:14}]->(s3),
  (s6)-[:Line {distance:24.7, minute:22}]->(s7),
  (s7)-[:Line {distance:24.7, minute:22}]->(s6)
return s1, s2, s3, s4, s5, s6, s7

最短経路

match (start:L_Station {name:"梅田"}), 
  (end:L_Station{name:"宝塚"}), 
  path = shortestPath((start)-[:Line*]-(end))
return path

距離での最短

match(start:L_Station {name:"梅田"}), 
  (end:L_Station{name:"宝塚"}), 
  path = (start)-[:Line*]-(end) 
return path, 
  reduce(distance = 0, edge in relationships(path) | distance + edge.distance) AS totalDistance 
order by totalDistance asc 
limit 1

*1:構文上 Node テーブルの必要がある