# Graph Data Modeling

## Tip: Make the implicit explicit

  - Fill in the missing links in the graph
  - You could run this type of query once a day during a quiet period
  - On bigger graphs we'd run it in batches to avoid loading the whole database into memory


{{youtube>78r0MgH0u0w}}

{{ https://i.imgur.com/4yh0NZv.jpg }}

Common nouns => Labels

  - user => :User
  - email => :Email

Verbs that take an object => Relationships

  - sent => SENT
  - wrote => WROTE

Proper noun => Node with properties

  - Ian => ({name: 'Ian'})

## Attirbutes: Property or Relationship?

### Use Relationships When...

You need to specify the weight, strength, or some other quality of the relationship:

  - Frendship strength
  - Proficiency in a skill

### AND/OR

Attribute value comprises a complex value type:

  - Address (first line, second line, zip code, etc)

### AND/OR

Attribute values are interconnected:

  - Taxonomy of skills

### Modeling Skills as Nodes

{{ https://i.imgur.com/saTLlqh.jpg }}

## Common Graph Structures

### Rich Context, Multiple Dimensions

{{ https://i.imgur.com/KJhEgB8.jpg }}


### Trap: Verbing

  - Be as simple as possible
  - But beware verbing
    - Language habit: verb => none
      - Send an email => EMAIL
      - Search Goolge => GOOGLE

### Example: [:EMAILED] to (:Email)

{{ https://i.imgur.com/hYHlDVf.jpg }}

{{ https://i.imgur.com/VODmArB.jpg }}

### Considerations

  - An intermediate node provides flexibility
    - It allows more than two nodes to be connected in a single context
  - But it can be overkill, and will have an impact on performance

## Linked List

  - Entities are linked in a sequence
  - You need to traverse the sequence
  - You may need to identify the beginning or end (first/last, earliest/latest, etc.)
  - Examples
    - Event stream
    - Episodes of a TV series
    - Job history

### Linked List

{{ https://i.imgur.com/GH9hvAS.jpg }}

### Interleaved Linked Lists

{{ https://i.imgur.com/cJZt6pY.jpg }}

### Pointers to Head and Tail

{{ https://i.imgur.com/7VPNcqh.jpg }}

## Versioning Graphs

  - Time-based
    - Universal versioning schema
    - Discrete, continuous sequence
      - Millis since the epoch

### Seprate Structure from State

  - Sttucture
    - Identity nodes
      - Placeholders
    - Timestamped identity relationships
      - i.e. normal domain relationships

  - State
    - State nodes
      - Sanpshot of entity state
    - Timestamped state relationships


### Return Results

{{ https://i.imgur.com/jkQI7DX.jpg }}

<code>
MATCH (s:Shop{shop_id:1})-[r1:SELLS]->(p:Product)
WHERE (r1.from <= 1391558400000 AND r1.to > 1391558400000)
MATCH (p)-[r2:STATE]->(ps:ProductState)
WHERE (r2.from <= 1391558400000 AND r2.to > 1391558400000)
RETURN p.product_id AS productId,
       ps.name AS product,
       ps.price AS price
ORDER BY price DESC
</code>

### Considerations

  - Purely additive
    - No deletions
    - Store file locality for node and relationship properties
  - Creates a lot more data
    - Nodes and relationships
  - Queries will be more complex
  - Some queries will be slower
    - Because they have to search more of the graph

## Refactoring

Definition

  - Restructure graph without changing informational semantics

Reasons

  - Improve design
  - Enhance performance
  - Accommodate new functionality
  - Enable iterative and incremental development of data model

## Data Migrations

  - Execute in repeatable order
  - Backup database
  - Execute in batches
    - Unbounded results will generate large transactions and may trigger Out of Memory exceptions
  - Apply migrations to test data to ensure existing functionality doesn't break
  - Ensure application can accommodate old and new structures if performing against live data

## Extract Node From Relationship

Problem

  - You've modeled something as a relationship (with properties), but now need to connect it to more than two things

Solution

  - Extract relationship into a new node (and two new relationships)
  - Copy old relationship properties onto new node
  - Delete old relationship

<code>
MATCH (a:User)-[r:EMAILED]->(b:User)
WITH a, r, b LIMIT 2
CREATE (email:Email{content:r.content})
MERGE (a)-[:SENT]->(email)-[:TO]->(b)
DELETE r
RETURN count(r) AS numberDeleted
</code>


## Find similar groups to Neo4j

<code>
MATCH (group:Group {name:"Neo4j - London User Group"})-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup)
RETURN otherGroup.name,
       COUNT(topic) AS topicsInCommon,
       COLLECT(topic.name) as topics
ORDER BY topicsInCommon DESC, otherGroup.name
LIMIT 10
</code>

## Exclude groups I'm a member of

<code>
MATCH (group:Group {name:"Neo4j - London User Group"})-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT ((:Member {name:"Mark Needham"})-[:MEMBER_OF]->(otherGroup))
RETURN otherGroup.name,
       COUNT(topic) AS topicsInCommon,
       COLLECT(topic.name) as topics
ORDER BY topicsInCommon DESC, otherGroup.name
LIMIT 10
</code>

## What is Jonny interested in?

<code>
MATCH (m:Member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH m, topic, COUNT(*) AS times
WHERE times > 3

MERGE (m)-[:INTERESTED_IN]->(topic)
</code>

## Facts can become nodes

{{ https://i.imgur.com/9nCU8zF.jpg }}

### Refactors to facts

<code>
MATCH (member:Member)-[rel:MEMBER_OF]->(group)

MERGE (memebership:Membership {id: member.id + "_" + group.id})
SET membership.joind = rel.joined

MERGE (member)-[:HAS_MEMBERSHIP]->(membership)
MERGE (membership)-[:OF_GROUP]->(group)
</code>


<code>
MATCH (member:Member)-[:HAS_MEMBERSHIP]->(membership)

WITH member, membership ORDER BY member.id, membership.joined

WITH member, COLLECT(membership) AS memberships
UNWIND RANGE(0,SIZE(memberships) - 2) as idx

WITH memberships[idx] AS m1, memberships[idx+1] AS me
MERGE (m1)-[:NEXT]->(m2)
</code>

## Find next group people join

<code>
MATCH (group:Group {name:"Neo4j"})<-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership),
      (membership)<-[:HAS_MEMBERSHIP]-(member:Member)-[:HAS_MEMBERSHIP]->(nextMembership),
      (nextMembership)-[:OF_GROUP]->(nextGroup)
RETURN nextGroup.name COUNT(*) AS times
ORDER BY times DESC
</code>


## Docs

[[Test Driven Data Modeling]]
[[TigerGraph]]

## Refs

- https://www.youtube.com/watch?v=78r0MgH0u0w
- http://graphdatabases.com