# Graph Data Modeling
## Tip: Make the implicit explicit
- Fill in the missing links in the graph
- You could run this type of query once a day during a quiet period
- On bigger graphs we'd run it in batches to avoid loading the whole database into memory
{{youtube>78r0MgH0u0w}}
{{ https://i.imgur.com/4yh0NZv.jpg }}
Common nouns => Labels
- user => :User
- email => :Email
Verbs that take an object => Relationships
- sent => SENT
- wrote => WROTE
Proper noun => Node with properties
- Ian => ({name: 'Ian'})
## Attirbutes: Property or Relationship?
### Use Relationships When...
You need to specify the weight, strength, or some other quality of the relationship:
- Frendship strength
- Proficiency in a skill
### AND/OR
Attribute value comprises a complex value type:
- Address (first line, second line, zip code, etc)
### AND/OR
Attribute values are interconnected:
- Taxonomy of skills
### Modeling Skills as Nodes
{{ https://i.imgur.com/saTLlqh.jpg }}
## Common Graph Structures
### Rich Context, Multiple Dimensions
{{ https://i.imgur.com/KJhEgB8.jpg }}
### Trap: Verbing
- Be as simple as possible
- But beware verbing
- Language habit: verb => none
- Send an email => EMAIL
- Search Goolge => GOOGLE
### Example: [:EMAILED] to (:Email)
{{ https://i.imgur.com/hYHlDVf.jpg }}
{{ https://i.imgur.com/VODmArB.jpg }}
### Considerations
- An intermediate node provides flexibility
- It allows more than two nodes to be connected in a single context
- But it can be overkill, and will have an impact on performance
## Linked List
- Entities are linked in a sequence
- You need to traverse the sequence
- You may need to identify the beginning or end (first/last, earliest/latest, etc.)
- Examples
- Event stream
- Episodes of a TV series
- Job history
### Linked List
{{ https://i.imgur.com/GH9hvAS.jpg }}
### Interleaved Linked Lists
{{ https://i.imgur.com/cJZt6pY.jpg }}
### Pointers to Head and Tail
{{ https://i.imgur.com/7VPNcqh.jpg }}
## Versioning Graphs
- Time-based
- Universal versioning schema
- Discrete, continuous sequence
- Millis since the epoch
### Seprate Structure from State
- Sttucture
- Identity nodes
- Placeholders
- Timestamped identity relationships
- i.e. normal domain relationships
- State
- State nodes
- Sanpshot of entity state
- Timestamped state relationships
### Return Results
{{ https://i.imgur.com/jkQI7DX.jpg }}
MATCH (s:Shop{shop_id:1})-[r1:SELLS]->(p:Product)
WHERE (r1.from <= 1391558400000 AND r1.to > 1391558400000)
MATCH (p)-[r2:STATE]->(ps:ProductState)
WHERE (r2.from <= 1391558400000 AND r2.to > 1391558400000)
RETURN p.product_id AS productId,
ps.name AS product,
ps.price AS price
ORDER BY price DESC
### Considerations
- Purely additive
- No deletions
- Store file locality for node and relationship properties
- Creates a lot more data
- Nodes and relationships
- Queries will be more complex
- Some queries will be slower
- Because they have to search more of the graph
## Refactoring
Definition
- Restructure graph without changing informational semantics
Reasons
- Improve design
- Enhance performance
- Accommodate new functionality
- Enable iterative and incremental development of data model
## Data Migrations
- Execute in repeatable order
- Backup database
- Execute in batches
- Unbounded results will generate large transactions and may trigger Out of Memory exceptions
- Apply migrations to test data to ensure existing functionality doesn't break
- Ensure application can accommodate old and new structures if performing against live data
## Extract Node From Relationship
Problem
- You've modeled something as a relationship (with properties), but now need to connect it to more than two things
Solution
- Extract relationship into a new node (and two new relationships)
- Copy old relationship properties onto new node
- Delete old relationship
MATCH (a:User)-[r:EMAILED]->(b:User)
WITH a, r, b LIMIT 2
CREATE (email:Email{content:r.content})
MERGE (a)-[:SENT]->(email)-[:TO]->(b)
DELETE r
RETURN count(r) AS numberDeleted
## Find similar groups to Neo4j
MATCH (group:Group {name:"Neo4j - London User Group"})-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup)
RETURN otherGroup.name,
COUNT(topic) AS topicsInCommon,
COLLECT(topic.name) as topics
ORDER BY topicsInCommon DESC, otherGroup.name
LIMIT 10
## Exclude groups I'm a member of
MATCH (group:Group {name:"Neo4j - London User Group"})-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT ((:Member {name:"Mark Needham"})-[:MEMBER_OF]->(otherGroup))
RETURN otherGroup.name,
COUNT(topic) AS topicsInCommon,
COLLECT(topic.name) as topics
ORDER BY topicsInCommon DESC, otherGroup.name
LIMIT 10
## What is Jonny interested in?
MATCH (m:Member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH m, topic, COUNT(*) AS times
WHERE times > 3
MERGE (m)-[:INTERESTED_IN]->(topic)
## Facts can become nodes
{{ https://i.imgur.com/9nCU8zF.jpg }}
### Refactors to facts
MATCH (member:Member)-[rel:MEMBER_OF]->(group)
MERGE (memebership:Membership {id: member.id + "_" + group.id})
SET membership.joind = rel.joined
MERGE (member)-[:HAS_MEMBERSHIP]->(membership)
MERGE (membership)-[:OF_GROUP]->(group)
MATCH (member:Member)-[:HAS_MEMBERSHIP]->(membership)
WITH member, membership ORDER BY member.id, membership.joined
WITH member, COLLECT(membership) AS memberships
UNWIND RANGE(0,SIZE(memberships) - 2) as idx
WITH memberships[idx] AS m1, memberships[idx+1] AS me
MERGE (m1)-[:NEXT]->(m2)
## Find next group people join
MATCH (group:Group {name:"Neo4j"})<-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership),
(membership)<-[:HAS_MEMBERSHIP]-(member:Member)-[:HAS_MEMBERSHIP]->(nextMembership),
(nextMembership)-[:OF_GROUP]->(nextGroup)
RETURN nextGroup.name COUNT(*) AS times
ORDER BY times DESC
## Docs
[[Test Driven Data Modeling]]
[[TigerGraph]]
## Refs
- https://www.youtube.com/watch?v=78r0MgH0u0w
- http://graphdatabases.com