Graph Data Modeling
Tip: Make the implicit explicit
- Fill in the missing links in the graph
- You could run this type of query once a day during a quiet period
- On bigger graphs we'd run it in batches to avoid loading the whole database into memory
Common nouns ⇒ Labels
- user ⇒ :User
- email ⇒ :Email
Verbs that take an object ⇒ Relationships
- sent ⇒ SENT
- wrote ⇒ WROTE
Proper noun ⇒ Node with properties
- Ian ⇒ ({name: 'Ian'})
Attirbutes: Property or Relationship?
Use Relationships When...
You need to specify the weight, strength, or some other quality of the relationship:
- Frendship strength
- Proficiency in a skill
AND/OR
Attribute value comprises a complex value type:
- Address (first line, second line, zip code, etc)
AND/OR
Attribute values are interconnected:
- Taxonomy of skills
Modeling Skills as Nodes
Common Graph Structures
Rich Context, Multiple Dimensions
Trap: Verbing
- Be as simple as possible
- But beware verbing
- Language habit: verb ⇒ none
- Send an email ⇒ EMAIL
- Search Goolge ⇒ GOOGLE
Example: [:EMAILED] to (:Email)
Considerations
- An intermediate node provides flexibility
- It allows more than two nodes to be connected in a single context
- But it can be overkill, and will have an impact on performance
Linked List
- Entities are linked in a sequence
- You need to traverse the sequence
- You may need to identify the beginning or end (first/last, earliest/latest, etc.)
- Examples
- Event stream
- Episodes of a TV series
- Job history
Linked List
Interleaved Linked Lists
Pointers to Head and Tail
Versioning Graphs
- Time-based
- Universal versioning schema
- Discrete, continuous sequence
- Millis since the epoch
Seprate Structure from State
- Sttucture
- Identity nodes
- Placeholders
- Timestamped identity relationships
- i.e. normal domain relationships
- State
- State nodes
- Sanpshot of entity state
- Timestamped state relationships
Return Results
MATCH (s:Shop{shop_id:1})-[r1:SELLS]->(p:Product) WHERE (r1.from <= 1391558400000 AND r1.to > 1391558400000) MATCH (p)-[r2:STATE]->(ps:ProductState) WHERE (r2.from <= 1391558400000 AND r2.to > 1391558400000) RETURN p.product_id AS productId, ps.name AS product, ps.price AS price ORDER BY price DESC
Considerations
- Purely additive
- No deletions
- Store file locality for node and relationship properties
- Creates a lot more data
- Nodes and relationships
- Queries will be more complex
- Some queries will be slower
- Because they have to search more of the graph
Refactoring
Definition
- Restructure graph without changing informational semantics
Reasons
- Improve design
- Enhance performance
- Accommodate new functionality
- Enable iterative and incremental development of data model
Data Migrations
- Execute in repeatable order
- Backup database
- Execute in batches
- Unbounded results will generate large transactions and may trigger Out of Memory exceptions
- Apply migrations to test data to ensure existing functionality doesn't break
- Ensure application can accommodate old and new structures if performing against live data
Extract Node From Relationship
Problem
- You've modeled something as a relationship (with properties), but now need to connect it to more than two things
Solution
- Extract relationship into a new node (and two new relationships)
- Copy old relationship properties onto new node
- Delete old relationship
MATCH (a:User)-[r:EMAILED]->(b:User) WITH a, r, b LIMIT 2 CREATE (email:Email{content:r.content}) MERGE (a)-[:SENT]->(email)-[:TO]->(b) DELETE r RETURN count(r) AS numberDeleted
Find similar groups to Neo4j
MATCH (group:Group {name:"Neo4j - London User Group"})-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup) RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, COLLECT(topic.name) as topics ORDER BY topicsInCommon DESC, otherGroup.name LIMIT 10
Exclude groups I'm a member of
MATCH (group:Group {name:"Neo4j - London User Group"})-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT ((:Member {name:"Mark Needham"})-[:MEMBER_OF]->(otherGroup)) RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, COLLECT(topic.name) as topics ORDER BY topicsInCommon DESC, otherGroup.name LIMIT 10
What is Jonny interested in?
MATCH (m:Member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic) WITH m, topic, COUNT(*) AS times WHERE times > 3 MERGE (m)-[:INTERESTED_IN]->(topic)
Facts can become nodes
Refactors to facts
MATCH (member:Member)-[rel:MEMBER_OF]->(group) MERGE (memebership:Membership {id: member.id + "_" + group.id}) SET membership.joind = rel.joined MERGE (member)-[:HAS_MEMBERSHIP]->(membership) MERGE (membership)-[:OF_GROUP]->(group)
MATCH (member:Member)-[:HAS_MEMBERSHIP]->(membership) WITH member, membership ORDER BY member.id, membership.joined WITH member, COLLECT(membership) AS memberships UNWIND RANGE(0,SIZE(memberships) - 2) as idx WITH memberships[idx] AS m1, memberships[idx+1] AS me MERGE (m1)-[:NEXT]->(m2)
Find next group people join
MATCH (group:Group {name:"Neo4j"})<-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership), (membership)<-[:HAS_MEMBERSHIP]-(member:Member)-[:HAS_MEMBERSHIP]->(nextMembership), (nextMembership)-[:OF_GROUP]->(nextGroup) RETURN nextGroup.name COUNT(*) AS times ORDER BY times DESC
Docs
Refs
관련 문서
Plugin Backlinks: 아무 것도 없습니다.