# Graph Data Modeling ## Tip: Make the implicit explicit - Fill in the missing links in the graph - You could run this type of query once a day during a quiet period - On bigger graphs we'd run it in batches to avoid loading the whole database into memory {{youtube>78r0MgH0u0w}} {{ https://i.imgur.com/4yh0NZv.jpg }} Common nouns => Labels - user => :User - email => :Email Verbs that take an object => Relationships - sent => SENT - wrote => WROTE Proper noun => Node with properties - Ian => ({name: 'Ian'}) ## Attirbutes: Property or Relationship? ### Use Relationships When... You need to specify the weight, strength, or some other quality of the relationship: - Frendship strength - Proficiency in a skill ### AND/OR Attribute value comprises a complex value type: - Address (first line, second line, zip code, etc) ### AND/OR Attribute values are interconnected: - Taxonomy of skills ### Modeling Skills as Nodes {{ https://i.imgur.com/saTLlqh.jpg }} ## Common Graph Structures ### Rich Context, Multiple Dimensions {{ https://i.imgur.com/KJhEgB8.jpg }} ### Trap: Verbing - Be as simple as possible - But beware verbing - Language habit: verb => none - Send an email => EMAIL - Search Goolge => GOOGLE ### Example: [:EMAILED] to (:Email) {{ https://i.imgur.com/hYHlDVf.jpg }} {{ https://i.imgur.com/VODmArB.jpg }} ### Considerations - An intermediate node provides flexibility - It allows more than two nodes to be connected in a single context - But it can be overkill, and will have an impact on performance ## Linked List - Entities are linked in a sequence - You need to traverse the sequence - You may need to identify the beginning or end (first/last, earliest/latest, etc.) - Examples - Event stream - Episodes of a TV series - Job history ### Linked List {{ https://i.imgur.com/GH9hvAS.jpg }} ### Interleaved Linked Lists {{ https://i.imgur.com/cJZt6pY.jpg }} ### Pointers to Head and Tail {{ https://i.imgur.com/7VPNcqh.jpg }} ## Versioning Graphs - Time-based - Universal versioning schema - Discrete, continuous sequence - Millis since the epoch ### Seprate Structure from State - Sttucture - Identity nodes - Placeholders - Timestamped identity relationships - i.e. normal domain relationships - State - State nodes - Sanpshot of entity state - Timestamped state relationships ### Return Results {{ https://i.imgur.com/jkQI7DX.jpg }} MATCH (s:Shop{shop_id:1})-[r1:SELLS]->(p:Product) WHERE (r1.from <= 1391558400000 AND r1.to > 1391558400000) MATCH (p)-[r2:STATE]->(ps:ProductState) WHERE (r2.from <= 1391558400000 AND r2.to > 1391558400000) RETURN p.product_id AS productId, ps.name AS product, ps.price AS price ORDER BY price DESC ### Considerations - Purely additive - No deletions - Store file locality for node and relationship properties - Creates a lot more data - Nodes and relationships - Queries will be more complex - Some queries will be slower - Because they have to search more of the graph ## Refactoring Definition - Restructure graph without changing informational semantics Reasons - Improve design - Enhance performance - Accommodate new functionality - Enable iterative and incremental development of data model ## Data Migrations - Execute in repeatable order - Backup database - Execute in batches - Unbounded results will generate large transactions and may trigger Out of Memory exceptions - Apply migrations to test data to ensure existing functionality doesn't break - Ensure application can accommodate old and new structures if performing against live data ## Extract Node From Relationship Problem - You've modeled something as a relationship (with properties), but now need to connect it to more than two things Solution - Extract relationship into a new node (and two new relationships) - Copy old relationship properties onto new node - Delete old relationship MATCH (a:User)-[r:EMAILED]->(b:User) WITH a, r, b LIMIT 2 CREATE (email:Email{content:r.content}) MERGE (a)-[:SENT]->(email)-[:TO]->(b) DELETE r RETURN count(r) AS numberDeleted ## Find similar groups to Neo4j MATCH (group:Group {name:"Neo4j - London User Group"})-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup) RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, COLLECT(topic.name) as topics ORDER BY topicsInCommon DESC, otherGroup.name LIMIT 10 ## Exclude groups I'm a member of MATCH (group:Group {name:"Neo4j - London User Group"})-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT ((:Member {name:"Mark Needham"})-[:MEMBER_OF]->(otherGroup)) RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, COLLECT(topic.name) as topics ORDER BY topicsInCommon DESC, otherGroup.name LIMIT 10 ## What is Jonny interested in? MATCH (m:Member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic) WITH m, topic, COUNT(*) AS times WHERE times > 3 MERGE (m)-[:INTERESTED_IN]->(topic) ## Facts can become nodes {{ https://i.imgur.com/9nCU8zF.jpg }} ### Refactors to facts MATCH (member:Member)-[rel:MEMBER_OF]->(group) MERGE (memebership:Membership {id: member.id + "_" + group.id}) SET membership.joind = rel.joined MERGE (member)-[:HAS_MEMBERSHIP]->(membership) MERGE (membership)-[:OF_GROUP]->(group) MATCH (member:Member)-[:HAS_MEMBERSHIP]->(membership) WITH member, membership ORDER BY member.id, membership.joined WITH member, COLLECT(membership) AS memberships UNWIND RANGE(0,SIZE(memberships) - 2) as idx WITH memberships[idx] AS m1, memberships[idx+1] AS me MERGE (m1)-[:NEXT]->(m2) ## Find next group people join MATCH (group:Group {name:"Neo4j"})<-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership), (membership)<-[:HAS_MEMBERSHIP]-(member:Member)-[:HAS_MEMBERSHIP]->(nextMembership), (nextMembership)-[:OF_GROUP]->(nextGroup) RETURN nextGroup.name COUNT(*) AS times ORDER BY times DESC ## Docs [[Test Driven Data Modeling]] [[TigerGraph]] ## Refs - https://www.youtube.com/watch?v=78r0MgH0u0w - http://graphdatabases.com