open:graph-data-modeling

Graph Data Modeling

  • Fill in the missing links in the graph
  • You could run this type of query once a day during a quiet period
  • On bigger graphs we'd run it in batches to avoid loading the whole database into memory

4yh0nzv.jpg

Common nouns ⇒ Labels

  • user ⇒ :User
  • email ⇒ :Email

Verbs that take an object ⇒ Relationships

  • sent ⇒ SENT
  • wrote ⇒ WROTE

Proper noun ⇒ Node with properties

  • Ian ⇒ ({name: 'Ian'})

You need to specify the weight, strength, or some other quality of the relationship:

  • Frendship strength
  • Proficiency in a skill

Attribute value comprises a complex value type:

  • Address (first line, second line, zip code, etc)

Attribute values are interconnected:

  • Taxonomy of skills

satllqh.jpg

kjhegb8.jpg

  • Be as simple as possible
  • But beware verbing
    • Language habit: verb ⇒ none
      • Send an email ⇒ EMAIL
      • Search Goolge ⇒ GOOGLE

hyhldvf.jpg

vodmarb.jpg

  • An intermediate node provides flexibility
    • It allows more than two nodes to be connected in a single context
  • But it can be overkill, and will have an impact on performance
  • Entities are linked in a sequence
  • You need to traverse the sequence
  • You may need to identify the beginning or end (first/last, earliest/latest, etc.)
  • Examples
    • Event stream
    • Episodes of a TV series
    • Job history

gh9hvas.jpg

cjzt6py.jpg

7vpncqh.jpg

  • Time-based
    • Universal versioning schema
    • Discrete, continuous sequence
      • Millis since the epoch
  • Sttucture
    • Identity nodes
      • Placeholders
    • Timestamped identity relationships
      • i.e. normal domain relationships
  • State
    • State nodes
      • Sanpshot of entity state
    • Timestamped state relationships

jkqi7dx.jpg

MATCH (s:Shop{shop_id:1})-[r1:SELLS]->(p:Product)
WHERE (r1.from <= 1391558400000 AND r1.to > 1391558400000)
MATCH (p)-[r2:STATE]->(ps:ProductState)
WHERE (r2.from <= 1391558400000 AND r2.to > 1391558400000)
RETURN p.product_id AS productId,
       ps.name AS product,
       ps.price AS price
ORDER BY price DESC

  • Purely additive
    • No deletions
    • Store file locality for node and relationship properties
  • Creates a lot more data
    • Nodes and relationships
  • Queries will be more complex
  • Some queries will be slower
    • Because they have to search more of the graph

Definition

  • Restructure graph without changing informational semantics

Reasons

  • Improve design
  • Enhance performance
  • Accommodate new functionality
  • Enable iterative and incremental development of data model
  • Execute in repeatable order
  • Backup database
  • Execute in batches
    • Unbounded results will generate large transactions and may trigger Out of Memory exceptions
  • Apply migrations to test data to ensure existing functionality doesn't break
  • Ensure application can accommodate old and new structures if performing against live data

Problem

  • You've modeled something as a relationship (with properties), but now need to connect it to more than two things

Solution

  • Extract relationship into a new node (and two new relationships)
  • Copy old relationship properties onto new node
  • Delete old relationship

MATCH (a:User)-[r:EMAILED]->(b:User)
WITH a, r, b LIMIT 2
CREATE (email:Email{content:r.content})
MERGE (a)-[:SENT]->(email)-[:TO]->(b)
DELETE r
RETURN count(r) AS numberDeleted

MATCH (group:Group {name:"Neo4j - London User Group"})-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup)
RETURN otherGroup.name,
       COUNT(topic) AS topicsInCommon,
       COLLECT(topic.name) as topics
ORDER BY topicsInCommon DESC, otherGroup.name
LIMIT 10

MATCH (group:Group {name:"Neo4j - London User Group"})-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT ((:Member {name:"Mark Needham"})-[:MEMBER_OF]->(otherGroup))
RETURN otherGroup.name,
       COUNT(topic) AS topicsInCommon,
       COLLECT(topic.name) as topics
ORDER BY topicsInCommon DESC, otherGroup.name
LIMIT 10

MATCH (m:Member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH m, topic, COUNT(*) AS times
WHERE times > 3

MERGE (m)-[:INTERESTED_IN]->(topic)

9ncu8zf.jpg

MATCH (member:Member)-[rel:MEMBER_OF]->(group)

MERGE (memebership:Membership {id: member.id + "_" + group.id})
SET membership.joind = rel.joined

MERGE (member)-[:HAS_MEMBERSHIP]->(membership)
MERGE (membership)-[:OF_GROUP]->(group)

MATCH (member:Member)-[:HAS_MEMBERSHIP]->(membership)

WITH member, membership ORDER BY member.id, membership.joined

WITH member, COLLECT(membership) AS memberships
UNWIND RANGE(0,SIZE(memberships) - 2) as idx

WITH memberships[idx] AS m1, memberships[idx+1] AS me
MERGE (m1)-[:NEXT]->(m2)

MATCH (group:Group {name:"Neo4j"})<-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership),
      (membership)<-[:HAS_MEMBERSHIP]-(member:Member)-[:HAS_MEMBERSHIP]->(nextMembership),
      (nextMembership)-[:OF_GROUP]->(nextGroup)
RETURN nextGroup.name COUNT(*) AS times
ORDER BY times DESC

  • open/graph-data-modeling.txt
  • 마지막으로 수정됨: 2021/07/07 09:38
  • 저자 127.0.0.1