OPEN

좋아할 만한 = 통계적으로 유의미한 아이템

Chi-Squared (예상보다 많이 본) - Categorical
Cross-Entropy(KL-divergence) (분포가 많이 다른) - Continuous

$$ X^{2} = \sum^{k}_{i=1}\frac{(x_i - m_i)^2}{m_i} \\ x = 관측치\\ m = 예측치 $$

Categorical Variable

유저가 소비한 아이템의 예측치와 실제로 소비한 관측치의 차이를 이용

예상보다 많이 본 것이므로, 절대값보다는 상대적인 변화량에 주목

Co-occurrence 동시에 발생하는 이벤트에 주목하라

PMI (Pointwise Mutual Information)
$$ PMI(A, B) = \frac{P(A,B)}{P(A)P(B)} $$
함께 발생한 빈도와 함께 각 이벤트가 발생할 확률을 함께 고려한 정보량

유저의 문서 소비 패턴을 보고 추천

추천의 만족도

Language & Tool 통합
데이터 분석 및 추천 모델링에 사용하는 언어와 도구를 실서비스 투입이 가능하도록 통합

Airflow (schedular)
Hive (data warehouse)
Spark (big data processing engine)
Hadoop YARN (distributed cluster resource manager)
Slider (an application to deploy existing distributed applications on Yarn cluster)
OpenTSDB/Grafana (a scalabe, distributed monitoring system)
DOT (distributed incremental search engine)
DDK/Cana (event-driven near-realtime serverless compute solution)
C3 (PassS, Hadoop cluster)
Cuve (PassS, HBase, Kafka)
nBase-ARC (PaaS)