H好菇凉666用万字长文聊一聊 Embedding 技术( 十 )

五、总结 针对当前热门的embedding技术 , 本文系统的总结了能处理各类型数据的embedding方法 , 如传统基于矩阵分解的方法(如SVD分解)、处理文本的embedding方法(如Word2vec、FastText等)以及处理图数据的embedding方法(如DeepWalk、GraphSAGE等) 。 在推荐系统中 , 针对于不同数据类型 , 可以灵活采用上述方法来实现对数据的抽象表示 。 如可以基于用户行为 , 构造item列表 , 采用基于文本的方法对item进行向量化;也可以通过构建user和item关系图 , 采用基于图的方法来对user和item进行向量化 。 在实际过程中 , 不同的向量化方法得到的embedding结果也会有较大差异 , 需要根据具体业务需求来选择相应的算法 。 如要挖掘用户与用户的同质性 , 可以尝试采用Node2vec;此外 , 如果需要结合物品或Item的side-info , 可以考虑GraphSAGE算法来对图中节点进行embedding 。 跟深度学习炼丹术一样 , 要熟练掌握各类embedding技术 , 需要根据具体应用场景不断试错积累经验 。 最后 , 要司庆了 , 祝我们“炼丹人”能快乐搬砖!
参考文献 Simon Funk. Netflix Update: Try This at Home. http://www.sifter.org/~simon/journal/20061211.html. 2006
Koren, Yehuda. "Factorization meets the neighborhood: a multifaceted collaborative filtering model." Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008
Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
Pennington, Jeffrey, et al. "Glove: Global vectors for word representation." Conference on empirical methods in natural language processin. 2014.
Bojanowski, Piotr, et al. "Enriching word vectors with subword information." Transactions of the Association for Computational Linguistics 5 (2017): 135-146.
Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018). Radford, Alec, et al. "Improving language understanding by generative pre-training." (2018): 12.
Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
Perozzi, Bryan, et al. "Deepwalk: Online learning of social representations." ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.
Grover, Aditya, et al. "node2vec: Scalable feature learning for networks." Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.
Dong, Yuxiao, et al. "metapath2vec: Scalable representation learning for heterogeneous networks." Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 2017.
Wilson L Taylor. 1953. cloze procedure: A new tool for measuring readability. Journalism Bulletin, 30(4):415–433.
Hammond, David K., Pierre Vandergheynst, and Rémi Gribonval. "Wavelets on graphs via spectral graph theory." Applied and Computational Harmonic Analysis 30.2 (2011): 129-150.
Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016).