机器之心从word2vec开始,说下GPT庞大的家族系谱( 十 )
[4] Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin (2017). Attention Is All You NeedCoRR, abs/1706.03762.
[5] Zihang Dai and Zhilin Yang and Yiming Yang and Jaime G. Carbonell and Quoc V. Le and Ruslan Salakhutdinov (2019). Transformer-XL: Attentive Language Models Beyond a Fixed-Length ContextCoRR, abs/1901.02860.
[6] P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, and N. Shazeer. Generating wikipedia by summarizing long sequences. ICLR, 2018.
[7] Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.
[8] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
[9] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, & Dario Amodei. (2020). Language Models are Few-Shot Learners.
[10]Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingCoRR, abs/1810.04805.
[11]Zhilin Yang and Zihang Dai and Yiming Yang and Jaime G. Carbonell and Ruslan Salakhutdinov and Quoc V. Le (2019). XLNet: Generalized Autoregressive Pretraining for Language UnderstandingCoRR, abs/1906.08237.
[12] attention 机制及 self-attention(transformer). Accessed at: https://blog.csdn.net/Enjoy_endless/article/details/88679989
[13] Attention 机制详解(一)——Seq2Seq 中的 Attention. Accessed at: https://zhuanlan.zhihu.com/p/47063917
[14]一文看懂 Attention(本质原理 + 3 大优点 + 5 大类型.Accessed at:https://medium.com/@pkqiang49/%E4%B8%80%E6%96%87%E7%9C%8B%E6%87%82-attention-%E6%9C%AC%E8%B4%A8%E5%8E%9F%E7%90%86-3%E5%A4%A7%E4%BC%98%E7%82%B9-5%E5%A4%A7%E7%B1%BB%E5%9E%8B-e4fbe4b6d030
[15]The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning).Accessed at:http://jalammar.github.io/illustrated-bert/
[16] Yang, Zhilin, et al. "Xlnet: Generalized autoregressive pretraining for language understanding." Advances in neural information processing systems. 2019.
[17] Dai, Zihang, et al. "Transformer-xl: Attentive language models beyond a fixed-length context." arXiv preprint arXiv:1901.02860 (2019).
[18] NLP——GPT 对比 GPT-2. Accessed at: https://zhuanlan.zhihu.com/p/96791725
[19] 深度学习:前沿技术 - GPT 1 & 2. Accessed at: http://www.bdpt.net/cn/2019/10/08/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%EF%BC%9A%E5%89%8D%E6%B2%BF%E6%8A%80%E6%9C%AF-gpt-1-2/
- 狼|日本居民区摆出红眼机器“魔鬼狼”,为防熊出没
- 葡萄|到底要不要去葡萄皮?果酒机器:酿葡萄酒
- 穿越火线|CF:除2018圣诞和水晶之心,这些稀有的尼泊尔军刀期待也能送永久!
- 机器之心旷视物流,一个AI独角兽的B面
- 人工智能领军企业达观数据推出新一代RPA智能办公机器人集群
- 当人工智能遇上服务机器人 机器人被赋予了人类的“灵魂”
- 王国之心|switch游戏日报:怪猎崛起新情报!NS版王国之心有望了?
- 智能机器人和机器人对战乒乓球,你玩过吗?要不来试试?
- 海外网|为防熊出没,日本居民区摆出红眼机器“魔鬼狼”[图]
- 读芯术七个关键因素:如何选择出最佳机器学习算法?