摘 要
本系统是基于python开发的,采用的核心技术有python web架构——Django框架,以及MySQL数据库。主要功能有信息采集、信息处理、舆情分析和舆情展示。其中网络爬虫实现信息采集和处理功能,调用新浪的API接口和Oauth2 授权机制,并将抓取到的微博内容和评论写入数据库;舆情分析,包括敏感话题识别和倾向性分析,设置敏感、反对、支持词库,通过字符串匹配来实现;舆情展示,基于Django框架,实现微博主题及相关评论的显示,敏感话题识别、倾向性分析的显示,同时用户管理界面的过滤、查询、自动排序等功能方便了用户操作。
关键词:Python; 网络舆情; 微博
Students network public opinion refers to the university network space, college students' evaluation and social attitudes toward surrounding the intermediary event occurrence, development and change. Mastering the dynamic timely and guiding students’ public opinion actively is an important way to maintain campus stability and security. Under the environment of network public, opinion information sources mainly include: news commentary, BBS, blog, microblog, etc. And with its ease of communication and richness of resource, microblog has become a mainstream of public access to information channel and a source of China's third largest Internet public opinion after news and BBS. From here we can see that analyzing the microblog has the very high authority and real-time performance. Therefore, this system is to make information collection and analysis of public opinion mainly aimed at sinaweibo.
This system is developed on basis of the python. Its core technologies include python web framework - the Django, and MySQL database. And its main functions include information collecting, information processing, public opinion analysising and public opinion showing. With information acquisition and processing functions, the web crawler can use sina API interface and Oauth2 authorization and write content and comments of microblog into the database; Public opinion analysis includes sensitive topic identification and orientation analysis and sets sensitive, opposition, support thesaurus through string matching; Public opinion shows, which is based on the Django framework, implements the microblog topics and related comments show, sensitive topic identification and orientation analysis, and other functions. At the same time, functions of the user management interface, such as filtering, querying, automatic sorting are convenient for user’s operation.
Key words: Python; The network public opinion; microblog
目 录
第一章 绪论 1
1.1 研究背景及意义 1
1.2 国内外研究现状 1
1.2.1 相关定义及概念 1
1.2.2 国外舆情分析技术现状 2
1.2.3 国内舆情分析技术现状 2
1.3 本文的组织结构 3
第二章 相关技术 4
2.1 系统开发环境及开发平台简介 4
2.2 Python简介 4
2.3 正则表达式 5
2.4 MySQL简介 6
2.4.1 MySQL-python类库 6
2.4.2 数据库配置与连接 7
2.5 Django简介 8
第三章 网络舆情检测系统总体设计 12
3.1 系统功能需求 12
3.2 系统结构设计 12
3.3 数据库设计 13
第四章 关键模块设计及技术分析 14
4.1 信息采集及处理 14
4.1.1 新浪微博认证及授权 14
4.1.2 微博API 16
4.1.3 OAuth授权与Code获取 16
4.1.4 导入相关功能 18
4.1.5 信息抓取—网络爬虫 18
4.2 舆情分析 19
4.2.1 分词技术 20
4.2.2 敏感话题识别 21
4.2.3 倾向性分析 23
4.3 舆情展示 26
4.3.1 界面演示 27
4.3.2 功能演示 30
第五章 总结与展望 34
5.1 论文总结 34
5.2 展望 34
致 谢 35
参考文献 36
附 录:Auto OAuth2.py 37
第一章 绪论
1.1 研究背景及意义