349 lines
11 KiB
Plaintext
349 lines
11 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#VN.DATAYES - Welcome!\n",
|
||
"***\n",
|
||
"##1. Preface\n",
|
||
"\n",
|
||
"###1.1\n",
|
||
"vn.datayes是一个从属于vnpy的开源历史数据模块;使用通联数据API以及MongoDB进行数据的下载和存储管理。项目目前与将来主要解决\\准备解决以下问题:\n",
|
||
"\n",
|
||
"* 从通联数据等API高效地爬取、更新、清洗历史数据。\n",
|
||
"* 基于MongoDB的数据库管理、快速查询、转换输出格式;支持自定义符合需求的行情历史数据库。\n",
|
||
"* 基于Python.Matplotlib或R.ggplot2,快速绘制K线图等可视化对象。\n",
|
||
"\n",
|
||
"项目目前主要包括了通联API开发者试用方案中大部分的市场行情日线数据(股票、期货、期权、指数、基金等),以及部分基本面数据。数据下载与更新主要采用多线程设计,测试效率如下:\n",
|
||
"\n",
|
||
"| 数据集举例 | 数据集容量 | 下载时间估计 |\n",
|
||
"| :-------------: | :-------------: | :-------------: |\n",
|
||
"| 股票日线数据,2800个交易代码,2013年1月1日至2015年8月1日 | 2800个collection,约500条/each | 7-10分钟 |\n",
|
||
"| 股票分钟线数据,2个交易代码,2013年1月1日至2015年8月1日 | 2个collection,约20万条/each | 1-2分钟 |\n",
|
||
"| 股票日线数据更新任务,2800个交易代码,2015年8月1日至2015年8月15日 | 2800个collection,约10条/each | 1-2分钟 |\n",
|
||
"\n",
|
||
"vn.datayes基于MongoDB数据库,通过一个json配置文件简化数据库的初始化、设置、动态更新过程。较为精细的数据库操作仍需编写脚本进行。若对MongoDB与pymongo不熟悉,推荐使用Robomongo等窗口化查看工具作为辅助。\n",
|
||
"\n",
|
||
"###1.2 主要依赖:\n",
|
||
"pymongo, pandas, requests, json\n",
|
||
"###1.3 开发测试环境:\n",
|
||
"Mac OS X 10.10; Windows 7 || Anaconda.Python 2.7"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* * *\n",
|
||
"##2. Get Started\n",
|
||
"###2.1 准备\n",
|
||
"\n",
|
||
"* 下载并安装MongoDB: https://www.mongodb.org/downloads\n",
|
||
"* 获取API token,以通联数据为例。\n",
|
||
"\n",
|
||
"![fig1](figs/fig1.png)\n",
|
||
"\n",
|
||
"* 更新pymongo至3.0以上版本; 更新requests等包。 \n",
|
||
"```\n",
|
||
"~$ pip install pymongo --upgrade\n",
|
||
"~$ pip install requests --upgrade\n",
|
||
"```\n",
|
||
"\n",
|
||
"* [ ! 注意,本模块需要pymongo3.0新加入的部分方法,使用vnpy本体所用的2.7版本对应方法将无法正常插入数据。依赖冲突的问题会尽快被解决,目前推荐制作一个virtual environment来单独运行这个模块;或者暴力切换pymongo的版本:]\n",
|
||
"```\n",
|
||
"~$ pip install pymongo==3.0.3 # this module.\n",
|
||
"~$ pip install pymongo==2.7.2 # pymongo 2.7.\n",
|
||
"```\n",
|
||
"\n",
|
||
"* 启动MongoDB\n",
|
||
"```\n",
|
||
"~$ mongod\n",
|
||
"```\n",
|
||
"\n",
|
||
"\n",
|
||
"###2.2 数据库初始化与下载\n",
|
||
"* **api.Config** 对象包含了向API进行数据请求所需的信息,我们需要一个用户token来初始化这个对象。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {
|
||
"collapsed": false
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"{'domain': 'api.wmcloud.com/data',\n",
|
||
" 'header': {'Authorization': 'Bearer 7c2e59e212dbff90ffd6b382c7afb57bc987a99307d382b058af6748f591d723',\n",
|
||
" 'Connection': 'keep-alive'},\n",
|
||
" 'ssl': False,\n",
|
||
" 'version': 'v1'}"
|
||
]
|
||
},
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"from storage import *\n",
|
||
"\n",
|
||
"myConfig = Config(head=\"Zed's Config\", \n",
|
||
" token='7c2e59e212dbff90ffd6b382c7afb57bc987a99307d382b058af6748f591d723')\n",
|
||
"myConfig.body"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* * *\n",
|
||
"* **storage.DBConfig** 对象包含了数据库配置。我们需要自己编写一个json字典来填充这个对象。举例来说,我们希望下载股票日线数据和指数日线数据,数据库名称为DATAYES_EQUITY_D1和DATAYES_INDEX_D1,index为日期“date”。那么json字典是这样的:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {
|
||
"collapsed": false
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"{'client': MongoClient('localhost', 27017),\n",
|
||
" 'dbNames': ['EQU_M1', 'EQU_D1', 'FUT_D1', 'OPT_D1', 'FUD_D1', 'IDX_D1'],\n",
|
||
" 'dbs': {'EQU_D1': {'collNames': 'equTicker',\n",
|
||
" 'index': 'date',\n",
|
||
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_EQUITY_D1')},\n",
|
||
" 'EQU_M1': {'collNames': 'secID',\n",
|
||
" 'index': 'dateTime',\n",
|
||
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_EQUITY_M1')},\n",
|
||
" 'FUD_D1': {'collNames': 'fudTicker',\n",
|
||
" 'index': 'date',\n",
|
||
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_FUND_D1')},\n",
|
||
" 'FUT_D1': {'collNames': 'futTicker',\n",
|
||
" 'index': 'date',\n",
|
||
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_FUTURE_D1')},\n",
|
||
" 'IDX_D1': {'collNames': 'idxTicker',\n",
|
||
" 'index': 'date',\n",
|
||
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_INDEX_D1')},\n",
|
||
" 'OPT_D1': {'collNames': 'optTicker',\n",
|
||
" 'index': 'date',\n",
|
||
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_OPTION_D1')}}}"
|
||
]
|
||
},
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"client = pymongo.MongoClient() # pymongo.connection object.\n",
|
||
"\n",
|
||
"body = {\n",
|
||
" 'client': client, # connection object.\n",
|
||
" 'dbs': {\n",
|
||
" 'EQU_D1': { # in-python alias: 'EQU_D1'\n",
|
||
" 'self': client['DATAYES_EQUITY_D1'], # pymongo.database[name] object.\n",
|
||
" 'index': 'date', # index name.\n",
|
||
" 'collNames': 'equTicker' # what are collection names consist of.\n",
|
||
" },\n",
|
||
" 'IDX_D1': { # Another database\n",
|
||
" 'self': client['DATAYES_INDEX_D1'],\n",
|
||
" 'index': 'date',\n",
|
||
" 'collNames': 'idxTicker'\n",
|
||
" }\n",
|
||
" },\n",
|
||
" 'dbNames': ['EQU_D1','IDX_D1'] # List of alias.\n",
|
||
"}\n",
|
||
"\n",
|
||
"myDbConfig_ = DBConfig(body=body)\n",
|
||
"\n",
|
||
"# 这看上去有些麻烦;不想这么做的话可以直接使用DBConfig的默认构造函数。\n",
|
||
"\n",
|
||
"myDbConfig = DBConfig()\n",
|
||
"\n",
|
||
"myDbConfig.body"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* * *\n",
|
||
"* **api.PyApi**是向网络数据源进行请求的主要对象。**storage.MongodController**是进行数据库管理的对象。当我们完成了配置对象的构造,即可初始化PyApi与MongodController。**MongodController._get_coll_names()** 和**MongodController._ensure_index()** 是数据库初始化所调用的方法,为了模块开发的方便,它们暂时没有被放进构造函数中自动执行。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"metadata": {
|
||
"collapsed": false
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"[MONGOD]: Collection names gotten.\n",
|
||
"[MONGOD]: MongoDB index set.\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"1"
|
||
]
|
||
},
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"myApi = PyApi(myConfig) # construct PyApi object.\n",
|
||
"mc = MongodController(api=myApi, config=myDbConfig) # construct MongodController object, \n",
|
||
" # on the top of PyApi.\n",
|
||
"mc._get_coll_names() # get names of collections.\n",
|
||
"mc._ensure_index() # ensure collection indices."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"![fig2](figs/fig2.png)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* 使用**MongodController.download#()**方法进行下载。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"mc.download_index_D1('20150101','20150801')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"![fig3](figs/fig3.png)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"###2.3 数据库更新\n",
|
||
"* 使用**MongodController.update#()**方法进行更新。脚本会自动寻找数据库中的最后一日并更新至最新交易日。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"metadata": {
|
||
"collapsed": false
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"datetime.datetime(2015, 8, 17, 10, 49, 21, 37758)"
|
||
]
|
||
},
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"from datetime import datetime\n",
|
||
"datetime.now()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"mc.update_index_D1()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"![fig4](figs/fig4.png)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"###2.4 Mac OS或Linux下的下载与更新\n",
|
||
"模块中包含了一些shell脚本,方面在linux-like os下的数据下载、更新。\n",
|
||
"```\n",
|
||
"~$ cd path/of/vn/datayes\n",
|
||
"~$ chmod +x prepare.sh\n",
|
||
"~$ ./prepare.sh\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"![fig5](figs/fig5.png)\n",
|
||
"![fig6](figs/fig6.png)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 2",
|
||
"language": "python",
|
||
"name": "python2"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 2
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython2",
|
||
"version": "2.7.10"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 0
|
||
}
|