vnpy/api/vn.datayes/static/tutorial.ipynb

349 lines
11 KiB
Plaintext
Raw Normal View History

2016-07-02 03:12:44 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#VN.DATAYES - Welcome!\n",
"***\n",
"##1. Preface\n",
"\n",
"###1.1\n",
"vn.datayes是一个从属于vnpy的开源历史数据模块使用通联数据API以及MongoDB进行数据的下载和存储管理。项目目前与将来主要解决\\准备解决以下问题:\n",
"\n",
"* 从通联数据等API高效地爬取、更新、清洗历史数据。\n",
"* 基于MongoDB的数据库管理、快速查询、转换输出格式支持自定义符合需求的行情历史数据库。\n",
"* 基于Python.Matplotlib或R.ggplot2快速绘制K线图等可视化对象。\n",
"\n",
"项目目前主要包括了通联API开发者试用方案中大部分的市场行情日线数据股票、期货、期权、指数、基金等以及部分基本面数据。数据下载与更新主要采用多线程设计测试效率如下\n",
"\n",
"| 数据集举例 | 数据集容量 | 下载时间估计 |\n",
"| :-------------: | :-------------: | :-------------: |\n",
"| 股票日线数据2800个交易代码2013年1月1日至2015年8月1日 | 2800个collection约500条/each | 7-10分钟 |\n",
"| 股票分钟线数据2个交易代码2013年1月1日至2015年8月1日 | 2个collection约20万条/each | 1-2分钟 |\n",
"| 股票日线数据更新任务2800个交易代码2015年8月1日至2015年8月15日 | 2800个collection约10条/each | 1-2分钟 |\n",
"\n",
"vn.datayes基于MongoDB数据库通过一个json配置文件简化数据库的初始化、设置、动态更新过程。较为精细的数据库操作仍需编写脚本进行。若对MongoDB与pymongo不熟悉推荐使用Robomongo等窗口化查看工具作为辅助。\n",
"\n",
"###1.2 主要依赖:\n",
"pymongo, pandas, requests, json\n",
"###1.3 开发测试环境:\n",
"Mac OS X 10.10; Windows 7 || Anaconda.Python 2.7"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* * *\n",
"##2. Get Started\n",
"###2.1 准备\n",
"\n",
"* 下载并安装MongoDB: https://www.mongodb.org/downloads\n",
"* 获取API token以通联数据为例。\n",
"\n",
"![fig1](figs/fig1.png)\n",
"\n",
"* 更新pymongo至3.0以上版本; 更新requests等包。 \n",
"```\n",
"~$ pip install pymongo --upgrade\n",
"~$ pip install requests --upgrade\n",
"```\n",
"\n",
"* [ ! 注意本模块需要pymongo3.0新加入的部分方法使用vnpy本体所用的2.7版本对应方法将无法正常插入数据。依赖冲突的问题会尽快被解决目前推荐制作一个virtual environment来单独运行这个模块或者暴力切换pymongo的版本]\n",
"```\n",
"~$ pip install pymongo==3.0.3 # this module.\n",
"~$ pip install pymongo==2.7.2 # pymongo 2.7.\n",
"```\n",
"\n",
"* 启动MongoDB\n",
"```\n",
"~$ mongod\n",
"```\n",
"\n",
"\n",
"###2.2 数据库初始化与下载\n",
"* **api.Config** 对象包含了向API进行数据请求所需的信息我们需要一个用户token来初始化这个对象。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'domain': 'api.wmcloud.com/data',\n",
" 'header': {'Authorization': 'Bearer 7c2e59e212dbff90ffd6b382c7afb57bc987a99307d382b058af6748f591d723',\n",
" 'Connection': 'keep-alive'},\n",
" 'ssl': False,\n",
" 'version': 'v1'}"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from storage import *\n",
"\n",
"myConfig = Config(head=\"Zed's Config\", \n",
" token='7c2e59e212dbff90ffd6b382c7afb57bc987a99307d382b058af6748f591d723')\n",
"myConfig.body"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* * *\n",
"* **storage.DBConfig** 对象包含了数据库配置。我们需要自己编写一个json字典来填充这个对象。举例来说我们希望下载股票日线数据和指数日线数据数据库名称为DATAYES_EQUITY_D1和DATAYES_INDEX_D1index为日期“date”。那么json字典是这样的"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'client': MongoClient('localhost', 27017),\n",
" 'dbNames': ['EQU_M1', 'EQU_D1', 'FUT_D1', 'OPT_D1', 'FUD_D1', 'IDX_D1'],\n",
" 'dbs': {'EQU_D1': {'collNames': 'equTicker',\n",
" 'index': 'date',\n",
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_EQUITY_D1')},\n",
" 'EQU_M1': {'collNames': 'secID',\n",
" 'index': 'dateTime',\n",
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_EQUITY_M1')},\n",
" 'FUD_D1': {'collNames': 'fudTicker',\n",
" 'index': 'date',\n",
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_FUND_D1')},\n",
" 'FUT_D1': {'collNames': 'futTicker',\n",
" 'index': 'date',\n",
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_FUTURE_D1')},\n",
" 'IDX_D1': {'collNames': 'idxTicker',\n",
" 'index': 'date',\n",
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_INDEX_D1')},\n",
" 'OPT_D1': {'collNames': 'optTicker',\n",
" 'index': 'date',\n",
" 'self': Database(MongoClient('localhost', 27017), u'DATAYES_OPTION_D1')}}}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client = pymongo.MongoClient() # pymongo.connection object.\n",
"\n",
"body = {\n",
" 'client': client, # connection object.\n",
" 'dbs': {\n",
" 'EQU_D1': { # in-python alias: 'EQU_D1'\n",
" 'self': client['DATAYES_EQUITY_D1'], # pymongo.database[name] object.\n",
" 'index': 'date', # index name.\n",
" 'collNames': 'equTicker' # what are collection names consist of.\n",
" },\n",
" 'IDX_D1': { # Another database\n",
" 'self': client['DATAYES_INDEX_D1'],\n",
" 'index': 'date',\n",
" 'collNames': 'idxTicker'\n",
" }\n",
" },\n",
" 'dbNames': ['EQU_D1','IDX_D1'] # List of alias.\n",
"}\n",
"\n",
"myDbConfig_ = DBConfig(body=body)\n",
"\n",
"# 这看上去有些麻烦不想这么做的话可以直接使用DBConfig的默认构造函数。\n",
"\n",
"myDbConfig = DBConfig()\n",
"\n",
"myDbConfig.body"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* * *\n",
"* **api.PyApi**是向网络数据源进行请求的主要对象。**storage.MongodController**是进行数据库管理的对象。当我们完成了配置对象的构造即可初始化PyApi与MongodController。**MongodController._get_coll_names()** 和**MongodController._ensure_index()** 是数据库初始化所调用的方法,为了模块开发的方便,它们暂时没有被放进构造函数中自动执行。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[MONGOD]: Collection names gotten.\n",
"[MONGOD]: MongoDB index set.\n"
]
},
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myApi = PyApi(myConfig) # construct PyApi object.\n",
"mc = MongodController(api=myApi, config=myDbConfig) # construct MongodController object, \n",
" # on the top of PyApi.\n",
"mc._get_coll_names() # get names of collections.\n",
"mc._ensure_index() # ensure collection indices."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![fig2](figs/fig2.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* 使用**MongodController.download#()**方法进行下载。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mc.download_index_D1('20150101','20150801')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![fig3](figs/fig3.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###2.3 数据库更新\n",
"* 使用**MongodController.update#()**方法进行更新。脚本会自动寻找数据库中的最后一日并更新至最新交易日。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"datetime.datetime(2015, 8, 17, 10, 49, 21, 37758)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from datetime import datetime\n",
"datetime.now()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mc.update_index_D1()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![fig4](figs/fig4.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###2.4 Mac OS或Linux下的下载与更新\n",
"模块中包含了一些shell脚本方面在linux-like os下的数据下载、更新。\n",
"```\n",
"~$ cd path/of/vn/datayes\n",
"~$ chmod +x prepare.sh\n",
"~$ ./prepare.sh\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![fig5](figs/fig5.png)\n",
"![fig6](figs/fig6.png)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}