| 
									
										
										
										
											2018-01-25 17:01:29 +08:00
										 |  |  |  | # DataX
 | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、SQL Server、Oracle、PostgreSQL、HDFS、Hive、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。 | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | # Features
 | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。 | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | # System Requirements
 | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | - Linux | 
					
						
							| 
									
										
										
										
											2018-06-07 16:30:06 +08:00
										 |  |  |  | - [JDK(1.8以上,推荐1.8) ](http://www.oracle.com/technetwork/cn/java/javase/downloads/index.html)  | 
					
						
							| 
									
										
										
										
											2021-11-19 16:38:55 +08:00
										 |  |  |  | - [Python(2或3都可以) ](https://www.python.org/downloads/) | 
					
						
							| 
									
										
										
										
											2018-01-25 17:01:29 +08:00
										 |  |  |  | - [Apache Maven 3.x](https://maven.apache.org/download.cgi) (Compile DataX) | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | # Quick Start
 | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | * 工具部署 | 
					
						
							|  |  |  |  |    | 
					
						
							| 
									
										
										
										
											2022-10-26 15:22:40 +08:00
										 |  |  |  |   * 方法一、直接下载DataX工具包:[DataX下载地址](https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202210/datax.tar.gz) | 
					
						
							| 
									
										
										
										
											2018-01-25 17:01:29 +08:00
										 |  |  |  |      | 
					
						
							|  |  |  |  |     下载后解压至本地某个目录,进入bin目录,即可运行同步作业: | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     ``` shell | 
					
						
							|  |  |  |  |     $ cd  {YOUR_DATAX_HOME}/bin | 
					
						
							|  |  |  |  |     $ python datax.py {YOUR_JOB.json} | 
					
						
							|  |  |  |  |     ``` | 
					
						
							| 
									
										
										
										
											2018-02-05 21:43:21 +08:00
										 |  |  |  |     自检脚本: | 
					
						
							| 
									
										
										
										
											2018-02-05 21:45:15 +08:00
										 |  |  |  |     python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json | 
					
						
							| 
									
										
										
										
											2018-01-25 17:01:29 +08:00
										 |  |  |  |   * 方法二、下载DataX源码,自己编译:[DataX源码](https://github.com/alibaba/DataX) | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     (1)、下载DataX源码: | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     ``` shell | 
					
						
							|  |  |  |  |     $ git clone git@github.com:alibaba/DataX.git | 
					
						
							|  |  |  |  |     ``` | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     (2)、通过maven打包: | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     ``` shell | 
					
						
							|  |  |  |  |     $ cd  {DataX_source_code_home} | 
					
						
							|  |  |  |  |     $ mvn -U clean package assembly:assembly -Dmaven.test.skip=true | 
					
						
							|  |  |  |  |     ``` | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     打包成功,日志显示如下: | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     ```  | 
					
						
							|  |  |  |  |     [INFO] BUILD SUCCESS | 
					
						
							|  |  |  |  |     [INFO] ----------------------------------------------------------------- | 
					
						
							|  |  |  |  |     [INFO] Total time: 08:12 min | 
					
						
							|  |  |  |  |     [INFO] Finished at: 2015-12-13T16:26:48+08:00 | 
					
						
							|  |  |  |  |     [INFO] Final Memory: 133M/960M | 
					
						
							|  |  |  |  |     [INFO] ----------------------------------------------------------------- | 
					
						
							|  |  |  |  |     ``` | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下: | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     ``` shell | 
					
						
							|  |  |  |  |     $ cd  {DataX_source_code_home} | 
					
						
							|  |  |  |  |     $ ls ./target/datax/datax/ | 
					
						
							|  |  |  |  |     bin		conf		job		lib		log		log_perf	plugin		script		tmp | 
					
						
							|  |  |  |  |     ``` | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | * 配置示例:从stream读取数据并打印到控制台 | 
					
						
							|  |  |  |  |    | 
					
						
							| 
									
										
										
										
											2019-11-14 10:30:24 +08:00
										 |  |  |  |   * 第一步、创建作业的配置文件(json格式) | 
					
						
							| 
									
										
										
										
											2018-01-25 17:01:29 +08:00
										 |  |  |  |      | 
					
						
							|  |  |  |  |     可以通过命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER} | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     ``` shell | 
					
						
							|  |  |  |  |     $ cd  {YOUR_DATAX_HOME}/bin | 
					
						
							|  |  |  |  |     $  python datax.py -r streamreader -w streamwriter | 
					
						
							|  |  |  |  |     DataX (UNKNOWN_DATAX_VERSION), From Alibaba ! | 
					
						
							|  |  |  |  |     Copyright (C) 2010-2015, Alibaba Group. All Rights Reserved. | 
					
						
							|  |  |  |  |     Please refer to the streamreader document: | 
					
						
							|  |  |  |  |         https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md  | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     Please refer to the streamwriter document: | 
					
						
							|  |  |  |  |          https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md  | 
					
						
							|  |  |  |  |       | 
					
						
							|  |  |  |  |     Please save the following configuration as a json file and  use | 
					
						
							|  |  |  |  |          python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json  | 
					
						
							|  |  |  |  |     to run the job. | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     { | 
					
						
							|  |  |  |  |         "job": { | 
					
						
							|  |  |  |  |             "content": [ | 
					
						
							|  |  |  |  |                 { | 
					
						
							|  |  |  |  |                     "reader": { | 
					
						
							|  |  |  |  |                         "name": "streamreader",  | 
					
						
							|  |  |  |  |                         "parameter": { | 
					
						
							|  |  |  |  |                             "column": [],  | 
					
						
							|  |  |  |  |                             "sliceRecordCount": "" | 
					
						
							|  |  |  |  |                         } | 
					
						
							|  |  |  |  |                     },  | 
					
						
							|  |  |  |  |                     "writer": { | 
					
						
							|  |  |  |  |                         "name": "streamwriter",  | 
					
						
							|  |  |  |  |                         "parameter": { | 
					
						
							|  |  |  |  |                             "encoding": "",  | 
					
						
							|  |  |  |  |                             "print": true | 
					
						
							|  |  |  |  |                         } | 
					
						
							|  |  |  |  |                     } | 
					
						
							|  |  |  |  |                 } | 
					
						
							|  |  |  |  |             ],  | 
					
						
							|  |  |  |  |             "setting": { | 
					
						
							|  |  |  |  |                 "speed": { | 
					
						
							|  |  |  |  |                     "channel": "" | 
					
						
							|  |  |  |  |                 } | 
					
						
							|  |  |  |  |             } | 
					
						
							|  |  |  |  |         } | 
					
						
							|  |  |  |  |     } | 
					
						
							|  |  |  |  |     ``` | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     根据模板配置json如下: | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     ``` json | 
					
						
							|  |  |  |  |     #stream2stream.json | 
					
						
							|  |  |  |  |     { | 
					
						
							|  |  |  |  |       "job": { | 
					
						
							|  |  |  |  |         "content": [ | 
					
						
							|  |  |  |  |           { | 
					
						
							|  |  |  |  |             "reader": { | 
					
						
							|  |  |  |  |               "name": "streamreader", | 
					
						
							|  |  |  |  |               "parameter": { | 
					
						
							|  |  |  |  |                 "sliceRecordCount": 10, | 
					
						
							|  |  |  |  |                 "column": [ | 
					
						
							|  |  |  |  |                   { | 
					
						
							|  |  |  |  |                     "type": "long", | 
					
						
							|  |  |  |  |                     "value": "10" | 
					
						
							|  |  |  |  |                   }, | 
					
						
							|  |  |  |  |                   { | 
					
						
							|  |  |  |  |                     "type": "string", | 
					
						
							|  |  |  |  |                     "value": "hello,你好,世界-DataX" | 
					
						
							|  |  |  |  |                   } | 
					
						
							|  |  |  |  |                 ] | 
					
						
							|  |  |  |  |               } | 
					
						
							|  |  |  |  |             }, | 
					
						
							|  |  |  |  |             "writer": { | 
					
						
							|  |  |  |  |               "name": "streamwriter", | 
					
						
							|  |  |  |  |               "parameter": { | 
					
						
							|  |  |  |  |                 "encoding": "UTF-8", | 
					
						
							|  |  |  |  |                 "print": true | 
					
						
							|  |  |  |  |               } | 
					
						
							|  |  |  |  |             } | 
					
						
							|  |  |  |  |           } | 
					
						
							|  |  |  |  |         ], | 
					
						
							|  |  |  |  |         "setting": { | 
					
						
							|  |  |  |  |           "speed": { | 
					
						
							|  |  |  |  |             "channel": 5 | 
					
						
							|  |  |  |  |            } | 
					
						
							|  |  |  |  |         } | 
					
						
							|  |  |  |  |       } | 
					
						
							|  |  |  |  |     } | 
					
						
							|  |  |  |  |     ``` | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |   * 第二步:启动DataX | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     ``` shell | 
					
						
							|  |  |  |  |     $ cd {YOUR_DATAX_DIR_BIN} | 
					
						
							|  |  |  |  |     $ python datax.py ./stream2stream.json  | 
					
						
							|  |  |  |  |     ``` | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     同步结束,显示日志如下: | 
					
						
							|  |  |  |  |      | 
					
						
							|  |  |  |  |     ``` shell | 
					
						
							|  |  |  |  |     ... | 
					
						
							|  |  |  |  |     2015-12-17 11:20:25.263 [job-0] INFO  JobContainer -  | 
					
						
							|  |  |  |  |     任务启动时刻                    : 2015-12-17 11:20:15 | 
					
						
							|  |  |  |  |     任务结束时刻                    : 2015-12-17 11:20:25 | 
					
						
							|  |  |  |  |     任务总计耗时                    :                 10s | 
					
						
							|  |  |  |  |     任务平均流量                    :              205B/s | 
					
						
							|  |  |  |  |     记录写入速度                    :              5rec/s | 
					
						
							|  |  |  |  |     读出记录总数                    :                  50 | 
					
						
							|  |  |  |  |     读写失败总数                    :                   0 | 
					
						
							|  |  |  |  |     ``` | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | # Contact us
 | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | Google Groups: [DataX-user](https://github.com/alibaba/DataX) | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | 
 | 
					
						
							|  |  |  |  | 
 |