embulk-output-documentdb

Azure DocumentDB output plugin for Embulk

Download .zip Download .tar.gz View on GitHub

Azure DocumentDB output plugin for Embulk

embulk-output-documentdb is an embulk output plugin that dumps records to Azure DocumentDB. Embulk is a open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services. See Embulk documentation for details.

Overview

  • Plugin type: output
  • Load all or nothing: no
  • Resume supported: no
  • Cleanup supported: yes

Installation

$ gem install embulk-output-documentdb

Configuration

DocumentDB

To use Microsoft Azure DocumentDB, you must create a DocumentDB database account using either the Azure portal, Azure Resource Manager templates, or Azure command-line interface (CLI). In addition, you must have a database and a collection to which embulk-output-documentdb writes event-stream out. Here are instructions:

Embulk Configuration (config.yml)

out:
  type: documentdb
  docdb_endpoint: https://yoichikademo0.documents.azure.com:443/
  docdb_account_key: EMwUa3EzsAtJ1qYfzxo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
  docdb_database: myembulkdb
  docdb_collection: myembulkcoll
  auto_create_database: true
  auto_create_collection: true
  partitioned_collection: false
  key_column: id
  • docdb_endpoint (required) - Azure DocumentDB Account endpoint URI
  • docdb_account_key (required) - Azure DocumentDB Account key (master key). You must NOT set a read-only key
  • docdb_database (required) - DocumentDB database nameb
  • docdb_collection (required) - DocumentDB collection name
  • auto_create_database (optional) - Default:true. By default, DocumentDB database named docdb_database will be automatically created if it does not exist
  • auto_create_collection (optional) - Default:true. By default, DocumentDB collection named docdb_collection will be automatically created if it does not exist
  • partitioned_collection (optional) - Default:false. Set true if you want to create and/or store records to partitioned collection. Set false for single-partition collection
  • partition_key (optional) - Default:nil. Partition key must be specified for paritioned collection (partitioned_collection set to be true)
  • offer_throughput (optional) - Default:10100. Throughput for the collection expressed in units of 100 request units per second. This is only effective when you newly create a partitioned collection (ie. Both auto_create_collection and partitioned_collection are set to be true )
  • key_column (required) - Column name to be inserted to DocumentDB as primary key. If it's not named "id", the column name is converted into "id" (string).

Configuration examples

Here are two types of the plugin configurations example - single-parition collection and partitioned collection.

(1) Single-Partition Collection Case

out:
  type: documentdb
  docdb_endpoint: https://yoichikademo0.documents.azure.com:443/
  docdb_account_key: EMwUa3EzsAtJ1qYfzxo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
  docdb_database: myembulkdb
  docdb_collection: myembulkcoll
  auto_create_database: true
  auto_create_collection: true
  partitioned_collection: false
  key_column: id

(2) Partitioned Collection Case

  type: documentdb
  docdb_endpoint: https://yoichikademo0.documents.azure.com:443/
  docdb_account_key: EMwUa3EzsAtJ1qYfzxo9nQ3KudofsXNm3xLh1SLffKkUHMFl80OZRZIVu4lxdKRKxkgVAj0c2mv9BZSyMN7tdg==
  docdb_database: myembulkdb
  docdb_collection: myembulkcoll
  auto_create_database: true
  auto_create_collection: true
  partitioned_collection: true
  partition_key: account
  offer_throughput: 10100
  key_column: id

Build, Install, and Run

$ rake

$ embulk gem install pkg/embulk-output-documentdb-0.1.0.gem

$ embulk preview config.yml

$ embulk run config.yml

Change log

Links

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/yokawasa/embulk-output-documentdb.

Copyright

Copyright Copyright (c) 2016- Yoichi Kawasaki
License MIT