Creating a New CloudQuery Source Plugin in Java

This guide will help you develop a new source or destination plugin for CloudQuery in Java. CloudQuery's modular architecture means that a source plugin can be used to fetch data from any third-party API, and then be combined with a destination plugin to insert data into any supported destination. We will cover the basics of how to get started, and then dive into some more advanced topics. We will also cover how to release your plugin for use by the wider CloudQuery community.

This guide assumes that you are somewhat familiar with CloudQuery. If you are not, we recommend starting by reading the Quickstart guide and playing around with the CloudQuery CLI a bit first.

Though you by no means need to be an expert, you will also need some familiarity with Java. If you are new to Java, we recommend starting with the W3Schools Java tutorials (opens in a new tab).

Core Concepts

Before we dive in, let's quickly cover some core concepts of CloudQuery plugins, so that they're familiar when we see our first example.

Syncs

A sync is the process that gets kicked off when a user runs cloudquery sync. A sync is responsible for fetching data from a third-party API and inserting it into the destination (database, data lake, stream, etc.). When you write a source plugin for CloudQuery, you will only need to implement the part that interfaces with the third-party API. The rest of the sync process, such as delivering to the destination database, is handled by the CloudQuery SDK.

Tables and Services

A table is the term CloudQuery uses for a collection of related data. In most databases it directly maps to an actual database table, but in some destinations it could be stored as a file, stream or other medium. A table is defined by a name, a list of columns, and a resolver function. The resolver function is responsible for fetching data from the third-party API and sending it to CloudQuery. We will look at examples of this soon!

Every table will typically have its own .java file in the plugin resources package.

Resolvers

Resolvers are functions associated with a table that get called when it's time to populate data for that table. There are two types of resolvers:

Table resolvers

Table resolvers are responsible for fetching data from the third-party API. In Java, a table resolver is a class that implements the TableResolver (opens in a new tab) interface.

Column resolvers

Column resolvers (opens in a new tab) are responsible for mapping data from the third-party API into the columns of the table. In most cases, you will not need to implement this, as the SDK will automatically map data from the class passed in by the table resolver to the columns of the table. But in some cases, you may need to implement a custom column resolver to fetch additional data or do custom transformations.

Apache Arrow Java SDK

Like other CloudQuery SDK the Java SDK uses [Apache Arrow] as part of the underlying CloudQuery type system.

Creating Your First Plugin

As the Java SDK is still new, we don't yet have a plugin scaffold generator. For now, we recommend starting by copying the code from an existing Java plugin (opens in a new tab).

Before running the plugin locally, you will need to first install Gradle by following the installation instructions (opens in a new tab).

In addition, you will need to setup a classic personal access token with read:packages scope to authenticate to GitHub Packages. See Authenticating to GitHub Packages (opens in a new tab) for more details. The username associated with the token and the token itself should be set as environment variables as follows:

export GITHUB_ACTOR=<your-github-username>
export GITHUB_TOKEN=<personal-access-token>

Once you have Gradle installed and have set up authentication to GitHub Packages, you can run the following command to build the plugin project:

gradle build

Testing the Plugin

There are two options for running a plugin as a developer before it is released: as a gRPC server, or as a Docker container. We will briefly summarize both options here, or you can read about them in more detail in Running Locally.

Run the Plugin as a gRPC Server

This mode is especially useful for setting breakpoints your code for debugging, as you can run it in server mode from your IDE and attach a debugger to it. To run the plugin as a gRPC server, you can run the following command in the root of the plugin directory:

gradle run --args serve

This will start a gRPC server on port 7777. You can then create a config file that sets the registry and path properties to point to this server. For example:

config.yaml

kind: source
spec:
  name: "my-plugin"
  registry: "grpc"
  path: "localhost:7777"
  version: "v1.0.0"
  tables:
    ["*"]
  destinations:
    - "sqlite"
---
kind: destination
spec:
  name: sqlite
  path: cloudquery/sqlite
  registry: cloudquery
  version: "v2.9.4"
  spec:
    connection_string: ./db.sql

With the above configuration, we can now run cloudquery sync as normal:

cloudquery sync config.yaml

Note that when running a source plugin as a gRPC server, errors with the source plugin will be printed to the console running the gRPC server, not to the CloudQuery log like usual.

Run the Plugin as a Docker Container

You can also build a Docker container for the plugin, and then either run it directly as a gRPC server or via the docker registry in a config file. See an example Docker file for a Java plugin here (opens in a new tab).

We need to first build the image:

docker build -t my-plugin:latest --build-arg GITHUB_ACTOR=<your-github-username> --build-arg GITHUB_TOKEN=<personal-access-token> .

And then we can specify the docker registry in our config file:

config.yaml

kind: source
spec:
  name: "my-plugin"
  registry: "docker"
  path: "my-plugin:latest"
  tables:
    ["*"]
  destinations:
    - "sqlite"
---
kind: destination
spec:
  name: sqlite
  path: cloudquery/sqlite
  registry: cloudquery
  version: "v2.9.4"
  spec:
    connection_string: ./db.sql

Releasing and Deploying Your Plugin

Releasing a Java plugin for use by the wider CloudQuery community involves publishing a Docker image to any registry of your choice. We recommend using Docker Hub (opens in a new tab), but you can also use GitHub Container Registry (opens in a new tab) or any other registry that supports Docker images. You can see an example Dockerfile here (opens in a new tab).

Once published, users can then import your plugin by specifying the image path in their config file together with the docker registry, e.g.:

kind: source
spec:
  name: cloudwidgets
  path: ghcr.io/myorg/cloudwidgets
  registry: docker

This will download and run the plugin as a Docker container when cloudquery sync is run.

Real-world Examples

A good way to learn how to create a new plugins in Java is to look at the following examples:

The Bitbucket Source Plugin (opens in a new tab) is an example of dynamically generating tables based on the schema of a third-party API and mapping API types to arrow types.

Resources

Python Source Plugin Publishing an Addon to the Hub