CURL Function

CURL Function

The CURL() function implements a subset of cURL functionality and enables N1QL queries to interact and integrate with external JSON data sources available over HTTP/REST. This allows federated queries against external data sources such as Google geocoding API, Yahoo API, Couchbase Full text service or other Couchbase clusters. The federated queries can leverage full querying capabilities of N1QL, including functions, expressions, sub-queries, JOINs, NESTs, UNNESTs etc.

The CURL() function can be used in various N1QL expressions and in various clauses of the DML statements such as projection, WHERE predicates, FROM data source, and so on. For example, you can use the CURL function to integrate with Couchbase FTS and run fuzzy text searches in N1QL queries, or INSERT-SELECT statement to import external data into Couchbase Server.

Syntax

CURL(url, [options])

Arguments
  • url - is a string representing the URL of the data source. The URL needs to point to a JSON endpoint and must be either http:// or https:// only. No other protocol is supported. The redirection of URL is not allowed.
  • options - is an optional JSON object representing various supported options. This includes options and parameters to be sent to the URL source endpoint.
Options
Table 1. Security Options
Option Description Value
user Server user name and password. USERNAME[:PASSWORD]
basic Use HTTP Basic Authentication. Boolean
insecure Allow connections to SSL sites without certificates (H). Boolean
anyauth CURL to figure out authentication method by itself, and use the most secure one.
Note: This supports only basic authentication.
Boolean
cacert Specify CA signed certificate file name. Certificates should be stored on the local machine, on each query node within a cluster.
  • To store certificates, access /Couchbase/var/lib/couchbase/n1qlcerts. This is not visible to the user.
  • The file name cannot contain a path.
  • If the certificate is not a match to the existing contents of n1qlcerts directory, the function returns an error.
  • All expired and invalid certificates return an error.
FILENAME

For example, this is the certificate .pem file for AWS.

Table 2. Transfer-related Options
Option Description Value
get The Get request for CURL. BOOLEAN

Example: {"get": true}

request Sets the request method. This only accepts GET or POST requests. This is case sensitive. String

Example: {“request”:”POST”}

connect-timeout The maximum time allowed for connection in seconds. Integer

Example: {"connect-timeout":30}

max-time The maximum time allowed for the transfer in seconds. Integer

Example: {"max-time":30}

data POST data to be sent to the HTTP/REST service. The string data should be formatted exactly same as HTTP POST data. String or [...string, string….]

Example: {"data":"address=Half+Moon+Bay"}

{"data":"address=Half+Moon+Bay,california"}

header Passes custom header line to the server. String or [...string, string….]

Example: {"header":"Content-Type: application/json"}

{"header":["Content-Type: application/json", "Content-Length: 115"]}

show-error Displays error message. Boolean

Example: {"show-error": true}

silent The silent mode. Boolean

Example: {"silent": false}

keepalive-time The number of seconds to wait between the keepalive probes. Integer

Example: {"keepalive-time":20}

user-agent The value for the user-agent to send it to the server. String

Example: {"header":[ "user-agent: bsmith","Content-Type: application/json"]}

data-urlencode Encode the data and send it to the server.

This is a test => this%20is%20a%20test

String or [...string, string….]
Return Value
The CURL function returns either a single JSON object or multiple objects in an array. These objects are returned by the HTTP/REST service at the URL. Note that, the result is expected to be JSON, and the function itself doesn’t do any additional processing. However, if the query parameter pretty=true, then the data received is in pretty format. If the returned data from the URL is not a well defined JSON, it may result in errors or undefined behavior. The errors returned by CURL (such as moved permanently), can be in different formats, like HTML, XML, plain strings, and can be a large blob. The N1QL function returns a “Not a JSON endpoint” error when it returns any other format other than JSON.

Security

It is important to understand the potential security implications in using the CURL() function. Note that CURL() function can connect to any REST end point accessible to the Query node (where the function is executed) inside or outside the firewall. To avoid security vulnerabilities, multiple security measures have been implemented. These can be used to control and minimize the risks associated with using the CURL() function. See Table 1 for the list of security options that can be used with the CURL() function.

In addition to the security options, a Full Administrator can also whitelist URLs and REST endpoints that can be accessed by the CURL() function. The CURL() function can access URLs that satisfy a prefix match, which means only URLs specified on the list or the prefixes.

Consider the following use case where a deployment of Couchbase Server and Mobile Sync Gateway exist on the same machine, and the user has access to the query service but not the Sync Gateway admin endpoint. Let's assume that the user has been granted QUERY_EXTERNAL_ACCESS role by the Full Administrator. This means that the user can write queries using the CURL() function and potentially use CURL to access the Mobile Sync Gateway admin endpoint. To avoid this, the Full Administrator can create a whitelist for CURL() access and add the Admin endpoint of Sync Gateway to the disallowed_urls list.

For more details on creating the whitelist and the structure of the whitelist file, see create a whitelist for CURL access.

The following security measures help control risks when using the CURL() function:

  • Enable the CURL() function only for the Full Administrator role.
  • For all other users, the RBAC role QUERY_EXTERNAL_ACCESS is required to run the CURL() function. Only the Full Administrator can assign the QUERY_EXTERNAL_ACCESS role to other users.
  • A whitelist of URLs and REST points must be configured on the query nodes before being able to use the CURL() function. See Creating a Whitelist for CURL() Access for details.
  • Each query node in the cluster must define its whitelist file.
  • The CURL() function internally supports a specific set of SSL ciphers (MEDIUM or HIGH). This is dependent on the COUCHBASE_SSL_CIPHER_LIST.
  • The CURL() function runs on the query node within a cluster. In order to identify a request from N1QL’s CURL() function, a custom header is always set for all requests. The custom header format is: "X-N1QL-User-Agent: couchbase/n1ql/<n1ql-version>". External clients that wish to disallow N1QL CURL() from accessing their REST API endpoints can explicitly check for the above header and block it. The general external endpoints will ignore this header value.
  • A user-agent is always set by default. This can be reset using the -user-agent option. The value set by default is couchbase/n1ql/<n1ql-version>.
    Note: The current <n1ql-version> is "1.7.0-N1QL".

Creating a Whitelist for CURL() Access

A whitelist allows a Full Administrator to list out the permitted REST endpoints and URLs for the CURL() function. To enable access based on the whitelist, a Full Administrator must create the file containing the whitelist, curl_whitelist.json with the all_access field set to false, and save it in the same directory as that of the N1QL certificates(.../Couchbase/var/lib/couchbase/n1qlcerts). Note that this file is available to only someone with access to localhost, such as the Full Administrator, and is not visible to the user.

Each query node in the system needs to define a curl_whitelist.json file.

The CURL() function can access URLs that satisfy a prefix match, which means only URLs on the list (or prefixes) specified in the curl_whitelist.json file.

The whitelist file is specified as a JSON object with the fields described in the following table.
Table 3. Structure of Whitelist for CURL()
Field Type Description Default Value
all_access boolean This field defines whether the user has access to all URLs or only URLs specified in the allowed_urls array.

The curl_whitelist.json file must contain the all_access field set to false to enable whitelisting and restrict access.

Setting this field to true enables access to all endpoints.

false
allowed_urls array List of prefixes for URLs that we wish to grant access. This needs to be a prefix match. For example, if we wish to allow access to all Google APIs then add the URL https://maps.googleapis.com/ to the array. Any URL in a CURL() function that contains this value as a prefix will be allowed.

To allow complete access to localhost, use http://localhost.

Note that the URL must be complete and must include the port/protocol and all other components o the URL.

empty
disallowed_urls array List of prefixes for URLs that will be restricted for all roles. Note that the URL must be complete and must include the port/protocol and all other components o the URL.

If both allowed_urls and disallowed_urls fields are populated, the disallowed_urls field takes precedence over allowed_urls.

empty
The following tables lists the curl_whitelist.json JSON object for different scenarios and CURL whitelist is enabled and
curl_whitelist.json Example
CURL is disabled when the curl_whitelist.json does not exist. CURL is disabled.
To allow access to all endpoints, the all_access field is set to true.
{
  "all_access":true,
  "allowed_urls":[],
  "disallowed_urls":[]
}
{
  "all_access":true,
  "allowed_urls":["https://maps.googleapis.com/maps/api/geocode/json"]
}
All CURL queries can run.
To restrict access to values specified in allowed_urls, the all_access field is set to false and allowed_urls field contains the URLs that can be accessed by CURL.
{
  "all_access":false,
  "allowed_urls":["https://maps.googleapis.com/maps/api/geocode/json"]
}
If the allowed_urls field is changed to contain https://maps.googleapis.com, all Google API queries and endpoints are accessible by CURL.
{
  "all_access":false,
  "allowed_urls":["https://maps.googleapis.com"]
}
Only queries with the specified URL can be run. All other queried return an error with the message "Cannot access <url>. Access denied."
SELECT 
CURL("https://maps.googleapis.com/maps/api/geocode/json", 
  {
    "data":["address=santa+cruz","components=country:ES"],
    "get":true
  }
)
If both allowed_urls and disallowed_urls fields are populated, disallowed_urls takes precedence over allowed_urls.
{
  "All_access":false,
  "allowed_urls":["https://maps.googleapis.com/maps/api/geocode/json"],
  "disallowed_urls":["https://maps.googleapis.com"]
}
Since disallowed_urls takes precedence over allowed_urls, it means all Google API queries and endpoints are inaccessible.
Table 4. CURL Whitelisting Errors
Error Condition Error Message
curl_whitelist.json does not exist. n1qlcerts directory does not exist under .../var/lib/couchbase/

CURL() functionality is disabled. File curl_whitelist.json does not exist under ../Couchbase/var/lib/couchbase/n1qlcerts/ directory.
curl_whitelist.json is empty, or the fields are empty, or all_access field does not exist. Access denied to CURL(). File curl_whitelist.json is empty.
curl_whitelist.json is an invalid JSON object. Access denied to CURL(). File curl_whitelist.json contains invalid JSON. Contents have to be a JSON object.

Design Considerations

Here are some of the design considerations to keep in mind when using CURL:

  • The URL needs to point to a JSON endpoint. The redirection of URL is not allowed.
  • Only HTTP and HTTPS protocols are supported. This means that files on the local file system cannot be accessed.
  • The amount of memory used for the CURL result is controlled using the result-cap option. The default is 20MB.
  • Any values passed to the arguments of CURL() must be static values. That means, they cannot include any references to names, aliases of documents, attributes in the documents, or any N1QL functions or expressions that need to be evaluated.

    Consider the following example:

    SELECT CURL(b.url, { “data” : “address=”||b.data }) 
    FROM bucket b;

    The above example is invalid, because the first argument b.url refers to the alias b and the attribute url in the document. In the second argument, the string concatenation operator (||) cannot be evaluated.

  • When using CURL whitelisting, the

Examples

The following examples are using CURL in the query projection list.

Example 1: The following N1QL query fetches details about the address “Half Moon Bay” using the Google maps API. The Geocoding API from Google Maps allows you to convert static addresses into coordinates. (For more information refer to https://developers.google.com/maps/documentation/geocoding/intro) The corresponding curl command is also provided below.

Curl Request:
curl https://maps.googleapis.com/maps/api/geocode/json?address=Half+Moon+Bay

Query:
SELECT CURL("https://maps.googleapis.com/maps/api/geocode/json",
           {"data":"address=Half+Moon+Bay" , "request":"GET"} );
Results:
[
  {
    "$1": {
      "results": [
        {
          "address_components": [
            {
              "long_name": "Half Moon Bay",
              "short_name": "Half Moon Bay",
              "types": [
                "locality",
                "political"
              ]
            },
            {
              "long_name": "San Mateo County",
              "short_name": "San Mateo County",
              "types": [
                "administrative_area_level_2",
                "political"
              ]
            },
            {
              "long_name": "California",
              "short_name": "CA",
              "types": [
                "administrative_area_level_1",
                "political"
              ]
            },
            {
              "long_name": "United States",
              "short_name": "US",
              "types": [
                "country",
                "political"
              ]
            }
          ],
          "formatted_address": "Half Moon Bay, CA, USA",
          "geometry": {
            "bounds": {
              "northeast": {
                "lat": 37.5226389,
                "lng": -122.4165183
              },
              "southwest": {
                "lat": 37.4249286,
                "lng": -122.4778879
              }
            },
            "location": {
              "lat": 37.4635519,
              "lng": -122.4285862
            },
            "location_type": "APPROXIMATE",
            "viewport": {
              "northeast": {
                "lat": 37.5226389,
                "lng": -122.4165183
              },
              "southwest": {
                "lat": 37.4249286,
                "lng": -122.4774494
              }
            }
          },
          "place_id": "ChIJC8sZCqULj4ARVJvnNcic_V4",
          "types": [
            "locality",
            "political"
          ]
        }
      ],
      "status": "OK"
    }
  }
]

Example 2: This is similar to Example 1, but following N1QL query fetches details about Santa Cruz in Spain using Google geocoding API and extracts the ‘geometry’ field from the result. This query retrieves the address and geographic location bounds of the address, Santa Cruz, ES. We use the address and components parameters from the Geocoding API. The data option represents the HTTP POST data.

Curl Request:
curl https://maps.googleapis.com/maps/api/geocode/json?address=santa+cruz&components=country:ES

Query:
SELECT CURL("https://maps.googleapis.com/maps/api/geocode/json", 
           {"data":["address=santa+cruz","components=country:ES"],"get":true});

Results:
[
  {
    "$1": {
      "results": [
        {
          "address_components": [
            {
              "long_name": "Santa Cruz de Tenerife",
              "short_name": "Santa Cruz de Tenerife",
              "types": [
                "locality",
                "political"
              ]
            },
            {
              "long_name": "Santa Cruz de Tenerife",
              "short_name": "TF",
              "types": [
                "administrative_area_level_2",
                "political"
              ]
            },
            {
              "long_name": "Canary Islands",
              "short_name": "CN",
              "types": [
                "administrative_area_level_1",
                "political"
              ]
            },
            {
              "long_name": "Spain",
              "short_name": "ES",
              "types": [
                "country",
                "political"
              ]
            }
          ],
          "formatted_address": "Santa Cruz de Tenerife, Spain",
          "geometry": {
            "bounds": {
              "northeast": {
                "lat": 28.487616,
                "lng": -16.2356646
              },
              "southwest": {
                "lat": 28.4280248,
                "lng": -16.3370045
              }
            },
            "location": {
              "lat": 28.4636296,
              "lng": -16.2518467
            },
            "location_type": "APPROXIMATE",
            "viewport": {
              "northeast": {
                "lat": 28.487616,
                "lng": -16.2356646
              },
              "southwest": {
                "lat": 28.4280248,
                "lng": -16.3370045
              }
            }
          },
          "place_id": "ChIJcUElzOzMQQwRLuV30nMUEUM",
          "types": [
            "locality",
            "political"
          ]
        }
      ],
      "status": "OK"
    }
  }
]

Example 3: This N1QL query shows how to JOIN two buckets on different Couchbase clusters. It is same as explained in the JOIN Clause example, but with the left and right side buckets for the JOIN are in two different Couchbase clusters.

  • The left side bucket route is the `travel-sample` route documents from cluster running on hostname. If you don’t have a second cluster running, you should substitute the hostname with 127.0.0.1 or the IP-address of the local cluster.
  • The right side bucket airline is the `travel-sample` airline documents in the local cluster.
Note that the results from the CURL() output are embedded in the results[] array under the bucket alias ‘t’ used in the remote query. So, we extract the result documents appropriately with the expression CURL(...).results[*].t and alias it to route as the left side bucket for the JOIN.
Query:
SELECT DISTINCT airline.name, airline.callsign, route.destinationairport, route.stops, route.airline
FROM CURL("http://hostname:8093/query/service", 
     {"data" : "statement=SELECT * FROM `travel-sample` t WHERE t.type = 'route'"
      }).results[*].t  route
JOIN `travel-sample` airline 
ON KEYS route.airlineid 
WHERE airline.type = "airline" AND route.sourceairport = "SFO";

Results:

[
  {
    "airline": "B6",
    "callsign": "JETBLUE",
    "destinationairport": "AUS",
    "name": "JetBlue Airways",
    "stops": 0
  },
  {
    "airline": "B6",
    "callsign": "JETBLUE",
    "destinationairport": "BOS",
    "name": "JetBlue Airways",
    "stops": 0
  },
  ….
]

The credentials are required for RBAC when CURL() is accessing Couchbase Server 5.0 or later version.

SELECT DISTINCT airline.name, airline.callsign, 
   route.destinationairport, route.stops, route.airline
FROM CURL("http://localhost:8093/query/service", 
          {"data" : "statement=SELECT * FROM `travel-sample` t
                     WHERE t.type = 'route'",
  "request" : "POST", "user":"Administrator:password"}).results[*].t  route
JOIN `travel-sample` airline 
ON KEYS route.airlineid 
WHERE airline.type = "airline" AND route.sourceairport = “SFO” 
LIMIT 4;
[
  {
    "airline": "B6",
    "callsign": "JETBLUE",
    "destinationairport": "AUS",
    "name": "JetBlue Airways",
    "stops": 0
  }, … 
]

Example 4: The following example shows how to use fuzzy search from the FTS service in a N1QL query. Assuming the FTS index fts_travel is created on the bucket `travel-sample`, running the following N1QL query finds all documents that have “sanfrancisco” anywhere in the document, using the full-text searching capabilities of the Couchbase FTS service.

Query:
SELECT result.hits[*].id
FROM CURL("http://Administrator:password@127.0.0.1:8094/api/index/fts_index/query",
     {"header":"Content-Type: application/json",
      "request" : "POST",  
      "data":'{"explain":false,"fields": ["*"],"highlight": {},
              "query": {"query": "san fran isco"}}' }) result;

Results:

[
  {
    "id": [
      "hotel_25509",
      "hotel_25508",
      "hotel_26139",
      "hotel_25587",
      "hotel_25503",
      "hotel_25667",
      "hotel_25502",
      "hotel_25597",
      "hotel_26493",
      "hotel_25670"
    ]
  }
]

Example 5: The following example uses the CURL() function with a WHERE clause. It uses the Yahoo finance API to find the day’s low value (i.e DaysLow) of HDP stock and finds all the documents in the default bucket that have ‘min_threshold’ attribute value greater than the DaysLow stock value.

Insert the following documents into the default bucket, representing customers and their minimum thresholds, and then run the SELECT query:

INSERT INTO `travel-sample` (KEY, VALUE)
  VALUES
     ("k1", 
         {"custID" : 12345, "min_threshold" : 4}), 
     ("k2", 
         {"custID" : 44444, "min_threshold" : 12});

Query:
SELECT min_threshold, 
       meta().id, 
       to_number(hdp_low) hdp_low 
FROM `travel-sample`
USE KEYS ["k1", "k2"]
LET hdp_low = curl("https://query.yahooapis.com/v1/public/yql", {"data":"q=select%20*%20from%20yahoo.finance.quotes%20where%20symbol%20in%20(%22HDP%22)&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback="}).query.results.quote.DaysLow
WHERE to_number(hdp_low) < min_threshold;

Results:
[
  {
    "hdp_low": 9.48,
    "id": "k2",
    "min_threshold": 12
  }
]
Example 6: The following example uses CURL() whitelisting to allow access to all Google Maps API and endpoints except https://maps.googleapis.com/maps/api/geocode/json.
curl_whitelist.json
{
  "All_access":false,
  "allowed_urls":["https://maps.googleapis.com"],
  "disallowed_urls":["https://maps.googleapis.com/maps/api/geocode/json"]
}
The following query will return an error: " Cannot access <url>. Access denied".
SELECT CURL("https://maps.googleapis.com/maps/api/geocode/json", 
  {  "data":["address=santa+cruz","components=country:ES"],
     "get":true
  }
)