SQL API

  • Simply Combine all your DataSets & express your entire Logic as a single SQL via SQL API.
  • In a Single SQL - Join data from various systems (Kafka, ES, HBASE, Hive, Teradata, more to come).
  • Execute your SQL in Batch Mode or Continuous Streaming Mode.

Controlling SQL output and behaviour

Property for SQL results Notes
gimel.query.results.show.rows.only Set this to "true" to stop getting all these messages. (Default : false)
gimel.query.results.show.rows.threshold Number of rows to display in interactive mode (Default : 1000)

SQL on Scala (spark-shell | spark-submit)

Execute Batch SQL - Spark Shell / Program

sparkSession.conf.set("gimel.query.results.show.rows.only","true")

sparkSession.sql("set gimel.kafka.throttle.streaming.window.seconds=20");
sparkSession.sql("set gimel.kafka.throttle.streaming.parallelism.factor=20");
sparkSession.sql("set gimel.kafka.kafka.reader.checkpoint.save=true");
sparkSession.sql("set gimel.kafka.kafka.reader.checkpoint.clear=false");
sparkSession.sql("set gimel.kafka.throttle.batch.fetchRowsOnFirstRun=100");
sparkSession.sql("set gimel.kafka.throttle.batch.maxRecordsPerPartition=50");
sparkSession.sql("set gimel.logging.level=INFO");
sparkSession.sql("set gimel.query.results.show.rows.only=true");

val gsql=com.paypal.gimel.sql.GimelQueryProcessor.executeBatch(_:String,sparkSession);
val df = gsql("SELECT count(*) FROM pcatalog.flights_kafka");

Execute Stream SQL - Spark Shell / Program

val gsqlStream=com.paypal.gimel.sql.GimelQueryProcessor.executeStream(_:String,sparkSession);
gsqlStream("INSERT into pcatalog.flights_elastic SELECT count(*) FROM pcatalog.flights_kafka");