SQL API
- Simply Combine all your DataSets & express your entire Logic as a single SQL via SQL API.
- In a Single SQL - Join data from various systems (Kafka, ES, HBASE, Hive, Teradata, more to come).
- Execute your SQL in Batch Mode or Continuous Streaming Mode.
Controlling SQL output and behaviour
Property for SQL results |
Notes |
gimel.query.results.show.rows.only |
Set this to "true" to stop getting all these messages. (Default : false) |
gimel.query.results.show.rows.threshold |
Number of rows to display in interactive mode (Default : 1000) |
SQL on Scala (spark-shell | spark-submit)
Execute Batch SQL - Spark Shell / Program
sparkSession.conf.set("gimel.query.results.show.rows.only","true")
sparkSession.sql("set gimel.kafka.throttle.streaming.window.seconds=20");
sparkSession.sql("set gimel.kafka.throttle.streaming.parallelism.factor=20");
sparkSession.sql("set gimel.kafka.kafka.reader.checkpoint.save=true");
sparkSession.sql("set gimel.kafka.kafka.reader.checkpoint.clear=false");
sparkSession.sql("set gimel.kafka.throttle.batch.fetchRowsOnFirstRun=100");
sparkSession.sql("set gimel.kafka.throttle.batch.maxRecordsPerPartition=50");
sparkSession.sql("set gimel.logging.level=INFO");
sparkSession.sql("set gimel.query.results.show.rows.only=true");
val gsql=com.paypal.gimel.sql.GimelQueryProcessor.executeBatch(_:String,sparkSession);
val df = gsql("SELECT count(*) FROM pcatalog.flights_kafka");
Execute Stream SQL - Spark Shell / Program
val gsqlStream=com.paypal.gimel.sql.GimelQueryProcessor.executeStream(_:String,sparkSession);
gsqlStream("INSERT into pcatalog.flights_elastic SELECT count(*) FROM pcatalog.flights_kafka");