Use JDBC to Synchronize Data
The user can load data with INSERT statements through JDBC protocol.
The use of INSERT statements is similar to that of INSERT statements in MySQL and other databases. INSERT statements support the following syntaxes:
* INSERT INTO table SELECT ...
* INSERT INTO table VALUES(...)
Here we only introduce the second way. Refer to INSERT command file for more detailed instructions of INSERT command.
Write-once
Write-once means that the user directly executes an INSERT command. Examples are as follows:
INSERT INTO example_tbl (col1, col2, col3) VALUES (1000, "baidu", 3.25);
An INSERT command, for Palo, is a complete load transaction.
Therefore, whether a piece of data or multiple line data, we do not recommend loading data in the production environment with this method. The operation of high frequency word INSERT will lead to a large number of small files in the storage layer, which will seriously affect the system performance.
This method is only used for simple offline tests or low frequency operation.
Or the user can use the following methods to batch insert operations:
INSERT INTO example_tbl VALUES
(1000, "baidu1", 3.25)
(2000, "baidu2", 4.25)
(3000, "baidu3", 5.25);
We suggest that the number of inserts in a batch should be as large as possible, such as thousands or even 10,000 at a time. Or the user can use PreparedStatement to batch insert in the following way.
JDBC examples
Here is a simple example of JDBC batch INSERT codes:
package demo.palo;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
public class PaloJDBCDemo {
private static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
private static final String DB_URL_PATTERN = "jdbc:mysql://%s:%d/%s?rewriteBatchedStatements=true";
private static final String HOST = "127.0.0.1"; // Leader Node host
private static final int PORT = 8030; // http port of Leader Node
private static final String DB = "example_db";
private static final String TBL = "example_tbl";
private static final String USER = "admin";
private static final String PASSWD = "my_pass";
private static final int INSERT_BATCH_SIZE = 10000;
public static void main(String[] args) {
insert();
}
private static void insert() {
// Don't ad semicolon ";" at the end
String query = "insert into " + TBL + " values(?, ?)";
// Set Label to be idempotent.
// String query = "insert into " + TBL + " WITH LABEL my_label values(?, ?)";
Connection conn = null;
PreparedStatement stmt = null;
String dbUrl = String.format(DB_URL_PATTERN, HOST, PORT, DB);
try {
Class.forName(JDBC_DRIVER);
conn = DriverManager.getConnection(dbUrl, USER, PASSWD);
stmt = conn.prepareStatement(query);
for (int i =0; i < INSERT_BATCH_SIZE; i++) {
stmt.setInt(1, i);
stmt.setInt(2, i * 100);
stmt.addBatch();
}
int[] res = stmt.executeBatch();
System.out.println(res);
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
if (stmt != null) {
stmt.close();
}
} catch (SQLException se2) {
se2.printStackTrace();
}
try {
if (conn != null) conn.close();
} catch (SQLException se) {
se.printStackTrace();
}
}
}
}
Please note:
-
The JDBC connection string needs to add the parameter
rewriteBatchedStatements=true
and use the method ofPreparedStatement
.Currently, Palo does not support PrepareStatemnt on the server side, so JDBC Driver will batch Prepare on the client side.
rewriteBatchedStatements=true
ensures Driver to perform batch precessing. Finally, the INSERT statements in the following form are sent to Palo:INSERT INTO example_tbl VALUES (1000, "baidu1", 3.25) (2000, "baidu2", 4.25) (3000, "baidu3", 5.25);
-
Batch size
Pay attention that if a batch is too large, it will occupy the memory resources of the client because batch processing is carried out on the client.
Palo will support PrepareStatemnt on the server in the future. Please wait and see.
-
Load atomicity
INSERT operation itself, like other load methods, supports atomicity. Every INSERT operation is a load transaction, ensuring the write-in of all data atomicity in an INSERT.
As mentioned earlier, we recommend that you load data in a "batch" way when loading data with INSERT, rather than single insert.
Also, we can set a Label for each INSERT operation. Through Label mechanism, we can ensure the idempotence and atomicity of the operation, and finally achieve the data without loss and duplication. Refer to INSERT file for specific use of Label in INSERT.