i am using spring batch with Jdbc and Postgres DB. all the job and step execution data gets saved in spring batch created tables in PostgreS DB.
i am using it to save some step context data, which gets saved in batch_step_execution_context table in the column SERIALIZED_CONTEXT.
the data i am saving has some MBCS characters.
but i see that when writing data to the table and reading from it its using ISO-8859-1 charset.
hence my mbcs characters though serialized by Xstream default serializer gets stored as garbage.
any way to workaround this, so i can retrieve and save data as MBCS.
please find the code snippet from JDBCExecutionContextDao.
private String serializeContext(ExecutionContext ctx) {
Map<String, Object> m = new HashMap<String, Object>();
for (Entry<String, Object> me : ctx.entrySet()) {
m.put(me.getKey(), me.getValue());
}
ByteArrayOutputStream out = new ByteArrayOutputStream();
String results = "";
try {
serializer.serialize(m, out);
results = new String(out.toByteArray(), "ISO-8859-1");
}
catch (IOException ioe) {
throw new IllegalArgumentException("Could not serialize the execution context", ioe);
}
return results;
}
private class ExecutionContextRowMapper implements RowMapper<ExecutionContext> {
#Override
public ExecutionContext mapRow(ResultSet rs, int i) throws SQLException {
ExecutionContext executionContext = new ExecutionContext();
String serializedContext = rs.getString("SERIALIZED_CONTEXT");
if (serializedContext == null) {
serializedContext = rs.getString("SHORT_CONTEXT");
}
Map<String, Object> map;
try {
ByteArrayInputStream in = new ByteArrayInputStream(serializedContext.getBytes("ISO-8859-1"));
map = serializer.deserialize(in);
}
catch (IOException ioe) {
throw new IllegalArgumentException("Unable to deserialize the execution context", ioe);
}
for (Map.Entry<String, Object> entry : map.entrySet()) {
executionContext.put(entry.getKey(), entry.getValue());
}
return executionContext;
}
}
i expect to store and retrieve mbcs data.
hence my mbcs characters though serialized by Xstream default serializer gets stored as garbage.
The XStreamExecutionContextStringSerializer is deprecated, we recommend using the Jackson2ExecutionContextStringSerializer instead.
That said, if you want to use Multi-Byte characters for the execution context's data, you need to increase the size of the columns in the meta-data schema. This is explained in the International and Multi-byte Characters section of the reference documentation.
Related
The question was asked here: https://vaadin.com/forum/thread/18095407/how-to-create-a-grid-without-binder
However the vaadin's forum closed so i want to continue it here.
On Vaadin 14, Any recommendation on the best way to implement grid with dynamic varying number of columns. Using column Index (1,2,3...) is not a good choice for me. Let say I have a simple Json file (only 1 level: key-value) to map to a grid and this Json has an unknown list of properties.
which approach is better in term of performance ?:
[Option 1]
class Data {
private Map<String, Object> values = new HashMap<>();
public void set(String key, Object val) {
values.put(key, val);
}
public Object get(String key) {
return values.get(key);
}
}
Grid<Data> myGrid = new Grid<>();
[Option 2]
public class GridDynamicValueProvider implements ValueProvider<GridDynamicRow, Object> {
private int columnIndex;
public GridDynamicValueProvider(int columnIndex) {
this.columnIndex = columnIndex;
}
#Override
public Object apply(GridDynamicRow dynamicRow) {
return dynamicRow.getValue(columnIndex);
}
}
public class GridDynamicRow {
private List<Object> values = new ArrayList<>();
public void addValue(String value) {
values.add(value);
}
public Object getValue(int columnIndex) {
return values.get(columnIndex);
}
}
The SerializablePredicate of Vaadin accepts both function references and Lambdas, thus it is possible to use non-POJO data types with Grid and Binder in Vaadin, although that is a bit unconventional. The key ingredients are:
Grid<Map<String, Integer>> grid = new Grid<>();
...
grid.addColumn(map -> map.get("column")).setHeader("Column");
You can also wrap the Map in custom class if you have need to protect the internals.
class Data {
private Map<String, Object> values = new HashMap<>();
public void set(String key, Object val) {
values.put(key, val);
}
public Object get(String key) {
return values.get(key);
}
}
Grid<Data> myGrid = new Grid<>();
As for the performance, essentially, you're comparing between using a List where you fetch by index versus a HashMap where you fetch by key. Here's a related question: ArrayList .get faster than HashMap .get?
You can use also ArrayList as Grid's type if you can index the columns with a number.
The both approaches allow generating Grid's with varying dynamic number of columns, for example if you read the data directly from a file or have raw data backend queries.
I am building a pipeline that reads Avro generic records. To pass GenericRecord between stages I need to register AvroCoder. The documentation says that if I use generic record, the schema argument can be arbitrary: https://beam.apache.org/releases/javadoc/2.2.0/org/apache/beam/sdk/coders/AvroCoder.html#of-java.lang.Class-org.apache.avro.Schema-
However, when I pass an empty schema to the method AvroCoder.of(Class, Schema) it throws an exception at run time. Is there a way to create an AvroCoder for GenericRecord that does not require a schema? In my case, each GenericRecord has an embedded schema.
The exception and stacktrace:
Exception in thread "main" java.lang.NullPointerException
at org.apache.beam.sdk.coders.AvroCoder$AvroDeterminismChecker.checkIndexedRecord(AvroCoder.java:562)
at org.apache.beam.sdk.coders.AvroCoder$AvroDeterminismChecker.recurse(AvroCoder.java:430)
at org.apache.beam.sdk.coders.AvroCoder$AvroDeterminismChecker.check(AvroCoder.java:409)
at org.apache.beam.sdk.coders.AvroCoder.<init>(AvroCoder.java:260)
at org.apache.beam.sdk.coders.AvroCoder.of(AvroCoder.java:141)
I had a similar case and solved it with custom coder. The simplest (but sub-efficient) solution would be to encode schema along with each record. If your schemas are not too volatile you can get benefit of caching.
public class GenericRecordCoder extends AtomicCoder<GenericRecord> {
public static GenericRecordCoder of() {
return new GenericRecordCoder();
}
private static final ConcurrentHashMap<String, AvroCoder<GenericRecord>> avroCoders = new ConcurrentHashMap<>();
#Override
public void encode(GenericRecord value, OutputStream outStream) throws IOException {
String schemaString = value.getSchema().toString();
String schemaHash = getHash(schemaString);
StringUtf8Coder.of().encode(schemaString, outStream);
StringUtf8Coder.of().encode(schemaHash, outStream);
AvroCoder<GenericRecord> coder = avroCoders.computeIfAbsent(schemaHash,
s -> AvroCoder.of(value.getSchema()));
coder.encode(value, outStream);
}
#Override
public GenericRecord decode(InputStream inStream) throws IOException {
String schemaString = StringUtf8Coder.of().decode(inStream);
String schemaHash = StringUtf8Coder.of().decode(inStream);
AvroCoder<GenericRecord> coder = avroCoders.computeIfAbsent(schemaHash,
s -> AvroCoder.of(new Schema.Parser().parse(schemaString)));
return coder.decode(inStream);
}
}
While this solves the task, in fact I made it slightly different, using external schema registry (you can build this on the top of datastore for example). In this case you don't need to serialize/deserialize schema. The code looks like:
public class GenericRecordCoder extends AtomicCoder<GenericRecord> {
public static GenericRecordCoder of() {
return new GenericRecordCoder();
}
private static final ConcurrentHashMap<String, AvroCoder<GenericRecord>> avroCoders = new ConcurrentHashMap<>();
#Override
public void encode(GenericRecord value, OutputStream outStream) throws IOException {
SchemaRegistry.registerIfAbsent(value.getSchema());
String schemaName = value.getSchema().getFullName();
StringUtf8Coder.of().encode(schemaName, outStream);
AvroCoder<GenericRecord> coder = avroCoders.computeIfAbsent(schemaName,
s -> AvroCoder.of(value.getSchema()));
coder.encode(value, outStream);
}
#Override
public GenericRecord decode(InputStream inStream) throws IOException {
String schemaName = StringUtf8Coder.of().decode(inStream);
AvroCoder<GenericRecord> coder = avroCoders.computeIfAbsent(schemaName,
s -> AvroCoder.of(SchemaRegistry.get(schemaName)));
return coder.decode(inStream);
}
}
The usage is pretty straightforward:
PCollection<GenericRecord> inputCollection = pipeline
.apply(AvroIO
.parseGenericRecords(t -> t)
.withCoder(GenericRecordCoder.of())
.from(...));
After reviewing the code for AvroCoder, I do not think the documentation is correct there. Your AvroCoder instance will need a way to figure out the schema for your Avro records - and likely the only way to do that is by providing one.
So, I'd recommend calling AvroCoder.of(GenericRecord.class, schema), where schema is the correct schema for the GenericRecord objects in your PCollection.
I want to fetch data from a web service but it is showing this error.
D/exception: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was BEGIN_ARRAY at line 1 column 3 path $[0]
There are two arrays in an array, I tried parsing it but unable to get it, It is showing data in stringResponse but don't know why parsing through GSON not working.
Model Class
MainClass
private void getStringResponse() {
String url = "http://www.mocky.io/v2/5bd7f4683100003508474b3d";
StringRequest request = new StringRequest(
Request.Method.GET,
url,
new com.android.volley.Response.Listener<String>() {
#Override
public void onResponse(String response) {
String stringResponse = response.toString();
Gson gson = new Gson();
try{
ModelOne[] data = gson.fromJson(stringResponse, ModelOne[].class);
List<ModelOne.First> first = new ArrayList<>();
first = data[0].getZero();
Log.d("data", first.get(0).getFullName());
} catch (Exception e){
Log.d("exception", e.toString());
}
}
}, new com.android.volley.Response.ErrorListener() {
#Override
public void onErrorResponse(VolleyError error) {
Log.d("error", error.toString());
}
});
RequestQueue requestQueue = Volley.newRequestQueue(this);
requestQueue.add(request);
}
I just need minor guidelines, that how to tackle data when you have arrays within array and how to parse it.
and It was showing that your question has max code that's why I uploaded dropbox link
I am creating a batch data streamer in apache ignite, and need to control what happening after data receive.
My batch has a structure:
public class Batch implements Binarylizable, Serializable {
private String eventKey;
private byte[] bytes;
etc..
Then i trying to stream my data:
try (IgniteDataStreamer<Integer, Batch> streamer = serviceGrid.getIgnite().dataStreamer(cacheName);
StreamBatcher batcher = StreamBatcherFactory.create(event) ){
streamer.receiver(StreamTransformer.from(new BatchDataProcessor(event)));
streamer.autoFlushFrequency(1000);
streamer.allowOverwrite(true);
statusService.updateStatus(event.getKey(), StatusType.EXECUTING);
int counter = 0;
Batch batch = null;
IgniteFuture<?> future = null;
while ((batch = batcher.batch()) != null) {
future = streamer.addData(counter++, batch);
}
Object getted = future.get();
Just for test use lets get only the last future, and try to analyze this object. In the code above I'm using BatchDataProcessor, that look like this:
public class BatchDataProcessor implements CacheEntryProcessor<Integer, Batch, Object> {
private final Event event;
private final String eventKey;
public BatchDataProcessor(Event event) {
this.event = event;
this.eventKey = event.getKey();
}
#Override
public Object process(MutableEntry<Integer, Batch> mutableEntry, Object... objects) throws EntryProcessorException {
Node node = NodeIgniter.node(Ignition.localIgnite().cluster().localNode().id());
ServiceGridContainer container = (ServiceGridContainer) node.getEnvironmentContainer().getContainerObject(ServiceGridContainer.class);
ProcessMarshaller marshaller = (ProcessMarshaller) container.getService(ProcessMarshaller.class);
LocalProcess localProcess = marshaller.intoProccessing(event.getLambdaExecutionKey());
try {
localProcess.addBatch(mutableEntry);
} catch (IOException e) {
e.printStackTrace();
} finally {
return new String("111");
}
}
}
So after localProcess.addBatch(mutableEntry) I want to send back an information about the status of this particular batch, so I think that I should do this in IgniteFuture object, but I don't find any information how to control the future object that's received in addData function.
Can anybody help with understanding, where can I control future that receives in addData function or some other way to realize a callback to streamed batch?
When you do StreamTransformer.from(), you forfeit the result of your BatchDataProcessor, because
for (Map.Entry<K, V> entry : entries)
cache.invoke(entry.getKey(), this, entry.getValue());
// ^ result of cache.invoke() is discarded here
DataStreamer is for one-directional streaming of data. It is not supposed to return values as far as I know.
If you depend on the result of cache.invoke(), I recommend calling it directly instead of relying on DataStreamer.
BTW, be careful with fut.get(). You should do dataStreamer.flush() first, or DataStreamer's futures will wait indefinitely.
I have a PCollection [String] say "X" that I need to dump in a BigQuery table.
The table destination and the schema for it is in a PCollection[TableRow] say "Y".
How to accomplish this in the simplest manner?
I tried extracting the table and schema from "Y" and saving it in static global variables (tableName and schema respectively). But somehow oddly the BigQueryIO.writeTableRows() always gets the value of the variable tableName as null. But it gets the schema. I tried logging the values of those variables and I can see the values are there for both.
Here is my pipeline code:
static String tableName;
static TableSchema schema;
PCollection<String> read = p.apply("Read from input file",
TextIO.read().from(options.getInputFile()));
PCollection<TableRow> tableRows = p.apply(
BigQueryIO.read().fromQuery(NestedValueProvider.of(
options.getfilename(),
new SerializableFunction<String, String>() {
#Override
public String apply(String filename) {
return "SELECT table,schema FROM `BigqueryTest.configuration` WHERE file='" + filename +"'";
}
})).usingStandardSql().withoutValidation());
final PCollectionView<List<String>> dataView = read.apply(View.asList());
tableRows.apply("Convert data read from file to TableRow",
ParDo.of(new DoFn<TableRow,TableRow>(){
#ProcessElement
public void processElement(ProcessContext c) {
tableName = c.element().get("table").toString();
String[] schemas = c.element().get("schema").toString().split(",");
List<TableFieldSchema> fields = new ArrayList<>();
for(int i=0;i<schemas.length;i++) {
fields.add(new TableFieldSchema()
.setName(schemas[i].split(":")[0]).setType(schemas[i].split(":")[1]));
}
schema = new TableSchema().setFields(fields);
//My code to convert data to TableRow format.
}}).withSideInputs(dataView));
tableRows.apply("write to BigQuery",
BigQueryIO.writeTableRows()
.withSchema(schema)
.to("ProjectID:DatasetID."+tableName)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
Everything works fine. Only BigQueryIO.write operation fails and I get the error TableId is null.
I also tried using SerializableFunction and returning the value from there but i still get null.
Here is the code that I tried for it:
tableRows.apply("write to BigQuery",
BigQueryIO.writeTableRows()
.withSchema(schema)
.to(new GetTable(tableName))
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
public static class GetTable implements SerializableFunction<String,String> {
String table;
public GetTable() {
this.table = tableName;
}
#Override
public String apply(String arg0) {
return "ProjectId:DatasetId."+table;
}
}
I also tried using DynamicDestinations but I get an error saying schema is not provided. Honestly I'm new to the concept of DynamicDestinations and I'm not sure that I'm doing it correctly.
Here is the code that I tried for it:
tableRows2.apply(BigQueryIO.writeTableRows()
.to(new DynamicDestinations<TableRow, TableRow>() {
private static final long serialVersionUID = 1L;
#Override
public TableDestination getTable(TableRow dest) {
List<TableRow> list = sideInput(bqDataView); //bqDataView contains table and schema
String table = list.get(0).get("table").toString();
String tableSpec = "ProjectId:DatasetId."+table;
String tableDescription = "";
return new TableDestination(tableSpec, tableDescription);
}
public String getSideInputs(PCollectionView<List<TableRow>> bqDataView) {
return null;
}
#Override
public TableSchema getSchema(TableRow destination) {
return schema; //schema is getting added from the global variable
}
#Override
public TableRow getDestination(ValueInSingleWindow<TableRow> element) {
return null;
}
}.getSideInputs(bqDataView)));
Please let me know what I'm doing wrong and which path I should take.
Thank You.
Part of the reason your having trouble is because of the two stages of pipeline execution. First the pipeline is constructed on your machine. This is when all of the applications of PTransforms occur. In your first example, this is when the following lines are executed:
BigQueryIO.writeTableRows()
.withSchema(schema)
.to("ProjectID:DatasetID."+tableName)
The code within a ParDo however runs when your pipeline executes, and it does so on many machines. So the following code runs much later than the pipeline construction:
#ProcessElement
public void processElement(ProcessContext c) {
tableName = c.element().get("table").toString();
...
schema = new TableSchema().setFields(fields);
...
}
This means that neither the tableName nor the schema fields will be set at when the BigQueryIO sink is created.
Your idea to use DynamicDestinations is correct, but you need to move the code to actually generate the schema the destination into that class, rather than relying on global variables that aren't available on all of the machines.