Avro schema evolution testing and questions

Avro schema evolution testing and questions - avro

With the following Avro schemas defined and test code, I have a couple questions when considering Avro schema evolution and how the first version of the Avro data can be stored and later retrieved using the second version of the schema. In my example, Person.avsc represents the first version, and PersonWithMiddleName.avsc represents the second version, where we have added a middleName attribute.
Is there a way to store the Avro schema and the binary encoded data as a byte array in Java? We are wanting to store our Avro objects to DynamoDB, and we'd like to store the Avro data as a blob with the schema stored alongside it (just like it is when stored to a file)? As reference, look at my Test Output below (the binary contents didn't copy, so the line just reads The Person is now serialized to a byte array: JoeCool) and compare what gets stored when Person is serialized to a byte array vs. when it is written out during the test to the person.avro file. As you can see, it appears as though the schema is only written out with the file and not with the byte array.
Is the AvroTypeException I encounter during my test truly expected as I have indicated with my comments in the catch block of the test? In this case, I have serialized the Person object as JSON and tried to deserialize it as PersonWithMiddleName.
Java Test Code
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import org.apache.avro.AvroTypeException;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.BinaryDecoder;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.io.JsonDecoder;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class SchemaEvolutionTest {
Logger log = LoggerFactory.getLogger(this.getClass());
#Test
public void createAndReadPerson() {
// Create the Person using the Person schema
var person = new Person();
person.setFirstName("Joe");
person.setLastName("Cool");
log.info("Person has been created: {}", person);
SpecificDatumWriter<Person> personSpecificDatumWriter =
new SpecificDatumWriter<Person>(Person.class);
DataFileWriter<Person> dataFileWriter = new DataFileWriter<Person>(personSpecificDatumWriter);
try {
dataFileWriter.create(person.getSchema(), new File("person.avro"));
dataFileWriter.append(person);
dataFileWriter.close();
} catch (IOException e) {
Assertions.fail();
}
log.info("Person has been written to an Avro file");
// ******************************************************************************************************
// Next, read as Person from the Avro file using the Person schema
DatumReader<Person> personDatumReader =
new SpecificDatumReader<Person>(Person.getClassSchema());
var personAvroFile = new File("person.avro");
DataFileReader<Person> personDataFileReader = null;
try {
personDataFileReader = new DataFileReader<Person>(personAvroFile, personDatumReader);
} catch (IOException e1) {
Assertions.fail();
}
Person personReadFromFile = null;
while (personDataFileReader.hasNext()) {
// Reuse object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
try {
personReadFromFile = personDataFileReader.next(person);
} catch (IOException e) {
Assertions.fail();
}
}
log.info("Person read from the file: {}", personReadFromFile.toString());
// ******************************************************************************************************
// Read the Person from the Person file as PersonWithMiddleName using only the
// PersonWithMiddleName schema
DatumReader<PersonWithMiddleName> personWithMiddleNameDatumReader =
new SpecificDatumReader<PersonWithMiddleName>(PersonWithMiddleName.getClassSchema());
DataFileReader<PersonWithMiddleName> personWithMiddleNameDataFileReader = null;
try {
personWithMiddleNameDataFileReader =
new DataFileReader<PersonWithMiddleName>(personAvroFile, personWithMiddleNameDatumReader);
} catch (IOException e1) {
Assertions.fail();
}
PersonWithMiddleName personWithMiddleName = null;
while (personWithMiddleNameDataFileReader.hasNext()) {
// Reuse object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
try {
personWithMiddleName = personWithMiddleNameDataFileReader.next(personWithMiddleName);
} catch (IOException e) {
Assertions.fail();
}
}
log.info(
"Now a PersonWithMiddleName has been read from the file that was written as a Person: {}",
personWithMiddleName.toString());
// ******************************************************************************************************
// Serialize the Person to a byte array
byte[] personByteArray = new byte[0];
ByteArrayOutputStream personByteArrayOutputStream = new ByteArrayOutputStream();
Encoder encoder = null;
try {
encoder = EncoderFactory.get().binaryEncoder(personByteArrayOutputStream, null);
personSpecificDatumWriter.write(person, encoder);
encoder.flush();
personByteArray = personByteArrayOutputStream.toByteArray();
} catch (IOException e) {
log.error("Serialization error:" + e.getMessage());
}
log.info("The Person is now serialized to a byte array: {}", new String(personByteArray));
// ******************************************************************************************************
// Deserialize the Person byte array into a Person object
BinaryDecoder binaryDecoder = null;
Person decodedPerson = null;
try {
binaryDecoder = DecoderFactory.get().binaryDecoder(personByteArray, null);
decodedPerson = personDatumReader.read(null, binaryDecoder);
} catch (IOException e) {
log.error("Deserialization error:" + e.getMessage());
}
log.info("Decoded Person from byte array {}", decodedPerson.toString());
// ******************************************************************************************************
// Deserialize the Person byte array into a PesonWithMiddleName object
PersonWithMiddleName decodedPersonWithMiddleName = null;
try {
binaryDecoder = DecoderFactory.get().binaryDecoder(personByteArray, null);
decodedPersonWithMiddleName = personWithMiddleNameDatumReader.read(null, binaryDecoder);
} catch (IOException e) {
log.error("Deserialization error:" + e.getMessage());
}
log.info(
"Decoded PersonWithMiddleName from byte array {}", decodedPersonWithMiddleName.toString());
// ******************************************************************************************************
// Serialize the Person to JSON
byte[] jsonByteArray = new byte[0];
personByteArrayOutputStream = new ByteArrayOutputStream();
Encoder jsonEncoder = null;
try {
jsonEncoder =
EncoderFactory.get().jsonEncoder(Person.getClassSchema(), personByteArrayOutputStream);
personSpecificDatumWriter.write(person, jsonEncoder);
jsonEncoder.flush();
jsonByteArray = personByteArrayOutputStream.toByteArray();
} catch (IOException e) {
log.error("Serialization error:" + e.getMessage());
}
log.info("The Person is now serialized to JSON: {}", new String(jsonByteArray));
// ******************************************************************************************************
// Deserialize the Person JSON into a Person object
JsonDecoder jsonDecoder = null;
try {
jsonDecoder =
DecoderFactory.get().jsonDecoder(Person.getClassSchema(), new String(jsonByteArray));
decodedPerson = personDatumReader.read(null, jsonDecoder);
} catch (IOException e) {
log.error("Deserialization error:" + e.getMessage());
}
log.info("Decoded Person from JSON: {}", decodedPerson.toString());
// ******************************************************************************************************
// Deserialize the Person JSON into a PersonWithMiddleName object
try {
jsonDecoder =
DecoderFactory.get()
.jsonDecoder(PersonWithMiddleName.getClassSchema(), new String(jsonByteArray));
decodedPersonWithMiddleName = personWithMiddleNameDatumReader.read(null, jsonDecoder);
} catch (AvroTypeException ae) {
// Do nothing. We expect this since JSON didn't serialize anything out.
log.error(
"An AvroTypeException occurred trying to deserialize Person JSON back into a PersonWithMiddleName. Here's the exception: {}",ae.getMessage());
} catch (Exception e) {
log.error("Deserialization error:" + e.getMessage());
}
}
}
Person.avsc
{
"type": "record",
"namespace": "org.acme.avro_testing",
"name": "Person",
"fields": [
{
"name": "firstName",
"type": ["null", "string"],
"default": null
},
{
"name": "lastName",
"type": ["null", "string"],
"default": null
}
]
}
PersonWithMiddleName.avsc
{
"type": "record",
"namespace": "org.acme.avro_testing",
"name": "PersonWithMiddleName",
"fields": [
{
"name": "firstName",
"type": ["null", "string"],
"default": null
},
{
"name": "middleName",
"type": ["null", "string"],
"default": null
},
{
"name": "lastName",
"type": ["null", "string"],
"default": null
}
]
}
Test Output
Person has been created: {"firstName": "Joe", "lastName": "Cool"}
Person has been written to an Avro file
Person read from the file: {"firstName": "Joe", "lastName": "Cool"}
Now a PersonWithMiddleName has been read from the file that was written as a Person: {"firstName": "Joe", "middleName": null, "lastName": "Cool"}
The Person is now serialized to a byte array: JoeCool
Decoded Person from byte array {"firstName": "Joe", "lastName": "Cool"}
Decoded PersonWithMiddleName from byte array {"firstName": "Joe", "middleName": null, "lastName": "Cool"}
The Person is now serialized to JSON: {"firstName":{"string":"Joe"},"lastName":{"string":"Cool"}}
Decoded Person from JSON: {"firstName": "Joe", "lastName": "Cool"}
An AvroTypeException occurred trying to deserialize Person JSON back into a PersonWithMiddleName. Here's the exception: Expected field name not found: middleName
person.avro
Objavro.schema�{"type":"record","name":"Person","namespace":"org.acme.avro_testing","fields":[{"name":"firstName","type":["null","string"],"default":null},{"name":"lastName","type":["null","string"],"default":null}]}

For question one, I'm not a Java expert, but in Python instead of writing to an actual file, there is the concept of a file-like object that has the same interface as a file, but just writes to a byte buffer. So for example, instead doing this:
file = open(file_name, "wb")
# use avro library to write to file
file.close()
You can do something like this:
from io import BytesIO
bytes_interface = BytesIO()
# use bytes_interface the same way you would the previous "file" object
byte_output = bytes_interface.getvalue()
So that final byte_output would be the bytes that normally would have been written to a file but now is just a byte buffer that could be stored anywhere. Does Java have some concept like this? Alternatively I'm assuming there is some way in Java to read the file contents back into a byte buffer if you absolutely have to go through the process of writing an actual temporary file.
For question two, I think you are hitting the same problem mentioned in this Jira ticket: https://issues.apache.org/jira/browse/AVRO-2890. Currently the JSON decoder expects the schema the data was written with and can't do any sort of schema evolution with a different schema than the data was written with.

Related

How to train Open NLP without file

i have the following code for training Open NLP POS Tagger
Trainer(String trainingData, String modelSavePath, String dictionary){
try {
dataIn = new MarkableFileInputStreamFactory(
new File(trainingData));
lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<POSSample> sampleStream = new WordTagSampleStream(lineStream);
POSTaggerFactory fac=new POSTaggerFactory();
if(dictionary!=null && dictionary.length()>0)
{
fac.setDictionary(new Dictionary(new FileInputStream(dictionary)));
}
model = POSTaggerME.train("en", sampleStream, TrainingParameters.defaultParams(), fac);
} catch (IOException e) {
// Failed to read or parse training data, training failed
e.printStackTrace();
} finally {
if (lineStream != null) {
try {
lineStream.close();
} catch (IOException e) {
// Not an issue, training already finished.
// The exception should be logged and investigated
// if part of a production system.
e.printStackTrace();
}
}
}
}
and this works just fine. Now, is it possible to do the same without involving files? I want to store the training data in a database somewhere. Then i can read it as a stream or chunks and feed it to the trainer. I do not want to create a temp file. Is this possible?

Yes, instead of passing FileInputStream to a dictionary, you can create your own implementation of InputStream, say DatabaseSourceInputStream and use it instead.

How to get the mp4 url for Youtube videos using Youtube v3 API

How do I get the full mp4 url to play the video from it's actual location in my application using some other source except Youtube. The gdata/youtube API has been deprecated so I am having trouble. Any help will be appreciated. Thanks.

i made a very simple API : https://gist.github.com/egyjs/9e60f1ae3168c38cc0f0054c15cd6a83
As Example:
YouTube Video Link: https://www.youtube.com/watch?v=**YGCLs9Bt_KY**
now to get the Direct link
you need to call the api , like this (change example.com to your site) :
https://example.com/?url=https://www.youtube.com/watch?v=YGCLs9Bt_KY
returns:
[
{
"url": "https:\/\/r10---sn-aigllnlr.googlevideo.com\/videoplayback?key=yt6&signature=81D86D3BC3D34D8A3B865464BE7BC54F34C1B0BC.7316033C2DD2F65E4D345CFA890257B63D7FE2A2&mt=1522999783&expire=1523021537&sparams=dur%2Cei%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Cratebypass%2Crequiressl%2Csource%2Cexpire&requiressl=yes&ei=gSLHWvuxDMOUVYaTqYgB&dur=244.204&pl=22&itag=22&ip=185.27.134.50&lmt=1522960451860848&id=o-AAoaDzyDCVXS404wfqZoCIdolGU-NM3-4yDxC0t868iL&ratebypass=yes&ms=au%2Conr&fvip=2&source=youtube&mv=m&ipbits=0&mm=31%2C26&mn=sn-aigllnlr%2Csn-5hne6nsy&mime=video%2Fmp4&c=WEB&initcwndbps=710000",
"quality": "hd720",
"itag": "22",
"type": "video\/mp4; codecs=\"avc1.64001F, mp4a.40.2\""
},
{
"url": "https:\/\/r10---sn-aigllnlr.googlevideo.com\/videoplayback?key=yt6&mt=1522999783&gir=yes&expire=1523021537&sparams=clen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Cratebypass%2Crequiressl%2Csource%2Cexpire&itag=43&ratebypass=yes&fvip=2&ipbits=0&mime=video%2Fwebm&initcwndbps=710000&signature=71DC48B9BF4B2E3ED46FE0A4CD36FE027DACF31E.4624B7B4BCB947336CEB029E9958B136F79759EB&clen=24203231&requiressl=yes&dur=0.000&pl=22&ip=185.27.134.50&lmt=1522961642553275&ei=gSLHWvuxDMOUVYaTqYgB&ms=au%2Conr&source=youtube&mv=m&id=o-AAoaDzyDCVXS404wfqZoCIdolGU-NM3-4yDxC0t868iL&mm=31%2C26&mn=sn-aigllnlr%2Csn-5hne6nsy&c=WEB",
"quality": "medium",
"itag": "43",
"type": "video\/webm; codecs=\"vp8.0, vorbis\""
},
{
"url": "https:\/\/r10---sn-aigllnlr.googlevideo.com\/videoplayback?key=yt6&mt=1522999783&gir=yes&expire=1523021537&sparams=clen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Cratebypass%2Crequiressl%2Csource%2Cexpire&itag=18&ratebypass=yes&fvip=2&ipbits=0&mime=video%2Fmp4&initcwndbps=710000&signature=C83DE33E3DC80981A65DB3FE4E6B3A48BF7500E4.361D0EE6210B30D3D3A80F43228DEF1BD20691A4&clen=15954979&requiressl=yes&dur=244.204&pl=22&ip=185.27.134.50&lmt=1522960340235683&ei=gSLHWvuxDMOUVYaTqYgB&ms=au%2Conr&source=youtube&mv=m&id=o-AAoaDzyDCVXS404wfqZoCIdolGU-NM3-4yDxC0t868iL&mm=31%2C26&mn=sn-aigllnlr%2Csn-5hne6nsy&c=WEB",
"quality": "medium",
"itag": "18",
"type": "video\/mp4; codecs=\"avc1.42001E, mp4a.40.2\""
},
{
"url": "https:\/\/r10---sn-aigllnlr.googlevideo.com\/videoplayback?key=yt6&mt=1522999783&gir=yes&expire=1523021537&sparams=clen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Crequiressl%2Csource%2Cexpire&itag=36&fvip=2&ipbits=0&mime=video%2F3gpp&initcwndbps=710000&signature=3E993D911492DA039A16BB26182ACDC6C6A04FCC.BFB9728C71CD03970B0F15AFD51A7355F9D3F899&clen=6759799&requiressl=yes&dur=244.273&pl=22&ip=185.27.134.50&lmt=1522957367267598&ei=gSLHWvuxDMOUVYaTqYgB&ms=au%2Conr&source=youtube&mv=m&id=o-AAoaDzyDCVXS404wfqZoCIdolGU-NM3-4yDxC0t868iL&mm=31%2C26&mn=sn-aigllnlr%2Csn-5hne6nsy&c=WEB",
"quality": "small",
"itag": "36",
"type": "video\/3gpp; codecs=\"mp4v.20.3, mp4a.40.2\""
},
{
"url": "https:\/\/r10---sn-aigllnlr.googlevideo.com\/videoplayback?key=yt6&mt=1522999783&gir=yes&expire=1523021537&sparams=clen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Crequiressl%2Csource%2Cexpire&itag=17&fvip=2&ipbits=0&mime=video%2F3gpp&initcwndbps=710000&signature=810D13A2C507A4EA220E6DA895B39B237FA22DAF.898D020851087CF3C10BC6E3ED7360736A239904&clen=2443931&requiressl=yes&dur=244.273&pl=22&ip=185.27.134.50&lmt=1522957365473654&ei=gSLHWvuxDMOUVYaTqYgB&ms=au%2Conr&source=youtube&mv=m&id=o-AAoaDzyDCVXS404wfqZoCIdolGU-NM3-4yDxC0t868iL&mm=31%2C26&mn=sn-aigllnlr%2Csn-5hne6nsy&c=WEB",
"quality": "small",
"itag": "17",
"type": "video\/3gpp; codecs=\"mp4v.20.3, mp4a.40.2\""
}
]
update :
to see the source code :
GIST: https://gist.github.com/egyjs/9e60f1ae3168c38cc0f0054c15cd6a83

Sorry sir, You cannot do this with youtube api v3. you have to use a url of youtube which is not an api, but here you can get all the videos related to this.
See
http://www.youtube.com/get_video_info?&video_id='. $my_id.'&asv=3&el=detailpage&hl=en_US
Or you can do to get all the videos download link even it is private or not allow for your country
1st: Go to any webpage of youtube videos link https://www.youtube.com/watch?v=9mdJV5-eias
2nd : view source of that page
3rd : 188 or 187 line, where you find source Javascript codes where location of the videos with mp4 format is also available.
You can do the second Idea by simplehtmldom and some php functions. and the first one can get by using curl which is easy but in little bit hard way by php. Thank you hope this will help you.

For Local to Java/Android here is how i achieved this, credit goes to #abdo-el-zahaby i converted his php script to ~equivalent java code it uses okhttp client to get urls
final String videoInfoUrl = "http://www.youtube.com/get_video_info?video_id=some_video_id&el=embedded&ps=default&eurl=&gl=US&hl=en";
Request request = new Request.Builder()
.cacheControl(CacheControl.FORCE_NETWORK)
.url(videoInfoUrl)
.build();
final Response response = okHttpClient.newCall(request).execute();
InputStream inputStream = response.body().byteStream();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String line;
final StringBuilder contentBuilder = new StringBuilder();
while ((line = bufferedReader.readLine()) != null) {
contentBuilder.append(line);
}
final String streamKey = "url_encoded_fmt_stream_map";
final Map<String, String> map = new HashMap<>();
final String content = contentBuilder.toString();
String[] ampSplit = content.split("&");
for (String s : ampSplit) {
printDivider();
final String[] equalsPlit = s.split("=");
if (equalsPlit.length >= 2) {
String key;
String value;
key = equalsPlit[0];
value = equalsPlit[1];
map.put(key, value);
}
printDivider();
}
int count = 0;
if (map.containsKey(streamKey)) {
String[] streams = map.get(streamKey).split(",");
for (String stream : streams) {
String[] streamSplit = stream.split("&");
for (String s : streamSplit) {
printDivider();
final String urlDecoded = URLDecoder.decode(s, "UTF-8");
String[] details = urlDecoded.split(",");
for (String detail : details) {
System.out.println("Detail " + URLDecoder.decode(detail, "UTF-8"));
final String urlContent= URLDecoder.decode(detail, "UTF-8");
final String url = urlContent.substring(urlContent.indexOf("http"), urlContent.indexOf(";"));
mp4Url.put(Integer.toString(count++), url);
}
}
printDivider();
}
}
This is the code i am using to download and store in sdcard/internal memory
Request request = new Request.Builder()
.cacheControl(CacheControl.FORCE_NETWORK)
.url(url)
.build();
final Response response = okHttpClient.newCall(request).execute();
InputStream inputStream = response.body().byteStream();
final File newFile = new File(location);
boolean created = newFile.createNewFile();
System.out.println(location + " new file created: " + created);
byte[] buff = new byte[4096];
long downloaded = 0;
long target = response.body().contentLength();
System.out.println("File size is: " + Long.toString(target));
OutputStream outStream = new FileOutputStream(newFile);
while (true) {
int read = inputStream.read(buff);
if (read == -1) {
break;
}
outStream.write(buff, 0, read);
//write buff
downloaded += read;
}
System.out.println("Target: " + target +", Downloaded: " + downloaded);
outStream.flush();

I don't know how it works, but check out https://podsync.net/.
When you input a youtube playlist link, it does some magic and generates an mp4 link to be used by podcast catchers. Examine the returned xml file and you'll see lines like this for each video in the playlist:
<enclosure url="http://podsync.net/download/[random-letters]/[video-id].mp4" length="x" type="video/mp4"></enclosure>
In my experience, that URL works no matter what video-id you use.

Unmarshalling XML with (xpath)conditions

I'm trying to unmarshall some XML which is structured like the following example:
<player>
<stat type="first_name">Somebody</stat>
<stat type="last_name">Something</stat>
<stat type="birthday">06-12-1987</stat>
</player>
It's dead easy to unmarshal this into a struct like
type Player struct {
Stats []Stat `xml:"stat"`
}
but I'm looking to find a way to unmarshal it into a struct that's more like
type Player struct {
FirstName string `xml:"stat[#Type='first_name']"`
LastName string `xml:"stat[#Type='last_name']"`
Birthday Time `xml:"stat[#Type='birthday']"`
}
is there any way to do this with the standard encoding/xml package?
If not, can you give me a hint how one would split down such a "problem" in go? (best practices on go software architecture for such a task, basically).
thank you!

The encoding/xml package doesn't implement xpath, but does have a simple set of selection methods it can use.
Here's an example of how you could unmarshal the XML you have using encoding/xml. Because the stats are all of the same type, with the same attributes, the easiest way to decode them will be into a slice of the same type. http://play.golang.org/p/My10GFiWDa
var doc = []byte(`<player>
<stat type="first_name">Somebody</stat>
<stat type="last_name">Something</stat>
<stat type="birthday">06-12-1987</stat>
</player>`)
type Player struct {
XMLName xml.Name `xml:"player"`
Stats []PlayerStat `xml:"stat"`
}
type PlayerStat struct {
Type string `xml:"type,attr"`
Value string `xml:",chardata"`
}
And if it's something you need to transform often, you could do the transformation by using an UnamrshalXML method: http://play.golang.org/p/htoOSa81Cn
type Player struct {
XMLName xml.Name `xml:"player"`
FirstName string
LastName string
Birthday string
}
func (p *Player) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
for {
t, err := d.Token()
if err == io.EOF {
break
} else if err != nil {
return err
}
if se, ok := t.(xml.StartElement); ok {
t, err = d.Token()
if err != nil {
return err
}
var val string
if c, ok := t.(xml.CharData); ok {
val = string(c)
} else {
// not char data, skip for now
continue
}
// assuming we have exactly one Attr
switch se.Attr[0].Value {
case "first_name":
p.FirstName = val
case "last_name":
p.LastName = val
case "birthday":
p.Birthday = val
}
}
}
return nil
}

Formatting Rabl for Ember or Mapping Ember to Rabl?

My rails app used to produce JSON that looked like this:
{"paintings":
[
{"id":1,"name":"a"},
{"id":2,"name":"b"}
]
}
I added rabl json formatting, and now the json looks like this:
[
{"id":1,"name":"a"},
{"id":2,"name":"b"}
]
Ember is telling me
Uncaught Error: assertion failed: Your server returned a hash with the key 0 but you have no mapping for it
How can I make Ember understand this? Or how should I make rabl understandable to ember?

Found a solution. My index.json.rabl looked like this:
collection #paintings
extends 'paintings/show'
Now it looks like this:
collection #paintings => :paintings
extends 'paintings/show'

You can extend DS.RESTSerializer and change extract and extractMany. The following is merely a copy and paste from the serializer I use in .NET, for the same scenario:
window.App = Ember.Application.create();
var adapter = DS.RESTAdapter.create();
var serializer = Ember.get( adapter, 'serializer' );
serializer.reopen({
extractMany: function (loader, json, type, records) {
var root = this.rootForType(type);
root = this.pluralize(root);
var objects;
if (json instanceof Array) {
objects = json;
}
else {
this.sideload(loader, type, json, root);
this.extractMeta(loader, type, json);
objects = json[root];
}
if (objects) {
var references = [];
if (records) { records = records.toArray(); }
for (var i = 0; i < objects.length; i++) {
if (records) { loader.updateId(records[i], objects[i]); }
var reference = this.extractRecordRepresentation(loader, type, objects[i]);
references.push(reference);
}
loader.populateArray(references);
}
},
extract: function (loader, json, type, record) {
if (record) loader.updateId(record, json);
this.extractRecordRepresentation(loader, type, json);
}
});
And before you set your store, you must configure your model to sideload properly:
serializer.configure( 'App.Painting', {
sideloadAs: 'paintings'
} );
App.Store = DS.Store.extend({
adapter: adapter,
revision: 12
});
Now you should be able to load rootless JSON payload into your app.
(see fiddle)

Tika--Extracting Distinct Items from a Compound Document

Question:
Assume an email message with an attachment (assume a JPEG attachment). How do I parse (not using the Tika facade classes) the email message and return the distinct pieces--a) the email text contents and b) the email attachment?
Configuration:
Tika 1.2
Java 1.7
Details:
I have been able to properly parse email messages in basic email message formats. However, after the parsing, I need to know a) the email's text contents and b) the the contents of any attachment to the email. I will store these items in my database as essentially parent email with child attachments.
What I cannot figure out is how I can "get back" the distinct parts and know that the parent email has attachments and be able to separately store those attachments referenced to the mail. This is, I believe, essentially similar to extracting ZipFile contents.
Code Example:
private Message processDocument(String fullfilepath) {
try {
File filename = new File(fullfilepath) ;
return this.processDocument(filename) ;
} catch (NullPointerException npe) {
Message error = new Message(false) ;
error.appendErrorMessage("The file name was null.") ;
return error ;
}
}
private Message processDocument(File filename) {
InputStream stream = null;
try {
stream = new FileInputStream(filename) ;
} catch (FileNotFoundException fnfe) {
// TODO Auto-generated catch block
fnfe.printStackTrace();
System.out.println("FileNotFoundException") ;
return diag ;
}
int writelimit = -1 ;
ContentHandler texthandler = new BodyContentHandler(writelimit);
this.safehandlerbodytext = new SafeContentHandler(texthandler);
this.meta = new Metadata() ;
ParseContext context = new ParseContext() ;
AutoDetectParser autodetectparser = new AutoDetectParser() ;
try {
autodetectparser.parse(
stream,
texthandler,
meta,
context) ;
this.documenttype = meta.get("Content-Type") ;
diag.setSuccessful(true);
} catch (IOException ioe) {
// if the document stream could not be read
System.out.println("TikaTextExtractorHelper IOException " + ioe.getMessage()) ;
//FIXME -- add real handling
} catch (SAXException se) {
// if the SAX events could not be processed
System.out.println("TikaTextExtractorHelper SAXException " + se.getMessage()) ;
//FIXME -- add real handling
} catch (TikaException te) {
// if the document could not be parsed
System.out.println("TikaTextExtractorHelper TikaException " + te.getMessage()) ;
System.out.println("Exception Filename = " + filename.getName()) ;
//FIXME -- add real handling
}
}

When Tika hits an embedded document, it goes to the ParseContext to see if you have supplied a recursing parser. If you have, it'll use that to process any embedded resources. If you haven't, it'll skip.
So, what you probably want to do is something like:
public static class HandleEmbeddedParser extends AbstractParser {
public List<File> found = new ArrayList<File>();
Set<MediaType> getSupportedTypes(ParseContext context) {
// Return what you want to handle
HashSet<MediaType> types = new HashSet<MediaType>();
types.put(MediaType.application("pdf"));
types.put(MediaType.application("zip"));
return types;
}
void parse(
InputStream stream, ContentHandler handler,
Metadata metadata, ParseContext context
) throws IOException {
// Do something with the child documents
// eg save to disk
File f = File.createTempFile("tika","tmp");
found.add(f);
FileOutputStream fout = new FileOutputStream(f);
IOUtils.copy(stream,fout);
fout.close();
}
}
ParseContext context = new ParseContext();
context.set(Parser.class, new HandleEmbeddedParser();
parser.parse(....);

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Avro schema evolution testing and questions - avro

Related

How to train Open NLP without file

How to get the mp4 url for Youtube videos using Youtube v3 API

Unmarshalling XML with (xpath)conditions

Formatting Rabl for Ember or Mapping Ember to Rabl?

Tika--Extracting Distinct Items from a Compound Document

Categories

Resources