How to parse Resume(Word doc) using Tika parser, Resumes have different format - apache-tika

I want to parse resume to fetch Skillset, location, experience etc. All resume might have different format, Once parsed I will use those firlds to index via Solr, any help please?

Related

EventStore - read specific time frame from stream

I have a system that produces thousands of messages per hour. Some location tracking system that gather events from different devices and doing different calculations based on that messages.
I'm trying to evaluate if event store suites for this use case. So my plane is associate stream per device and accumulate messages in those streams.
Now the question - will I be able to read those messages for specific time frame in the past? I don't want to replay all the messages from the beginning, I just need fast access to messages from date1 to date2.
Any ideas? So far what I saw in the docs only relates to reading all messages either from the beginning or from the end and do the filtration during the process. But this pattern doesn't look very optimal to me. Am I doing something wrong?
EventStoreDB index allows you to read events from a specific stream by the event number, but not by date. What you wrote about reading from the beginning or the end is not entirely correct, as you can read from any position in the stream, both backwards and forwards, but then again it has nothing to do with dates.
Essentially, the data when the event was written to the database is considered not important and transient. For example, if you decide to move your data to another store using replication, all the events will get the new date. That's why if the date is important, it should be stored somewhere in the event data or metadata. EventStoreDB doesn't know about the event payload (or meta), and doesn't index that.
If you are looking to find a database kind that allows you to query records by time, the best chance is to look at time series databases like Prometheus and InfluxDB. These databases are specifically designed to index primarily by timestamp, and optimised to store data like sensor readings where each reading is a replacement of the previous one. EventStoreDB is not designed for that purpose, it's the database built to support event-sourced applications, and sensor readings is not that.

Should I use an YAML file or the DB to store my success/error messages in Rails app?

In my Rails app, after executing some code I want to send Slack messages to users to notify them of the execution result. There are multiple processes for which I want to send these messages, and in short, I need somewhere to store message templates for successes/errors (they're just short strings, like "Hi, we've successfully done x!", but they differ for each process).
Right now, I have a SlackMessage model in the DB from which you can retrieve the message content. However, I heard that it's better to manage custom messages like this in a yml file, since it's easier to add/edit the messages later on (like this, even though this is for locales).
What is the best practice for this kind of scenario? If it's not to use a DB, I appreciate if you could give pointers or a link on how to do it (in terms of using yml files, the only material I could find was on internationalisation).
Why don't you use the already existing I18n module in Rails? This is perfect for storing messages, and gives you the ability to use translations would you ever need them in the future.
Getting a message is simple:
Slack.message(I18n.t(:slack_message, scope:'slack'))
In this case you need a translation file like this:
en:
slack:
slack_message: This is the message you are going to select.
Read more on I18n: https://guides.rubyonrails.org/i18n.html
YAML is generally much slower than a DB to load data from. Additionally YAML parsers load all of the data usually even if there are multiple documents in the YAML stream.
For programs that have a long run-time and use a large part of the messages, it is usually not a problem to use YAML. But on short running programs the loading can be a significant part of the run-time and techniques like delaying the loading and caching might not help. As an example: I got a PR for my YAML library some time ago, that delayed the instantiation of regular expressions in the library, as that delayed the startup of some programs.
If you have many messages, they all stay in memory after loading from YAML, that might be a problem. With a DB it is much more common to only retrieve what is needed, and rely on the DB to do that efficiently (caching, etc).
If the above mentioned advantages and criteria don't help you decide, you can also have it both ways: by having the ease of reading/editing of YAML and the speed, caching, etc. of a DB. "Just" convert the YAML stream to a DB, either explicitly after editing the YAML document or on first use by your program (by looking at the files date-time-stamps). That is approach that programs like postfix use relying on postmap (although the inputs are text, but not YAML files).

Timestamp for contiki

I want to add a timestamp with the data I am collecting in Contiki. When I am adding time.h from c library, it is showing that the associated files are missing. Can someone show to add the time stamp either using standard C files or using Contiki library?
You might want to have a look at core/sys/clock.h. It might not provide a date and time but at least you will get the seconds since startup.
it is better to output the data to python and there it is possible to add timestamp(month and date)
I had a similar requirement in my project. I did the following.
Use microcontroller RTC to get date and time
Convert that into Unix timestamp. If you just google you can find simple C code for converting date and time into Unix timestamp.
Send timestamp with payload, my payload is in json format so I use a key to distinguish from other data.
Hope this helps.

Query a xml file from a url

In my iOS application project I have a string which holds an address and I was given a url of a xml file that holds a list of addresses and precinct locations like so:
<?xml version=”1.0” encoding=”UTF-8”?>
<root>
<row>
<res_street_address>448 IVY YOKELEY RD</res_street_address>
<res_city_desc>WINSTON-SALEM</res_city_desc>
<state_cd>NC</state_cd>
<zip_code>27101</zip_code>
<precinct_desc>GUMTREE # 16</precinct_desc>
</row>
This file is very large and takes a long time to load in my browser. My job is to find the precinct location for a given address in this app. I have been at this for days and I’m about at my wits end. Everything I’ve found so far leads me to believe that I will have to download the file first and then parse it. This file is VERY large and this will slow the app down. Is there a way that I can do a query on the records in the xml file and then download those results? I’ve tried
https://thewebaddress/voter.xml/?res_street_address=”448 IVY YOKELY RD”&res_city_desc=”WINSTON-SALEM”…
but I’ve found this isn’t the way to do this. I have very little experience with web technologies, so a lot of what I find online is over my head. What is the best way to go about something like this?
you can't natively do that with a xml file.
a xml file is purely data and contains no 'programming' that could fulfil your search request. to WORK with an XML you have to download it. (you CAN work with a byte stream and a SAX based XML Parser and only download as much as it needs to find your result... but thats a small optimisation and not worth the trouble IMO)
point is: the DATA you want to work with HAS to be loaded
(a way would BE to install a script (php/asp/whatever) to have this scan the xml on the server side and that would then only serve you the relevant xml parts)
[I would recommend this if you can. It saves CPU & bandwidth -> time & battery

Parsing data in an iOS app using XML or JSON parser

I just wanted to clarify my understanding of web services consumed in iOS apps.
So, NSURLConnection is used to invoke a web method from an iOS app and to get the data returned by the web method. And once the data is received, we can use either XMLParser or JSON parser to parse that data, depending on what the web service is written in. Is that correct?
Just to add onto what others have already said in the comments:
Ideally you'll want to always connect asynchronously, using NSURLConnection's Delegate which will not block the main thread. If you're not already aware, iOS will force quit applications that block the main thread for too long a time. Synchronous connections can be ok in certain instances, but I'd say over 90% of the time you'll want asynchronous.
That said, asynchronous connections cause their own set of headaches. If you need to have data before allowing your users access to the application interface, you'll need to throw up loading screens and then remove them. You'll need to manage canceling requests if the user moves to a different part of the application that starts a new request. You'll also want to determine if any ongoing requests need to be registered for background completion in the event the user backgrounds the app.
On the JSON parsing side of things, it's always recommended to use NSData when you can, as converting JSON from NSString adds a lot of overhead from NSString itself. I personally do NOT use Apple's provided JSON parser, but the excellent JSONKit as it's faster than even Apple's binary plist parser. It does however require that your JSON adhere very strictly to the JSON RFC, and that it be UTF-8/16/32 encoded, not ASCII. This is fairly standard amongst the faster JSON parsers available.
Avoid XML all together when possible. The built-in iPhone parser is a SAX-style parser only, which is obnoxious to implement. If you must use XML, take a look at Ray Wenderlich's XML parser comparison and choose an appropriate one. If you have a large XML document to parse though, SAX is probably your only option given the limited processing capabilities of iDevices.
--EDIT--
One of the commenters mention SBJSON. I'd recommend against that entirely. It's slower than Apple's JSON parser (which has been available since iOS 5), and is much slower then JSONKit and several others by an order of magnitude. I specifically moved my current enterprise-level iOS Application off of SBJSON because of errors I was receiving in the JSON parsing as well.

Resources