How do I automate do a phone number lookup online for a large list of names & addresses I have in CSV? - phone-number

I have a large list of names & addresses (currently in an excel file). I need to perform a phone number lookup
I can't seem to find any APIs for phone number lookup, and the only solution I can find is a regular phone book lookup where you type in name and address one by one into the browser and find the number eg canada411.ca
How can I automate this process and get the numbers I need in bulk?
Thanks

Related

How to get a list of all topics containing specific values known to MQTT broker?

I'm looking for a way to get a list of all topics known to a broker. There are some quite similar question's, but they didn't help me to figure it out for my use case.
I've got 3 Raspberry Pi's with multiple sensors (temperature, humidity) which are connected over an MQTT network. Every Pi has it's own database containing time series of measurements and other system variables(like CPU).
Now I'm looking for a way for the following szenario:
I want to monitor my system and detect anomalies. For that I want to get all sensor-time series in the last x seconds and process them in a python script. My system to do the monitoring calculations can be every Pi.
Example: I'm on RPI2 and want to monitor the whole distributed network. There's no given knowledge about the sensors attached to the Pi's. Now from my python script running on RP2 I would initalise a MQTT client and subscribe every sensor data on the broker. I know about the wildcard # but I'm not sure how to use it in that case. My magic command would look like the following pseudo code:
1) client subscribe to all sensor data - #/sensor/#
2) get list with all topics
3) client subscribe to all topics from given list list/#
4) analyse data for anomalies every x seconds
First, your wildcard topic patterns are not valid. Topic patterns can only contain a single '#' character and it can only appear at the end of a topic e.g. foo/bar/# is valid, #/foo is not. You can use the + character which is a single level wildcard character.
This means a topic pattern of +/sensor/# will match each of the following:
rpi1/sensor/foo
rpi1/sensor/bar/temp
but not
rpi1/foo/sensor/bar
Next brokers do not have a list of topics that exist. Topics only really exist at the instant that a message is published to one, the broker then checks the patterns that subscribing clients have requested and checks that topic against the list and delivers it to the clients that match.
Thirdly when bridging brokers in loops like that you have to be very careful with the bridge filters to make sure that messages don' end up a constant loop.
The solution is probably to designate a "master" broker and bridge all the others one way to that broker and then have the client subscribe to either '#' to get everything or something more like '+/sensor/#' to just see the sensor readings.

Per-partition GroupByKey in Beam

Beam's GroupByKey groups records by key across all partitions and outputs a single iterable per-key-per-window. This "brings associated data together into one location"
Is there a way I can groups records by key locally, so that I still get a single iterable per-key-per-window as its output, but only over the local records in the partition instead of a global group-by-key over all locations?
If I understand your question correctly, you don't want to transfer a data over network if a part of it (partition) was processed on the same machine and then can be grouped locally.
Normally, Beam doesn't provide you details where and how your code will be running since it may vary depending on runner/engine/resource manager. Though, if you can fetch some uniq information about your worker (like hostname, ip or mac address) then you can use it as a part of your key and group all related data by this. Quite likely that in this case these data partitions won't be moved to other machines since all needed input data is already sitting on the same machine and can be processed locally. Though, afaik, there is no 100% guarantee about that.

Where does raw geoip data come from?

This question is a general version of a more specific question asked here. However, those answers were unusable.
Question: What is the raw source for geoIP data?
Many websites will tell me where my IP is, but they all appear to be using databases from fewer than 5 companies (most are using a database from MaxMind). These companies offer limited free versions of their databases, but I'm trying to determine what they're using for their source data?
I've tried using Linux/Unix commands such as ping, traceroute, dig, whois, etc., but they don't provide predictably accurate information.
Preamble: I believe this is actually a very valid question for SO website as understanding how such things work is important to understanding how such datasets can be used in software. However the answer to this question is rather complex and full of historical remarks.
First - it is worth mentioning that there is NO unified raw geoip data. Such thing just does not exist. Second - the data for this comes from multiple resources and often is not reliable and/or outdated.
To understand how that comes to be one need to know how Internet came into existence and spread around the world. Short summary is below:
IANA is a global [non-profit] organization which manages assignment of IP blocks to regional organizations: https://www.iana.org/numbers This happens upon request and regional organization requests specified block size
Regional organizations may assign those IP blocks to either ISP directly or to country level sub-organizations (who would assign that to ISP then).
ISP assigns IP addresses to local branches etc.
From above you can easily see that:
There is no single body which is responsible for IP block assignment to this or that location
Decisions how to (and whether to) release information about which IP belongs to which location are not taken uniformly and instead each organizations decides how to (and whether do it at all) release that information
All of above creates a whole lot of mess. It takes a lot of dedication and long time to obtain, aggregate and sort this data. And this is why most up-to-date and detailed geoip datasets are commercial commodity.
Whoever takes on a challenge of building their own dataset should be able to obtain this information directly from end users (ISPs), because higher level organizations do not know to which location each IP address will be assigned. Higher level organizations only distribute IP blocks among applicants (and keep some reserve for faster processing) and it is a lowest level organizations who decide which location gets which IP address and they are not obligated to release this information publicly.
UPD:
To start building your own dataset you can begin with this list of blocks and how they are assigned

Elasticsearch / Kibana: Application-side joins

is it possible with Kibana (preferably the shining new version 4 beta) to perform application-side joins?
I know that ES / Kibana is not built to replace relational- databases and it is normally a better idea to denormalize my data. In this use-case however, this is not the best approach since index-size is exploding and performance is dropping:
I'm indexing billions of documents containing session information of network flows like this: source ip, source port, destination ip, destination port, timestamp.
Now I also want to collect additional information for each ip address, such as geolocation, asn, reverse dns etc. Adding this information to every single session document makes the whole database unmanageable: There are millions of documents with the same ip addresses and the redundancy of adding the same additional information to all these documents leads to a massive bloat and an unresponsive user-experience even on a cluster with hundreds of gigabytes of ram.
Instead I would like to create a separate index containing only unique ip addresses and the metadata that I have collected to each one of them.
The question is: How can I still analyze my data using kibana? For each document returned by the query, kibana should perform a lookup in the ip-index and "virtually enrich" each ip address with this information. Something like adding virtual fields so the structure would look like this (on the fly):
source ip, source port, source country, source asn, source fqdn
I'm aware that this would come at the cost of multiple queries.
I don't think that there is such thing, but maybe that you could play around with the filters :
You create nice and simple data visualisations that filter on
different types and display only one simple data.
You put these different visualizations in a dashboard in order to display all the data associated with a type of join.
You use the filters as your join key and use the full dashboard,
composed of different panels, to get insights of specific join keys
(ips in your case, or sessions)
You need to create 1 dashboard for every type of join that you want to make.
Note that you will need to harmonize the names and mappings of the fields in your different documents!
Keep us updated, that's an interesting problematic, I would like to now how it turns out with so many documents.

Liaison Delta ECS Batching

Does anyone know if Liaison's ECS product has the capability to aggregate messages together based on a filter criteria and process them after all of the messages have been received?
I need to listen for flat file data messages, read the order number in the flat file and an integer in the flat file that represents how many flat files will be generated for that order number, and then once all of the flat files have been processed I need to map them into a single outbound EDI transaction message (one outbound message per order number).
A standard aggregation pattern; please note I am not asking about EDI batching, which is something different.
Is this something that can be done using ECS functionality, or does an external batching system need to be created?

Resources