Find high-volume log messages in sumologic - sumologic

I am trying to reduce logging costs using sumologic for a microservice I run.
My specific goal is to determine which log statements are costing the most money (or to put it in more easily query-able terms, which log statements are being called the most frequently by my service so I can analyze whether they're really adding value by being ingested into sumo).
I've been digging around the Log Operators Cheat Sheet trying to figure it out and the closest thing I can find seems to be the count_frequent function, but I can't seem to figure out how to use it in a query.
Google has not been of much use in this case (or at least, I haven't really come up with a good search string that Google can latch onto and get me what I'm looking for).
Any help would be appreciated.

Related

Is there a data model, or principle, for structuring data for scripting, rather than the human eye?

I spend a lot of time writing scripts to automate Google Sheets using Google Apps Script.
Google Apps Script can get relatively slow when the size of your data gets very big, spanning multiple sheets within multiple workbooks. I've spent my fair share of time trying to refactor my code as much as possible to reach Apps Script's 6 minute execution quota (excluding cache service).
I believe the root of the problem is the way data is being organized on sheets. An example of this is splitting 1 column into 5 columns for readability despite the data being automated. I'm sure there are other factors such as validating data once its processed into information & making sure there are no bugs but, beyond that, code doesn't care about what data looks like.
Additionally, I don't think this is a matter of the XY Problem. The best way, overall, to make Google Apps Script more efficient is to use less of it.
I'm not well versed in data science in general so I could definitely be wrong but, nonetheless, is this an existing principle I can learn more about? I've tried searching for this already but its hard to look for what you don't know.

Rails profiling time taken for each line execution while process request

Is there any better way to find time taken for every logic executed while processing HTTP request. We need profiling individually rather than group and give time taken for multiple things. We tried NewRelic, AppSignal to get this data but no luck.
I have added screenshot too, here we could see ActiveRecord took around 1.5seconds and view rendering view (response) took around 300ms. When we sum these two it's around 2seconds but total time taken for the request is 10seconds. We are not able to find where the rest of 8seconds taken. NewRelic saying 90% of time taken from controller action but no breakdown for this. Is there any better tools to get more detailed info?
Note: Most of time it's working fine and we have issue in specific time. But we don't have what is causing slowness, to identify this only we are expecting tool to find this.
AppSignal can help you with this, here is an example of an event timeline AppSignal provides when you are looking at the performance of a specific action.
This is provided out of the box for Rails. There will be some cases where you want to narrow down even further that's where custom instrumentation comes in, it helps you find a specific piece of code that is causing performance problems.
You can reach out to us at AppSignal and I will be happy to help.
There may be tooling options that give you a little more information out-of-the-box. But, if you're really trying to identify bottlenecks you may want to take things into your own hands and do some good old-fashioned debugging. Use ruby's built-in Benchmark module in your controller actions and log the results. This may help you identify which sections of code are chewing up response times.
https://ruby-doc.org/stdlib-3.0.0/libdoc/benchmark/rdoc/Benchmark.html

Geodata Querying Optimisations

I am planning to write a Node.js-powered RESTful web service that I will use for a mobile application which provides some sort of location based features. The most basic use case is going to look something like this:
the user can create a resource by sending a request to the web service containing the resource's name and the user's current location (latitude and longitude)
the web service will store the metadata about this resource internally in some sort of collection
the user can query the web service for a list of resources within 5km of his current location
One of the first problems that came up in my mind was scalability. Let's suppose that at some point in the future the server will hold metadata for 1 million resources. When a user will query for nearby results, looping through 1 million entries to compute the distance will take forever.
There are many services out there that have the same flow, so I thought implementing something like this is not going to take me a lot of time. I might have been wrong.
I am now two days into researching proven methods and algorithms. By now I have read everything I could put my hands on about QuadTrees, Geohases, databases with spatial indexing support, formulas and so on. However, I still can't get the whole picture of how everything is going to work.
I was hoping that maybe someone who has worked on something similar could share his insight on what approach might be the most suitable considering this use case and the technologies that I am planning to use. Also, a short description of how it can be implemented would help me a lot!
For those who are also looking for more information on this topic out of curiosity, my answer might not provide much clearance. However, some answers in here might help you understand how you could achieve proximity searches using Geohashes.
My approach, after doing a little research on Redis, will be not to overcomplicate things and just use the tools that are already out there. It has out of the box support for spatial indexing and will most probably meet all my persistance requirements for this project.
Apparently MongoDB also comes with built-in support for geodata. In fact, even RDBMS like MySQL or SQLite do come with such capabilities.

How do "Indeed.com" and "hotelscombined.com" search other sites?

I'm trying to build a vertical (meta) search engine for a particular industry. I'm trying to do somthing similar to "indeed.com" (job search engine) and "hotelscombined.com" (hotel search engine). I would like to know how do these two search engines build up their search results?
1) Is it using APIs of the other websites they serve results from? (seems odd to me since some results come from small and primitive sites).
2) Do other website post updates to these search engines? (Also seems odd as above)
3) Do they internally understand and create a map for each website they serve results from? (if so, then maybe they need to constantly monitor the structure of these sites for any changes. Seems error prone to me).
4) Any other possibilities?
I don't know even where to start, so any pointers in the right direction is much appreciated. (books, tutorials, hints, ideas...)
Thanks
It is mostly a mix of 1 and 3. Ideally, the site will have some sort of API they expose and document. If not, you have to do data scraping. Basically, you reverse-engineer their page. If they get results asynchronously via an undocumented API, you can use that API as well as (until they make a breaking change). Otherwise, it's simply a matter of pulling the text straight out of the HTML.
I don't know of any more advanced techniques since I don't do this myself, but several of my acquaintances have gone on to work on mobile apps that need to do this sort of thing with sports scores and such (not for searching, but same requirements - get someone else's data into our database). The low tech "pull it from the HTML until they change the HTML and break everything" is standard practice where they work.
2 is possible, but to do it you have to either make business arrangements with every source of data you want to use, or gain enough market presence for everyone to want to upload their data.
Also, you don't do this while actually searching (unless you have other constraints as Charles Duffy points out in his comment). You run a process that regularly goes out, gets all the data it can find, and inserts it into your own database, which you then search. This allows you to decouple data gathering from data searching - your search page won't have to know about and handle errors from the scraper, and the scraper has to only "get all the data" from each source instead of being able to transform queries from your site to search each source.

Elastic Search - Get the matching field

I'm using ElasticSearch to implement search on a Webapp (Rails + Tire). When querying the ES server, is there a way to know what field of the Json returned matched the query?
The simplest way is to use the highlight feature, see support in Tire: https://github.com/karmi/tire/blob/master/test/integration/highlight_test.rb.
Do not use the Explain API for other then debugging purposes, as this will negatively affect the performance.
Have you tried using the Explain API from elastic search? The output of explain gives you a detailed explanation of why a document was matched, and it's relevance score.
The algorithm(s) used for searching the records are often much more complex than a single string match. Also, given the fact that you have the possibility of a term matching multiple fields (with possibly different weights), it may not be easy to come up with a simple answer. But, looking at the output of Explain API, you should be able to construct a meaningful message.

Resources