When inserting new data into KairosDB, why is it necessary to include a tag for every datapoint? I really only need to keep query datapoints using their metric name.
It appears to be a design decision. I bet the creators of KairosDB may give you an explanation if you post the question on the KairosDB discussion group, since they are very active on it but not on Stak Overflow.
On another hand, I don't see why you wouldn't need tags, maybe your metrics would benefit for more normalization and using tags (e.g. by data source, host, ...etc.) to do useful queries.
What kind of data are you storing?
Related
I have a website I am developing that will be deployed to several different clients. All of the functionality is the same and the vast majority of the language used is the same. However, some of the clients are in different industries so specific words and phrases within some pages need to changed based off of the company of the individual logged into the site. What is the best way to accomplish this?
In the past I have seen people use string database tables but that seems rather cumbersome. I thought about using localization but I don't want another developer to get confused because it isn't a change in spoken languages.
For this you can use something like a word list. I don't know whether word list is a well know concept or not but let me try to explain it to you.
You can add the information that distinguishes each login from other based on the companies in one table in your database and map it to corresponding words you wanna use for the respective English or default word in another table.
Now I am assuming that these words do not change very often. So what you can do is on application start, load it to a convenient memory data structure.
Now all the text you want to process will go through a word list processor which is basically a program code that identifies the group in which the login is and identifies the words to be replaced. Then it replaces those words based on the appropriate group and returns back the transformed text which you can display in the UI.
So here the advantage is, once the data is loaded into the memory data structure, you don't need to read the values from your DB.
Moreover, if there is any change in the word lists or if you want to give user the handle to change the words according to their preference, you can directly modify the memory data structure and then later refresh it in the DB asynchronously.
Also since the call for mapping is directly from the memory, its faster than DB calls.
And since its a program code, typically a method or something, its totally up to you which text to process and which to ignore.
This is a technique which we used in our application when we had a similar requirement. I hope this suggestion of solution to this problems helps !
Better alternatives and suggestions are always welcome since we would also want to improve our solution to this problem. Thanks.
I've asked a couple of questions around this subject recently, and I think I'm managing to narrow down what I need to do.
I am attempting to create some "metrics" (quotes because these should not be confused with metrics relating to the performance of the application; these are metrics that are generated based on application data) in a Rails app; essentially I would like to be able to use something similar to the following in my view:
#metric(#customer,'total_profit','01-01-2011','31-12-2011').result
This would give the total profit for the given customer for 2011.
I can, of course, create a metric model with a custom result method, but I am confused about the best way to go about creating the custom metrics (e.g. total_profit, total_revenue, etc.) in such a way that they are easily extensible so that custom metrics can be added on a per-user basis.
My initial thoughts were to attempt to store the formula for each custom metric in a structure with operand, operation and operation_type models, but this quickly got very messy and verbose, and was proving very hard to do in terms of adding each metric.
My thoughts now are that perhaps I could create a custom metrics helper method that would hold each of my metrics (thus I could just hard code each one, and pass variables to each method), but how extensible would this be? This option doesn't seem very rails-esque.
Can anyone suggest a better alternative for approaching this problem?
EDIT: The answer below is a good one in that it keeps things very simple - though i'm concerned it may be fraught with danger, as it uses eval (thus there is no prospect of ever using user code). Is there another option for doing this (my previous option where operands etc. were broken down into chunks used a combination of constantize and get_instance_variable - is there a way these could be used to make the execution of a string safer)?
This question was largely answered with some discussion here: Rails - Scalable calculation model.
For anyone who comes across this, the solution is essentially to ensure an operation always has two operands, but an operand can either be an attribute, or the result of a previous calculation (i.e. it can be a metric itself), and it is thus highly scalable. This avoids the need to eval anything, and thus avoids the potential security holes that this entails.
i need to explain the practical problems that might be encountered when transforming their transactional (and other) data from their diverse sources into the Data Warehouse. according to my knowledge this is about cleansing and scrubbing data. if anyone knows about any practical problem please help me.thanks for your help
That's a broad topic, but I'll offer a few good starting points.
For starters, think about history. If a transaction updates some data point, do you need to apply that retroactively, or do you need to remember what the value was at any given point in time. For example, suppose you have a monthly report of customers by city, and one of your customers moves. How should the DW reflect that.
Think about data acceptance. Is every input row a good input? For example, if you're dealing with web data, there are crawlers and spammers that you might not want to count the same as you count user traffic.
Think about data synchronization. Do all your inputs use the same keys? Do you know how to translate between them? Does Team A mean the same thing by "cust_id" as Team B does? A project glossary is very helpful here.
Think about localization. Are you inputs all in the same time zone? Do they all use the same calendar system? Do you need to handle unicode?
Think about reporting. Are the data you're capturing able to answer the questions people will ask of the DW? If not, how can you capture data that can?
Think about presentation. Should you be showing customers the same data you're using for internal reporting? Does finance need to see a different slice of the data than marketing?
This really only scratches the surface of the issues that come up on a major DW project. I would refer you to Ralph Kimball's assorted books on Data Warehousing for a more in depth discussion of problems and solutions. Hope this helps you get started.
You give the answer in your question.
According to my knowledge this is about cleansing and scrubbing data.
And you are correct. Cleansing data means that you have a company-wide list of clean element attributes, and a mapping that changes the unclean elements into clean elements.
Processing the data against the clean element attributes is a piece of cake compared to creating the company-wide list of clean element attributes.
You have to get people from different departments to agree on what data to warehouse, and to agree on what each element means. This is a difficult sociological problem. It's not a terribly hard technical problem.
Good luck getting your company-wide list of clean element attributes.
sorry if the question sounds so weird, but I don' really know how else to put it.
Essentially, my application will a bunch of objects. Each objects has somekind of post/comment structure, the unique thing though is, that it is more or less static, so i figure out it would make no sense to put in every single post and comment into my database, because that would cause more database load? Instead of this, I was thinking about putting the JSON representation of the post with its comments, thus only causing one database access per object. I would then render the JSON object in the controller or view or something. Is this a valid solution?
No!
You loose all ability to query that data at no benefit unless you are at massive scale. The database's job is to pull that stuff out for you efficiently, and if you create the proper indexes and implement the proper caching strategies, you shouldn't have any issues with database load. You want to replace all the goodness of the Rails ORM with your own decidedly less useful version in the interest of a speed gain, waaay before you need it.
What if later you want to do a most popular comments sidebar widget? Or you want to page through the comments, regardless of the post they are associated with, in a table for moderation? What if you want your data to be searchable?
Don't sacrifice your ability to easily query and manipulate the data for premature optimization.
Though it sounds a good idea but I don't think that it will work in the long run thinking of what is going to happen when you have many comments on your posts. You will have to get the long string from the database and then add the new comment to it and then update it in the data. This will be very inefficient compared to just inserting one more comment in the table.
Also, just think what is going to happen, if at some point, you will have to give user the option to update the comment. Getting the particular comment from that long string and then update it will be a nightmare, don't you think?
In general you want to use JSON and the like as a bit of a last resort. Storing JSON in the db makes sense if your information isn't necessarily known ahead of time. It is not a substitute for proper data modelling and is not a win performance-wise.
To give you an idea where I am looking at using it in a project, in LedgerSMB we want to be able to have consultants track additional information on some db objects. Because we don't know what it will be in advance JSON makes a lot of sense. We don't expect to be searching on the data or support searches on the data but if we did that could be arranged using plv8js.
For example: http://stackoverflow.com/questions/396164/exposing-database-ids-security-risk and http://stackoverflow.com/questions/396164/blah-blah loads the same question.
(I guess this is DB id of Questions table? Is this standard in ASP.NET?)
What are the pros and cons of using this type of scheme in your web app?
Well, for one, simple id's are usually sequential, so it's quite easy to guess at and retrieve other data from your application.
Load JSON at runtime rather than dynamically via AJAX
https://stackoverflow.com/questions/395858/doesnt-matter-what-I-type-here
Now, having said that, that might also be seen as a bonus, because nobody in their right mind would make their whole security hinge on the fact that you have to clink on a link to get to your secure data, and thus easy discoverability of the data might be good.
However, one point is that you're at some point going to reindex your database, having something that makes the old url's invalid would be bad, if for no other reason that search engines would still have old links.
Also, here on SO it's quite normal to use links like this to other questions, so if they at some point want to reindex and thus renumber things (or move to guid's), they will still have to keep the old structure and id's.
Now, is this likely to ever happen or be needed? Probably no.
I wouldn't worry too much about it, just build your security as though every entrypoint to your application is known and there should be no problems.
The database ID is used to lookup the question in the database. It's numerical which means: fast. If you would leave it out you had to lookup the title which is a lot slower.
The question itself is part of the url to make it "search engine friendly". It'll be higher ranked by g**gle etc.
Pro:
Super easy to retrieve the page information. Take the ID, call the database, viola. Your table will (should) be indexed to make this lookup super fast.
Guaranteed unique URL.
Con:
IDs in your system are being publicly displayed. Not a problem in a publicly available system like SO. However, proper security measures on the back end can make this not a problem even on sensitive systems.
Ugly URLs. 6+ digit numbers are just hard to remember, and makes it more difficult to distinguish pages, if the number is all that identifies it. This can also has SEO consequences, as URLs with more relevant and well structured information are generally ranked better. SO compensates by providing the post name in the URL as well. While I still can't rattle off a particular post to my buddy at lunch, I can still find it easier in the browser history.
Slower lookups. Doing text searches on a database is generally slower.
But remember in a community like this there is a higher (although still minimal) chance of the same question name being posted at the same time, which would break things, thus some kind of unique identification need be applied, ID's are probably quite logical in the context that this particular web application was developed in.
I dont think it's bad practice, and fairly common, to do it in ASP.NET and other frameworks. As #lassevk said, if your security depends on it, then you need some more checks in there (can user X get to record Y), but it more comes down to the SEO-friendlyness of the URLs for public sites.
For example, SO's URLs are fairly friendly:
Pros and cons of using DB id in the URL?
google rates information at the START of the URL higher than at the end, so having it look like:
https://stackoverflow.com/pros-and-cons-of-using-db-id-in-the-url/q/407120
should get a higher ranking for "pros and cons of using db id in the url". It's not the only factor, but it is quite a major one - look at Amazon's format, they do it for a very good reason:
http://www.amazon.com/Maverick-Ricardo-Semler/dp/0712678867
http://server/book-name/dp/book-id
Wordpress does it like this:
http://server/yyyy/mm/dd/name-of-the-post
however, if you post two posts on the same day called "foo", you get:
http://server/yyyy/mm/dd/foo
http://server/yyyy/mm/dd/foo2
the slug (foo/foo2) isn't a PK, but it IS maintained as unique over the posts table.
I think putting the ID in the URL isn't a problem, unless your URL is a GUID! Way too long, and hard to type. If it's an int, or some kind of short guid (eg 6-8 chars), then it shouldn't be a problem.