Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm finding it difficult to find any discussion on best practices for dealing with multiple currencies. Can anyone provide some insight or links to help?
I understand there are a number of ways to do this - either transactionally where you store the value entered as is, or functionally where you convert to a base rate. In both cases the exchange rate is needed to be stored that covers that transactions time for each currency that it may need to be converted to in the future.
I like the flexibility of the transactional approach, which allows old exchange rate info to be entered at a later date, but probably has more overhead (as you have to store more exchange rate data) than the functional approach.
Performance & Scalability are major factors. We have (all .net) a win & web client, a reports suite and a set of web services that provide functionality to a database back-end. I can cache the exchange rate information somewhere (e.g. on client) if required.
EDIT: I would really like links to some documents, or answers that include 'gotchas' from previous experience.
I couldn't find any definitive discussion, so I post my findings, I hope it helps someone.
The currency table should include the culture code to make use of any Globalisation Classes.
Transactional Method
Store in currency local to customer and store multiple conversion rates for the transaction currency that applied when the transaction occurred.
Requires multiple exchange rates for each currency
Site Settings table would store the input currency
Input & Output of values at client level would have no overhead as it can be assumed the value is in the correct currency
To apply exchange rates, you would need to know the currency of the entered values (which may be different for cross client reports), then multiply this by its associated entity exchange rate that was valid during the transactions time period.
Functional Method
Store in one base currency, hold conversion rates for this currency that apply over time
Consideration needs to be given at point between front end and database is the best place to convert values
Input performance is marginally affected as a conversion to the base currency would need to take place. Exchange rate could be cached on the client (note each entity may use a different exchange rate)
This required one set of exchange rates (from base to all other required currencies)
To apply exchange rates, every transaction would need to be converted between the base and required currencies
Composite
At point of transaction, store transactional value and functional value, that way no exchange rate information would need to be stored. (This would not be suitable a solution as it effectively restricts you to two currencies for any given value)
Comparison
Realistically, you have to choose between function and transactional methods. Both have their advantages & disadvantages.
Functional method does not need to store local currency for transaction, needs to convert current db values to base currency, only needs one set of exchange rates, is slightly harder to implement and maintain though requires less storage.
Transactions method is much more flexible, though it does require more exchange rate information to be held and each transaction needs to be associated with an input currency (though this can be applied to a group of customers rather than each transaction). It would generally not affect code already in production as local currencies would still be used at the local level making this solution easy to implement and maintain. Though obviously any reports or values needing to be converted to a different currency would be affected.
In both cases, each transaction would need exchange rates for the time of transaction for each currency it needs converting to – this is needed at point of transaction for functional method, however the transactional method allows more flexibility as past exchange rate data could be entered at any time (allowing any currency to be used),
i.e. you lose the ability to use other exchange rates in the functional method.
Conclusion
A transactional method of currency management would provide a flexible approach, avoiding any negative impact on client performance and zero client code modification. A negative performance impact would likely occur in reports where all will need rework if different currencies are required.
Each client site will need to store a currency reference that states what their input currency is.
It should be possible to get away with storing exchange rates at a high level (e.g. a group of customer sites etc), this will minimise the amount of data stored. Problems may occur if exchange rate information is required at a lower level.
There is no single answer, because it very much depends on the way a business handles the transactions in those currencies. Some companies use fairly sophisticated ways to manage foreign currencies. I suggest you read up on multi-currency accounting.
The main thing to do is to capture the data in the unit, value & date in which the business transaction is done without any conversion, or you risk losing something in translation.
For display & reporting, convert on demand, using either the original exchange rate, or any other exchange rate depending on the intent of the user.
Store & compute with values as the 'Decimal' (in C#) type - don't use float/double or you leave yourself vulnerable to rounding errors.
For instance, the way I did a multi currency app in a previous life was:
Every day, the exchange rates for the day would be set and this got stored in a database and cached for conversion in the application.
All transactions would be captured as value + currency + date (ie. no conversion)
Displaying the transaction in a users' currency was done on the fly. Make it clear this is not the transaction currency, but a display currency. This is similar to a credit card statement when you've gone on holiday. It shows the foreign transaction amount and then how much it ends up costing you in your native currency.
Our company deals with multiple currencies accounting and budgeting. The solution we implemented is quite straight-forward, and includes the following:
one currency table, with a few fields including numbers of decimals to be considered for the currency (yes, some currencies have to be managed with 3 decimals ...) and a exchange rate value, which has no other meaning than being an 'proposed/default exchange rate' when evaluating 'non-executed' or 'pending' financial transactions (see infra)
In this currency table, one of the records has an exchange rate of 1. This is the main/pivot currency in our system
All financial transactions, or all operations with a financial dimension (what we call commitments in our language), are either sorted as 'pending' or 'executed':
Pending transactions are for example invoices that are expected to be received for a certain amount at a certain date. In our budget follow-up system, these amounts are always reevaluated according to the 'proposed/default exchange rate' available in the currency table.
Executed transactions are always saved with the execution date, amount, currency AND exchange rate, which has to be confirmed/typed in when entering the execution data.
(I'm assuming you already know that you definitely shouldn't store currency data as float and why)
In my opinion, working with a single base currency might be easier; however, you should save the original amount, original currency, conversion rate, and base currency amount - otherwise your Accounting dept. might eat you alive, as they're likely to keep different currencies sort of separately.
Since exchange rates fluctuate, one approach is as you mentioned - store an "entered as is" amount that is not converted but display a companion field which is display only and shows the converted amount. In order to do the conversion, a table of exchange rates and their applicable date ranges would be required. If the size of this is small, caching on the client is an option. Otherwise, a remote call would be required in order to perform the conversion.
Related
This seems like a pretty common use-case. Let's say we have sensitive PII that we want to protect, such as SSNs. We mask that data using dynamic data masking in Snowflake. Now we have an engineer that is writing data transformations, and they need to join two tables using SSN. They don't have clearance to view the SSNs, but they can view the other information on both tables. I want the engineer to be able to join the two tables, and see all the combined unsecured data, while keeping the SSN secret from the engineer. I'm really not sure why Snowflake doesn't use real values for joins behind the scenes while refusing to return them in results. Is there a workaround?
One idea is to make the masking policy return a hash of the initial value. That has a couple of limitations. First, it is explicitly warned against in the Snowflake docs. Second, it requires runtime hashing of all the values, which slows down query execution seemingly needlessly. Third, there is the issue of hash collisions which could break joins. This could result in an engineer spending days working to track down a bug in their code, only to realize that the extra rows in their dataset are the result of a hash collision.
Another potential solution is using an external tokenization provider (docs). I don't understand this option well, but it appears that this would mean that I would need to store the actual values and their tokenized form with a third party service, then make an API call each time I wanted to use the values in a query. That seems less than ideal. I'd rather the solution be contained within Snowflake.
I'd love to hear any thoughts, thanks in advance.
If you care about database integrity and avoid errors: Don't use SSNs as identifiers.
A SSN can be a property of a person, but don't use it as their primary key.
As the United States Social Security Administration says:
A 1990 OIG, HHS study indicated that 45% of organizations, both public and private, using SSNs make no effort to verify SSN accuracy. This leads to the real possibility that transfers of data from one organization to another could be inaccurate; computer matching of data between different organizations could be invalid; and innocent persons could be subjected to unwarranted intrusions into their privacy or improper changes in their benefits or services or even misidentified with serious results.
Also:
The SSN is the single most widely used record identifier for both government and the private sector, exerting a broad influence on the lives of most Americans. However, by itself, it is not a personal identifier because it lacks systematic assignment to every person and the means to authenticate a person's identity.
https://www.ssa.gov/history/reports/ssnreportc2.html
Instead you could create a unique id for each person within your database, and use that key for joins.
I am trying to answer the below question given by the business (The business generates revenue from multiple apps through customer pay model) The business is interested in the below questions
new users (trend with respect to previous months)
daily active users
Day 1 retention
I came up with the below DM
Dimension: users, app, deviceid, useractions, plan, date
Fact: fact_activity(userid, appid,deviceid, actionid)
Actions could be: app installed, app launch, registered, completed purchase, postedcomments, playgame etc
The questions I have is
should the fact table contain action_type instead of the actionid into the fact (to avoid join with useractions)
Definition of day 1 retention: No of apps installed/ app launches next day how do to avoid multiple counting of single user using multiple devices
Would it be advisable to have device details in the user dimension
or separate.
If I need to measure average session duration, should I use another fact at session level or tweak the activity fact?
your questions are really unanswerable without significant more information about your business processes, data definitions, etc. In effect, you are asking someone to design a dimensional model for you before they can answer your questions - which is obviously not going to happen.
However, I can give you some very generic pointers that may help you:
Dimensions
A Dimension describes an entity so if attributes can't be described as belonging to the same entity then they shouldn't be in the same dimension. In your case, I assume a Device and a User are not the same thing and therefore they need to be separate dimensions
Facts
You need to define your measures i.e. what precisely are the things you are going to want to aggregate (count, sum, avg, etc) and how are they defined/calculated.
For each measure, you also need to define its grain i.e. what is the minimum set of dimensions that uniquely identify it. Once you have the grain defined, if multiple measures have the same grain then they can be held in the same fact table and if they don't then they can't
We are building a data warehouse by consuming file feeds from different sources.
The file feeds are all denormalized/flattened (In the Transactions (fact) file, the Account attributes keeps repeating in all the records).
Also, the account information changes often (the feed gives an as-is version of the data).
What is the best practice in this situation. Should the data warehouse have a star schema model (with the Account information as a slowly changing dimension and a Transaction fact). Will re-normalizing make the ETL process complex?
In my company, whenever some input is denormalized, we normalize it and from there we proceed with loading our schemas (whatever your schema is).
The reason is that, being de-normalized, those inputs are difficult to check for inconsistencies (data quality). Apart from that, conforming all of your inputs to some standard allows your code to be more maintainable.
In our case, following the Kimball practices has been a total success, fact table, slow changing dimensions and all that jazz.
Hard to answer without such details as daily volume, latency threshold, resource availability, reporting requirements, platform and tool constraints, etc. A traditional ODS, where you import into and store a normalized structure before creating data marts from that, is great but not optimal for big data or real time analysis. A more modern approach, using a data lake in Hadoop or a virtualization layer, may not be feasible for your organization.
General Opinions:
1) re-normalizing does seem unnecessary from both a complexity and performance standpoint unless you have some ongoing use for the normalized data store.
2) Whether or not you build a traditional star schema or a graph or whatever should be governed by the reporting requirements and tools, not the source data format. Those sources will change, btw.
3) "Transaction" does not sound like a fact to me. A purchase transaction, e.g., could feed a sales fact, an accumulating snapshot for a sales cycle, a funnel conversion fact, etc.
4) I'm not sure whether "Account" is a customer, or a balance account such as a credit card, online payment service, bank account, etc. They imply different SCD types. In any case, Google will be sufficient to get plenty of information about building those dimensions.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am working on a project to implement an historian.
I can't really find a difference between an historian and a data warehouse.
Any details would be useful.
Data Historian
Data historians are groups of tables within a database that store historical information about a process or information system.
Data historians are used to keep historical data regarding a manufacturing system. This data can be changes in state of a data point, current values and summary data for these points. Usually this data comes from automated systems like PLCs, DCS or other process controlling system. However some historian data can be human entered.
There are several historians available for commercial use. However, one of the most common historians have tended to be custom developed. The commercial versions would be products like OsiSoft’s PI or GE’s Data Historian.
Some examples of data that could be stored in a data historian are items (or tags) like:
- Total products manufactured for the day
-Total defects created on a particular crew shift
-Current temperature of a motor on the production line
-Set point for the maximum allowable value being monitored by another tag
-Current speed of a conveyor
-Maximum flow rate of a pump over a period of time
-Human entered marker showing a manual event occured
-Total amount of a chemical added to a tank
These items are some of the important data tags that might be captured. However, once captured the next step is in presentation or reporting of that data. This is where the work of analysis is of great importance. The data/time stamp of one tag can have a huge correlation to another/other tag(s). Carefully storing this in the historians’ database is critical to good reporting.
The retrieval of data stored in a data historian is the slowest part of the system to be implemented. Many companies do a great job of putting data into a historian, but then do not go back and retrieve any of the data. Many times this author has gone into a site that claims to have a historian only to find that the data is “in there somewhere”, but has never had a report run against the data to validate the accuracy of the data.
The rule-of-thumb should be to provide feedback on any of the tags entered as soon as possible after storage into the historian. Reporting on the first few entries of a newly added tag is important, but ongoing review is important too. Once the data is incorporated into both a detailed listing and a summarized list the data can be reviewed for accuracy by operations personnel on a regular basis.
This regular review process by the operational personnel is very important. The finest data gathering systems that might historically archive millions of data points will be of little value to anyone if the data is not reviewed for accuracy by those that are experts in that information.
Data Warehouse
Data warehousing combines data from multiple, usually varied, sources into one comprehensive and easily manipulated database. Different methods can then be used by a company or organization to access this data for a wide range of purposes. Analysis can be performed to determine trends over time and to create plans based on this information. Smaller companies often use more limited formats to analyze more precise or smaller data sets, though warehousing can also utilize these methods.
Accessing Data Through Warehousing
Common methods for accessing systems of data warehousing include queries, reporting, and analysis. Because warehousing creates one database, the number of sources can be nearly limitless, provided the system can handle the volume. The final result, however, is homogeneous data, which can be more easily manipulated. System queries are used to gain information from a warehouse and this creates a report for analysis.
Uses for Data Warehouses
Companies commonly use data warehousing to analyze trends over time. They might use it to view day-to-day operations, but its primary function is often strategic planning based on long-term data overviews. From such reports, companies make business models, forecasts, and other projections. Routinely, because the data stored in data warehouses is intended to provide more overview-like reporting, the data is read-only.
I'm trying to figure out the best (or rather the most practical/efficient) way of doing something in my rails application. Basically what I have is an area where a user must enter some information in several form fields which I currently have DB columns for (income for example).
With that information obtained from the user, some calculations need to be performed (say, for example: income and rrsp contribution need to be run through a simple formula to determine the approximate taxable income of the user).
My question is, would it be best practice to perform said calculation in a method at the model level, and save that processed information in a DB column of its own, or would I perform said calculations of the raw input data at a controller level, requiring processing each time?
I'm guessing it's probably generally best to store the calculated data in the database so it doesn't need to be processed each time, but I'm basically looking for best practices to follow in this case and in general. It probably also depends widely based on the applications specific requirements I'd guess.
My preference is to store raw (or lightly sanitized) data only. Then turn the formulae you need into methods on your model, or perhaps library/helper functions, depending on the structure of your project as a whole.
When you start storing processed data, you need to start worrying about the task of syncing when the source data changes. This can be messy and hard.
Since computers are fast at arithmetic, and for a web application relying on a database the arithmetic is not likely to be your performance bottleneck, i wouldn't worry about the performance overhead. If it became a bottleneck, i might start to think about a cache layer.