Creating tables from unstructured texts about stock market - machine-learning

I am trying to extract information such as profits, revenues and others along with their corresponding dates and quarters from an unstructured text about stock market and convert it into a report in the table form but as there is not format of the input text, it is hard to know which entity belong to what date and quarters and which value belong to which entity. Chunking works on few documents but not enough. Is there any unsupervised way to linking entities with their corresponding dates, values and quarters?

Financial data is highly structured data. Not sure what you are after, but maybe this will help.
import pandas_datareader as web
import pandas as pd
df = web.DataReader('AAPL', data_source='yahoo', start='2011-01-01', end='2021-01-12')
df.head()
import yfinance as yf
aapl = yf.Ticker("AAPL")
aapl
# get stock info
aapl.info
Result:
{'zip': '95014',
'sector': 'Technology',
'fullTimeEmployees': 154000,
'longBusinessSummary': 'Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. It also sells various related services. In addition, the company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; AirPods Max, an over-ear wireless headphone; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, HomePod, and iPod touch. Further, it provides AppleCare support services; cloud services store services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts. Additionally, the company offers various services, such as Apple Arcade, a game subscription service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a subscription news and magazine service; Apple TV+, which offers exclusive original content; Apple Card, a co-branded credit card; and Apple Pay, a cashless payment service, as well as licenses its intellectual property. The company serves consumers, and small and mid-sized businesses; and the education, enterprise, and government markets. It distributes third-party applications for its products through the App Store. The company also sells its products through its retail and online stores, and direct sales force; and third-party cellular network carriers, wholesalers, retailers, and resellers. Apple Inc. was incorporated in 1977 and is headquartered in Cupertino, California.',
'city': 'Cupertino',
'phone': '408 996 1010',
'state': 'CA',
'country': 'United States',
'companyOfficers': [],
'website': 'https://www.apple.com',
'maxAge': 1,
'address1': 'One Apple Park Way',
'industry': 'Consumer Electronics',
'ebitdaMargins': 0.3343,
'profitMargins': 0.25709,
'grossMargins': 0.43313998,
'operatingCashflow': 118224003072,
'revenueGrowth': 0.019,
'operatingMargins': 0.30533,
'ebitda': 129556996096,
'targetLowPrice': 130,
'recommendationKey': 'buy',
'grossProfits': 152836000000,
etc., etc., etc.
# get historical market data
hist = aapl.history(period="max")
# show actions (dividends, splits)
aapl.actions
# show dividends
aapl.dividends
# show splits
aapl.splits
# show financials
aapl.financials
aapl.quarterly_financials
Result:
2022-06-25 2022-03-26 \
Research Development 6797000000.0 6387000000.0
Effect Of Accounting Charges None None
Income Before Tax 23066000000.0 30139000000.0
Minority Interest None None
Net Income 19442000000.0 25010000000.0
Selling General Administrative 6012000000.0 6193000000.0
Gross Profit 35885000000.0 42559000000.0
Ebit 23076000000.0 29979000000.0
Operating Income 23076000000.0 29979000000.0
Other Operating Expenses None None
Interest Expense -719000000.0 -691000000.0
Extraordinary Items None None
Non Recurring None None
Other Items None None
Income Tax Expense 3624000000.0 5129000000.0
Total Revenue 82959000000.0 97278000000.0
Total Operating Expenses 59883000000.0 67299000000.0
Cost Of Revenue 47074000000.0 54719000000.0
Total Other Income Expense Net -10000000.0 160000000.0
Discontinued Operations None None
Net Income From Continuing Ops 19442000000.0 25010000000.0
Net Income Applicable To Common Shares 19442000000.0 25010000000.0
2021-12-25 2021-09-25
Research Development 6306000000.0 5772000000.0
Effect Of Accounting Charges None None
Income Before Tax 41241000000.0 23248000000.0
Minority Interest None None
Net Income 34630000000.0 20551000000.0
Selling General Administrative 6449000000.0 5616000000.0
Gross Profit 54243000000.0 35174000000.0
Ebit 41488000000.0 23786000000.0
Operating Income 41488000000.0 23786000000.0
Other Operating Expenses None None
Interest Expense -694000000.0 -672000000.0
Extraordinary Items None None
Non Recurring None None
Other Items None None
Income Tax Expense 6611000000.0 2697000000.0
Total Revenue 123945000000.0 83360000000.0
Total Operating Expenses 82457000000.0 59574000000.0
Cost Of Revenue 69702000000.0 48186000000.0
Total Other Income Expense Net -247000000.0 -538000000.0
Discontinued Operations None None
Net Income From Continuing Ops 34630000000.0 20551000000.0
Net Income Applicable To Common Shares 34630000000.0 20551000000.0
Documentation Here:
https://medium.com/codestorm/how-to-get-data-from-yahoo-finance-using-python-8d087fe42b10

Related

How can Named Entity Recognition work without NLP?

I have a question about Machine Learning and Names Entity Recognition.
My goal is to extract named entities from an invoice document. Invoices are typical structured text and this kind of data is usually not useful for Natural Language processing (NLP). I already tried to train a model with the NLP Library Spacy to detect invoice meta data like Date, Total, Customer name. This works more or less good. As far as I understand, an invoice does not provide the unstructured plain text which is usually expected from NLP.
An typical text example for an invoice text analyzed with NLP ML which I found often in the Internet, looks like this:
“Partial invoice (€100,000, so roughly 40%) for the consignment C27655 we shipped on 15th August to London from the Make Believe Town depot. INV2345 is for the balance.. Customer contact (Sigourney) says they will pay this on the usual credit terms (30 days).”
NLP loves this kind of text. But text extracted form a Invoice PDF (using Apache Tika) usually looks more like this:
Client no: Invoice no: Invoice date: Due date:
1000011128 DEAXXXD220012269 26-Jul-2022 02-Aug-2022
Invoice to: Booking Reference
LOGISTCS GMBH Client Reference :
DEMOSTRASSE 2-6 Comments:
28195 BREMEN
Germany
Vessel : Voy : Place of Receipt : POL: B/LNo:
XXX JUBILEE NUBBBW SAV33NAH, GA ME000243
ETA: Final Destination : POD:
15-Jul-2022 ANTWERP, BELGIUM
Charge Quantity(days) x Rate Currency Total ROE Total EUR VAT
STORAGE_IMP_FOREIGN 1 day(s) x 30,00 EUR EUR 30,00 1,000000 30,00 0,00
So I guess NLP is in general the wrong approach to train the recognition of meta data from an invoice document. I think the problem is more like recognizing cats in a picture.
What could be a more promising approach for Named Entity Recognition to train structured text with a machine learning framework?

Income Statement Subtotals In Tableau Worksheet

Sample Data
First, I am trying to create your basic multi-level income statement.
Gross Sales
Discounts
Net Sales
COGS
Gross Profit
and so on. I have tried turning on subtotals for all columns. I have searched for an answer for days now, but the one example on the Tableau forum has SUM(Amount) only. It has various different groups; covering the income statement categories. This example has a grouping of groups. I can mimic this, but it gets me no closer to, for example, calculating gross profit (net sales minus cost of sales). For presentation purposes, all of the values need to be positive, except of course if there is negative gross profit; e.g., a loss. Therefore, for some subtotals I am subtracting one subtotal from another. I am relatively new to Tableau and I am at a dead end since I don't know where to turn. My data sheet is simply an Excel workbook with transaction date, account, product, product category, and amount.

purchasing in killos and selling in length? how to measure cost algo?

I am stuck in a situation where I need to come up with an effective way to calculate cost of pipes which is bought in KG and is sold in length and record cost for profit calculation in accounts..
things to consider:
Even if i weight the pipes in feet and meter and add conversion quantities it wont work... because the material used in manufacturing varies sometimes there is e.g. 1000ft in 50 kg and sometimes there is 1150ft in the same weight..
The bundles purchased are sometimes of 52kg, 49kg and 50kg.
Ideas:
(a. i could purchase in unit... and sell in feet and have a customization where after every order i have an option to mark the end of product.. and when i mark the end of the product the purchase cost i.e. $1000 can be divided by the length sold... but issue is it might take a week to sell the product, so it wont show accurate profit at end of day alterntively i can have an approximate cost and have it replaced once the item has ended? thats the best i can come up with. The con of this is what if its time for closing accounts of the year and only half of the bundle has been sold?
what would be the most accurate way of handing this logic in any ERP? POS? The reason i tagged Magento, SAP because i am curious how Magento and SAP handles this situation?
I am feeling accountants and accounts and finance related guys can also chip in, so I am adding the accounting tag
You need to have different UoM for this kind of Item.
Purchasing UoM, Inventory UoM, and Sales UoM. I will consider the Inventory and Sales UoM as the same.
Pipe_A001:
Purchase = Kg |
Inventory = meter |
Sales = meter
So, the challenge is to have the measurement in your Inventory UoM then get the price cost, and you need to take into account when you actually receive against the cost they charge in the Invoice.
Ex: you buy on Aug 1, your PO is 50 Kg # $ 10/Kg, so it Cost you $ 500. Receive actually 49 Kg, but they still charge you for the PO amount which is $ 500.
The 49 Kg is not relevant anymore, since you own the item and you will convert to your measurement which is meter. Let's say you measure and it's 320 meter, so the cost per meter is $ 500 / 320m = $ 1.5625/m.
The next batch on Aug 15, you buy another 50 Kg at the same price. Receive 51 Kg, Invoiced $ 500, Length measured 350m. So now you will have the new batch price which is $ 500 / 350m = $ 1.4285/m.
What matters is not the Kg received, but the cost the Supplier Invoiced and the measurement in your Inventory UoM. You might have an agreement with Supplier that they charge only the weight measured at your receiving point.
From this point, it is back to your procedure whether to count as FIFO with different batch or count as Moving Average.

Is there a dimension modeling design pattern for multi-valued dimensions

I'm working on a data warehouse that seeks to capture website visits and purchase. We have a hypothesis that by identifying patterns from previous site visits you can get insights into visitor behavior for the current site visit
The grain of my fact table is individual website visits and we assign a 1 if the customer makes a purchase and a 0 if she does not. Our fact is additive. We would like to be able explore and understand how the actions of prior visits influence the action of the current visit so I'm trying to figure out how you would go about modeling this. On a particular site visit a visitor could have 1, 2 or 12 prior site visits.
So my question is how would I model a past visit dimension that includes the past visit date, past visit activity (purchase or no purchase, time on site, etc). Is this an example of a use for a bridge table.
A bridge table in a data-warehouse is primarily (exclusively?) for dealing with many to many relationships, which you don't appear to have.
If the grain of your fact table is website visits then you don't need a 'past visit' dimension, since your fact table contains the visit history already.
You have two dimensions here:
Customer
Date
Time on site is presumably a number, and since you are treating purchase/no purchase as a boolean score (1,0) these are both measures and belong in the fact table.
The Customer dimension is for your customer attributes. Don't put measures here (e.g. prior scores). You should also consider how to handle changes (probably SCD type 2).
You could put your date field directly in the fact table but it is more powerful as a separate dimension, since you can much more easily analyze by quarters, financial years, public holidays etc.
So,
Example Fact_Website_Visit table:
Fact_Website_Visit_Key | Dim_Customer_Key | Dim_Date_Key | Purchase(1,0) | Time_On_Site
Example Dim_Customer Dimension:
Dim_Customer_Key | Customer_ID | Customer_Demographic
Example Dim_Date Dimension:
Dim_Date_Key | Full_Date | IsWeekend
To demonstrate how this works I've written an example report to see sale success and average time spent online on weekends grouped by customer demographic:
SELECT
Dim_Customer.demographic,
COUNT(fact.Fact_Website_Visit_Key) AS [# of Visits],
SUM (fact.Purchase) AS [Total Purchases],
AVG (fact.Time_On_Site) AS [Average Minutes Online],
SUM (fact.Purchase)/COUNT(fact.Fact_Website_Visit_Key)*100 AS [% sale success]
FROM
Fact_Website_Visit fact
INNER JOIN Dim_Customer ON fact.Dim_Customer_Key=Dim_Customer.Dim_Customer_Key
INNER JOIN Dim_Date ON fact.Dim_Date_Key=Dim_Date.Dim_Date_Key
WHERE
Dim_Date.IsWeekend='Y'
GROUP BY
Dim_Customer.Demographic

How to get stock of finance yahoo API

I want to get data of some selected stock of finance yahoo API. I am listing these API here .
Chart
http://finance.yahoo.com/echarts?s=AAPL+Interactive#
Key statistics
http://finance.yahoo.com/q/ks?s=AAPL+Key+Statistics
Competitors
http://finance.yahoo.com/q/co?s=AAPL+Competitors
Analyst Opinion
http://finance.yahoo.com/q/ao?s=AAPL+Analyst+Opinion
Analyst Estimates
http://finance.yahoo.com/q/ae?s=AAPL+Analyst+Estimates
Major Holders
http://finance.yahoo.com/q/mh?s=AAPL+Major+Holders
Income Statement (annual and Quarterly)
http://finance.yahoo.com/q/is?s=AAPL
Balance Sheet (annual and Quarterly)
http://finance.yahoo.com/q/bs?s=AAPL+Balance+Sheet&annual
Cash Flow (annual and Quarterly)
http://finance.yahoo.com/q/cf?s=AAPL+Cash+Flow&annual
Is there any api which give me exact solution for this.Even it is paid.Please Provide me correct and sufficient information also thanks in advance.
You can get most of what you're looking for using the yahoo_fin package in Python. Its documentation is here: http://theautomatic.net/yahoo_fin-documentation/.
Here's some examples:
Pulling analyst info (e.g. https://finance.yahoo.com/quote/AAPL/analysts?p=AAPL):
from yahoo_fin.stock_info import *
get_analysts_info("AAPL")
Income statment:
get_income_statement("AAPL")
Cash flow statment:
get_cash_flow("AAPL")
Balance sheet:
get_balance_sheet("AAPL")
Key statistics (e.g. https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL):
get_stats("AAPL")
Major holders (e.g. https://finance.yahoo.com/quote/AAPL/holders?p=AAPL):
get_holders("AAPL")
You just need to replace "AAPL" with whatever ticker you want data for.

Resources