Web scraping stock dividend data with F# - f#

I am attempting to scrape stock dividend data from web pages using F# and the FSharp.Data library. An example page can be seen at http://www.nasdaq.com/symbol/ibm/dividend-history.
To request the web page, my code is setup as a simple console app as an example and is as follows:
open FSharp.Data
[<EntryPoint>]
let main argv =
let url = "http://www.nasdaq.com/symbol/ibm/dividend-history"
let result = Http.RequestString(url)
System.Console.ReadLine() |> ignore
0 // return an integer exit code
When run, the RequestString method errors with:
"An unhandled exception of type 'System.ArgumentOutOfRangeException' occurred in FSharp.Core.dll
Additional information: Length cannot be less than zero."
It looks like the page is formatted in a way to that "traditional" scraping approaches won't work. Any ideas or thoughts would be appreciated.

This is the full stacktrace I get when I run the code:
System.ArgumentOutOfRangeException: Length cannot be less than zero.
Parameter name: length
at System.String.Substring(Int32 startIndex, Int32 length)
at FSharp.Data.HttpHelpers.getAllCookiesFromHeader#671.Invoke(Int32 i, String cookiePart) in C:\Git\FSharp.Data\src\Net\Http.fs:line 675
at Microsoft.FSharp.Collections.ArrayModule.IterateIndexed[T](FSharpFunc`2 action, T[] array)
at FSharp.Data.HttpHelpers.getAllCookiesFromHeader(String header, Uri responseUri, CookieContainer cookieContainer) in C:\Git\FSharp.Data\src\Net\Http.fs:line 671
at <StartupCode$FSharp-Data>.$Http.InnerRequest#803-5.Invoke(WebResponse _arg2) in C:\Git\FSharp.Data\src\Net\Http.fs:line 803
at Microsoft.FSharp.Control.AsyncBuilderImpl.args#835-1.Invoke(a a)
--- End of stack trace from previous location where exception was thrown ---
at Microsoft.FSharp.Control.AsyncBuilderImpl.commit[a](Result`1 res)
at Microsoft.FSharp.Control.CancellationTokenOps.RunSynchronously[a](CancellationToken token, FSharpAsync`1 computation, FSharpOption`1 timeout)
> at Microsoft.FSharp.Control.FSharpAsync.RunSynchronously[T](FSharpAsync`1 computation, FSharpOption`1 timeout, FSharpOption`1 cancellationToken)
at <StartupCode$FSI_0004>.$FSI_0004.main#() in C:\Users\helgeu.COMPODEAL\AppData\Local\Temp\~vs2B9.fsx:line 8
Stopped due to error
I think you unfortunately have stumbled upon an bug related to this cookie handling code:
https://github.com/fsharp/FSharp.Data/issues/904
<rant>
I have tried to look into that code, but it gives me a headache from the evil cut and paste of some google answer on how to handle cookies in C# and then badly translated to F#.
</rant>
Think maybe adding info to that github case might be a better option than here.

Related

How to get a wasm stacktrace

I'm looking for something gdb --core equivalent on webassembly.
Take this example:
//crash.cpp
#include <iostream>
int main() {
std::cout << "crashing soon..." << std::endl;
int *a = 0;
*a = 1;
}
I compile this with:
$ em++ -g4 crash.cpp -o crash.html --source-map-base http://localhost:8080/
And start the server:
$ emrun --no_browser --port 8080 crash.html
So how can I get a stack trace of this core dump/crash? The console on both chrome/firefox when visiting page just shows a js stacktrace and that won't help me. Looking at Sources => Call stack on chrome console just shows "Not paused", after the crash.
This is on debian 11, emscripten 2.0.12~dfsg-2, clang-11.
The reason is that what you're doing is not an error in WebAssembly. Like on many embedded platforms, writing to or reading from zero pointer is a perfectly valid operation in WebAssembly memory model.
However, Emscripten tries to help you catch this as for C/C++ it's a common mistake, so what it does instead is checks the value at the address zero after the program has finished execution and throws a helpful assertion if that value happened to be overwritten. For this reason you're getting a stacktrace with only JavaScript bits in it - because the check is done by JavaScript when Wasm stack has already been exited.
If you tried a different operation that does cause immediate abort, for example, assert(false), then you would see WebAssembly and/or C/C++ on the stack as expected.

reactjs.net - 'potential stack overflow detected' error

Since I've decided to switch to server-side rendering from client-side react, I began to create my components and use them in the app.
However I came across this error:
Unknown error (RangeError); potential stack overflow detected
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: Microsoft.ClearScript.ScriptEngineException: Unknown error (RangeError); potential stack overflow detected
And this is a part from the stack-trace
[ScriptEngineException: Unknown error (RangeError); potential stack overflow detected]
V8Exception.ThrowScriptEngineException(V8Exception* ) +169
Microsoft.ClearScript.V8.V8ContextProxyImpl.Execute(String gcDocumentName, String gcCode, Boolean evaluate, Boolean discard) +462
Microsoft.ClearScript.V8.<>c__DisplayClass1b.<Execute>b__19() +197
Microsoft.ClearScript.ScriptEngine.ScriptInvoke(Func`1 func) +70
Microsoft.ClearScript.V8.V8ScriptEngine.BaseScriptInvoke(Func`1 func) +49
Microsoft.ClearScript.V8.<>c__DisplayClass25`1.<ScriptInvoke>b__24() +45
Microsoft.ClearScript.V8.?A0x792c8756.LockCallback(Void* pvArg) +9
Microsoft.ClearScript.V8.V8ContextProxyImpl.InvokeWithLock(Action gcAction) +176
Microsoft.ClearScript.V8.V8ScriptEngine.ScriptInvoke(Func`1 func) +118
Microsoft.ClearScript.V8.V8ScriptEngine.Execute(String documentName, String code, Boolean evaluate, Boolean discard) +118
JavaScriptEngineSwitcher.V8.V8JsEngine.InnerEvaluate(String expression) +89
So I don't know what causes this error but I think it is some code that goes in a loop or something similar. Furthermore if I refresh the page this error goes away and if I continue to refresh intensively it shows up again which is very frustrating.
I was getting the same error (web app in azure) and after some investigation and tests, setting SetAllowMsieEngine to true did in fact fixed the issue also to me.
As Luke McGregor said, this seems to be an issue with V8ScriptEngine and using SetAllowMsieEngine does the job, however this method is latest version of react.net is deprecated and is recommended to "managed in the JavaScriptEngineSwitcher configuration".
So the solution I found so far is to switch js engine switcher by setting it directly in code:
JsEngineSwitcher engineSwitcher = JsEngineSwitcher.Instance;
engineSwitcher.EngineFactories
.AddChakraCore()
.AddMsie( new MsieSettings() { EngineMode = JsEngineMode.Auto } );
engineSwitcher.DefaultEngineName = ChakraCoreJsEngine.EngineName;
Like this I'm using ChakraCore engine rather than default V8 which was causing the error.
So far, during our performance tests with around 250 concurrent requests, we do not have this error anymore when previously in the same conditions for sure this error would have occurred.
This is a known issue, see https://github.com/reactjs/React.NET/issues/190
The work around is to not use V8 to do the render ie:
app.UseReact(config =>
{
config
// ..other configuration settings
.SetAllowMsieEngine(true);
});

F# Data Type Provider ; error on CSVProvider initialization

I am trying to set up a tiny F# console app with FSharp.Data referenced in the solution. I got the following error at runtime :
An unhandled exception of type 'System.TypeInitializationException' occurred in Anot_F1.exe
for this code (error in line 4) :
1 open FSharp.Data
2 type Anot_lines = CsvProvider<"anot1.csv",Separators=";">
3 let ll = Anot_lines.Load("anot1.csv")
4 for r in ll.Rows do
5 printfn "%A" r.ToString
In debug mode after line 3, I can see that the variable ll contains the proper Headers but does not show the rows.
My CSV file is :
tline;tcol;bline;bcol;anot
3;1;4;16;"Barack Obama has ... The US president"
3;1;3;12;"Barack Obama"
3;18;3;26;"ratcheted"
4;102;4;109;"agencies"
4;289;4;306;"financial pressure"
4;1;4;320;"The US president ...ure on the regime"
4;1;4;16;"The US president"
I am new to F# and especially have no experience on using type providers.
Any help greatly appreciated.
The issue is with line 3, you're using a method that loads CSV data from a URL. You need to use the GetSample() method. Also note that the "%A" format placeholder can print any value and doesn't require a ToString() call.
let ll = Anot_lines.GetSample()
for r in ll.Rows do
printfn "%A" r

Upgrade to Neo4jClient 1.0.0.651 results in NullReferenceException

I have a working solution that is using Neo4jClient 1.0.0.646 with no problem. When I install the nuget package for the latest 1.0.0.651 I receive a NullReferenceException on every attempt to return query results. Given the stack trace details below can someone diagnose the issue for me? I am on Json.NET 5.0.6 if that is relevant. I see the REST calls going out and coming back with the correct data so the Cypher is good.
System.NullReferenceException was unhandled by user code
HResult=-2147467261 Message=Object reference not set to an instance
of an object. Source=Neo4jClient StackTrace:
at Neo4jClient.Cypher.CypherQuery.b__0(String
current, String paramName) in
c:\TeamCity\buildAgent\work\5bae2aa9bce99f44\Neo4jClient\Cypher\CypherQuery.cs:line
46
at System.Linq.Enumerable.Aggregate[TSource,TAccumulate](IEnumerable1
source, TAccumulate seed, Func3 func)
at Neo4jClient.Cypher.CypherQuery.get_DebugQueryText() in c:\TeamCity\buildAgent\work\5bae2aa9bce99f44\Neo4jClient\Cypher\CypherQuery.cs:line
43
at Neo4jClient.GraphClient.<>c__DisplayClass1e1.<Neo4jClient.IRawGraphClient.ExecuteGetCypherResultsAsync>b__1d(Task1
responseTask) in
c:\TeamCity\buildAgent\work\5bae2aa9bce99f44\Neo4jClient\GraphClient.cs:line
825
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
at System.Threading.Tasks.Task.Execute() InnerException:
You haven't provided enough of the exception detail for me to identify exactly where this is occurring.
Please include the full text of the exception in a new issue at https://github.com/readify/Neo4jClient/issues/new
Here are all the changes since 1.0.0.646, if you want to take a peek for anything related to what you're doing: https://github.com/Readify/Neo4jClient/compare/v1.0.0.646...master

String index out of range error when URL contains two dots

I have a ColdFusion 9 server which serves the following error on any ColdFusion page where the URL contains the characters .. after a / e.g. http://www.example.com/..cfm or http://www.example.com/..foo/bar.cfm :
String index out of range: -1
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.AbstractStringBuilder.delete(AbstractStringBuilder.java:698)
at java.lang.StringBuffer.delete(StringBuffer.java:373)
at coldfusion.util.Utils.collapseDotDots(Utils.java:604)
at coldfusion.util.Utils.canonicalizeURI(Utils.java:558)
at coldfusion.filter.PathFilter.invoke(PathFilter.java:39)
at coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70)
at coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
at coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
at coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46)
at coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
at coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
at coldfusion.CfmServlet.service(CfmServlet.java:175)
at coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
at coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at com.seefusion.Filter.doFilter(Filter.java:49)
at com.seefusion.SeeFusion.doFilter(SeeFusion.java:1500)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
I haven't been able to reproduce this on every server I've tested, but it seems to occur on the majority. Looking at the error, it looks like it relates to part of ColdFusion rather than any ColdFusion code running on these sites. Can anyone shed any more light on this e.g. how to catch the error?
I was getting the above error and figured out that this can be handled by "Site-wide Error Handler" in ColdFusion Admin. I directed it to my 404.cfm.
Hope it helps.

Resources