I am learning F# on my own (this is for fun, it is not for work/school) and I am trying to write a simple parser which count the number of reviews across multiple markets for a Windows Phone app. There's no doubt that the code I have so far is ugly, but I am trying to improve it and follow functional programming paradigm. Since I come from the C, C++, C# world, it is pretty hard.
Coming from C world, I like null values. I know that functional programming / F# doesn't encourage the use of null, but I can't figure out a way to not use it. For example, in the function parse there's a null check. How do I not do that?
Right now my code only count the number of reviews on the first page, but it is possible that an app has more than 10 reviews and as a result multiple pages. How do I recursively go through all page (functuion downloadReviews or parse).
How could we extend this code to be entirely async?
Below is the code I have so far. In addition to the questions above, I would really like if someone could help me and give me directions on how to improve the overall structure of my code.
open System
open System.IO
open System.Xml
open System.Xml.Linq
open Printf
type DownloadPageResult = {
Uri: System.Uri;
ErrorOccured: bool;
Source: string;
}
type ReviewData = {
CurrentPageUri: System.Uri;
NextPageUri: System.Uri;
NumberOfReviews: int;
}
module ReviewUrl =
let getBaseUri path =
new Uri(sprintf "http://cdn.marketplaceedgeservice.windowsphone.com/%s" path)
let getUri country locale appId =
getBaseUri(sprintf "/v8/ratings/product/%s/reviews?os=8.0.0.0&cc=%s&oc=&lang=%s&hw=520170499&dm=Test&chunksize=10" appId country locale)
let downloadPage (uri: System.Uri) =
try
use webClient = new System.Net.WebClient()
printfn "%s" (uri.ToString())
webClient.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
webClient.Headers.Add("Accept-Encoding", "zip,deflate,sdch")
webClient.Headers.Add("Accept-Language", "en-US,en;q=0.8,fr;q=0.6")
webClient.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1482.0 Safari/537.36")
{ Uri = uri; Source = webClient.DownloadString(uri); ErrorOccured = false }
with error -> { Uri = uri; Source = String.Empty; ErrorOccured = true }
let downloadReview country locale appId =
let uri = ReviewUrl.getUri country locale appId
downloadPage uri
let parse(pageResult: DownloadPageResult) =
if pageResult.ErrorOccured then { CurrentPageUri = pageResult.Uri; NextPageUri = null; NumberOfReviews = 0 }
else
let reader = new StringReader(pageResult.Source)
let doc = XDocument.Load(reader)
let ns = XNamespace.Get("http://www.w3.org/2005/Atom")
let nextUrl = query { for link in doc.Descendants(ns + "link") do
where (link.Attribute(XName.Get("rel")).Value = "next")
select link.Value
headOrDefault }
if nextUrl = null then
{ CurrentPageUri = pageResult.Uri; NextPageUri = null; NumberOfReviews = doc.Descendants(ns + "entry") |> Seq.length }
else
{ CurrentPageUri = pageResult.Uri; NextPageUri = ReviewUrl.getBaseUri(nextUrl); NumberOfReviews = doc.Descendants(ns + "entry") |> Seq.length }
let downloadReviews(locale: string) =
let appId = "4e08377c-1240-4f80-9c35-0bacde2c66b6"
let country = locale.Substring(3)
let pageResult = downloadReview country locale appId
let parseResult = parse pageResult
parseResult
[<EntryPoint>]
let main argv =
let locales = [| "en-US"; "en-GB"; |]
let results = locales |> Array.map downloadReviews
printfn "%A" results
0
I was playing with this problem a bit more and tried using the XML type provider and other features from F# Data. It is not complete code, but it should be enough to give you the idea (and to show that type providers are really nice :-)):
First, I need some references:
#r "System.Xml.Linq.dll"
#r "FSharp.Data.dll"
open FSharp.Data
open FSharp.Net
Next, I wrote the following code to download one sample page.
let data =
Http.Request
( "http://cdn.marketplaceedgeservice.windowsphone.com//v8/ratings/product/4e08377c-1240-4f80-9c35-0bacde2c66b6/reviews",
query=["os", "8.0.0.0"; "cc", "US"; "lang", "en-US"; "hw", "520170499"; "dm", "Test"; "chunksize", "10" ],
headers=["User-Agent", "F#"])
I saved the sample as D:\temp\appstore.xml and then used the XML type provider to get a nice type for parsing the page:
type PageDocument = XmlProvider< #"D:\temp\appstore.xml" >
Then you can download & parse the page like this (this shows how to get the number of reviews and information about the next link):
let parseAsync (locale:string) appId = async {
let country = locale.Substring(3)
// Make the request (asynchronously) using the parameters specified
let! data =
Http.AsyncRequest
( "http://cdn.marketplaceedgeservice.windowsphone.com//v8/ratings/product/"
+ appId + "/reviews",
query=[ "os", "8.0.0.0"; "cc", country; "lang", locale;
"hw", "520170499"; "dm", "Test"; "chunksize", "10" ],
headers=["User-Agent", "F#"])
// Parse the result using the type-provider generated type
let page = PageDocument.Parse(data)
// Now you can type 'page' followed by '.' and explore the results!
// page.GetLinks() returns all links and page.GetEntries() returns
// review entries. Each link also has 'Rel' and 'Href' properties:
let nextLink =
page.GetLinks()
|> Seq.tryFind (fun link -> link.Rel = "next")
|> Option.map (fun link -> link.Href)
let reviewsCount = page.GetEntries().Length
return (reviewsCount, nextLink) }
The general pattern for making code asynchronous is to find the I/O expensive operation (somewhere down in the call tree) and then go "up" from there and make all code that uses it asynchronous too until you reach a point where you need to block.
In your example, the primitive operation is downloading, so you would start by making downloadPage asynchronous:
let downloadPage (uri: System.Uri) = async {
try
use webClient = new System.Net.WebClient()
printfn "%s" (uri.ToString())
// (Headers omitted)
let! source = webClient.AsyncDownloadString(uri)
return { Uri = uri; Source = source; ErrorOccured = false }
with error ->
return { Uri = uri; Source = String.Empty; ErrorOccured = true } }
You need to wrap code in async { ... }, make call to asynchronous version of DownloadString using let! and return the results using return (in both branches).
Then you need to make functions like downloadReview and downloadReviews (again, wrap them in async block, call other asynchronous operations like downloadPage using let! or using return!).
In the end, if you're writing console application you'll need to block, but you can run downloads for different locales in parallel. Assuming downloadReviews is asynchronous:
let locales = [| "en-US"; "en-GB"; |]
let results =
locales
|> Array.map downloadReviews // Build an array of asynchronous computations
|> Async.Parallel // Compose them into a single, parallel computation
|> Async.RunSynchronously // Run the computation and wait
To answer other questions, I think using null in the example above is probably okay (you are calling LINQ which returns it, so there is no easy way to avoid that). It is actually possible to use option type instead, but it is a bit tricky - see this snippet if you're interested.
Also, you could use the Http.AsyncRequest method from F# Data Library which gives you a bit simpler way to construct complex HTTP requests (but I'm one of the contributors to that library, so I'm biased!)
As Tomas said, it would be more "functional" to create an async-based version of DownloadString (or just use his FSharp.Data library to handle it).
You could also combine FSharp.Data with ExtCore to take advantage of the asyncMaybe or asyncChoice workflows in ExtCore. Those workflows provide very easy-to-use error handling on top of the normal async workflow.
Anyway, I spent a few minutes cleaning up your code. It's not much, but it does simplify your code in a few spots:
open System
open System.IO
open System.Xml
open System.Xml.Linq
open Printf
type DownloadPageResult = {
Uri : System.Uri;
ErrorOccured : bool;
Source : string;
}
type ReviewData = {
CurrentPageUri : System.Uri;
NextPageUri : System.Uri option;
NumberOfReviews : uint32;
}
module ReviewUrl =
let baseUri = Uri ("http://cdn.marketplaceedgeservice.windowsphone.com/", UriKind.Absolute)
let getUri country locale (appId : System.Guid) =
let localUri =
let appIdStr = appId.ToString "D"
sprintf "/v8/ratings/product/%s/reviews?os=8.0.0.0&cc=%s&oc=&lang=%s&hw=520170499&dm=Test&chunksize=10" appIdStr country locale
Uri (baseUri, localUri)
let downloadPage (uri : System.Uri) =
try
use webClient = new System.Net.WebClient()
printfn "%s" (uri.ToString())
webClient.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
webClient.Headers.Add("Accept-Encoding", "zip,deflate,sdch")
webClient.Headers.Add("Accept-Language", "en-US,en;q=0.8,fr;q=0.6")
webClient.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1482.0 Safari/537.36")
{ Uri = uri; Source = webClient.DownloadString uri; ErrorOccured = false }
with error ->
{ Uri = uri; Source = String.Empty; ErrorOccured = true }
let parse (pageResult : DownloadPageResult) =
if pageResult.ErrorOccured then
{ CurrentPageUri = pageResult.Uri; NextPageUri = None; NumberOfReviews = 0u }
else
use reader = new StringReader (pageResult.Source)
let doc = XDocument.Load reader
let ns = XNamespace.Get "http://www.w3.org/2005/Atom"
let nextUrl =
query {
for link in doc.Descendants(ns + "link") do
where (link.Attribute(XName.Get("rel")).Value = "next")
select link.Value
headOrDefault }
{ CurrentPageUri = pageResult.Uri;
NextPageUri =
if System.String.IsNullOrEmpty nextUrl then None
else Some <| Uri (ReviewUrl.baseUri, nextUrl);
NumberOfReviews =
doc.Descendants (ns + "entry") |> Seq.length |> uint32; }
let downloadReviews (locale : string) =
System.Guid "4e08377c-1240-4f80-9c35-0bacde2c66b6"
|> ReviewUrl.getUri (locale.Substring 3) locale
|> downloadPage
|> parse
[<EntryPoint>]
let main argv =
let locales = [| "en-US"; "en-GB"; |]
let results = locales |> Array.map downloadReviews
printfn "%A" results
0
Related
This is not for a practical need, but rather to try to learn something.
I am using FSToolKit's asyncResult expression which is very handy and I would like to know if there is a way to 'combine' expressions, such as async and result here, or does a custom expression have to be written?
Here is an example of my function to set the ip to a subdomain, with CloudFlare:
let setSubdomainToIpAsync zoneName url ip =
let decodeResult (r: CloudFlareResult<'a>) =
match r.Success with
| true -> Ok r.Result
| false -> Error r.Errors.[0].Message
let getZoneAsync (client: CloudFlareClient) =
asyncResult {
let! r = client.Zones.GetAsync()
let! d = decodeResult r
return!
match d |> Seq.filter (fun x -> x.Name = zoneName) |> Seq.toList with
| z::_ -> Ok z // take the first one
| _ -> Error $"zone '{zoneName}' not found"
}
let getRecordsAsync (client: CloudFlareClient) zoneId =
asyncResult {
let! r = client.Zones.DnsRecords.GetAsync(zoneId)
return! decodeResult r
}
let updateRecordAsync (client: CloudFlareClient) zoneId (records: DnsRecord seq) =
asyncResult {
return!
match records |> Seq.filter (fun x -> x.Name = url) |> Seq.toList with
| r::_ -> client.Zones.DnsRecords.UpdateAsync(zoneId, r.Id, ModifiedDnsRecord(Name = url, Content = ip, Type = DnsRecordType.A, Proxied = true))
| [] -> client.Zones.DnsRecords.AddAsync(zoneId, NewDnsRecord(Name = url, Content = ip, Proxied = true))
}
asyncResult {
use client = new CloudFlareClient(Credentials.CloudFlare.Email, Credentials.CloudFlare.Key)
let! zone = getZoneAsync client
let! records = getRecordsAsync client zone.Id
let! update = updateRecordAsync client zone.Id records
return! decodeResult update
}
It is interfacing with a C# lib that handles all the calls to the CloudFlare API and returns a CloudFlareResult object which has a success flag, a result and an error.
I remapped that type to a Result<'a, string> type:
let decodeResult (r: CloudFlareResult<'a>) =
match r.Success with
| true -> Ok r.Result
| false -> Error r.Errors.[0].Message
And I could write an expression for it (hypothetically since I've been using them but haven't written my own yet), but then I would be happy to have an asyncCloudFlareResult expression, or even an asyncCloudFlareResultOrResult expression, if that makes sense.
I am wondering if there is a mechanism to combine expressions together, the same way FSToolKit does (although I suspect it's just custom code there).
Again, this is a question to learn something, not about the practicality since it would probably add more code than it's worth.
Following Gus' comment, I realized it would be good to illustrate the point with some simpler code:
function DoA : int -> Async<AWSCallResult<int, string>>
function DoB : int -> Async<Result<int, string>>
AWSCallResultAndResult {
let! a = DoA 3
let! b = DoB a
return b
}
in this example I would end up with two types that can take an int and return an error string, but they are different. Both have their expressions so I can chain them as needed.
And the original question is about how these can be combined together.
It's possible to extend CEs with overloads.
The example below makes it possible to use the CustomResult type with a usual result builder.
open FsToolkit.ErrorHandling
type CustomResult<'T, 'TError> =
{ IsError: bool
Error: 'TError
Value: 'T }
type ResultBuilder with
member inline _.Source(result : CustomResult<'T, 'TError>) =
if result.IsError then
Error result.Error
else
Ok result.Value
let computeA () = Ok 42
let computeB () = Ok 23
let computeC () =
{ CustomResult.Error = "oops. This went wrong"
CustomResult.IsError = true
CustomResult.Value = 64 }
let computedResult =
result {
let! a = computeA ()
let! b = computeB ()
let! c = computeC ()
return a + b + c
}
I've created a website using WebSharper and has stumbled into a problem. I wish to integrate the site with VSTS REST API. To do that (seemlessly) I need to forward a session cookie. How do I do that in an WebSharper-Ajax call. My current implementation of the Ajax call prior to needing this looks like this and works just fine for the other needs I've had so far
let Ajax (request : Request) =
let httpMethod = request.Method
let url = request.EndPoint
let data = request.AsJson
let success ok =
System.Action<obj,string,JqXHR>(
fun res _ _ ->
let result = (res :?> string |> Json.Parse)
if JS.HasOwnProperty result "error" then
{
ErrorType = result?error
Reason = result?reason
} |> pushError
else
result
|> Success
|> ok
)
let contentType = Union<bool,string>.Union2Of2("application/json")
try
Async.FromContinuations
<| fun (ok, ko, _) ->
let settings = JQuery.AjaxSettings(
Url = url,
DataType = JQuery.DataType.Text,
Type = As<JQuery.RequestType> httpMethod,
Success = success ok,
ContentType = contentType,
Error = System.Action<JqXHR,string,string>(fun jqXHR _ _ ->
let error =
jqXHR?responseText
|> Json.Parse
{
ErrorType = error?error
Reason = error?reason
} |> pushError |> ok
)
)
match data with
Some data ->
settings.Data <- data
| None -> ()
JQuery.Ajax(settings) |> ignore
with e ->
async {
return {
ErrorType ="uncaught exception";
Reason = e.Message
} |> Error
}
It turns out that the solution is pretty easy. After creating the AjaxSetting object, simply use dynamic typing to add the xhrFields object
settings?xhrFields <- obj()
settings?xhrFields?withCredentials <- true
Im trying to populate list with my own type.
let getUsers =
use connection = openConnection()
let getString = "select * from Accounts"
use sqlCommand = new SqlCommand(getString, connection)
try
let usersList = [||]
use reader = sqlCommand.ExecuteReader()
while reader.Read() do
let floresID = reader.GetString 0
let exName = reader.GetString 1
let exPass = reader.GetString 2
let user = [floresID=floresID; exName=exName; exPass=exPass]
// what here?
()
with
| :? SqlException as e -> printfn "Došlo k chybě úrovni připojení:\n %s" e.Message
| _ -> printfn "Neznámá výjimka."
In C# I would just add new object into userList. How can I add new user into list? Or is it better approach to get some sort of list with data from database?
Easiest way to do this is with a type provider, so you can abstract away the database. You can use SqlDataConnection for SQLServer, SqlProvider for everything (incl. SQLServer), and also SQLClient for SQLServer.
Here is an example with postgres's dvdrental (sample) database for SQLProvider:
#r #"..\packages\SQLProvider.1.0.33\lib\FSharp.Data.SqlProvider.dll"
#r #"..\packages\Npgsql.3.1.8\lib\net451\Npgsql.dll"
open System
open FSharp.Data.Sql
open Npgsql
open NpgsqlTypes
open System.Linq
open System.Xml
open System.IO
open System.Data
let [<Literal>] dbVendor = Common.DatabaseProviderTypes.POSTGRESQL
let [<Literal>] connString1 = #"Server=localhost;Database=dvdrental;User Id=postgres;Password=root"
let [<Literal>] resPath = #"C:\Users\userName\Documents\Visual Studio 2015\Projects\Postgre2\packages\Npgsql.3.1.8\lib\net451"
let [<Literal>] indivAmount = 1000
let [<Literal>] useOptTypes = true
//create the type for the database, based on the connection string, etc. parameters
type sql = SqlDataProvider<dbVendor,connString1,"",resPath,indivAmount,useOptTypes>
//set up the datacontext, ideally you would use `use` here :-)
let ctx = sql.GetDataContext()
let actorTbl = ctx.Public.Actor //alias the table
//set up the type, in this case Records:
type ActorName = {
firstName:string
lastName:string}
//extract the data with a query expression, this gives you type safety and intellisense over SQL (but also see the SqlClient type provider above):
let qry = query {
for row in actorTbl do
select ({firstName=row.FirstName;lastName=row.LastName})
}
//seq is lazy so do all kinds of transformations if necessary then manifest it into a list or array:
qry |> Seq.toArray
The two important parts are defining the Actor record, and then in the query extracting the fields into a sequence of Actor records. You can then manifest into a list or array if necessary.
But you can also stick to your original solution. In that case just wrap the .Read() into a seq:
First define the type:
type User = {
floresID: string
exName: string
exPass: string
}
Then extract the data:
let recs = cmd.ExecuteReader() // execute the SQL Command
//extract the users into a sequence of records:
let users =
seq {
while recs.Read() do
yield {floresID=recs.[0].ToString()
exName=recs.[1].ToString()
exPass=recs.[2].ToString()
}
} |> Seq.toArray
Taking your code, you can use list expression:
let getUsers =
use connection = openConnection()
let getString = "select * from Accounts"
use sqlCommand = new SqlCommand(getString, connection)
try
[
use reader = sqlCommand.ExecuteReader()
while reader.Read() do
let floresID = reader.GetString 0
let exName = reader.GetString 1
let exPass = reader.GetString 2
let user = [floresID=floresID; exName=exName; exPass=exPass]
yield user
]
with
| :? SqlException as e -> failwithf "Došlo k chybě úrovni připojení:\n %s" e.Message
| _ -> failwithf "Neznámá výjimka."
That being said, I'd use FSharp.Data.SqlClient library so all of that boiler plate becomes a single line with added benefit of type safety (if you change the query, the code will have compile time error which are obvious to fix).
I'm trying to get F# async working, and I just can't figure out what I'm doing wrong. Here's my sorta syncronous code that runs:
open System.Net
open System.Runtime.Serialization
open System.Threading.Tasks
[<DataContract>]
type Person = {
[<field: DataMember(Name = "name")>]
Name : string
[<field: DataMember(Name = "phone")>]
Phone : int
}
let url = "http://localhost:5000/app/plugins/anon/CCure"
let js = Json.DataContractJsonSerializer(typeof<Person>)
let main x =
let client = new WebClient()
let url = url + "/" + x
let reader = client.OpenRead(url)
let person = js.ReadObject(reader) :?> Person
printfn "Name: %s, Phone number: %d" person.Name person.Phone
printfn "starting x"
let x = Task.Factory.StartNew(fun () -> main "x")
printfn "starting y"
let y = Task.Factory.StartNew(fun () -> main "y")
Task.WaitAll(x, y)
I was thinking that to run it asyncronously this would work, but it doesn't:
open System.Net
open System.Runtime.Serialization
open System.Threading.Tasks
[<DataContract>]
type Person = {
[<field: DataMember(Name = "name")>]
Name : string
[<field: DataMember(Name = "phone")>]
Phone : int
}
let url = "http://localhost:5000/app/plugins/anon/CCure"
let js = Json.DataContractJsonSerializer(typeof<Person>)
let main x = async {
let client = new WebClient()
let url = url + "/" + x
let! reader = client.OpenReadAsync(url)
let person = js.ReadObject(reader) :?> Person
printfn "Name: %s, Phone number: %d" person.Name person.Phone }
printfn "starting x"
let x = Task.Factory.StartNew(fun () -> main "x")
printfn "starting y"
let y = Task.Factory.StartNew(fun () -> main "y")
Task.WaitAll(x, y)
$ fsharpc -r System.Runtime.Serialization foo.fs && ./foo.exe F#
Compiler for F# 3.1 (Open Source Edition) Freely distributed under the
Apache 2.0 Open Source License
/home/frew/code/foo.fs(19,18): error FS0001: This expression was
expected to have type
Async<'a> but here has type
unit
/home/frew/code/foo.fs(20,17): error FS0041: A unique overload for
method 'ReadObject' could not be determined based on type information
prior to this program point. A type annotation may be needed.
Candidates: XmlObjectSerializer.ReadObject(reader:
System.Xml.XmlDictionaryReader) : obj,
XmlObjectSerializer.ReadObject(reader: System.Xml.XmlReader) : obj,
XmlObjectSerializer.ReadObject(stream: System.IO.Stream) : obj
/home/frew/code/foo.fs(20,17): error FS0008: This runtime coercion or
type test from type
'a to
Person involves an indeterminate type based on information prior to this program point. Runtime type tests are not allowed on
some types. Further type annotations are needed.
What am I missing here?
OpenReadAsync is part of the .NET BCL and therefore wasn't designed with F# async in mind. You'll notice it returns unit, rather than Async<Stream>, so it won't work with let!.
The API is designed to be used with events (i.e. you have to wire up client.OpenReadCompleted).
You have a couple of options here.
There are some nice helper methods in FSharp.Core that can help
you to convert the API into a more F# friendly one (see
Async.AwaitEvent).
Use AsyncDownloadString, an extension method for WebClient that can be found in Microsoft.FSharp.Control.WebExtensions. This is easier so I've done it below although it does mean holding the whole stream in memory as a string so if you have a huge amount of Json this may not be the best idea.
It's also more idiomatic F# to use async instead of tasks for running things in parallel.
open System.Net
open System.Runtime.Serialization
open System.Threading.Tasks
open Microsoft.FSharp.Control.WebExtensions
open System.Runtime.Serialization.Json
[<DataContract>]
type Person = {
[<field: DataMember(Name = "name")>]
Name : string
[<field: DataMember(Name = "phone")>]
Phone : int
}
let url = "http://localhost:5000/app/plugins/anon/CCure"
let js = Json.DataContractJsonSerializer(typeof<Person>)
let main x = async {
printfn "Starting %s" x
let client = new WebClient()
let url = url + "/" + x
let! json = client.AsyncDownloadString(System.Uri(url))
let bytes = System.Text.Encoding.UTF8.GetBytes(json)
let st = new System.IO.MemoryStream(bytes)
let person = js.ReadObject(st) :?> Person
printfn "Name: %s, Phone number: %d" person.Name person.Phone }
let x = main "x"
let y = main "y"
[x;y] |> Async.Parallel |> Async.RunSynchronously |> ignore<unit[]>
I'm trying to crawl a webpage, and get all the links, and add them to a list<string> which will be returned in the end, from the function.
My code:
let getUrls s : seq<string> =
let doc = new HtmlDocument() in
doc.LoadHtml s
doc.DocumentNode.SelectNodes "//a[#href]"
|> Seq.map(fun z -> (string z.Attributes.["href"]))
let crawler uri : seq<string> =
let rec crawl url =
let web = new WebClient()
let data = web.DownloadString url
getUrls data |> Seq.map crawl (* <-- ERROR HERE *)
crawl uri
The problem is that at the last line in the crawl function (the getUrls seq.map...), it simply throws an error:
Type mismatch. Expecting a string -> 'a but given a string
-> seq<'a> The resulting type would be infinite when unifying ''a'
and 'seq<'a>'
crawl is returning unit, but is expected to return seq<string>. I think you want something like:
let crawler uri =
let rec crawl url =
seq {
let web = new WebClient()
let data = web.DownloadString url
for url in getUrls data do
yield url
yield! crawl url
}
crawl uri
Adding a type annotation to crawl should point out the issue.
i think something like this:
let crawler (uri : seq<string>) =
let rec crawl url =
let data = Seq.empty
getUrls data
|> Seq.toList
|> function
| h :: t ->
crawl h
t |> List.iter crawl
| _-> ()
crawl uri
In order to fetch links:
open System.Net
open System.IO
open System.Text.RegularExpressions
type Url(x:string)=
member this.tostring = sprintf "%A" x
member this.request = System.Net.WebRequest.Create(x)
member this.response = this.request.GetResponse()
member this.stream = this.response.GetResponseStream()
member this.reader = new System.IO.StreamReader(this.stream)
member this.html = this.reader.ReadToEnd()
let linkex = "href=\s*\"[^\"h]*(http://[^&\"]*)\""
let getLinks (txt:string) = [
for m in Regex.Matches(txt,linkex)
-> m.Groups.Item(1).Value
]
let collectLinks (url:Url) = url.html
|> getLinks