I'm exploring the Azure Cognitive Service for one of our project. We had a Classified Community website called DewaList.com and this site is battling with spam. We had an Robotic Process Automation (RPA) that identify and remove the ads based on the listed spam keyword that we as human identify and marked as a spam.
Our next project is to generate intelligently this spam keyword list using this Machine Learning. The logic is look at regularly new ads or a week or a month that coming through and if it's repetitive more than few time in a day or week. Then flagged that as a spam. The keyword will be based on phone number, website or email.
Can this be done though via this Cognitive Service? What sort of API?
Any pointer that will be good for us as a starting point.
Thanks
I am guessing you're referring the Azure Machine Learning.
You have to configure the inputs to train you model, what did you configure? Below is the code - I would configure div container with the placeholder as an input and co-relate that to your spam prediction, by that I mean extract and feed that as your content to the engine
global inputs_dc, prediction_dc
inputs_dc = ModelDataCollector('model.pkl',identifier="inputs")
prediction_dc = ModelDataCollector('model.pkl', identifier="prediction")
Related
I have recently been tasked to look into Workflow Foundation. The actual goal would be to implement a system in which the end users can define custom workflows in the deployed application (and of course, use them). Personally I have never used WF before (and reading around here on SO people are very doubtful about it - so am I reading those questions/answers), and I am having a hard time finding my way around it given the sparse learning resources available.
Anyway, there are some questions, for example, this, which mention something they call dynamic or user-defined workflows. They point out that WF makes it possible to "rehost" the designer, so that end-users can define their own new workflows after the application is deployed (without developer intervention (?), this is the part I am not really sure about).
I have been told by fellow employees that this way we could implement an application in which once this feature is implemented we would no longer have to keep modifying the application every time a new workflow is to be implemented. However, they also pointed out that they just "heard it", they don't have firsthand experience themselves either.
I have been looking around for samples online but the best thing I could find was a number guess app - barely more than a simple hello world. So not much that would point me to the right direction of how this user-defined workflow feature actually works and how it can be used, what its limitations are etc.
My primary concern is this: it is alright that one can define custom workflows but no workflow is worth a penny without the possibility of actually inputting data throughout the process. For example, even if the only thing I need to do is to register a customer in a complaint management system, I would need the customer's name, contact, etc. If the end user should be able to define any workflow the given toolset makes possible then of course there needs to be a way to provide the workflow consumers with a way of inputting data through forms. If the workflow can be of pretty much any nature then so needs to be the data - otherwise if we need to implement the UIs ourselves then this "end-user throws together a workflow" feature is kind of useless because they would still end up at us requiring to implement a form or some sort of data input for the individual steps.
So I guess that there should be a way of defining the "shape" of the data that needs to be filled at any given user interaction phase of the workflow which I can investigate and dynamically generate forms based on the data. So for example, if I found that the required data was made up of a name and a date of birth, then I would need to render a textbox and a datepicker on the page.
What I couldn't really figure out from the Q&As here and elsewhere is whether this is even possible. Can I define and then later "query" the structure of the data to be passed to the workflow at any point? If so, how? If not, how should this user-defined workflow feature even be used, what is it good for?
To clarify it a little, I could imagine something as specifying a complex type, which would be the view model (input model) in a regular MVC app, and then I could reflect over it, get the properties and render input fields based on that.
Windows Workflow Foundation is about machine workflows, not business workflows. True, it is the foundational tool set Microsoft created for building their business workflow products. But out of the box WWF does not have the components you need to quickly and easily build business workflows. If you want to send an email in a workflow, you have to write that from scratch. Just about anything you can think of doing from a business point of view you have to write from scratch.
If you want to easily create business workflows using Microsoft products check out the workflow stuff in SharePoint. It is the easiest of the Microsoft products to work with (in my experience.) If that does not meet your needs there are other products like BizTalk.
K2 is another company with a business workflow product that uses WWF as their base to more easily build business workflows, the older K2 products actually create web pages automatically to collect the data from the user.
WWF is very low level, arguably it lost traction after they re-wrote the whole thing in 4.0. While not publically stated by Microsoft, my personal opinion is Service Fabric (from Microsoft) achieves the goals WWF originally tried to solve which was a "more robust programming environment."
I want to develop a app/software which understand text from various input and make Decision according to it. Further if any point the system got confused then user can manual supply the output for it and from next time onwards system must learn to give such output in these scenarios. Basically system must learn from its past experience. The job that i want handle with this system is mundane job of resolving customer technical problems.( Production L3 tickets). The input in this case would be customer problem like with the order( like the state in which order is stuck and the state in which he wants it to be pushed) and second input be the current state order( data retrieved for that order from multiple tables of db) . For these two inputs the output would be the desired action to be taken like to update certain columns and fire XML for that order. The tools which I think would required is a Natural Language processor( NLP) library for understanding text and machine learning so as learn from past confusing scenarios.
If you want to use Java libraries for your NLP Pipeline, have a look at Opennlp.
you've a lot of basic support here.
And then you've deeplearning4j where you've a lot of Neural Network implementations in java.
As you want a Dynamic model which can learn from past experiences rather than a static one, you've a number of neural netwrok implementations which you can play with in deeplearning4j.
Hope this helps!
I have had a request by a client to pull in the Lab Name and CLIA information from several different vendors HL7 feeds. Problem is I am unsure what node I should really pull this information from.
I notice one vendor is using ZPS and it appears they have Lab Name and CLIA there. Although I see that others do not use the ZPS. Just curious what would be the appropriate node to pull these from?
I see the headers nodes look really abbreviated with some of my vendors. I need a perfectly readable name like, 'Johnson Hospital'. Any suggestions on the field you all would use to pull the CLIA and Lab Name?
Welcome to the wild world of HL7. This exact scenario is why interface engines are so prevalent and useful for message exchange in the healthcare industry.
Up until, I believe HL7 v2.5.1, there was no standardization around CLIA identifiers. Assuming you are receiving ORU^R01 message, you may want to look at the segment OBX and field 15, which may have producer or lab identifier. The only thing is that there is a very slim chance that they are using HL7 2.5.1 or are implementing the guidelines as intended. There are a lot of reasons for all of this, but the concept here is that you should be prepared to have to do some work here for each and every integration.
For the data, be prepared to exchange or ask for a technical specification from your trading partner. If that is not a possibility or if they do not have one, you should either ask for a sample export of representative messages from their system or if they maybe have a vendor reference. Since the data that you are looking for is not quite as established as something like an address there is a high likelihood that you will have to get this data from different segments and fields from each trading partner. The ZPS segment that you have in your example, is a good reference. Any segment that starts with Z is a custom segment and was created because the vendor or trading partner could not find a good, existing place to store that data, so they made a new segment to store that data themselves.
For the identifiers, what I would recommend is to create a translation or a mapping table for identifiers. So, if you receive JHOSP or JH123 you can translate/map that to 'Johnson Hospital'. Each EMR or hospital system will have their own way to represent different values and there is no guarantee that they will be consistent, so you must be prepared to handle that scenario.
I have the following problem and was thinking I could use machine learning but I'm not completely certain it will work for my use case.
I have a data set of around a hundred million records containing customer data including names, addresses, emails, phones, etc and would like to find a way to clean this customer data and identify possible duplicates in the data set.
Most of the data has been manually entered using an external system with no validation so a lot of our customers have ended up with more than one profile in our DB, sometimes with different data in each record.
For Instance We might have 5 different entries for a customer John Doe, each with different contact details.
We also have the case where multiple records that represent different customers match on key fields like email. For instance when a customer doesn't have an email address but the data entry system requires it our consultants will use a random email address, resulting in many different customer profiles using the same email address, same applies for phones, addresses etc.
All of our data is indexed in Elasticsearch and stored in a SQL Server Database. My first thought was to use Mahout as a machine learning platform (since this is a Java shop) and maybe use H-base to store our data (just because it fits with the Hadoop Ecosystem, not sure if it will be of any real value), but the more I read about it the more confused I am as to how it would work in my case, for starters I'm not sure what kind of algorithm I could use since I'm not sure where this problem falls into, can I use a Clustering algorithm or a Classification algorithm? and of course certain rules will have to be used as to what constitutes a profile's uniqueness, i.e what fields.
The idea is to have this deployed initially as a Customer Profile de-duplicator service of sorts that our data entry systems can use to validate and detect possible duplicates when entering a new customer profile and in the future perhaps develop this into an analytics platform to gather insight about our customers.
Any feedback will be greatly appreciated :)
Thanks.
There has actually been a lot of research on this, and people have used many different kinds of machine learning algorithms for this. I've personally tried genetic programming, which worked reasonably well, but personally I still prefer to tune matching manually.
I have a few references for research papers on this subject. StackOverflow doesn't want too many links, but here is bibliograpic info that should be sufficient using Google:
Unsupervised Learning of Link Discovery Configuration, Andriy Nikolov, Mathieu d’Aquin, Enrico Motta
A Machine Learning Approach for Instance Matching Based on Similarity Metrics, Shu Rong1, Xing Niu1, Evan Wei Xiang2, Haofen Wang1, Qiang Yang2, and Yong Yu1
Learning Blocking Schemes for Record Linkage, Matthew Michelson and Craig A. Knoblock
Learning Linkage Rules using Genetic Programming, Robert Isele and Christian Bizer
That's all research, though. If you're looking for a practical solution to your problem I've built an open-source engine for this type of deduplication, called Duke. It indexes the data with Lucene, and then searches for matches before doing more detailed comparison. It requires manual setup, although there is a script that can use genetic programming (see link above) to create a setup for you. There's also a guy who wants to make an ElasticSearch plugin for Duke (see thread), but nothing's done so far.
Anyway, that's the approach I'd take in your case.
Just came across similar problem so did a bit Google. Find a library called "Dedupe Python Library"
https://dedupe.io/developers/library/en/latest/
The document for this library have detail of common problems and solutions when de-dupe entries as well as papers in de-dupe field. So even if you are not using it, still good to read the document.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
We need to involve our customer development partners in our development process. We're more or less following Agile methodologies. Some customer partners are remote, others closer. We need to minimize travel costs.
Our customers are in health care and tend to be busy, expensive, and hard to schedule.
What practices and technologies have worked to support customer involvement? We're using phone calls, phone conferences and email. We're curious about leveraging wiki techniques and would love to hear what's worked for others.
it doesn't matter whether the customer is in the same cubicle or halfway around the planet, except for communication delays - the critical factor is availability.
a customer that is too busy to answer your emails for several days is going to cause your iteration to be late, or fail
the customer has two critical commitments for agile:
available to answer questions in a timely manner
not to change their mind/priorities during an iteration
the customer must commit to a reasonable service-level agreement (SLA) on availability, e.g. 1-hour response time, or 24-hour response time, etc., and you will need to adjust all estimates and schedules by the lag factor. If the customer will not commit or does not follow through, cancel the iteration and re-plan, bringing the customer's commitment to the forefront again. Do not just "guess" at what you think the customer might want.
Bottom line: without a customer commitment, agile will not work.
My experience with Agile methods is mostly for desktop applications. When our customers are remote, we've spent time to get an engineer to the customer site to configure/install a demo rig. The engineer works with the customer on a test and demo setup/plan that will provide an environment that the customer believes replicates the important aspects of the deployment environment but isolates the demo system from existing infrastructure (so that we can push updates whenever we need to). The engineer also sets up deployment systems to move our applications into production, so that we can "deploy" without being on site. Our applications can self-update (either for each release or each build) and we carefully instrument the releases to log all errors and submit all crashes as bugs to our bug tracker. This way we at least know what went wrong, even if we don't know what's going right.
For each release/build that shows up on the customer's test rig, we provide a (short) screencast, narrated by the project lead or primary developer, demo-ing any new features. The release notes contain any long-term issues or questions we want the customer to think about (i.e. issues that can't be resolved immediately by a phone call or email), and the application displays these notes for the user.
Finally, and possibly most importantly, we get the customer and/or the customer's liaison an account on our calendar server and configure their calendar app to make use of that account. This then goes both ways--we can schedule time (on site, phone, email, etc.) with the customer and they can do the same with our developers.
One option: Install a customer proxy at the "customer partner" site who can extract the information that you need when those customers are available. Have these proxies build the solid relationships that allow them to represent the customer view. Their time is all yours. And when questions arise that they cannot answer, they have ready access to your customer partners - even if in the coffee line.
The whole point of the customer in agile is to have open and free discourse with the developers (IE immediate feedback). If your actual customers cannot provide this, then you need an intermediary/proxy that can fill this role. You don't need actual customers, you just need someone that can represent the customers' interests well enough to meet your customers' needs.
Just a few ideas:
If you do choose to use a Wiki, make sure it supports a whole-wiki-wide "recent changes" list, and preferably one that is specific to the users. The less distant from development people are, the more likely to have email as a metaphor for their computer use. If they can't immediately tell when there's something new for them to see, they will never explore it. You also preferably need ways to signal to them that you need their attention to matters, or they will treat changes like CCs.
I'm a big believer in creating video screen captures of interactions (narrated) and distributing them to users. Unlike a real demo, customers don't feel like they need to interrupt, and they can rewind and re-watch the same interaction over and over, paying attention to little details.
Finally, if you do distribute prototypes, make sure to send someone (or at least a screen sharing session) to see how the prototypes are used. Contextual design is effective. You can count on people using your prototype differently from the way you expect, and you have to understand how they use it to really understand where the issues are, even if they don't report them.
Have you considered something like LogMeIn.
This would allow customers to either log-in to a PC on your network already running your application, or alternatively allow you to install/update the application on one of their computers.
This would solve the remote customer issue and would also support the ongoing continual customer feedback requirement in the agile process.
I used it a previous company for technical support, but there is no reason (except maybe cost) that it would not work for your situation.
It is also a great way to actually see how users are using your application and therefore find out what works and what doesn't.
First of all, make sure that you have a product manager or a product owner close the the developers. This person will be managing the relationship with the customer.
Then, the product manager can demonstrate the product to the customer at the end of each iteration and also ask customers question when the developer need feedback to implement a user story.
It is amazing the positive feedback you can get from customers when you involve them.
We did not use a wiki and most of the communication is done via E-Mail, phone, and a screen sharing application (we are using GoToMeeting, but there are tons of alternative out there).
You should probably do a kick-off once with everyone at one place. Face-to-face time is invaluable. That includes all developers. Prepare some metaplan questions, but also have enough time to just mingle.
I think by most definitions of Agile processes that have high dependence on customer involvement you've already missed "best practice", which would be for an on-site, and preferably "in-team" customer present at all times. So I suppose we're looking for a "next-best practice". :)
There's the possibility of introducing a "proxy customer" on-site. I have to admit to being very sceptical about the value of a proxy customer. I'm concerned about the risk of introducing some sort of second-rate and otherwise unnecessary business analyst function to the mix, with the increased signal-to-noise ratio and potential for garbled messages. It also carries the risk of allowing busy real customers to reduce their involvement in the process, which is likely to lead to dissatisfaction. I wonder if there might be someone with good domain knowledge who has recently retired and might be available to act in this capacity as a consultant?
Communication bandwidth with remote customers is astonishingly lower than face-to-face, something I had not fully realised until I started dealing with users in another country. Even with video the loss is significant.
How long are your iterations? How hard is planning iterations? Might it be easier to go for longer iterations and get more planning done less frequently, or reduce iteration length and go to smaller, but more frequent planning sessions? Are more than one customer involv
Do you have a useable and available build at the end of each iteration? Is there time for involved users to have hands-on time before the next planning session? Keeping users engaged by shipping frequently would seem on the surface to be a Good Idea, which perhaps legislates for small frequent iterations (a week? two weeks?)
The wiki idea might work: have you looked at the FIT Framework? It's a sort of integrated acceptance test/wiki, which might help in getting acceptance tests from remote customers. I think I'd also look to provide some sort of (separate or integrated) "project dashboard", possibly pushed regularly to key customers as well as available on demand. use it as a substitute for things like post-its on whiteboards, Big Visible Charts and the like. There are a number of open-source or low-cost options that may serve - writing your own simple alternative need not be too time-consuming or costly, either.
Above all, remember that "Agile" is a kind-of catch-all label for developments that are carried out with an emphasis on the values and principles espoused in the Agile manifesto. What is considered "best" in one situation may not be so in another. If you understand the principles and regularly review your methods with a critical eye then you're probably going to be close enough to the best practice application to your situation.
I haven't looked at it for some time but with Beck and Fowler on the author list, there should be something useful in Planning Extreme Programming.
In my previous position #drchrono.com I aggregated data/feedback/iteration requests from 20,000 clinicians across the country. The best way to do this is to to evangelize a site like uservoice.com. I held "daily live web demonstrations" with sometimes 50 to 100 doctors (doctors signed up right from our website). In these demos I would demonstrate our current product and evangelize user voice to drive their feedback into a useful tool for our development team. All of this was done remotely and led to a 1,400% overall increase in recurring revenue growth.