Skip to main content

Why Use Machine Learning?

Consider how you would write a spam filter using traditional programming techniques (Figure 1-1):

1. First you would look at what spam typically looks like. You might notice that

some words or phrases (such as “4U,” “credit card,” “free,” and “amazing”) tend to

come up a lot in the subject. Perhaps you would also notice a few other patterns

in the sender’s name, the email’s body, and so on.

2. You would write a detection algorithm for each of the patterns that you noticed,

and your program would flag emails as spam if a number of these patterns are

detected.

3. You would test your program, and repeat steps 1 and 2 until it is good enough.


Since the problem is not trivial, your program will likely become a long list of com‐

plex rules—pretty hard to maintain.

In contrast, a spam filter based on Machine Learning techniques automatically learns

which words and phrases are good predictors of spam by detecting unusually fre‐

quent patterns of words in the spam examples compared to the ham examples

(Figure 1-2). The program is much shorter, easier to maintain, and most likely more

accurate 


Moreover, if spammers notice that all their emails containing “4U” are blocked, they

might start writing “For U” instead. A spam filter using traditional programming

techniques would need to be updated to flag “For U” emails. If spammers keep work‐

ing around your spam filter, you will need to keep writing new rules forever.

In contrast, a spam filter based on Machine Learning techniques automatically noti‐

ces that “For U” has become unusually frequent in spam flagged by users, and it starts

flagging them without your intervention (Figure 1-3).


Another area where Machine Learning shines is for problems that either are too com‐

plex for traditional approaches or have no known algorithm. For example, consider

speech recognition: say you want to start simple and write a program capable of dis‐

tinguishing the words “one” and “two.” You might notice that the word “two” starts

with a high-pitch sound (“T”), so you could hardcode an algorithm that measures

high-pitch sound intensity and use that to distinguish ones and twos. Obviously this

technique will not scale to thousands of words spoken by millions of very different

people in noisy environments and in dozens of languages. The best solution (at least

today) is to write an algorithm that learns by itself, given many example recordings

for each word.

Finally, Machine Learning can help humans learn (Figure 1-4): ML algorithms can be

inspected to see what they have learned (although for some algorithms this can be

tricky). For instance, once the spam filter has been trained on enough spam, it can

easily be inspected to reveal the list of words and combinations of words that it

believes are the best predictors of spam. Sometimes this will reveal unsuspected cor‐

relations or new trends, and thereby lead to a better understanding of the problem.

Applying ML techniques to dig into large amounts of data can help discover patterns

that were not immediately apparent. This is called data mining.

To summarize, Machine Learning is great for:
• Problems for which existing solutions require a lot of hand-tuning or long lists of
rules: one Machine Learning algorithm can often simplify code and perform bet‐
ter.
• Complex problems for which there is no good solution at all using a traditional
approach: the best Machine Learning techniques can find a solution.
• Fluctuating environments: a Machine Learning system can adapt to new data.
• Getting insights about complex problems and large amounts of data.


Comments

Popular posts from this blog

Customer Engagement with Chatbots and Collaboration Bots: Methods, Chances and Risks of the Use of Bots in Service and Marketing

 Relevance and Potential of Bots for Customer  Obtaining information, flight check-ins or keeping a diary of one’s own diet—all of this is possible in dialogue today. Customers can ask questions via Messenger or WhatsApp or initiate processes. This service is comfortable for the customer, available at all times via mobile and promises fast answers or smooth problem-solving. A meanwhile strongly increasing number of companies is already relying on this means of contact and the figures on chat usage speak in favour of this means supplementing or even replacing many apps and web offers in the future. The reasons for this are manifold. Figures of the online magazine Business Insider 1 reveal a clear develop- ment away from the public post to the use of private messaging services such as Facebook Messenger or WhatsApp. Facebook meanwhile has a user base of around 1.7 billion people worldwide; 1.1 billion people use WhatsApp, and Twitter can nevertheless still record 310 million us...

Robot Journalism Is Becoming Creative

 Algorithms are able to automatically search the Web for information, pool it and create a readable piece of writing. In addition, data-based reports in the area of sport, the weather or finances are already frequently created automat- ically today. Recently, for example, merely a few minutes after Apple had announced their latest quarterly figures, there was a report by the news agency Associated Press (AP): “Apple tops Street 1Q forecasts”. The financial report deals solely with the mere financial figures, without any human assistance whatsoever. Yet, AP was able to publish their report entirely via AI in line with the AP guidelines. For this purpose, AP launched their corresponding platform Wordsmith at the beginning of 2016, which automatically creates more than 3000 of such financial reports every quarter, and which are pub- lished fast and accurately. It is no longer that easy to distinguish between whether an algorithm or a human has written a text. Another exception of rece...

A Bluffer’s Guide to AI, Algorithmics and Big Data

 Big Data—More Than “Big” A few years ago, the keyword big data resounded throughout the land. What is meant is the emergence and the analysis of huge amounts of data that is generated by the spreading of the Internet, social media, the increasing number of built-in sensors and the Internet of Things, etc. The phenomenon of large amounts of data is not new. Customer and credit card sensors at the point of sale, product identification via barcodes or RFID as well as the GPS positioning system have been producing large amounts of data for a long time. Likewise, the analysis of unstructured data, in the shape of business reports, e-mails, web form free texts or customer surveys, for example, is frequently part of internal analyses. Yet, what is new about the amounts of data falling under the term “big data” that has attracted so much attention recently? Of course, the amount of data avail- able through the Internet of Things (Industry 4.0), through mobile devices and social media has ...