Tech topics

What is Machine Learning?

Overview

Machine learning is a subset of artificial intelligence focused on building systems that can learn from historical data, identify patterns, and make logical decisions with little to no human intervention. It is a data analysis method that automates the building of analytical models through using data that encompasses diverse forms of digital information including numbers, words, clicks and images.

Machine learning applications learn from the input data and continuously improve the accuracy of outputs using automated optimization methods. The quality of a machine learning model is dependent on two major aspects:

The quality of the input data. A common phrase around developing machine learning algorithms is “garbage in, garbage out”. The saying means if you put in low quality or messy data then the output of your model will be largely inaccurate.
The model choice itself. In machine learning there are a plethora of algorithms that a data scientist can choose, all with their own specific uses. It is vital to choose the correct algorithm for each use case. Neural networks are an algorithm type with significant hype around it because of the high accuracy and versatility it can deliver. However, for low amounts of data choosing a simpler model will often perform better.

The better the machine learning model, the more accurately it can find features and patterns in data. That, in turn, implies the more precise its decisions and predictions will be.

OpenText™ ArcSight Intelligence for CrowdStrike

Unprecedented protection combining machine learning and endpoint security along with world-class threat hunting as a service.

Learn more

Machine learning

Why is Machine Learning important?

Why use machine learning? Machine learning is growing in importance due to increasingly enormous volumes and variety of data, the access and affordability of computational power, and the availability of high speed Internet. These digital transformation factors make it possible for one to rapidly and automatically develop models that can quickly and accurately analyze extraordinarily large and complex data sets.

There are a multitude of use cases that machine learning can be applied to in order to cut costs, mitigate risks, and improve overall quality of life including recommending products/services, detecting cybersecurity breaches, and enabling self-driving cars. With greater access to data and computation power, machine learning is becoming more ubiquitous every day and will soon be integrated into many facets of human life.

How does Machine Learning work?

There are four key steps you would follow when creating a machine learning model.

Choose and prepare a training data set
Training data is information that is representative of the data the machine learning application will ingest to tune model parameters. Training data is sometimes labeled, meaning it has been tagged to call out classifications or expected values the machine learning mode is required to predict. Other training data may be unlabeled so the model will have to extract features and assign clusters autonomously.

For labeled, data should be divided into a training subset and a testing subset. The former is used to train the model and the latter to evaluate the effectiveness of the model and find ways to improve it.
Select an algorithm to apply to the training data set
The type of machine learning algorithm you choose will primarily depend on a few aspects:
- Whether the use case is prediction of a value or classification which uses labeled training data or the use case is clustering or dimensionality reduction which uses unlabeled training data
- How much data is in the training set
- The nature of the problem the model seeks to solve
For prediction or classification use cases, you would usually use regression algorithms such as ordinary least square regression or logistic regression. With unlabeled data, you are likely to rely on clustering algorithms such as k-means or nearest neighbor. Some algorithms like neural networks can be configured to work with both clustering and prediction use cases.
Train the algorithm to build the model
Training the algorithm is the process of tuning model variables and parameters to more accurately predict the appropriate results. Training the machine learning algorithm is usually iterative and uses a variety of optimization methods depending upon the chosen model. These optimization methods do not require human intervention which is part of the power of machine learning. The machine learns from the data you give it with little to no specific direction from the user.
Use and improve the model
The last step is to feed new data to the model as a means of improving its effectiveness and accuracy over time. Where the new information will come from depends on the nature of the problem to be solved. For instance, a machine learning model for self-driving cars will ingest real-world information on road conditions, objects and traffic laws.

Machine Learning methods

What is supervised Machine Learning

Supervised machine learning algorithms use labeled data as training data where the appropriate outputs to input data are known. The machine learning algorithm ingests a set of inputs and corresponding correct outputs. The algorithm compares its own predicted outputs with the correct outputs to calculate model accuracy and then optimizes model parameters to improve accuracy.

Supervised machine learning relies on patterns to predict values on unlabeled data. It is most often used in automation, over large amounts of data records or in cases where there are too many data inputs for humans to process effectively. For example, the algorithm can pick up credit card transactions that are likely to be fraudulent or identify the insurance customer who will most probably file a claim.

What is unsupervised Machine Learning

Unsupervised machine learning is best applied to data that do not have structured or objective answer. There is no pre-determination of the correct output for a given input. Instead, the algorithm must understand the input and form the appropriate decision. The aim is to examine the information and identify structure within it.

Unsupervised machine learning works well on transactional information. For example, the algorithm can identify customer segments who possess similar attributes. Customers within these segments can then be targeted by similar marketing campaigns. Popular techniques used in unsupervised learning include nearest-neighbor mapping, self-organizing maps, singular value decomposition and k-means clustering. The algorithms are subsequently used to segment topics, identify outliers and recommend items.

What is the difference between supervised and unsupervised Machine Learning?

Aspect	Supervised learning	Unsupervised learning
Process	Input and output variables are provided to train model.	Only input data is provided to train model. No output data is used.
Input Data	Uses labeled data.	Uses unlabeled data.
Algorithms Supported	Supports regression algorithms, instance-based algorithms, classification algorithms, neural networks and decision trees.	Supports clustering algorithms, association algorithms and neural networks.
Complexity	Simpler.	More complex.
Subjectivity	Objective.	Subjective.
Number of Classes	Number of classes is known.	Number of classes is unknown.
Primary Drawback	Classifying massive data with supervised learning is difficult.	Choosing number of clusters can be subjective.
Primary Goal	Train the model to predict output when presented with new inputs.	Find useful insights and hidden patterns.

What can Machine Learning do: Machine Learning in the real world

Whereas machine learning functionality has been around for decades, it is the more recent ability to apply and automatically compute complex mathematical calculations involving big data that has given it unprecedented sophistication. The realm of machine learning application today is vast ranging from enterprise AIOps to online retail. Some real world examples of machine learning capabilities today include the following:

Cyber Security using behavioral analytics to determine suspicious or anomalous events that may indicate insider threats, APTs, or zero-day attacks.
Self-driving car projects, such as Waymo (a subsidiary of Alphabet Inc.) and Tesla’s Autopilot which is a step below actual self-driving cars.
Digital assistants like Siri, Alexa and Google Assistant that search the web for information in response to our voice commands.
User-tailored recommendations that are driven by machine learning algorithms on websites and apps like Netflix, Amazon and YouTube.
Fraud detection and cyber resilience solutions that aggregate data from multiple systems, unearth clients exhibiting high-risk behavior and identify patterns of suspicious activity. These solutions can use supervised and unsupervised machine learning to classify transactions for financial organizations as fraudulent or legitimate. This is why a consumer can get texts from their credit card company verifying if an unusual purchase using the consumer’s financial credentials is legitimate. Machine learning has gotten so advanced in the area of fraud that many credit card companies advertise no-fault to consumers if fraudulent transactions are not caught by the financial organization’s algorithms.
Image recognition has had significant advancements and can be reliably used for facial recognition, reading handwriting on deposited checks, traffic monitoring and counting the number of people in a room.
Spam filters that detect and block unwanted mail from inboxes.
Utilities that analyze sensor data to find ways of improving efficiency and cutting costs.
Wearable medical devices that capture in real time valuable data for use in assessing patient health continuously.
Taxi apps evaluating traffic conditions in real time and recommending the most efficient route.
Sentiment analysis determines the tone of a line of text. Good applications of sentiment analysis are Twitter, customer reviews, and survey respondents:
- Twitter: one way to evaluate brands is to detect the tone of tweets directed toward a person or company. Companies such as Crimson Hexagon and Nuvi provide this real time.
- Customer reviews: You can detect the tone of customer reviews to evaluate how your company is doing. This is especially useful if there is no rating system paired with free text customer reviews.
- Surveys: Using sentiment analysis on free text survey responses can give you at a glance evaluation of how your survey respondents feel. Qualtrics has this implemented with their surveys.
Market segmentation analysis uses unsupervised machine learning to cluster customers according to buying habits to determine different types or personas of customers. This allows you to better know your most valuable or underserved customers.
It is easy to press ctrl+f to search a document for exact words and phrases, but if you do not know the exact wording you are looking for it can be difficult to search documents. Machine learning can use techniques such as fuzzy methods and topic modelling can make this process much easier by allowing you to search documents without knowing the exact phrasing you are looking for.

Machine Learning’s role will only continue to grow

As data volumes grow, computing power increases, Internet bandwidth expands and data scientists enhance their expertise, machine learning will only continue to drive greater and deeper efficiency at work and at home.

With the ever increasing cyber threats that businesses face today, machine learning is needed to secure valuable data and keep hackers out of internal networks. Our premier UEBA SecOps software, ArcSight Intelligence, uses machine learning to detect anomalies that may indicate malicious actions. It has a proven track record of detecting insider threats, zero-day attacks, and even aggressive red team attacks. Take the first step to securing your organization by scheduling a demo of ArcSight Intelligence today!

Resources

AI and Machine Learning 101 - Part 1: Machine vs. Human Learning

AI and Machine Learning 101 - Part 2: The Neural Network and Deep Learning

What is Artificial Intelligence?

What is AIOps?

What is Machine Learning?

Overview

OpenText™ ArcSight Intelligence for CrowdStrike

Machine learning

Why is Machine Learning important?

How does Machine Learning work?

Machine Learning methods

What is the difference between supervised and unsupervised Machine Learning?

What can Machine Learning do: Machine Learning in the real world

Machine Learning’s role will only continue to grow

Resources

AI and Machine Learning 101 - Part 1: Machine vs. Human Learning

AI and Machine Learning 101 - Part 2: The Neural Network and Deep Learning

What is Artificial Intelligence?

What is AIOps?

Predictive Analytics using Machine Learning

AI and security: Machine learning is a threat detection game-changer

Best practices for Machine Learning in the SOC

How to get started with Machine Learning

MITRE ATT&CK Machine Learning

OpenText™ Interset Unplugged podcast

Test smarter with OpenText™ UFT One’s new AI-based capabilities

How can we help?

opentext.aiopentext.ai

Enterprise ApplicationsEnterprise Applications

IndustryIndustry

Line of BusinessLine of Business

Smarter with OpenTextSmarter with OpenText

Information management at scaleInformation management at scale

AI CloudAI Cloud

Application ModernizationApplication Modernization

Business Network CloudBusiness Network Cloud

Content CloudContent Cloud

Cybersecurity CloudCybersecurity Cloud

Developer CloudDeveloper Cloud

DevOps CloudDevOps Cloud

Experience CloudExperience Cloud

IT Operations CloudIT Operations Cloud

PortfolioPortfolio

Your journey to successYour journey to success

Customer SupportCustomer Support

Customer Success ServicesCustomer Success Services

Strategy & Advisory ServicesStrategy & Advisory Services

Consulting ServicesConsulting Services

Learning ServicesLearning Services

Managed ServicesManaged Services

Find an OpenText PartnerFind an OpenText Partner

Find a Partner SolutionFind a Partner Solution

Grow as a PartnerGrow as a Partner

Become a Partner

Asset LibraryAsset Library

BlogsBlogs

EventsEvents

CommunitiesCommunities

Customer StoriesCustomer Stories

OpenText NavigatorOpenText Navigator

OpenText™ ArcSight Intelligence for CrowdStrike

Why is Machine Learning important?

How does Machine Learning work?

Machine Learning methods

What is the difference between supervised and unsupervised Machine Learning?

What can Machine Learning do: Machine Learning in the real world

Machine Learning’s role will only continue to grow

Footnotes

opentext.ai

Enterprise Applications

Industry

Line of Business

Smarter with OpenText

Information management at scale

AI Cloud

Application Modernization

Business Network Cloud

Content Cloud

Cybersecurity Cloud

Developer Cloud

DevOps Cloud

Experience Cloud

IT Operations Cloud

Portfolio

Your journey to success

Customer Support

Customer Success Services

Strategy & Advisory Services

Consulting Services

Learning Services

Managed Services

Find an OpenText Partner

Find a Partner Solution

Grow as a Partner

Asset Library

Blogs

Events

Communities

Customer Stories

OpenText Navigator