Secure Your Code Via AI


Eliezer Kanal
Technical Manager, Cyber Security Foundations, CERT Division
Carnegie Mellon University Software Engineering Institute

Introduction

In this presentation Eliezer Kanal, a Technical Manager at CERT, talks about the possibilities involved with writing secure software that is not vulnerable to cyber-attacks. Developers have created many techniques to help write secure code. While many of these techniques have existed for a while, applying machine learning techniques is able to enhance the efficiency of which these methods are implemented.

Eliezer frames the solution to solve this problem from an NLP perspective. Natural Language Processing is a machine learning algorithms’ attempt to understand, categorize, and predict language. This is done in three steps. First, the data needs to be acquired, and as is typical of machine learning algorithms the more data the better. This data consists of anything that is written and is combined with an algorithm to produce a model.

The next step involves processing this data within the model and using it to take new raw data and produce a representation of what that raw data means. The final step is to generate new language. This last step can be thought of as an autocomplete function that you might see when performing a google search or writing a text message.

Watch Eliezer Kanal’s full presentation here

“You try to know a word by the company it keeps.”

How a machine learning algorithm attempts to dissect language can be thought of in terms of morphology, lexical analysis, and semantics. Morphology includes breaking up words in to component parts. Lexical analysis converts a sequence of characters into a sequence of tokens or strings with an assigned and thus identified meaning. Finally, semantics tries to determine what you are supposed to do with the information gathered. All of this can become quite complicated when looking at normal speech and text. Fortunately, coding is more structured than normal language practices which actually makes NLP better suited for applications related to coding.

One way that machine learning algorithms attempt to tackle these NLP problems is through N-grams. N-grams remove all of the context within a body of text and only analyzes the last “n” words to try and predict the next word. In this case “n” is the variable. A bigram would be a 2n-gram, where the algorithm would look at the last two adjacent words to predict the next. Essentially these n-gram algorithms tell the probability of the next word given the previous “n” words. Again, since code is much more regular than normal language Eliezer explains that looking back 3 grams is usually enough for accurate predictions. Additionally, to the benefit of code, it is very easy to get large data sets to train and test on though sites like GitHub. 

Word to vector is a newer machine learning process that uses ontology to build a giant linked dictionary that contains relationships between words. While these are difficult to make, they are accurate. By looking at the words around a single word, the algorithm can start to build a relational understanding between words. It is able to translate these relationships into a mathematical interpretation. A few examples of how this relationship can be defined are:

  • man + many = men
  • king – man + woman = queen

Eliezer goes on to give a few examples of how these different NLP algorithms are specifically applied to code. A common convention within programming is writing clean code. This is code that is understandable by others and allows for different processes to be passed and shared amongst individuals and team. Machine learning can look for similarities between an agreed upon correct code base and compared to new code to give warnings about what is similar and what is not. Additionally, NLP algorithms can be written to try and find bugs within the code itself. This concept might look for code that is very similar to other code with only small differences that might represent errors. If these tokens are almost identical then a warning would pop up notifying the user of a potential error.

NLP is a powerful tool that has been implemented to determine sentiment analysis, aid in spell check, allow voice text messaging, and is used in home voice recognition devices like Siri and Alexa. The application base is broad and seems adequately suited to be implemented towards cleaning and writing code as Eliezer points out in this presentation. As these processes continue to be perfected the security that they are able to provide increases as well.

For more information, please visit the Software Engineering Institute website (www.sei.cmu.edu) or send me an email at [email protected].


Tags   •   Cybersecurity

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Recent Posts

How AI is Revolutionizing Education -   Artificial intelligence has become increasingly relevant in a number of major industries. We read a lot about how it’s…
Three Amazing Ways AI is Revolutionizing Healthcare - It may not seem like it was too long ago when the idea of artificial intelligence playing a major role…
How 5G is Going to Impact AI in Automation Within Telecom - During this webinar, an industry expert discussed how an automation project comes to life from the initial business problem through…
How Automation Projects Come to Life in Telecom - During this webinar, an industry expert discussed how an automation project comes to life from the initial business problem through…
The Future of AI in Marketing - During this webinar, industry experts discussed where AI in marketing was heading in the future. We’ve included a short transcription…
How AI Has Changed Marketing - During this webinar, industry experts discussed how AI has changed the marketing industry. We’ve included a short transcription of the…
Key Takeaways From Ai4 2020 - Artificial Intelligence Creates the Demand of Innovation, Autonomy, and Personalization Amidst a Crisis There is a seemingly quiet, yet enormous…
Computer Vision Versus Other ML Projects - During this webinar, industry experts discussed computer vision projects versus other machine learning projects within an enterprise setting. We’ve included…
Computer Vision in the Enterprise - During this webinar, industry experts discussed if computer vision computer is commonplace within enterprises that have machine learning models in…
How AI is Enabling Banks to Provide a Better User Experience - During this webinar, industry experts discussed how AI is enabling banks to provide a better user experience for having both…

Popular Posts

Does Healthcare AI Meet Basic Ethics Principles? - Ingrid Vasiliu-Feltes Chief Quality and Innovation Officer MEDNAX, Health Solutions Partner Over the past decade we have noticed an exponential…
Machine Learning and Artificial Intelligence in Banking - Artit "Art" Wangperawong Distinguished Engineer US Bank Introduction Every company’s AI journey is different. We’re all trying to figure out…
Machine Learning for Pricing and Inventory Optimization @ Macy’s - Jolene Mork Senior Data Scientist Macy's Iain Stitt Data Scientist Macy's Bhagyesh Phanse VP, Data Science Macy's Overview In this…
Artificial Intelligence & Cybersecurity: Math Not Magic - Wayne Chung CTO FBI Introduction The field of cybersecurity has slowly progressed from an art to a science. It has…
AI/ML in Investment and Risk Management: Recent Applications, Use Cases, and Implementation Challenges - Arvind Rajan Managing Director - Head of Global & Macro PGIM Fixed Income Introduction Investing is a completely different ballgame…
Top AI Conferences - Interested in learning the latest in AI this year? We’ve compiled a list of the top artificial intelligence conferences in…
Machine Learning in Production: From Research to the Customer - Ameen Kazerouni Lead Data Scientist Zappos Overview In this presentation Ameen Kazerouni, the Lead Data Scientist at Zappos, walks through…
How COVID-19 is Impacting the State of AI in Banking - On this panel, industry experts (listed above) discussed The State of AI in Banking and how COVID-19 is affecting it.…
“Ask Me Anything” with Zappos’s Head of AI/ML Research & Platforms, Ameen Kazerouni - Ameen Kazerouni Head of AI/ML Research & Platforms Zappos Family of Companies Ai4 recently hosted an "Ask Me Anything" session…
The Autonomous Pharmacy: Applying AI and ML to Medication Management Across the Care Continuum - Ken Perez VP of Healthcare Policy Omnicell, Inc. Ken applies artificial intelligence (AI) and machine learning (ML) solutions to medication…