Search

Suggested keywords:
  • Java
  • Docker
  • Git
  • React
  • NextJs
  • Spring boot
  • Laravel

Google Open Sources Magika AI library to identify file types

  • Share this:

post-title

Google has open sourced Magika which helps to identify file types using deep learning. This will help cyber security experts to efficiently detect malware. This library also useful in other domains where documents usage is more. Now-a-days mostly all forms are filled online. User has to upload photos and other relevant documents to the online service. The service should be capable to detect the file types otherwise it will end up accepting malware or scripts embedded in images or word document. 

Is there any library already available? Is AI based library really required?

There are couple of open source text analysis library exist. To name a few libmagic, Apache Tikka helps to detect the file types. Each file types has a format specification and based on that its content is stored so that any parser which understands the file format can read it. To give an example, PDF, PNG, JPEG etc has a file format specification and based on the specification parser and writer libraries are implemented. 

The libraries which currently detects file formats basically understands the file format specification and based on that it detects the file type. They can be right almost 80% - 90% of the cases but there are possibility of error and Google wants to address this using machine learning. 

There are 100s of file types and hackers are really smart to embed a script inside a valid file format. Consider if any one uploads malicious content to Google drive or upload it as attachment in GMail. This will affect google servers and also the users who downloads it. Google has built this library in-house to detect the file types and now they have open sourced it for others to use. 

Magika is a Python based library and it can be installed using below command.

pip install magika

Editorial Team

About author
This article is published by our editorial team.