Plagiarism is a very big problem for online publishers. Large publishers have more than 2 Million views per day. They want to make sure, that their content remains on their website

We used machine learning algorithms to develop a solution that is accurate and allows for real-time plagiarism detection.



Our partner specializes in creating content for websites, articles, and e-magazines. Their daily publications reach millions of recipients.

They were fully aware of the problem, so they started looking for an efficient way of finding plagiarism -content duplication- across the internet. To achieve this, they needed an expert in AI who could help them design and build such a tool.


While “copy-paste” text plagiarism was easy to detect through simple word occurrence statistics, paraphrased text plagiarism detection required word embedding and optimal transport methods to be used. The latter method uses deep neural networks which are standard practice for semantic text matching but are slower and require dedicated hardware to run smoothly.

For image plagiarism detection, we had to consider a situation where an image is cropped, added to a series of other pictures, or modified in any other way. To detect it, we used the object detection system to extract people and other items for performing the validation test. We implemented it as a separate service to make the process run quickly. This way, the system is faster, more scalable, and cheaper when compared with object detection APIs.

The tricky part of the project was the responsiveness of the mobile version of the service. Certain page elements are organized in tables, which are difficult to be displayed in the right proportion on a mobile device. Kacper had to overcome this issue.


  • Easy access to google form “Copyright Removal”. Fill in the information and click on the google form. You will be directed to the right page straight away.
  • Send a direct email to the person who is related to the plagiarized content.
  • Send an email to your legal team.
  • Get a link to the plagiarized content.
  • Check if a picture is a plagiarism


Copysearcher crawls through nearly 200 thousand web-pages every two hours. Machine learning algorithms analyze text and images. The results of the search are listed clearly in a CRM so that the user can access them with ease.

After this, the user can send a direct email to the person who plagiarized to content, contact the company’s legal department, or enter Google’s “Copyright Removal” form. The next steps include real-time copy-checker, SEO text generation, and video copy search.


xBerry is a professional and trustworthy partner for software development. Together, we solved one of the largest problems that online publishers struggle with every day. I enjoyed working with xBerry. I understood what developers were working on, and I was very content with how smoothly they helped me transform a concept into a real product. I am confident to recommend xBerry to anyone interested in agile software development.

