Meta new Gen A.I. clearly shows the urge of gov-regulations to protect ourselves

Last Wednesday, Zuckemberg’s company released a new image generator in stand-alone A.I version.

With a slogan that reads: ‘Imagine with Meta A.I.’, their new image-synthesis model is in direct competition with other players in the artificial intelligence market. However, this technology is not entirely new. Previously, in fact, it had already been released as an additional feature in its messaging app as well as on Instagram itself. What has happened now is that they are marketing it as a separate product, under the name ‘Meta A.I.’, which will encompass a whole series of tools based on generative and non-generative A.I.

What is really shocking about Meta’s announcement, however, is the admission about the controversial source of the images used in the dataset that allows A.i. to stitch out ‘new’ images.

Built atop Emu, trained on user-generated Facebook and Instagram images

If you currently have Facebook or Instagram profile active, it is very likely that those images of you, your friends, your children and your artwork, that you have personally uploaded, have actually enabled Meta to develop this technology. It is in fact the billions of images (specifically 1.1 billion) posted by users on its two social networks that have been used to develop this new A.I.

- Advertisement -

According to Meta, it only used public images in its development. 

Since it takes such a large amount of data to teach effective models, a combination of sources are used for training. These sources include information that is publicly available online and licensed information, as well as information from Meta’s products and services,” Meta said in a policy document (that you can read here: https://www.facebook.com/privacy/genai). 

When we collect public information from the internet or license data from other providers to train our models, it may include personal information. For example, if we collect a public blog post it may include the author’s name and contact information. When we do get personal information as part of this public and licensed data that we use to train our models, we don’t specifically link this data to any Meta account.”

We managed to make Meta’s A.i. shows how even famous IP have been trained, by mispelling their name in the prompt (otherwise a message will appear saying that it cannot generate the image”)

How to avoid being scraped and Facebook’s statements on their scraping process

This way to avoid being scraped even further you can switch you profile or shared images as ‘only for ‘me or ‘only for ‘friends. This way no scraping should take place. This is of course subject to the usage policy, which as we all know can change over time.

- Advertisement -

But on what basis does Meta believe it has the right to use our public data without even asking?
The company itself explains it to us, again in the statement you can find at the previous link (https://www.facebook.com/privacy/genai):

“The use of public information and licensed data is in our interest; we are committed to being transparent about the legal basis we refer to when processing this information. In the EU and the UK, we refer to the legitimate interest basis for collecting and processing personal information included in public and licensed sources that enable us to train our generative AI models. In other jurisdictions, where applicable, we refer to an appropriate legal basis for collecting and processing this data.”

So basically if they think it may be useful for them to use your kid’s photo, they’ll use it.
As simple as that.

Not only meta scrapes your images, but even hidden actors.

But in reality Meta is not even your worse enemy.
Although Meta has only now finalised a generative image A.i., images posted publicly on this platform have already been scraped by other companies and hidden subjects. Specifically, it is news from last April that the controversial company ClearView, which used A.I. models to develop facial recognition systems, later also employed by the police, has actually scraped about 30 billion images from Facebook (yes, the number is right).

- Advertisement -
Some of the colleges of other people’s pixels, showcased in Meta A.I.’s landing page.

And this is just one piece of news actually validated by the CEO of Clearview himself. The reality could be far worse, since literally every single image posted publicly on social networks is in fact material that is easily accessible to scraping systems, which can be used for the most diverse purposes.

In this case, Meta only achieved what others were already doing with the material published on its platforms.

Try this and see for yourself the results that come up by typing ‘Scrape Facebook Images’ into the google search bar.

These are just the first results… of 30 million pages on how to scrape images from Facebook

The amount of queries, tutorials, plug-ins, entire software, just dedicated to this task is impressive. Imagine then how easy it is for someone who is going to use your material to take it and make datasets with it.

A solid solution is not there. Yet.

A real solution to this problem unfortunately does not exist. And although you can now protect your works manually via identity protection systems such as the very powerful Glaze, this would not protect yourself or other artists from the possible threats.

New images for datasets can in fact be taken from film stills (think Miyazaki’s films, Pixar), video game footage, specialised magazines or even (in the case of specific style and design needs) photographs from art books, manga, museum catalogues, in short anything that is printed or can be broken down into frames.

A significant breakthrough could therefore be given by Nightshade, if used en masse, because it would effectively undermine those datasets,  in the event of their being absorbed, effectively destroying the ‘copy core’ of these   Gen A.I.s in their entirety.

But the real solution to this problem could only come from a very stringent regulation of generative a.i.’s. The ‘very stringent’ is trivially the obligation to  declare and certificate the origin of the material used for the data set, which would lead to the obvious end of many of these projects as a result of the inability to find material in a lawful manner or from lawful sources. This is why there is an urgent need to press national governments to force them to act in this targeted manner.

However, even here new problems are beginning to appear on the horizon. Many of the sources from which datascraping is legal (such as the internet archive) are strangely being filled with pirated material. Why is this being allowed?
We will discuss this in a future article.

Submit your rage to Meta’s inbox.

Meanwhile if you want to contact, contest or get more info about how Meta uses your personal data, they made a contact form( that you can access here https://www.facebook.com/help/contact/510058597920541) that focuses on the generative A.i. topic and questions.

However, it must be emphasised that anyone who has tried to use this form to inform themselves and discourage Meta from using their personal data has so far achieved no significant results. On the contrary, it would seem that Meta itself is asking the user for a way to prove that his or her data have actually been used. Which is obviously impossible.

If any reader has better luck, please contact us directly and we will be happy to hear from you and report your testimony.

Share This Story
Leave a Comment
The Art Journal
×