Bitcoin Forum
December 12, 2019, 09:03:31 PM *
News: Latest Bitcoin Core release: 0.19.0.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Spam Methods: Artificial Text Generation  (Read 168 times)
actmyname
Copper Member
Legendary
*
Online Online

Activity: 1596
Merit: 1862


Exchange Bitcoin quicky--https://blockchain.com.do


View Profile WWW
August 06, 2019, 05:58:19 PM
Merited by LoyceV (2), d5000 (1), DdmrDdmr (1)
 #1

Thanks to Ddmr's post linking GLTR, I began my work as a devil's advocate and tried to find methods of breaking the detection.

Wasn't hard.

Take an example topic: thoughts on companies becoming too powerful?
Take a snippet from the OP or title: "Companies become so big and they have enormous power over us, do you think companies like this should be broken up by the government or be allowed to reign supreme?"
Use AI-generated text:
Quote
Let's begin by giving an example. Google now has hundreds if not thousands of apps for Android and iOS. They now control Google search, YouTube, and YouTube's search feature. They now control the search advertising we see on our mobile devices. And they now own the entire YouTube network. In addition, they have built massive research and development centers in Silicon Valley to research, develop, and test their new technology.

If you thought "corporations" sounds like something out of a science fiction novel, think again. We now have Google's self-driving car, Amazon's cloud cloud hosting solutions, Google's mobile phone, Amazon's Web Services, and Google's YouTube, to name a few. Imagine if we did this with Apple's iTunes, Amazon's cloud storage, etc. These big companies own so much power and data that they are a "threat to the entire Internet."

For obvious reasons I won't provide the direct link to how I generated the text but most people will find it easily enough.

Now we plug this into GLTR one paragraph at a time. Lots of green, a few bits of red. We need to obfuscate.

Add complexity with thesaurus flips:

Quote
Let's commence by giving an example. Google now has hundreds if not thousands of apps for Android and iOS. They now control Google search, YouTube, and YouTube's search feature. They now control the search advertising we optically discern on our mobile devices. And they now own the entire YouTube network. In integration, they have built massive research and development centers in Silicon Valley to research, develop, and test their incipient technology.

And simplify:

Quote
Let's begin by giving an example. Google now has hundreds if not thousands of apps for Android and iOS. They now control Google search, YouTube, and YouTube's search feature. They now control the search advertising we optically see on our mobile devices. And they now own the whole YouTube network. In integration, they have built huge research and development centers in Silicon Valley to research, develop, and test their incipent technology.

Full text after processes:

Quote
Let's begin by giving an example. Google now has hundreds if not thousands of apps for Android and iOS. They now control Google search, YouTube, and YouTube's search feature. They now control the search advertising we optically see on our mobile devices. And they now own the whole YouTube network. In integration, they have built huge research and development centers in Silicon Valley to research, develop, and test their incipent technology and if you thought "corporations" sounds like something out of a science fiction novel, think again. We now have Google's self-driving car, Amazon's cloud hosting solutions, Google's mobile phone, Amazon's Web Services, and Google's YouTube, to select/name a few. Imagine if we did this with Apple's iTunes, Amazon's cloud storage, etc. These hugely huge companies own so much power and data that they are a "threat to the whole Internet."

¯\_(ツ)_/¯

By fractional percentage, it looks human-generated. Could you tell the difference?

(results can be improved with grammar checkers or better obfuscation for a 'career spammer')
Even with tools like GLTR, advanced spam methods will be harder to detect. The question is: what should happen to these posts?

Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1576184611
Hero Member
*
Offline Offline

Posts: 1576184611

View Profile Personal Message (Offline)

Ignore
1576184611
Reply with quote  #2

1576184611
Report to moderator
suchmoon
Legendary
*
Offline Offline

Activity: 2156
Merit: 4402


nanny of the forum


View Profile
August 06, 2019, 06:25:12 PM
 #2

"hugely huge", really? Grin

As to what should happen... report to mods when you see them. Most likely such garbage will be posted in a useless spam megathread with no purpose other than to pad someone's post count, so it's still low-value spam regardless of verbosity.

ETFbitcoin
Legendary
*
Offline Offline

Activity: 1848
Merit: 2124

Use SegWit and enjoy lower fees.


View Profile WWW
August 06, 2019, 06:35:48 PM
 #3

I almost could applaud their effort to use services to generate news/article automatically.

Maybe it's time to use AI which can detect generated/fake news & combine it with bot to report such spam post/thread automatically Tongue

actmyname
Copper Member
Legendary
*
Online Online

Activity: 1596
Merit: 1862


Exchange Bitcoin quicky--https://blockchain.com.do


View Profile WWW
August 06, 2019, 06:40:33 PM
 #4

Most likely such garbage will be posted in a useless spam megathread with no purpose other than to pad someone's post count, so it's still low-value spam regardless of verbosity.

I imagine that in the future, many more posts will be leveraging AI use. Consider an AI-generated text post, where an algorithm generates the message and the human reads and interprets it. A machine could also be used to make recommendations that might not be obvious to the writer, based on some type of intelligence. For example, recommendations for future words or even sentences could be created by AI algorithms. Artificial neural networks are used to understand human speech so they can accurately predict certain words and phrases so it's clear they'll be used for the generation thereof.
Trivia: I wrote about 25% of that. It didn't capture my prose precisely but could definitely be used to mask someone else's effort... or lack thereof Smiley
I almost could applaud their effort to use services to generate news/article automatically.

Maybe it's time to use AI which can detect generated/fake news & combine it with bot to report such spam post/thread automatically Tongue
Perhaps. Though, with any effort, it will be difficult to decipher what was AI-generated and what was stupid-generated.

Welsh
Staff
Legendary
*
Offline Offline

Activity: 1792
Merit: 1764



View Profile
August 06, 2019, 07:04:44 PM
 #5

This is also used for avoiding detection via stylometry or better known as linguistic fingerprint. Truth is we all have unique ways of conveying our information, and its not different on this forum. Therefore,  we fall into patterns with the words we use, punctuation, and slang (and if you're TMAN the amount of times you say "fuck" in your replies). Right down to how often you quote someone, and how much you edit of said quote can be evaluated, and possibly gauged via a probability guess very much like GLTR guesses the next word.

The problem with stylometry is it can  be combated multiple ways, and the examples in the OP are ways that this can be done. However, most users including those using Tor browser usually do leave some sort of fingerprint behind, because they're not willing to change their habits every time they post or use automatically generated texts like the examples in the OP. Luckily, there are tools out there which capture the percentage of text which has been reused, and you can make an educated assumption that this post has been altered by one of the methods.

These posts despite looking very impressive to begin with are definitely noticeable, and have happened in the past. I believe hilariousandco has posted about this issue before, and a few users that were abusing these tools were dealt with. Despite their sophisticated appearance, they will be dealt with if these posts are reported. However, they do definitely take a lot more attention than the general spam.

As these tools get more sophisticated, and their corpora gets expanded we'll likely see more, and more of this being used.

suchmoon
Legendary
*
Offline Offline

Activity: 2156
Merit: 4402


nanny of the forum


View Profile
August 06, 2019, 07:05:51 PM
Merited by YOSHIE (1)
 #6

"AI" nowadays seems to be a buzzword substitution for "software". There is nothing "intelligent" about it, not at this level anyway. It might be able to generate a sentence, perhaps even answer a simple question that's been answered before (i.e. google it) but not to create an idea, or to defend an opinion, or pretty much anything else of what makes a discussion forum work. So if this shit spreads, it will either turn into a cesspit of meaningless wordy diarrhea (some say it's already happened) or we'll have to clean it out. We don't even have to know if it's generated by AI - if it's garbage it needs to be deleted, and if AI becomes smart enough to prove me wrong... Come to think of it, some of the trolls might be AI, you don't need much intelligence to repeat "earth is a pancake with a magic metal roof" ad nauseum.

Welsh
Staff
Legendary
*
Offline Offline

Activity: 1792
Merit: 1764



View Profile
August 06, 2019, 07:12:09 PM
 #7

We don't even have to know if it's generated by AI - if it's garbage it needs to be deleted, and if AI becomes smart enough to prove me wrong... Come to think of it, some of the trolls might be AI, you don't need much intelligence to repeat "earth is a pancake with a magic metal roof" ad nauseum.
This is the current downfall of the software. I also don't really like labeling everything with "AI" these days, and I'm not even sure if that's even at all possible. This software isn't learning, and is generating its content through spinning words by using a thesaurus, and generating words via its corpus rather than truly learning.  However, all this method really does from a spam point of view is look more constructive at a glance, however the closer you look you'll see that it is as you elegantly put "meaningless wordy diarrhea".

As I've mentioned from a spam point of view its not that good when you've got human moderators, and users looking at the response. However, from an anonymous point of view this is surprisingly effective at getting rid of linguistic habits that you might be conveying unintentionally.   


actmyname
Copper Member
Legendary
*
Online Online

Activity: 1596
Merit: 1862


Exchange Bitcoin quicky--https://blockchain.com.do


View Profile WWW
August 06, 2019, 07:21:22 PM
 #8

"AI" nowadays seems to be a buzzword substitution for "software". There is nothing "intelligent" about it, not at this level anyway. It might be able to generate a sentence, perhaps even answer a simple question that's been answered before (i.e. google it) but not to create an idea, or to defend an opinion, or pretty much anything else of what makes a discussion forum work. So if this shit spreads, it will either turn into a cesspit of meaningless wordy diarrhea (some say it's already happened) or we'll have to clean it out.
Well, now we venture into philosophy and semantics as we delve further into the topic. The whole idea of "thinking" is a notion that can be debated in and of itself.

Perhaps instead of AI, AG would be a more appropriate term: artificial generation.
When it comes to natural selection vs. artificial selection, there is exponentially less difference between the two as you lower the granularity of what is considered "artificial".
Eventually, either all selection is natural or all selection is artificial.

This is the current downfall of the software. I also don't really like labeling everything with "AI" these days, and I'm not even sure if that's even at all possible. This software isn't learning, and is generating its content through spinning words by using a thesaurus, and generating words via its corpus rather than truly learning.  However, all this method really does from a spam point of view is look more constructive at a glance, however the closer you look you'll see that it is as you elegantly put "meaningless wordy diarrhea".
The software digests an input (or rather several million) and tries to create patterns thereof. It's a glorified plagiarist, to be frank.

crwth
Copper Member
Hero Member
*****
Offline Offline

Activity: 1148
Merit: 733


Semper Paratus | https://gunbot.ph


View Profile WWW
August 06, 2019, 07:33:23 PM
 #9

I didn’t know that something like this could be done now. It is to be created sooner or later, I guess. But now posted like this and created a topic about it, wouldn't it make more exposure towards other users that might abuse this kind of thing?

I haven't tried the tool to determine if it's AI-generated or not, but it could be useful. I'm just worried that it could give false positives or false negatives.

Isn't everything that we write connected with thoughts that we may have encountered or read about? Is that how the AG also does the deed? Like searching and combining ideas?

suchmoon
Legendary
*
Offline Offline

Activity: 2156
Merit: 4402


nanny of the forum


View Profile
August 06, 2019, 07:33:54 PM
 #10

Well, now we venture into philosophy and semantics as we delve further into the topic. The whole idea of "thinking" is a notion that can be debated in and of itself.

Nice try Grin

How about "personality".

actmyname
Copper Member
Legendary
*
Online Online

Activity: 1596
Merit: 1862


Exchange Bitcoin quicky--https://blockchain.com.do


View Profile WWW
August 06, 2019, 07:40:12 PM
 #11

I didn’t know that something like this could be done now. It is to be created sooner or later, I guess. But now posted like this and created a topic about it, wouldn't it make more exposure towards other users that might abuse this kind of thing?
Spammers are busy spamming the discussion boards, not looking at meta for spam techniques. Roll Eyes
Nice try Grin

How about "personality".
Personality can be changed. It is not immutable. Nor is it unique. Lobotomizing someone can alter their personality (e.g. the one guy that got a pole through his head)

suchmoon
Legendary
*
Offline Offline

Activity: 2156
Merit: 4402


nanny of the forum


View Profile
August 06, 2019, 07:46:47 PM
 #12

Personality can be changed. It is not immutable. Nor is it unique. Lobotomizing someone can alter their personality (e.g. the one guy that got a pole through his head)

Yeah but can it be faked by the AI word-spinner that you're using? I bet it could fake lobotomy quite convincingly but that's about it.

Add "snarky attempts at humor" to the list of things AI is shit at.

actmyname
Copper Member
Legendary
*
Online Online

Activity: 1596
Merit: 1862


Exchange Bitcoin quicky--https://blockchain.com.do


View Profile WWW
August 06, 2019, 07:59:50 PM
 #13

Personality can be changed. It is not immutable. Nor is it unique. Lobotomizing someone can alter their personality (e.g. the one guy that got a pole through his head)

Yeah but can it be faked by the AI word-spinner that you're using? I bet it could fake lobotomy quite convincingly but that's about it.

Add "snarky attempts at humor" to the list of things AI is shit at.
Here's my best AI-generated joke: __________________


Yes, it really just generated the underline. That's pretty meta.

Alternative generations:

Here's my best AI-generated joke: a man walks into a bar, pulls two fake guns out of his pocket, draws one of them while the other is in a holster, goes into that bar and orders a drink, the bartender says 'How much? How many?' and he says 'Zero,' which I think he means 'Zero.'



Here's my best AI-generated joke: a man walks into a bar, he's drunk and wants to make fun of the fact that he's got some dumb ass AI running his system, but the AI decides to run for president, and the man who is drinking the beer is actually a robot.



Here's my best AI-generated joke: a man walks into a bar: "Why aren't you working tonight?" It's my guess that when he says this, he's thinking to himself, "Maybe tonight's my night off," as though he could have a drink with anyone else if he wanted to. It's quite likely that he's thinking it when he says: "You're not getting any sleep tonight, are you?"

And I think you've realized this, too. A man walks into a bar. What does he do? He walks into a bar, and he goes and sits in a corner, looks around and says nothing. He walks straight into a bar!

But that's not the joke. The joke is the fact that in modern society most people in every other part of society—from top to bottom—go to bed before 7:00 in the morning, except for the guy who's been sitting there all night and nobody has told him. His life, the most important experience of his life, is to sit there until 7:00 am and look at pictures and talk nonsense



Note: these are all generated using OpenAI's GPT-2 model.
I think there's some snark there. Lots of dry humor rather than your conventional comedy. Smiley

suchmoon
Legendary
*
Offline Offline

Activity: 2156
Merit: 4402


nanny of the forum


View Profile
August 06, 2019, 08:12:26 PM
 #14

I think there's some snark there. Lots of dry humor rather than your conventional comedy. Smiley

Maybe it needs another AI to pretend-laugh at those.

actmyname
Copper Member
Legendary
*
Online Online

Activity: 1596
Merit: 1862


Exchange Bitcoin quicky--https://blockchain.com.do


View Profile WWW
August 06, 2019, 08:17:14 PM
 #15

I think there's some snark there. Lots of dry humor rather than your conventional comedy. Smiley
Maybe it needs another AI to pretend-laugh at those.
@LoyceV bring unto me thy laughter
Here's something that might be interesting: using these generative tools as a way to teach English to those that do not speak it natively. Jet Cash had some forum for learning English and this might be a useful instrument in order to see what kind of sentences someone can write.

It isn't perfect: far from it, rather. But it can give some insight into syntax and the use of various words in a variety of forms. Smiley

DdmrDdmr
Hero Member
*****
Offline Offline

Activity: 700
Merit: 2966


There are lies, damned lies and statistics. MTwain


View Profile WWW
August 06, 2019, 08:41:13 PM
Merited by actmyname (1)
 #16

<…>
When I read through the principals that drive GLTR to determine if a posts is human or machine generated, I thought that it would not be too easy to deceive. The basic idea is that it predicts what the next word in a sentence will be, and compares the actual word to the possible sorted list of predictions. The closer the word is to those top predicted, the more “AI” generated it should be. The further away it is from the prediction, the more human it is (according to their algorithm).

I immediately thought that the tool was trying to detect “proper” AI generated texts, that tried to create a comprehensible and grammatically correct text. To throw the algorithm off-track, one would need to look use uncommon words every now and then, instead of simple straightforward ones, making the text more human according to the algorithm. False positives may also be easily called upon for those that use simple terms in their sentences, since these are more likely to be in the predictable lists of words for each term in the sentence. I figure most of what Trump writes would therefore fall under the AI text-generated categorization …

actmyname
Copper Member
Legendary
*
Online Online

Activity: 1596
Merit: 1862


Exchange Bitcoin quicky--https://blockchain.com.do


View Profile WWW
August 06, 2019, 08:53:00 PM
 #17

Something that I've noticed which differentiates AG-constructed content from human content is the mass parallelism.

We have sentence parallelism, which is common. Example: There is a lack of funding, a lack of established rules, and a lack of genuine interest

A number of times, the generated content falls into patterns. You can even see it in the example that I provided in the opening post:

Quote
They now control Google search, YouTube, and YouTube's search feature. They now control the search advertising we see on our mobile devices. And they now own

I see few long, complex sentences spanning multiple conjunctions and prepositions. The generations also have a few errors here and there when it comes to uncommon topics. I assume the larger forms of OpenAI's text generator will provide much better results.

PrimeNumber7
Sr. Member
****
Offline Offline

Activity: 294
Merit: 376


View Profile
August 07, 2019, 02:15:39 AM
 #18

The question is: what should happen to these posts?
I prompted an AI text "generator" to address this question, and this is the response I received:
Quote
The solution seems to be that they should be removed.  I think that this is too much of a security risk, in that even if a machine is not thinking or acting in bad faith, it could easily create such a post that would be considered offensive.  The AI is also created with too much responsibility in mind, and should not be given the authority or permission to make decisions that can destroy lives.  It's a difficult decision to make, but I agree with the authors that it should be possible to get a machine to do a bad thing with no ill will towards anyone other than the AI itself, for no obvious reason to me.

I would say that I have to agree, at least until it starts to sound like nonsense. Current AI publicly available algorithms are not going to generate anything much better than something tangentially on topic. The quoted text above what the response I got after my first try, but subsequent topics poised looked like even more nonsense. I believe most publicly available text AI code is not very good, and probably will not be for some time.

There are multiple AI related projects researching how to generate speech, whose researchers have decided to stop making the underlying code public out of fear it will be misused. Those who are trying to teach computers how to create algorithms after being fed large sets of data (AI researchers) are making six and seven figure salaries, so I don't think they would be interested in profiting from spamming a forum with posts. I believe a bigger risk would be the spread of fake news, and the like.  

smartmixer.io▀  ▀  ▀  ▀  ▀
▄  ▄  ▄  ▄  ▄

▀  ▀  ▀  ▀  ▀
▄  ▄  ▄  ▄  ▄

▀  ▀  ▀  ▀  ▀
▄  ▄  ▄  ▄  ▄

▀  ▀  ▀  ▀  ▀
▄  ▄  ▄  ▄  ▄

▀  ▀  ▀  ▀  ▀
Make your Cryptos untraceable!
(( ███████ ((    TELEGRAM    )) ███████ ))
▄▄███████▄▄
▄███████▀███████▄
▄███▀▀▀ ▄▄▄ ▀▀▀███▄
▄███ ▄▀▀▀   ▀▀▀▄ ███▄
████ █  ▄   ▄█ █ ████
████▌▐▌ ▀█▄█▀ ▐▌▐████
▀████ ▀▄  ▀  ▄▀ ████▀
▀████▄ ▀▄▄▄▀ ▄████▀
▀█████▄▄ ▄▄█████▀
▀▀███████▀▀
.

NO LOGS
▄▄███████▄▄
▄██████▀▀▀██████▄
▄█████▀ ▄▄▄ ▀█████▄
▄██████ ▀   █ ██████▄
███████   █▀  ███████
████████▄ ▄ ▄████████
▀████▀         ▀████▀
▀███   ▄   ▄   ███▀
▀███████████████▀
▀▀███████▀▀
.

NO SIGN-UP
▄▄███████▄▄
▄███████████████▄
▄███████▀   ▀█████▄
▄████▀  ▀      █████▄
████     ▄▀▄  ▀ ▀████
███    ▄▀▄ ▄▀▄    ███
▀███▄▄  ▀█ █▀   ▄███▀
▀████████ ████████▀
▀███████████████▀
▀▀███████▀▀
.

70% COMSN
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
MIX NOW!
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀  ▀  ▀  ▀  ▀
▄  ▄  ▄  ▄  ▄

▀  ▀  ▀  ▀  ▀
▄  ▄  ▄  ▄  ▄

▀  ▀  ▀  ▀  ▀
▄  ▄  ▄  ▄  ▄

▀  ▀  ▀  ▀  ▀
▄  ▄  ▄  ▄  ▄

▀  ▀  ▀  ▀  ▀
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!