ArticleTrader.com
  

 Main Menu

  Home
  Member Login
  Forum
  Submit Article
  RSS Feeds
  Contact Us
  About

 Services

  Article Distribution
  Link Building

 Tools

  ArticleMS
  Directory Tracker

 Categories

  Automotive
  Business
  Computers
  Entertainment
  Finance
  Food
  Health
  Home and Family
  Internet
  Legal
  Science
  Self Improvement
  Shopping
  Society
  Sports
  Technology
  Travel
  Writing





























 
Welcome! ( Login | Register )
» ArticleTrader Forums » ArticleMS » Support » Duplicate Article Remover
Members Search Help

Pages: << prev 1, 2 next >>
Duplicate Article Remover
This member is currently offline bveditz
Member


Member Level

Group: Members
Posts: 81
Joined: Oct 13, 2007

Go to the bottom of the page Go to the top of the page

I spent the last few hours working on this since I was catching duplicate content.  This SQL code will look for articles with matching titles and where the first 10 characters of the article text matches.  I figure if that much of it matches, it's probably a duplicate.  You need to paste this code into phpmyadmin if ya got it.  I tried to make it into a plugin, but didn't have much luck figuring that out.  If anyone wants to do that, I think it would be pretty handy.  Short of a plugin, perhaps a script that can be run as a cron job.  btw, this creates backup tables, just in case.

Anyway, thought I'd share:

Code:
            CREATE TABLE temp_ams_articles LIKE ams_articles;
            INSERT temp_ams_articles 
            SELECT * FROM ams_articles 
            WHERE 1 
            GROUP BY article_title, left(article_text,10);

            CREATE TABLE temp_ams_articleratings LIKE ams_articleratings;
            INSERT temp_ams_articleratings SELECT * FROM ams_articleratings;

            DELETE FROM temp_ams_articleratings
            USING temp_ams_articleratings, temp_ams_articles
            WHERE temp_ams_articles.article_id = temp_ams_articleratings.articlerating_articleid;

            DROP TABLE if exists backup_ams_articleratings;
            DROP TABLE if exists backup_ams_articles;

            RENAME TABLE ams_articles TO backup_ams_articles, temp_ams_articles TO ams_articles;
            RENAME TABLE ams_articleratings TO backup_ams_articleratings, temp_ams_articleratings TO ams_articleratings;   

.........................
Brian, Nikira Internet Ventures
Populate.NET Article Directory with Twitter & Media Support - TheHostingCompany.NET Blog Hosting - Arid.NET Technology Innovation Blog

Posted Mar 6, 2008, 6:00 am
This member is currently offline tdecker81
Newbie


Member Level

Group: Members
Posts: 13
Joined: Jan 24, 2008

Go to the bottom of the page Go to the top of the page

Thanks. I'm sure someone will run with this and make a php file to run it with a cron job. I wish I knew how I'd love to have it too.

Home Business Article Directory

Posted Mar 6, 2008, 7:19 am
This member is currently offline Figster
Member


Avatar

Member Level

Group: Members
Posts: 53
Joined: Jun 6, 2006

Go to the bottom of the page Go to the top of the page

Maybe instead of a cron job, a button in the admin to check for duplicates, and maybe it can run when an author submits an article.. or run when you review article in admin... something like that would be very useful. Great Idea.. and thanks for sharing it!

Posted Mar 6, 2008, 2:37 pm
This member is currently offline mark1
Puddle


Avatar

Member Level

Group: Members
Posts: 257
Joined: Feb 2, 2008

Go to the bottom of the page Go to the top of the page

Thanks for sharing, I have saved it for when I'll feel more confident with phpmyadmin!

If someone needs a speed reader to help moderating articles, check this out http://www.softology.com I am averaging 300 words per minute with it after just 3 days.
I haven't sussed out how the trial version works, not sure if it will just stop or what but so far it works very well.


bveditz said:
I spent the last few hours working on this
Anyway, thought I'd share:

.........................
Free Content For All | Dating Content | Legal Content

Posted Mar 7, 2008, 6:44 pm
This member is currently offline klaja26
Puddle


Member Level

Group: Members
Posts: 261
Joined: Aug 6, 2007

Go to the bottom of the page Go to the top of the page

I tried it but it didnt remove any duplicate articles. I might play around with it later.
.........................
Dollartwenty.com - My ArticleMS test website.
Relaxitsagame.com - Coming Soon - Age of Conan Database - Highly Modified AMS
SizeOfPaper.com - Coming Soon - Domain for sale

Posted Mar 7, 2008, 11:15 pm
This member is currently offline bveditz
Member


Member Level

Group: Members
Posts: 81
Joined: Oct 13, 2007

Go to the bottom of the page Go to the top of the page

Just double checked and it does work.  Of course, it depends on the quantity of articles you have as to whether you have dups or not.  One thing you need to do is run the 'recount user article count' function afterwards.  I had 72,987 articles, ran the sql code, and still had 72,987 articles.  After I did a 'recount' after doing the sql, I was down to 72,923.  The first time it got rid of about 700 articles, this time it was about 65. 

Now if only there was a way to get rid of useless 'advert' and spam articles.  :)
.........................
Brian, Nikira Internet Ventures
Populate.NET Article Directory with Twitter & Media Support - TheHostingCompany.NET Blog Hosting - Arid.NET Technology Innovation Blog

Posted Mar 17, 2008, 9:31 am
This member is currently offline sreeprakash
Pool


Avatar

Member Level

Group: Moderators
Posts: 1,332
Joined: Aug 31, 2006

Go to the bottom of the page Go to the top of the page

For removing duplicate titles, I am using a simple technique... Say a user submitted an article twice, or 2 users submitted a same article; then a "1" will be automatically added in the end of the "url of the article".

For example, 2 user or same user submitted an article twice.. The first submitted article url will be "how-to-make-a-happy-living"... Then the second url will be ""how-to-make-a-happy-living1"....... A "1" will be automatically added here...

I usually checks the url section, and if a "1" found I will delete that article, because it is already in the site...

For removing duplicate content using paragraph match, we need a script like bveditz posted here...

Posted Mar 17, 2008, 10:02 am
This member is currently offline mark1
Puddle


Avatar

Member Level

Group: Members
Posts: 257
Joined: Feb 2, 2008

Go to the bottom of the page Go to the top of the page

I have zero knowledge on this stuff so I just crossed my fingers and "went for what made sense" and it worked, thanks again for sharing, this is useful to have.

In case there's more people out there who don't have  a clue how to use this and they won't hold me responsible for anything then here's a step by step.

first, refresh your article count on your site, see how many articles you have, then


1) go to your C Panel

2) click on phpMyAdmin

3) on the left, click on the database link for articlems (if it is a stardard installation I think it has 16 tables so there's a (16) next to the name)

4) now look for "SQL" at the top of the page, click on it

5) you should be now looking at a white box where text can be entered. At the top it says
"Run SQL query/queries on database etc etc"

6) grab the code above, paste it into notepad to make sure no extra gibberish is added then copy it and paste it into the "Run SQL query/queries on database etc etc" box

7) now hit the "GO" button (it's on the bottom right of page).

now refresh your article count on your site, see how many articles you have. If there are less than before then  the thing worked.

Go back to phpMyAdmin, on the left there are some new tables, they are back ups the script did.

I just thought I went so far, might as well do more things I never did so I deleted the two back up tables.

If you feel as "brave", then click on the first one (on the left link) then at the top there's a link to "drop".. this I imagined meant delete it so I did and when asked again I confirmed.

It worked for me, lol.

I am not responsible for your possible disaster, if in doubt, never listen to a clueless person.
.........................
Free Content For All | Dating Content | Legal Content

Posted Mar 17, 2008, 1:06 pm
This member is currently offline genuwine4532
Newbie


Member Level

Group: Members
Posts: 1
Joined: Mar 17, 2008

Go to the bottom of the page Go to the top of the page

I am new to this, so this may sound stupid, but what aspect of duplicate content are you referring to?

If I submit an article here it will get distributed to many sites, correct? but that is NOT duplicate content?

:stare:

Posted Mar 17, 2008, 11:50 pm
This member is currently offline Anne Kirrin
Pool


Avatar

Member Level

Group: Moderators
Posts: 1,356
Joined: Apr 11, 2006

Go to the bottom of the page Go to the top of the page

There are two kind of duplicate content that are being referred to within this forum. One is when an article or other content is on several sites at the same time. The other, as in this case, is when your article site gets duplicate articles being submitted. These are either PLR articles being used by different authors or the same author trying to spam or stuff his/her articles as many times as they can get away with, to get more backlinks.
.........................
Anne Kirrin is in Nepal for the month of November

ArticleMSSkins.net <<:::: New
Directory of Niche Article Sites <::::: Add your article site for free!!

Posted Mar 18, 2008, 12:28 am
This member is currently offline bveditz
Member


Member Level

Group: Members
Posts: 81
Joined: Oct 13, 2007

Go to the bottom of the page Go to the top of the page

Yes,  In my case, Populate.net get its feeds from Article Trader (I think it still does anyway), Article Marketer and iSnare, so duplicate content is bound to come through.  The reason I have the script do just the first few characters is because there is a new method out there that people use which 'rewrite' articles so they don't get hit with the search engine "duplicate content penalty" that is referered to in other forums.  I actually would prefer the 'duplicate content penalty' then have barely readable articles because some AI script decided to reword an article haphazardly to the point where the article either loses its original context or worse yet, is unreadable and looks like it was translated.  :)

I think another variation of the script above could be to remove articles where the title matches and is in the same category, but not based on the article text.  The intent would be that people don't need multiple articles on the same exact subject.  Or.... maybe they do.  Optional per site.

On a side note, has anyone looked at their articles lately?  Sometimes I read them and quite a few of them are either spam, PR stuff or other useless variations.  I wonder about the idea of a central repository feature where we can mark an article as 'Poor Content'.  This would then submit the title and CRC (or whatever it's called these days) of the article text to a central site.  Each individual site can set it where it periodically compares it's articles (or checks them when they're posted) to see if an article has been marked as poor content.  If more then (x) amount of sites considered it poor content, you can have it remove the article from your site.

That idea might be too complex to implement, however we should discuss some similar module which could be done on a site-by-site basis.  I know we can 'rank' articles, but perhaps there could just be a 'Spam' flag as well.  Since, an article with 20 low rankings does not necesarily make it worse then an article never ranked, however a 20 'Spam' flagged article does.
.........................
Brian, Nikira Internet Ventures
Populate.NET Article Directory with Twitter & Media Support - TheHostingCompany.NET Blog Hosting - Arid.NET Technology Innovation Blog

Posted Mar 18, 2008, 1:24 am
This member is currently offline mark1
Puddle


Avatar

Member Level

Group: Members
Posts: 257
Joined: Feb 2, 2008

Go to the bottom of the page Go to the top of the page

Yep I check each one, sometimes more than once. Useless rubbish comes from all over and so are good articles. It's a minefield.
I like the central repository feature.  I read somewhere that during the golden rush the people who made the most money were the ones selling shovels to the miners.

If you went to rent a coder and got something going this could even be a service you could sell to all of us.
Perhaps even 10.000 clients.
Just a thought.


bveditz said:
.

On a side note, has anyone looked at their articles lately?  Sometimes I read them and quite a few of them are either spam, PR stuff or other useless variations.  I wonder about the idea of a central repository feature
.........................
Free Content For All | Dating Content | Legal Content

Posted Mar 18, 2008, 12:31 pm
This member is currently offline bveditz
Member


Member Level

Group: Members
Posts: 81
Joined: Oct 13, 2007

Go to the bottom of the page Go to the top of the page

So, I was listening to a marketing podcast which was discussing article submissions.  In it, they suggested to people to pick up PLR articles, change the title and the first paragraph then submit their articles to all the article directories to get their links out there.  This inspired me to make an additional supplemental SQL script for purging this type of behavior.  What this script does is basically searches articles and compares 200 characters of the articles (goes back 500 characters from the end of the article then reads 200 characters from there).  Essentially, checking the middle of many articles.  If those 200 characters match, it eliminates all but the first one it finds.  Similar to the previous script, you're on your own for using this.  Mark did a nice write-up on how to use SQL scripts, so you can follow that for this script as well.  This script eliminated about 2% of my 76000 articles (now down to about 75,000).  I -think- this could be used instead of the inital SQL script since it is my belief it would eliminate the same articles as the above SQL plus the more ambitious PLR spammers.

Code:
            CREATE TABLE temp_ams_articles LIKE ams_articles;
            INSERT temp_ams_articles 
            SELECT * FROM ams_articles 
            WHERE 1 
            GROUP BY SUBSTRING(article_text FROM -500 FOR 200);
            
            CREATE TABLE temp_ams_articleratings LIKE ams_articleratings;
            INSERT temp_ams_articleratings SELECT * FROM ams_articleratings;

            DELETE FROM temp_ams_articleratings
            USING temp_ams_articleratings, temp_ams_articles
            WHERE temp_ams_articles.article_id = temp_ams_articleratings.articlerating_articleid;

            DROP TABLE if exists backup_ams_articleratings;
            DROP TABLE if exists backup_ams_articles;

            RENAME TABLE ams_articles TO backup_ams_articles, temp_ams_articles TO ams_articles;
            RENAME TABLE ams_articleratings TO backup_ams_articleratings, temp_ams_articleratings TO ams_articleratings;   
.........................
Brian, Nikira Internet Ventures
Populate.NET Article Directory with Twitter & Media Support - TheHostingCompany.NET Blog Hosting - Arid.NET Technology Innovation Blog

Posted Mar 27, 2008, 8:35 am
This member is currently offline mushin
Member


Member Level

Group: Members
Posts: 39
Joined: Nov 14, 2007

Go to the bottom of the page Go to the top of the page

Hi bveditz

your code deleted my whole database. Any idea why?

Best
.........................
Cut and Paste Article Directory

Posted Aug 2, 2008, 8:47 am
This member is currently offline sreeprakash
Pool


Avatar

Member Level

Group: Moderators
Posts: 1,332
Joined: Aug 31, 2006

Go to the bottom of the page Go to the top of the page

You dont have any backups? Or contact your host if they have and can replace your site with a backup...

Posted Aug 2, 2008, 9:21 am

Nov 21, 2009, 2:52 am

  

0.0493s