Scraping data from Magento without privileged access or trust Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?Securing add to cart http requestProduct updates via XML-RPC API not taking effectPublic programmatic access to catalogs of foreign shopsHow to grant Oauth access to API without cut-and-pasting the consumer key?3rd party access to Magento CE downloadable productsRestrict a controller HTTP Request by Cutomer email and password?Access module files from admin panelNot able to access the magento api installed from AMPPSExtension disappeared from backend without errorFirst steps of creating API integration with Magento2.3

How to answer "Have you ever been terminated?"

Can an alien society believe that their star system is the universe?

Is safe to use va_start macro with this as parameter?

What is the meaning of the simile “quick as silk”?

Why aren't air breathing engines used as small first stages

How to compare two different files line by line in unix?

How does the math work when buying airline miles?

Why didn't Eitri join the fight?

An adverb for when you're not exaggerating

Is CEO the profession with the most psychopaths?

How can I use the Python library networkx from Mathematica?

Is it common practice to audition new musicians one-on-one before rehearsing with the entire band?

Using audio cues to encourage good posture

What is implied by the word 'Desika'

What is the longest distance a player character can jump in one leap?

Wu formula for manifolds with boundary

Withdrew £2800, but only £2000 shows as withdrawn on online banking; what are my obligations?

Is grep documentation wrong?

How come Sam didn't become Lord of Horn Hill?

Where are Serre’s lectures at Collège de France to be found?

Do I really need to have a message in a novel to appeal to readers?

Is "Reachable Object" really an NP-complete problem?

How do pianists reach extremely loud dynamics?

Fantasy story; one type of magic grows in power with use, but the more powerful they are, they more they are drawn to travel to their source



Scraping data from Magento without privileged access or trust



Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
Announcing the arrival of Valued Associate #679: Cesar Manara
Unicorn Meta Zoo #1: Why another podcast?Securing add to cart http requestProduct updates via XML-RPC API not taking effectPublic programmatic access to catalogs of foreign shopsHow to grant Oauth access to API without cut-and-pasting the consumer key?3rd party access to Magento CE downloadable productsRestrict a controller HTTP Request by Cutomer email and password?Access module files from admin panelNot able to access the magento api installed from AMPPSExtension disappeared from backend without errorFirst steps of creating API integration with Magento2.3



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4















There appear to be a number of ways of scraping product data from a Magento site, but all seem to have their upsides and downsides.



We deal with sites who have little to no technical resource, but who have given us permission to scrape their product catalog. There appear to be 3 different ways of doing this, none of which really work:



  • Manual web scraping - developer intensive, requires updating when the theme changes.

  • Magento Web API - requires setting up an API user, too technical for many users.

  • Magento Plugin - too technical for many users, exposes sensitive business data so many companies won't do this.

Are we missing something? Is there a better alternative, or are there ways of changing any of the above 3 to be better for scraping?



For example, is it possible to provide a link to a 'one-click-setup' like process for API access? Shopify do this in a nice way using OAuth and permission scopes, so we can give our partners a link that will give us read only access to just their product catalog, in a way that non-technical users can use.










share|improve this question














bumped to the homepage by Community 2 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.





















    4















    There appear to be a number of ways of scraping product data from a Magento site, but all seem to have their upsides and downsides.



    We deal with sites who have little to no technical resource, but who have given us permission to scrape their product catalog. There appear to be 3 different ways of doing this, none of which really work:



    • Manual web scraping - developer intensive, requires updating when the theme changes.

    • Magento Web API - requires setting up an API user, too technical for many users.

    • Magento Plugin - too technical for many users, exposes sensitive business data so many companies won't do this.

    Are we missing something? Is there a better alternative, or are there ways of changing any of the above 3 to be better for scraping?



    For example, is it possible to provide a link to a 'one-click-setup' like process for API access? Shopify do this in a nice way using OAuth and permission scopes, so we can give our partners a link that will give us read only access to just their product catalog, in a way that non-technical users can use.










    share|improve this question














    bumped to the homepage by Community 2 mins ago


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

















      4












      4








      4


      1






      There appear to be a number of ways of scraping product data from a Magento site, but all seem to have their upsides and downsides.



      We deal with sites who have little to no technical resource, but who have given us permission to scrape their product catalog. There appear to be 3 different ways of doing this, none of which really work:



      • Manual web scraping - developer intensive, requires updating when the theme changes.

      • Magento Web API - requires setting up an API user, too technical for many users.

      • Magento Plugin - too technical for many users, exposes sensitive business data so many companies won't do this.

      Are we missing something? Is there a better alternative, or are there ways of changing any of the above 3 to be better for scraping?



      For example, is it possible to provide a link to a 'one-click-setup' like process for API access? Shopify do this in a nice way using OAuth and permission scopes, so we can give our partners a link that will give us read only access to just their product catalog, in a way that non-technical users can use.










      share|improve this question














      There appear to be a number of ways of scraping product data from a Magento site, but all seem to have their upsides and downsides.



      We deal with sites who have little to no technical resource, but who have given us permission to scrape their product catalog. There appear to be 3 different ways of doing this, none of which really work:



      • Manual web scraping - developer intensive, requires updating when the theme changes.

      • Magento Web API - requires setting up an API user, too technical for many users.

      • Magento Plugin - too technical for many users, exposes sensitive business data so many companies won't do this.

      Are we missing something? Is there a better alternative, or are there ways of changing any of the above 3 to be better for scraping?



      For example, is it possible to provide a link to a 'one-click-setup' like process for API access? Shopify do this in a nice way using OAuth and permission scopes, so we can give our partners a link that will give us read only access to just their product catalog, in a way that non-technical users can use.







      api extensions






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Aug 17 '15 at 17:20









      danpalmerdanpalmer

      1213




      1213





      bumped to the homepage by Community 2 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community 2 mins ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.






















          1 Answer
          1






          active

          oldest

          votes


















          0














          Not sure why magento plugin in and of itself would be too technical, especially if instructed to install via magento connect.



          Which could build an accessible XML feed for you so you could scrape/retrieve the feed via HTTP without worrying about a changing theme layer.



          I don't think this is the one click answer you're looking for, but an 'alternative' solution could be to have clients upload a custom script that you provide.



          That script could be run via cron, and would perform periodic dumps of specified DB tables (i.e. no tables which contain 'sensitive business data').



          Each dump could be retrieved via ssh/sftp if you have access to that, a public facing folder / email if not. Setting up a crontask via cpanel would be pretty easy for the average user.



          That would give you the most complete dataset, although not without its glaring downsides.



          As a sidenote, xpath parser for webscraping is an elegant tool, and could be implemented in a way to be mostly theme agnostic if it comes to that.






          share|improve this answer

























          • Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.

            – danpalmer
            Aug 18 '15 at 8:37












          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "479"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmagento.stackexchange.com%2fquestions%2f78946%2fscraping-data-from-magento-without-privileged-access-or-trust%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          Not sure why magento plugin in and of itself would be too technical, especially if instructed to install via magento connect.



          Which could build an accessible XML feed for you so you could scrape/retrieve the feed via HTTP without worrying about a changing theme layer.



          I don't think this is the one click answer you're looking for, but an 'alternative' solution could be to have clients upload a custom script that you provide.



          That script could be run via cron, and would perform periodic dumps of specified DB tables (i.e. no tables which contain 'sensitive business data').



          Each dump could be retrieved via ssh/sftp if you have access to that, a public facing folder / email if not. Setting up a crontask via cpanel would be pretty easy for the average user.



          That would give you the most complete dataset, although not without its glaring downsides.



          As a sidenote, xpath parser for webscraping is an elegant tool, and could be implemented in a way to be mostly theme agnostic if it comes to that.






          share|improve this answer

























          • Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.

            – danpalmer
            Aug 18 '15 at 8:37
















          0














          Not sure why magento plugin in and of itself would be too technical, especially if instructed to install via magento connect.



          Which could build an accessible XML feed for you so you could scrape/retrieve the feed via HTTP without worrying about a changing theme layer.



          I don't think this is the one click answer you're looking for, but an 'alternative' solution could be to have clients upload a custom script that you provide.



          That script could be run via cron, and would perform periodic dumps of specified DB tables (i.e. no tables which contain 'sensitive business data').



          Each dump could be retrieved via ssh/sftp if you have access to that, a public facing folder / email if not. Setting up a crontask via cpanel would be pretty easy for the average user.



          That would give you the most complete dataset, although not without its glaring downsides.



          As a sidenote, xpath parser for webscraping is an elegant tool, and could be implemented in a way to be mostly theme agnostic if it comes to that.






          share|improve this answer

























          • Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.

            – danpalmer
            Aug 18 '15 at 8:37














          0












          0








          0







          Not sure why magento plugin in and of itself would be too technical, especially if instructed to install via magento connect.



          Which could build an accessible XML feed for you so you could scrape/retrieve the feed via HTTP without worrying about a changing theme layer.



          I don't think this is the one click answer you're looking for, but an 'alternative' solution could be to have clients upload a custom script that you provide.



          That script could be run via cron, and would perform periodic dumps of specified DB tables (i.e. no tables which contain 'sensitive business data').



          Each dump could be retrieved via ssh/sftp if you have access to that, a public facing folder / email if not. Setting up a crontask via cpanel would be pretty easy for the average user.



          That would give you the most complete dataset, although not without its glaring downsides.



          As a sidenote, xpath parser for webscraping is an elegant tool, and could be implemented in a way to be mostly theme agnostic if it comes to that.






          share|improve this answer















          Not sure why magento plugin in and of itself would be too technical, especially if instructed to install via magento connect.



          Which could build an accessible XML feed for you so you could scrape/retrieve the feed via HTTP without worrying about a changing theme layer.



          I don't think this is the one click answer you're looking for, but an 'alternative' solution could be to have clients upload a custom script that you provide.



          That script could be run via cron, and would perform periodic dumps of specified DB tables (i.e. no tables which contain 'sensitive business data').



          Each dump could be retrieved via ssh/sftp if you have access to that, a public facing folder / email if not. Setting up a crontask via cpanel would be pretty easy for the average user.



          That would give you the most complete dataset, although not without its glaring downsides.



          As a sidenote, xpath parser for webscraping is an elegant tool, and could be implemented in a way to be mostly theme agnostic if it comes to that.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 21 '17 at 13:07









          Teja Bhagavan Kollepara

          2,98241949




          2,98241949










          answered Aug 17 '15 at 18:20









          bakubaku

          1229




          1229












          • Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.

            – danpalmer
            Aug 18 '15 at 8:37


















          • Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.

            – danpalmer
            Aug 18 '15 at 8:37

















          Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.

          – danpalmer
          Aug 18 '15 at 8:37






          Thanks for your reply! Unfortunately Magento Connect looks too complicated for some of our partners, they often use contractors to set up Magento, and aren't able to do things like this. Also it doesn't solve the permissions issue, that plugins can read anything they want. We already use XPath, sitemaps, and lots of other ways to scrape data from the pages, but Magento themes differ enough on the sites we already do this for that we can't share much if any scraping code between them.

          – danpalmer
          Aug 18 '15 at 8:37


















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Magento Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmagento.stackexchange.com%2fquestions%2f78946%2fscraping-data-from-magento-without-privileged-access-or-trust%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Can not update quote_id field of “quote_item” table magento 2Magento 2.1 - We can't remove the item. (Shopping Cart doesnt allow us to remove items before becomes empty)Add value for custom quote item attribute using REST apiREST API endpoint v1/carts/cartId/items always returns error messageCorrect way to save entries to databaseHow to remove all associated quote objects of a customer completelyMagento 2 - Save value from custom input field to quote_itemGet quote_item data using quote id and product id filter in Magento 2How to set additional data to quote_item table from controller in Magento 2?What is the purpose of additional_data column in quote_item table in magento2Set Custom Price to Quote item magento2 from controller

          Nissan Patrol Зміст Перше покоління — 4W60 (1951-1960) | Друге покоління — 60 series (1960-1980) | Третє покоління (1980–2002) | Четверте покоління — Y60 (1987–1998) | П'яте покоління — Y61 (1997–2013) | Шосте покоління — Y62 (2010- ) | Посилання | Зноски | Навігаційне менюОфіційний український сайтТест-драйв Nissan Patrol 2010 7-го поколінняNissan PatrolКак мы тестировали Nissan Patrol 2016рвиправивши або дописавши її

          Перекидне табло Зміст Переваги | Недоліки | Будова | Посилання | Навігаційне менюПерекидне таблоU.S. Patent 3 220 174U.S. Patent 3 501 761Split-flap-display