{"id":28801,"date":"2024-08-30T05:00:00","date_gmt":"2024-08-30T03:00:00","guid":{"rendered":"https:\/\/sii.pl\/blog\/przetwarzanie-danych-w-czasie-rzeczywistym-z-uzyciem-google-cloud-storage-i-pubsub\/"},"modified":"2024-08-23T15:58:20","modified_gmt":"2024-08-23T13:58:20","slug":"real-time-data-processing-using-google-cloud-storage-and-pubsub","status":"publish","type":"post","link":"https:\/\/sii.pl\/blog\/en\/real-time-data-processing-using-google-cloud-storage-and-pubsub\/","title":{"rendered":"Real-time data processing using Google Cloud Storage and PubSub"},"content":{"rendered":"\n<p>In today&#8217;s world, where vast amounts of data are generated in real-time, processing this data on the fly becomes crucial for many companies. Google Cloud Storage and Google Pub\/Sub are two powerful tools that can work together to provide efficient real-time data processing.<\/p>\n\n\n\n<p>This article will discuss how to integrate these services to create a scalable and reliable live data processing solution.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Introduction to Google Cloud Storage and Pub\/Sub<\/strong><\/h2>\n\n\n\n<p><strong>Google Cloud Storage<\/strong> is a scalable object storage service that allows you to store and access data anytime, anywhere. It is an ideal place to store large amounts of data, such as multimedia files, documents, and application data.<\/p>\n\n\n\n<p><strong>Google Cloud Pub\/Sub<\/strong> is a global real-time messaging service that allows you to create, send, and receive messages between applications. With Pub\/Sub, you can quickly scale data processing and create flexible communication architectures.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><strong>Integration of Google Cloud Storage and Pub\/Sub<\/strong><\/strong><\/h2>\n\n\n\n<p>To process data in real-time using Google Cloud Storage and Pub\/Sub, you need to set up a workflow that allows data to be transferred from one system to another.<\/p>\n\n\n\n<p><strong>Step 1:<\/strong> Configuring a bucket in Google Cloud Storage<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"1\">\n<li>Log in to the Google Cloud Console.<\/li>\n\n\n\n<li>Create a new bucket in the &#8220;Storage&#8221; section.<\/li>\n\n\n\n<li>Set up access permissions to ensure the appropriate level of security.<\/li>\n<\/ol>\n\n\n\n<p><strong>Step 2:<\/strong> Configuring a topic in Google Pub\/Sub<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"1\">\n<li>Go to the &#8220;Pub\/Sub&#8221; section in the Google Cloud Console.<\/li>\n\n\n\n<li>Create a new topic that will be used to send messages about new data in the bucket.<\/li>\n<\/ol>\n\n\n\n<p><strong>Step 3:<\/strong> Configure notifications for the bucket<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"1\">\n<li>Select the created bucket in the &#8220;Storage&#8221; section.<\/li>\n\n\n\n<li>Set up notifications for this bucket to send messages to the Pub\/Sub topic when new files are added.<\/li>\n<\/ol>\n\n\n\n<p><strong>Example configuration using CLI:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ngsutil notification create -t &amp;lt;PUBSUB_TOPIC&gt; -f json gs:\/\/&amp;lt;BUCKET_NAME&gt;\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\"><strong>Real-Time Data processing<\/strong><\/h2>\n\n\n\n<p>This section will discuss receiving, processing, and responding to data in real-time using Google Cloud Storage and Pub\/Sub.<\/p>\n\n\n\n<p><strong>Step 4:<\/strong> Subscription and message processing<\/p>\n\n\n\n<p>To process data in real-time, you need to create a subscription to the Pub\/Sub topic and configure a process to receive and process the messages.<\/p>\n\n\n\n<p><strong>Creating a subscription:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"1\">\n<li>Select the created topic from the &#8220;Pub\/Sub&#8221; section.\n<ul class=\"wp-block-list\">\n<li>In the Google Cloud Console, on the left-hand navigation bar, click &#8220;Pub\/Sub.&#8221;<\/li>\n\n\n\n<li>Click on the topic name you created earlier.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Create a new subscription for this topic.\n<ul class=\"wp-block-list\">\n<li>Click &#8220;Create subscription.&#8221;<\/li>\n\n\n\n<li>Enter the subscription name.<\/li>\n\n\n\n<li>Choose the message delivery type (e.g., &#8220;Pull&#8221; or &#8220;Push&#8221;).<\/li>\n\n\n\n<li>If you choose &#8220;Push,&#8221; provide the URL endpoint to which messages should be sent.<\/li>\n\n\n\n<li>Click &#8220;Create&#8221;.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p><strong>Example configuration using CLI:<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ngcloud pubsub subscriptions create &amp;lt;SUBSCRIPTION_NAME&gt; --topic=&amp;lt;PUBSUB_TOPIC&gt;\n<\/pre><\/div>\n\n\n<p><strong>Step 5:<\/strong> Data processing<\/p>\n\n\n\n<p>After creating the subscription, you need to configure an application to receive messages from it and process data in real-time. You can use various tools and frameworks, such as Cloud Functions and Dataflow, or even your own solutions based on languages like Python or Java.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Processing messages in Python<\/strong><\/h3>\n\n\n\n<p>Below is an example of Python code that receives and processes messages:<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"1\">\n<li>Install the required libraries:\n<ul class=\"wp-block-list\">\n<li>Install the Google Cloud Pub\/Sub library:<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\npip install google-cloud-pubsub\n<\/pre><\/div>\n\n\n<ol class=\"wp-block-list\" start=\"2\">\n<li>Example code for receiving and processing messages:<\/li>\n<\/ol>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nfrom google.cloud import pubsub_v1\n\n# Initialize Pub\/Sub client\nsubscriber = pubsub_v1.SubscriberClient()\nsubscription_path = &#039;projects\/{project_id}\/subscriptions\/{subscription}&#039;\n\ndef callback(message):\n    print(f&#039;Received message: {message. Data}&#039;)\n    # Process data from the message\n    # For example, read data from Google Cloud Storage and perform analysis\n    message.ack()\n\n# Subscribe to messages and set the callback\nstreaming_pull_future = subscriber.subscribe(subscription_path, callback=callback)\nprint(f&#039;Listening for messages on {subscription_path}&#039;)\n\ntry:\n    # Process messages until interrupted\n    streaming_pull_future.result()\nexcept KeyboardInterrupt:\n    streaming_pull_future.cancel()\n<\/pre><\/div>\n\n\n<p>In the above example, the callback function is called every time a new message is received from the subscription. You can place data processing logic in this function, such as reading data from Google Cloud Storage, processing it, and saving the results.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Processing Data in Google Cloud Functions<\/strong><\/h3>\n\n\n\n<p>Google Cloud Functions is a serverless service that allows you to run code in response to events. You can use Cloud Functions to process messages from Pub\/Sub.<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"1\">\n<li>Creating a function in Cloud Functions:\n<ul class=\"wp-block-list\">\n<li>Go to the &#8220;Cloud functions&#8221; section in the Google Cloud Console.<\/li>\n\n\n\n<li>Click &#8220;Create function&#8221;.<\/li>\n\n\n\n<li>Choose the trigger type as &#8220;Cloud Pub\/Sub.&#8221;<\/li>\n\n\n\n<li>Select the Pub\/Sub topic that will trigger the function.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Example code for the function:<\/li>\n<\/ol>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nimport json\nfrom google.cloud import storage\n\ndef process_pubsub(event, context):\n    # Receive the message from Pub\/Sub\n    message_data = base64.b64decode(event&#x5B;&#039;data&#039;]).decode(&#039;utf-8&#039;)\n    message_json = json.loads(message_data)\n\n    # Process the data (e.g., read a file from Cloud Storage)\n    client = storage. Client()\n    bucket_name = message_json&#x5B;&#039;bucket&#039;]\n    file_name = message_json&#x5B;&#039;name&#039;]\n    bucket = client.get_bucket(bucket_name)\n    blob = bucket.blob(file_name)\n    data = blob.download_as_string()\n\n    # Perform further data analysis\n    print(f&#039;Przetwarzanie pliku: {file_name} z bucketa: {bucket_name}&#039;)\n    print(f&#039;Dane: {data}&#039;)\n<\/pre><\/div>\n\n\n<p>In this example, the process_pubsub function is triggered when a new message is received from the Pub\/Sub topic. The function reads data from the message, retrieves the corresponding file from Google Cloud Storage, and processes it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Monitoring and optimization<\/strong><\/h2>\n\n\n\n<p>To ensure the reliability and performance of your solution, you should monitor the system&#8217;s operation and optimize it as needed.<\/p>\n\n\n\n<p><strong>Monitoring:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"1\">\n<li>Google Cloud Monitoring:\n<ul class=\"wp-block-list\">\n<li>Use Google Cloud Monitoring to track metrics such as message processing latency, resource usage, and errors.<\/li>\n\n\n\n<li>Set up alerts to be notified of potential issues.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Logging:\n<ul class=\"wp-block-list\">\n<li>Set up logging for your data processing applications to gain insight into processing activities and quickly diagnose problems.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p><strong>Optimization:<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"1\">\n<li>Scaling:\n<ul class=\"wp-block-list\">\n<li>Scale your data processing instances according to the load. Google Cloud Functions and Cloud Run automatically scale in response to traffic.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Code optimization:\n<ul class=\"wp-block-list\">\n<li>Optimize processing code to minimize latency and increase performance. Ensure efficient management of resources such as memory and CPU.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>Real-time data processing using Google Cloud Storage and Pub\/Sub allows for creating scalable and flexible solutions that meet modern applications&#8217; demands. By adequately configuring and monitoring your systems, you can ensure they operate efficiently and reliably, delivering valuable real-time data.<\/p>\n\n\n\n<p>***<\/p>\n\n\n\n<p>If you are curious about the tools used in the IT industry, be sure to take a look at <a href=\"https:\/\/sii.pl\/blog\/en\/all\/tools\/\" target=\"_blank\" aria-label=\"other articles by our specialists (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">other articles by our specialists<\/a>.<\/p>\n\n\n<div class=\"kk-star-ratings kksr-auto kksr-align-left kksr-valign-bottom\"\n    data-payload='{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;28801&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;bottom&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;4&quot;,&quot;legendonly&quot;:&quot;&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;5&quot;,&quot;starsonly&quot;:&quot;&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;11&quot;,&quot;greet&quot;:&quot;&quot;,&quot;legend&quot;:&quot;5\\\/5 ( votes: 4)&quot;,&quot;size&quot;:&quot;18&quot;,&quot;title&quot;:&quot;Real-time data processing using Google Cloud Storage and PubSub&quot;,&quot;width&quot;:&quot;139.5&quot;,&quot;_legend&quot;:&quot;{score}\\\/{best} ( {votes}: {count})&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>\n            \n<div class=\"kksr-stars\">\n    \n<div class=\"kksr-stars-inactive\">\n            <div class=\"kksr-star\" data-star=\"1\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"2\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"3\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"4\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"5\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n    <\/div>\n    \n<div class=\"kksr-stars-active\" style=\"width: 139.5px;\">\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-right: 11px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 18px; height: 18px;\"><\/div>\n        <\/div>\n    <\/div>\n<\/div>\n                \n\n<div class=\"kksr-legend\" style=\"font-size: 14.4px;\">\n            5\/5 ( votes: 4)    <\/div>\n    <\/div>\n","protected":false},"excerpt":{"rendered":"<p>In today&#8217;s world, where vast amounts of data are generated in real-time, processing this data on the fly becomes crucial &hellip; <a class=\"continued-btn\" href=\"https:\/\/sii.pl\/blog\/en\/real-time-data-processing-using-google-cloud-storage-and-pubsub\/\">Continued<\/a><\/p>\n","protected":false},"author":656,"featured_media":28462,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","inline_featured_image":false,"footnotes":""},"categories":[1320],"tags":[2646,2645,1590,1526],"class_list":["post-28801","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hard-development","tag-google-cloud-storage-en","tag-google-cloud-pub-sub-en","tag-tools","tag-guidebook"],"acf":[],"aioseo_notices":[],"republish_history":[],"featured_media_url":"https:\/\/sii.pl\/blog\/wp-content\/uploads\/2024\/07\/Przetwarzanie-danych-w-czasie-rzeczywistym-z-uzyciem-Google-Cloud-Storage-i-PubSub.jpg","category_names":["Hard development"],"_links":{"self":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/28801"}],"collection":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/users\/656"}],"replies":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/comments?post=28801"}],"version-history":[{"count":1,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/28801\/revisions"}],"predecessor-version":[{"id":28803,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/posts\/28801\/revisions\/28803"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/media\/28462"}],"wp:attachment":[{"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/media?parent=28801"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/categories?post=28801"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sii.pl\/blog\/en\/wp-json\/wp\/v2\/tags?post=28801"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}