Your Search Results

      • Data warehousing
        January 2011

        Building the Unstructured Data Warehouse

        Architecture, Analysis & Design

        by Bill Inmon, Krish Krishnan

        Learn essential techniques from data warehouse legend Bill Inmon on how to build the reporting environment your business needs now!Answers for many valuable business questions hide in text. How well can your existing reporting environment extract the necessary text from email, spreadsheets, and documents, and put it in a useful format for analytics and reporting? Transforming the traditional data warehouse into an efficient unstructured data warehouse requires additional skills from the analyst, architect, designer, and developer. This book will prepare you to successfully implement an unstructured data warehouse and, through clear explanations, examples, and case studies, you will learn new techniques and tips to successfully obtain and analyze text.Master these ten objectives:Build an unstructured data warehouse using the 11-step approachIntegrate text and describe it in terms of homogeneity, relevance, medium, volume, and structureOvercome challenges including blather, the Tower of Babel, and lack of natural relationshipsAvoid the Data Junkyard and combat the Spider's WebReuse techniques perfected in the traditional data warehouse and Data Warehouse 2.0,including iterative developmentApply essential techniques for textual Extract, Transform, and Load (ETL) such as phrase recognition, stop word filtering, and synonym replacementDesign the Document Inventory system and link unstructured text to structured dataLeverage indexes for efficient text analysis and taxonomies for useful external categorizationManage large volumes of data using advanced techniques such as backward pointersEvaluate technology choices suitable for unstructured data processing, such as data warehouse appliancesThe following outline briefly describes each chapter's content:Chapter 1 defines unstructured data and explains why text is the main focus of this book.Chapter 2 addresses the challenges one faces when managing unstructured data.Chapter 3 discusses the DW 2.0 architecture, which leads into the role of the unstructured data warehouse. The unstructured data warehouse is defined and benefits are given. There are several features of the conventional data warehouse that can be leveraged for the unstructured data warehouse, including ETL processing, textual integration, and iterative development. Chapter 4 focuses on the heart of the unstructured data warehouse: Textual Extract, Transform, and Load (ETL).Chapter 5 describes the 11 steps required to develop the unstructured data warehouse.Chapter 6 describes how to inventory documents for maximum analysis value, as well as link the unstructured text to structured data for even greater value.Chapter 7 goes through each of the different types of indexes necessary to make text analysis efficient. Indexes range from simple indexes, which are fast to create and are good if the analyst really knows what needs to be analyzed before the indexing process begins, to complex combined indexes, which can be made up of any and all of the other kinds of indexes.Chapter 8 explains taxonomies and how they can be used within the unstructured data warehouse.Chapter 9 explains ways of coping with large amounts of unstructured data. Techniques such as keeping the unstructured data at its source and using backward pointers are discussed. The chapter explains why iterative development is so important.Chapter 10 focuses on challenges and some technology choices that are suitable for unstructured data processing. In addition, the data warehouse appliance is discussed.Chapters 11, 12, and 13 put all of the previously discussed techniques and approaches in context through three case studies. About Bill Bill Inmon, the father of data warehousing, has written 52 books translated into 9 languages. Bill has written over 1000 articles and conducted seminars and spoken at conferences on every continent except Antarctica. Bill holds three software patents and his latest company is Forest Rim Technology, a company dedicated to the access and integration of unstructured data into the structured world. About Krish Krish Krishnan is a recognized thought leader in Data Warehouse Performance and Architecture. Krish writes and teaches Social Intelligence across the world and is a frequent speaker at industry conferences. He provides consulting advice to CxO's on DW Strategy and is an Independent Analyst covering the Data Warehouse and Business Intelligence Industry.

      • Databases
        September 2019

        Blockchain for Beginners

        by Yathish. R

        Ever gone through hundred and ten resources for blockchain and still not able to figure out where to start off. Well this book would lay the foundation for most of the concepts that you would require to at least get started somewhere and scratch the surface of this hyped technology. From the different underlying technicalities to the diversity of platforms, from the variety of scenarios where Blockchain fits to understanding when it would be an overkill, from learning the two most important platforms to getting you started for creating your own applications on top of them, from various simple humorous references to intriguing exercises, this book aims to not only make you feel comfortable with the technology but also confident enough to ponder more about it.

      • Databases
        October 2019

        Data Science for Professionals

        by Prof. N.C. Das

        The book is meant for a wide-spectrum of readership-empirical scientists, consultants, technocrats, advertisers, researchers and students learning Data Science for conducting small replicated or non-replicated experiments / demonstrations. It deals with two-factor three-level designs to be conducted only in six experimental units. For this an exhaustive set of 9c6=9c3 = 84 designs have been generated. Least Square algorithm has been applied for estimating separate effect of each component and their interaction along with fitting of response surface; meeting both the objectives for any factorial experiment. Of them seventy-six could be found as valid or estimable designs and the rest as non-estimable ones. Value of the determinant obtained for Least Square Matrix for each such design has been the indicator of its D-optimality status: the same could also be obtained from geometric structure of the designs. Such exhaustive search ascertains (8+4) D-optimal designs; of them only (1+3) are hitherto known. Also, rotatable feature of any such design does not affect its D-optimality status. A simple method SAME has been devised to meet the said twin objectives, thus escaping matrix formation and related operations. Each such design could be combined to form pair of replicates or blocks of an experiment so as to fetch statistical significance test based on ANOVA and fitting the response surface.

      • Data mining

        BIG DATA ANALYTICS

        by KULKARNI, PARAG and JOSHI, SARANG and BROWN, META S.

        The book is an unstructured data mining quest, which takes the reader through different features of unstructured data mining while unfolding the practical facets of Big Data. It emphasizes more on machine learning and mining methods required for processing and decision-making. The text begins with the introduction to the subject and explores the concept of data mining methods and models along with the applications. It then goes into detail on other aspects of Big Data analytics, such as clustering, incremental learning, multi-label association and knowledge representation. The readers are also made familiar with business analytics to create value. The book finally ends with a discussion on the areas where research can be explored.   The book is designed for the senior level undergraduate, and postgraduate students of computer science and engineering.   KEY FEATURESContains numerous examples and case studies. Discusses Apache’s Hadoop—a software framework that enables distributed processing of large datasets across the clusters of computing machines. Incorporates review questions, MCQs, laboratory assignments and critical thinking questions at the end of the chapters, wherever required.   Google Preview: https://bit.ly/3vsWYc5

      • Data warehousing
        February 2012

        Data and Reality

        A Timeless Perspective On Perceiving & Managing Information in Our Imprecise World -- 3rd Edition

        by William Kent

        Let's step back to the year 1978. Sony introduces hip portable music with the Walkman, Illinois Bell Company releases the first mobile phone, Space Invaders kicks off the video game craze, and William Kent writes Data and Reality. We have made amazing progress in the last four decades in terms of portable music, mobile communication, and entertainment, making devices such as the original Sony Walkman and suitcase-sized mobile phones museum pieces today. Yet remarkably, the book Data and Reality is just as relevant to the field of data management today as it was in 1978. Data and Reality gracefully weaves the disciplines of psychology and philosophy with data management to create timeless takeaways on how we perceive and manage information. Although databases and related technology have come a long way since 1978, the process of eliciting business requirements and how we think about information remains constant. This book will provide valuable insights whether you are a 1970s data-processing expert or a modern-day business analyst, data modeler, database administrator, or data architect.This third edition of Data and Reality differs substantially from the first and second editions. Data modeling thought leader Steve Hoberman has updated many of the original examples and references and added his commentary throughout the book, including key points at the end of each chapter. The important takeaways in this book are rich with insight yet presented in a conversational writing style. Here are just a few of the issues this book tackles:Has "business intelligence" replaced "artificial intelligence"?Why is a map's geographic landscape analogous to a data model's information landscape?Where do forward and reverse engineering fit in our thought process?Why are we all becoming "data archeologists"?What causes the communication chasm between the business professional and the information technology professional, and how can the logical data model bridge this gap?Why do we invest in hardware and software to solve business problems before determining what the business problems are in the first place?What is the difference between oneness, sameness, and categories?Why does context play a role in every design decision?Why do the more important attributes become entities or relationships?Why do symbols speak louder than words?What's the difference between a data modeler, a philosopher, and an artist?Why is the 1975 dream of mapping all attributes still a dream today?What influence does language have on our perception of reality? Can we distinguish between naming and describing?From Graeme Simsion's foreword:While such fundamental issues remain unrecognized and unanswered, Data and Reality, with its lucid and compelling elucidation of the questions, needs to remain in print. I read the book as a database administrator in 1980, as a researcher in 2002, and just recently as the manuscript for the present edition. On each occasion I found something more, and on each occasion I considered it the most important book I had read on data modeling. It has been on my recommended reading list forever. The first chapter in particular should be mandatory reading for anyone involved in data modeling. In publishing this new edition, Steve Hoberman has not only ensured that one of the key books in the data modeling canon remains in print, but has added his own comments and up-to-date examples, which are likely to be helpful to those who have come to data modeling more recently. Don't do any more data modeling work until you've read it. About William: William Kent (1936-2005) was a renowned researcher in the field of data modeling. Author of Data and Reality, he wrote scores of papers and spoke at conferences worldwide, posing questions about database design and the management of information that remain unanswered today. Though he earned a bachelor's degree in chemical engineering and a master's in mathematics, he had no formal training in computer science. Kent worked at IBM and later at Hewlett-Packard Laboratories, where he helped develop prototype database systems. He also served on or chaired several international standards committees. Kent lived in New York City and later Menlo Park, Calif., before retiring to Moab, Utah, to pursue his passions of outdoor photography and protecting the environment. About Steve: Steve is currently a data modeling consultant and instructor. He taught his first data modeling class in 1992 and has educated more than 10,000 people about data modeling and business intelligence techniques since then. Steve balances the formality and precision of data modeling with the realities of building software systems with severe time, budget, and people constraints. In his consulting and teaching, he focuses on templates, tools, and guidelines to reap the benefits of data modeling with minimal investment. Steve is the author of five books on data modeling, the founder of the Design Challenges group, and inventor of the Data Model Scorecard.

      Subscribe to our

      newsletter