Friday, December 12, 2008

Privacy concerns-2

The danger occurs when the summarized data paints an untrue picture of things. This can lead to a company taking improper actions which could be detrimental to the prosperity of the company. The threat to an individual’s privacy comes into play when the data, once compiled, causes the data miner to be able to identify specific individuals, especially when originally the data was anonymous. Aggregating data from multiple sources allows profiles of individuals to be created . In order for the information derived from the data that is mined to be meaningful one must assume that the data which is in the repository is accurate and complete. In addition, one must assume that the analysis was done in a way that would produce a reliable result. A common saying is “garbage in garbage out” meaning if the data that is input into your repository is of poor quality, your analysis, or output, will also be of poor quality .

The steps that may be taken in order to protect your customers, from whom you are collecting data, and your company are to specify the purpose of the data collection and any data mining projects, how the data will be used, who will be able to mine the data and use it, the security surrounding access to the data, and in addition, to provide a way for individuals to update data which was collected from them. This also assists in ensuring the data is accurate. One may additionally modify the data so that it is anonymous so that individuals may not be readily identified.

blinds, roller shades, woven wood shades

Thursday, December 11, 2008

Privacy concerns-1

There are also privacy and human rights concerns associated with data mining, specifically regarding the source of the data analyzed. Data mining provides information that may be difficult to obtain otherwise. When the data collected involves individual people, there are many questions concerning privacy, legality, and ethics. In particular, data mining government or commercial data sets for national security or law enforcement purposes, such as in the Total Information Awareness Program, has raised privacy concerns.

The following facts have increased the urgency and difficulty regarding data mining and protecting the privacy of the individuals about whom the data was collected: the decreased cost of data mining tools and the prevalence of those tools, an increase in the amount of data being collected and stored, an increase in the use of data aggregation, and the use of data warehouses as the stores for the data from several sources.

“Data mining by itself is ethically neutral”. There are several ethical issues which are raised by the topic of data mining: “the suitability and validity of the methods used in any given data mining application, the degree to which confidentiality and privacy obligations are respected, and the overall aims of a given data mining application”.

One must take into consideration the reliability of the source of the data which is being mined, the reason that the data was collected originally, and any aggregation that has taken place A danger which is inherent to data mining projects is the possibility of erroneous information resulting from data aggregation. Data aggregation is when the data which has been mined, possibly from various sources, has been put together so that it can be analyzed.

Monday, December 8, 2008

Algorithms

There are various data mining algorithms which can be used to build the mining model. But choosing the right algorithm for the right business task is critical. Different algorithms can be used to do the same business tasks but each algorithm produces different results.
The various types of algorithms are as follows:
1. Classification algorithm predicts one or more discrete variables, based on the other attributes in the dataset. eg: Microsoft Decision Trees Algorithm.

2. Regression algorithm predicts one or more continuous variables, such as profit or loss, based on other attributes in the dataset. eg: Microsoft Time Series Algorithm.

3. Segmentation algorithm divides data into groups, or clusters, of items that have similar properties. eg: Microsoft Clustering Algorithm.

4. Association algorithm finds correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis. eg: Microsoft Association Algorithm.

5. Sequence analysis algorithm summarizes frequent sequences or episodes in data, such as a Web path flow. eg: Microsoft Sequence Clustering Algorithm.

A data mining application can adopt different algorithms for different functions, for example we can use segmentation algorithms for exploring data and regression algorithms for prediction functionalities.

roman shades, vertical blinds, window blinds