Distributed Transactions
• A distributed transaction is a transaction that updates data on two or more networked computer systems. Distributed transactions extend the benefits of transactions to applications that must update distributed data. Implementing robust distributed applications is difficult because these applications are subject to multiple failures, including failure of the client, the server, and the network connection between the client and server. In the absence of distributed transactions, the application program itself must detect and recover from these failures.
• For distributed transactions, each computer has a local transaction manager. When a transaction does work at multiple computers, the transaction managers interact with other transaction managers via either a superior or subordinate relationship. These relationships are relevant only for a particular transaction.
• Each transaction manager performs all the enlistment, prepare, commit, and abort calls for its enlisted resource managers (usually those that reside on that particular computer). Resource managers manage persistent or durable data and work in cooperation with the DTC to guarantee atomicity and isolation to an application.
Distributed update propagation
• Update propagation in a distributed database is problematic because of the fact that there may be more than one copy of a piece of data because of replication, and data may be split up because of partitioning. Any updates to data performed by any user must be propagated to all copies throughout the database. The use of snapshots is one technique for implementing this.
Distributed concurrency control
• Concurrency control in distributed databases can be done in several ways. Locking and timestamping are two techniques which can be used, but timestamping is generally preferred.
• The problems of concurrency control in a distributed DBMS are more severe than in a centralized DBMS because of the fact that data may be replicated and partitioned. If a user wants unique access to a piece of data, for example to perform an update or a read, the DBMS must be able to guarantee unique access to that data, which is difficult if there are copies throughout the sites in the distributed database.
Distributed Queries Optimization
• In a distributed database the optimization of queries by the DBMS itself is critical to the efficient performance of the overall system. Query optimization must take into account the extra communication costs of moving data from site to site, but can use whatever replicated copies of data are closest, to execute a query. Thus it is a more complex operation than query optimization in centralized databases.
Query optimization overview
• Query optimization is essential if a DBMS is to achieve acceptable performance and efficiency. Relational database systems based on the relational model and relational algebra have the strength that their relational expressions are at a sufficiently high level so query optimization is feasible in the first place; in non-relational systems, user requests are low level and optimization is done manually by the user - the system cannot help. Hence systems which implement optimization have several advantages over systems that do not.
• The optimization process itself involves several stages, which involves the implementation of the relational operators. A different approach to query optimization, called semantic optimization has recently been suggested.
• This technique may be used in combination with the other optimization techniques and uses constraints specified on the database schema. Consider the SQL query:
SELECT E.LNAME FROM EMPLOYEE E M WHERE E.SSN = M.SSN AND E.SALARY > M.SALARY This query retrieves the names of employees who earn more than their supervisors.
• Suppose we had a constraint on the database schema that states that no employee can earn more than their supervisor. If the semantic query optimizer checks for the existence of this constraint, then it need not execute the query at all. This may save considerable time if the checking for constraints can be done efficiently; however, searching through many constraints to find ones applicable to a given query can also be quite time consuming.
Timestamping
• Timestamping is a method of concurrency control where basically, all transactions are given a timestamp or unique date/time/site combination and the database management system uses one of a number of protocols to schedule transactions which require access to the same piece of data.
• While more complex to implement than locking, timestamping does avoid deadlock occurring by avoiding it in the first place
0 comments:
Post a Comment